[FM Discuss] Future of Booki

adam adam at xs4all.nl
Thu Jun 24 15:53:56 PDT 2010


did you see that booki can output epub and you can also push this or any
other format to archive.org?

adam

On Thu, 2010-06-24 at 13:54 -0500, James Simmons wrote:
> Adam,
> 
> I am well aware of the limitations of archive.org in the EPUB
> department.  I document that in my own book. I'm doing some
> proofreading of OCR'd text for my other donation, "Ancient Manners" by
> Pierre Louys, which I'm submitting to Distributed Proofreaders for
> Project Gutenberg.  The book was published in 1906 and I used
> Tesseract on it to get OCR results that were not so good, but no worse
> than I got with ABBYY Fine Reader.  Because of this, I'm proofing and
> correcting the text myself, one page at a time, before submitting it
> to DP.  Not much fun.
> 
> It seems to me that in the E-Book world a lot of different efforts are
> converging, and I'm trying to document that as much as possible in my
> book.
> 
> James Simmons
> 
> 
> > Date: Thu, 24 Jun 2010 18:37:31 +0200
> > From: adam <adam at xs4all.nl>
> > To: discuss at lists.flossmanuals.net
> > Subject: Re: [FM Discuss] Future of Booki
> > Message-ID: <1277397451.9881.198.camel at esetera>
> > Content-Type: text/plain; charset="UTF-8"
> >
> > nice book by the way...
> >
> > one thing that hasn't been mentioned but is interesting...booki and
> > archive.org are going to be linked together soon i hope. essentially
> > archive.org has a zillion books that have been scanned ('OCR') similar
> > to the book you have donated James.
> >
> > when you scroll down the page of that book:
> > http://www.booki.cc/big-aviation-book-for-boys/pages/
> >
> > you see lots of whacky ocr artifacts. These are created because scanners
> > cant tell if a blotch on the page is a blotch or an image, and they
> > can't also tell the difference between a big letter and an image. If you
> > scroll down that page you will see what I mean.
> >
> > They also dont format the page with headings or well formatted footnotes
> > etc.
> >
> > Also there is a 5% error rate in the text of OCR scans which is actually
> > quite high...
> >
> > so, this content needs to be improved, and that is what booki will
> > enable. we will create a 'round trip' -> books get imported from archive
> > into booki and placed directly in the Internet Archive Group. then the
> > proof readers proof and improve, then they export and push back to
> > archive.org
> >
> > i'm hoping we get to that soon now Doug is back from his art holiday :)
> > (Doug is the Objavi developer)
> >
> >
> >
> > adam
> >
> > On Thu, 2010-06-24 at 11:10 -0500, James Simmons wrote:
> >> I tried out Booki by importing a book from the Internet Archive which
> >> I donated myself, "The Big Aviation Book For Boys".  I'm incredibly
> >> impressed.
> _______________________________________________
> Discuss mailing list
> Discuss at lists.flossmanuals.net
> http://lists.flossmanuals.net/listinfo.cgi/discuss-flossmanuals.net
> 





More information about the Discuss mailing list