[FM Discuss] Future of Booki

adam adam at xs4all.nl
Thu Jun 24 09:37:31 PDT 2010


nice book by the way...

one thing that hasnt been mentioned but is interesting...booki and
archive.org are going to be linked together soon i hope. essentially
archive.org has a zillion books that have been scanned ('OCR') similar
to the book you have donated James.

when you scroll down the page of that book:
http://www.booki.cc/big-aviation-book-for-boys/pages/

you see lots of whacky ocr artifacts. These are created because scanners
cant tell if a blotch on the page is a blotch or an image, and they
can't also tell the difference between a big letter and an image. If you
scroll down that page you will see what I mean.

They also dont format the page with headings or well formatted footnotes
etc.

Also there is a 5% error rate in the text of OCR scans which is actually
quite high...

so, this content needs to be improved, and that is what booki will
enable. we will create a 'round trip' -> books get imported from archive
into booki and placed directly in the Internet Archive Group. then the
proof readers proof and improve, then they export and push back to
archive.org

i'm hoping we get to that soon now Doug is back from his art holiday :)
(Doug is the Objavi developer)



adam

On Thu, 2010-06-24 at 11:10 -0500, James Simmons wrote:
> I tried out Booki by importing a book from the Internet Archive which
> I donated myself, "The Big Aviation Book For Boys".  I'm incredibly
> impressed. 




More information about the Discuss mailing list