[Booki-dev] Proposed API for archive.org integration

Douglas Bagnall douglas at paradise.net.nz
Wed Jul 28 16:56:50 PDT 2010


Way back in April, Raj wrote:
(quoting in full first, to refresh memories).

> Hi,
> 
> I met with Adam today and we came up with this very simple API for
> integration with archive.org
> 
> There would be two new services that booki.cc would provide, at the
> following URLs:
> 
> - http://booki.cc/edit/IA/bookId.epub
> - http://booki.cc/download/IA/bookId.epub
> 
> (the actual urls could be different to fit with booki naming
> conventions)
> 
> The first url, /edit/IA/bookId.epub, would cause the booki editor to
> launch, pre-populated with an archive.org epub. The workflow would
> look something like this:
> 
> 1. Present log in dialog if user is not already logged in to booki.cc
> 2. Import IA epub, if this is the first time this book is being edited in booki
> 3. Open booki editor with the latest revision
> 
> The second url, /download/IA/bookId.epub, would cause a download of
> the latest version of the book in one of the following two ways:
> 
> 1. If the book has already been cached in an archive.org item, then
> this service would issue a 302 redirect to something like
> http://s3.us.archive.org/booki-bookId/bookId.epub
> 2. If the latest version of the book hasn't been cached yet, booki
> would generate the epub for the user, and then cache it for
> subsequent requests.
> 
> 
> From the IA side, we would add a two links to archive.org, one to
> download the corrected epub, and one to open the editor.
> 
> For example, for this book:
> 
> http://www.archive.org/details/birdbookillustra00reedrich
> 
> In the list of available formats for download on the left, we would
> add "Corrected EPUB" below "EPUB". This link would point to
> http://booki.cc/download/IA/birdbookillustra00reedrich.epub
> 
> Elsewhere on the page we would have text that says "Please help us
> correct the OCR data for this book on booki.cc", which a link that
> points to http://booki.cc/edit/IA/birdbookillustra00reedrich.epub
> 
> What do you guys think?


It looks quite sensible.

To me, the thorniest issue in previous plans looked to be uploading
and versioning the edited epub under the original IA id.  It is much
simpler to ignore that.

However, with this:

> In the list of available formats for download on the left, we would
> add "Corrected EPUB" below "EPUB". This link would point to
> http://booki.cc/download/IA/birdbookillustra00reedrich.epub



how would IA know when to add this link?

At least at first, most books won't be in booki, so if this link is
always added, it will usually be to something that doesn't exist.
That could be handled by adding a third case to the download/IA/*
links: if the book is not there, redirect to the original IA epub
link.  Then the "Corrected EPUB" will appear to work, though it has a
misleading title and circuitous route.

Another way would be for IA to record that somebody has at some point
followed the "edit" link, and add the "corrected" link for that book.
But that is a little fraught: it supposes that everything went well
for them -- including logging in or signing up to booki, which
precludes casual browsers and rogue bots.  So there will still be false
positives, and booki will have to handle them too.

Booki does tell IA that a book has useful edits (according to someone)
when it uploads a corrected epub to its own S3 bucket.  So IA could
check for the existence of this, using whatever convention booki uses.
That would be safe, but would miss edits that had not been committed
via S3.  And maybe cross-item references are complicated for IA.

Then, for completeness, I guess Booki could also tell IA when somebody
makes an edit, using a new API, whereupon this approach loses the
benefit of simplicity.

> There would be two new services that booki.cc would provide, at the
> following URLs:
> 
> - http://booki.cc/edit/IA/bookId.epub
> - http://booki.cc/download/IA/bookId.epub



Aco, Adam, has any work been done on these?  The first looks simple
enough, but the second involves versioning and other parts of Booki
that I know nothing about.  Does Booki record currently when an
archive upload is done?

Douglas




More information about the Booki-dev mailing list