[Booki-dev] booki zip format

Mon Sep 28 19:39:42 PDT 2009

hi again

This is just to describe the format that the FM epub software is dealing
with on the other end of the epub transformation, which booki will
accept and emit.  Examples can be found at
http://objavi.halo.gen.nz/booki-books/, though these were created by a
script that imports books form FM's old twiki site, not by booki.

The format is similar to a simplified epub.  Each book is a zipfile
containing html and images and metadata.  The metadata is contained in
one file only, 'info.json', which is a fairly straightforward
serialisation of whatever booki knows about the book, though with some
leaning toward epub in terminology and layout (e.g., it has a 'spine').
 The files are listed in a manifest, which also gives a mime-type.  HTML
files coming from FM have the extension '.html' and the mime-type
'text/html', and files coming from epub documents have the extension
'.html' and the mime-type 'application/xhtml+xml'.  From this, the
various components know how to treat each file, transforming them as
necessary.  All the images are in a 'static' subdirectory, reflecting
the location they have on booki, where it allows a dedicated web server
to deal with them directly.  And that is it.

So the job of epubjavi, the booki-zip to epub software, is to unpack a
zip containing these 'text/html' files, clean them up for epub, parse
the json metadata, and repackage the bits that are useful.  Espri, the
epub -> booki software does the reverse transform on the metadata.  With
the content it will try to find chapters and split the content
accordingly, and rewrite image links to use the 'static' directory, and
reserialise the xhtml to make sure it is web-ready.

This might seem a bit silly, transforming to and fro between two zip
based formats, neither of which is directly editable, so I will list
some reasons for it.

1. Aco and me are working rapidly on separate parts of this on opposite
sides of the world in opposite time zones.  If we were trying to
interface our code more directly, we would spend our whole time dealing
with a shifting front.  In other words, this format is our API.

2. The booki-zip format is very much simpler and more regular than
epubs.  By starting from this format, booki doesn't have to worry about
where it will find images, metadata, etc.

3. Although this may not be of interest to IA, FM will use these zips to
export to a variety of other formats, like PDF, ODF and docbook, in a
way that doesn't tie booki too closely to the export software.

4. Also not of interest to IA, but important to development, we are able
to produce booki-zips from existing FM books on the old TWiki server
(via http://objavi.halo.gen.nz/booki-twiki-gateway.cgi) which lets us
test early and make sure everything is going to work with our way of
editing.

Well, that's my email quota for the week.

Douglas