booki-zip format ================ This describes the booki-zip format that Booki, Espri, and Objavi use to communicate with each other. Tools that import into Booki should import into this format. Zip container format. ===================== A booki-zip file is a zip file[1], with certain restrictions. The ultimate test of whether a zip is correctly encoded is whether its contents can be extracted by the zipfile modules in Python 2.5 and 2.6. This means the contents must be either uncompressed or deflate-compressed. ZIP64 extensions are OK (though unnecessary in practical terms), but encryption and comments are not. The first file in the zip should be uncompressed and named "mimetype". It should contain only the 23 characters "application/x-booki+zip". This string will end up in the first few bytes of the zip file, allowing it to be identified without unzipping. Directory structure. ==================== As well as the just mentioned "mimetype", the booki-zip must have a file called "info.json" in its root directory, the contents of which will be described shortly. Any other files in the root directory should be html files intended for editing with Booki. Any associated files that are not directly editable by Booki should be in a subdirectory named 'static'. Here is an example structure: / mimetype Introduction.html UseCases.html AdamsTips.html Credits.html info.json static/ BookSprints-ott-adam-en.jpg Blog-writers-en.png Floss-100-en.gif example.css All references from the html to the files in 'static' should use relative addresses. For example, an image should be linked thus: It is recommended but not required that the file names have conventional extensions (".html", ".jpg", etc). File names should not contain spaces, and must meet the restrictions imposed by the zip format. There should be nothing in the root directory other than "mimetype", "info.json", and the html files, and there should be no other subdirectories other than "static". Apart from starting with "mimetype", there is no required order to the arrangement of entries within the zip file itself. Other than "mimetype", files should be deflated-compressed. character encoding ================== All html files, and info.json, should be encoded as utf-8. info.json ========= The "info.json" file describes the structure of the document and carries metadata. It is a JSON file [3], containing a single JSON object with 5 members, as shown here: { "spine": [ ... ], "TOC": [ ... ], "manifest": { ... }, "metadata": { ... }, } Being JSON object members, the ordering of these elements is not significant. The following order is for narrative purposes only. info.json manifest ================== The manifest is a mapping of identifiers to file names and mime-types. Each entry looks like: identifier: [filename, mimetype, authors] The constraints on *identifier* match the XML name specification[4] (in short, avoid spaces and most punctuation). In practise, the *identifier* is often related to the *filename*. *filename* locates the file within the zip, and must match a path in the zip index. *mimetype* is the IANA media type [5] of the file. Booki-editable html files must be of type 'text/html', and other files should be correctly identified. *authors* is a list of names of people who have contributed to this file. It can be empty. The manifest shouldn't list the 'mimetype' or 'info.json' files, just the editable html and associated static files. An example manifest, containing two html files and an image, is shown here: "manifest": { "Introduction": [ "Introduction.html", "text/html", ["Adam Hyde", "Aleksander Erkalovic"] ], "arbitrary-identifier_0005": [ "UseCases.html", "text/html", [] ], "BookSprints-ott-adam-en.jpg": [ "static/BookSprints-ott-adam-en.jpg", "image/jpeg", ["Ansell Adams"] ] } info.json spine =============== The spine lists the identifiers of all the html files in the order they appear in the book. It looks like: "spine": [ identifier, identifier,... ] where each *identifier* is the manifest identifier for an editable html page. Here is a possible spine for the manifest used in the previous example: "spine": ["Introduction", "arbitrary-identifier_0005"] info.json TOC ============= The TOC (Table of Contents) specifies navigation points with the book. It uses a nested structure, with less significant divisions being contained within the "children" attribute of greater division. The "TOC" element itself is a list of objects with the following structure: { "nav_id": identifier, "title": division title (optional), "url": filename and possible fragment ID, "type": string indicating division type (optional), "role": epub guide type (optional), "children": list of TOC structures (optional) } *nav_id* is a unique identifier for this navigation point. It uses a different namespace than manifest identifiers and need have no relationship to them. *title* is a free string giving the divisions title. It may be omitted. *url* points to the start of the division. It should consist of a filename as found in the manifest, optionally followed by a '#' and a fragment identifier. *type* is a string indicating what kind of navigation point it is. This might be used to determine text styles. *role*, if present, indicates the navigation point has a particular structural role. It must be a keyword for "reference type" as defined in the guide section of the epub OPF specification[6]. *children*, if present, contains a list of objects following this same specification. These are subsections of this section. An example: "TOC": [ {"nav_id": "section1", "title": "INTRODUCTION", "url": "Introduction.html", "type": "booki-section", "children": [ {"nav_id": "chapter1", "title": "WHAT IS GSoC?", "url": "Introduction.html", "type": "chapter", "role": "text" }, {"nav_id": "chapter2", "title": "WHY GSOC MATTERS", "url": "Testimonials.html", "type": "chapter", "children" [ ... ] } ] } ] info.json metadata ================== The names in the metadata object are "namespaces" in which "keywords" are defined. The objects referred to by keywords are further divided by "scheme". Each scheme points to a list of values. If the keyword is indivisible, there should be a single scheme identified by an empty string (""). Further, if a scheme is the primary default for that keyword, it may be identified by an empty string as well as by its scheme name. Here's the diagram: "metadata": { namespace: { keyword: { scheme: [value, value,...], scheme: [value],... },... },... } Booki uses Dublin Core[7] metadata keywords wherever possible, which are stored under the namespace "http://purl.org/dc/elements/1.1/". An example metadata section is shown below: "metadata": { "http://purl.org/dc/elements/1.1/": { "publisher": { "": ["FLOSS Manuals http://flossmanuals.net"] }, "language": { "": ["en"] }, "creator": { "": ["The Contributors"] }, "contributor": { "": ["Jennifer Redman", "Bart Massey", "Alexander Pico", "selena deckelmann", "Anne Gentle", "adam hyde", "Olly Betts", "Jonathan Leto", "Google Inc And The Contributors", "Leslie Hawthorn"] }, "title": { "": ["GSoC Mentoring"] }, "date": { "start": ["2009-10-23"], "last-modified": ["2009-10-30"] }, "identifier": { "flossmanuals.net": ["http://en.flossmanuals.net/epub/GSoCMentoring/2009.10.23-19.49.01"], "archive.org": ["gsocmentoring00fm"] } }, "http://booki.cc/": { "server": { "": ["en.flossmanuals.net"] }, "book": { "": ["GSoCMentoring"] } "dir": { "": ["LTR"] } } references ========== [1] Zip specification: http://www.pkware.com/documents/casestudies/APPNOTE.TXT [2] zipfile module: http://docs.python.org/library/zipfile.html [3] JSON specification: http://json.org/ [4] XML name specification http://www.w3.org/TR/REC-xml/#NT-Name [5] Media types http://www.iana.org/assignments/media-types/ [6] Guides in epub http://www.idpf.org/2007/opf/OPF_2.0_final_spec.html#Section2.6 [7] Dublin Core metadata elements http://dublincore.org/documents/2004/12/20/dces/