[Booki-dev] [FM Discuss] How to update EPUBs in Internet Archive?

raj kumar rkumar at archive.org
Mon Jan 17 14:57:52 PST 2011


On Jan 17, 2011, at 7:16 AM, James Simmons wrote:

> Raj,
> 
> The attached screen grab shows what I see when I attempt to edit my
> own item in IA for "Make Your Own Sugar Activities!"  As you can see,
> no EPUB is shown, so there is no way for me to remove the existing
> EPUB.  When I have tried to add my own EPUB it doesn't show up in the
> list either, and does not get served.

Hi James! Thanks for the screenshot.

IA generates EPUBs on-the-fly for most text items that do not have them, which is why you see the EPUB download link on the details page, but do not see it in the item.

It you edit your item and add a file named ActivitiesGuideSugar-en-2010.10.08-17.20.43.epub, then it will be served, instead of the generated-from-ocr version.

> I also question how I could upload a PDF and an EPUB at the same time.

There are a few ways to do this. The easiest  might be to use the FTP upload path, which lets you upload multiple files before "checking in" your item:

http://www.archive.org/create.php?ftp=1

Alternatively, you can use the S3-like upload interface with two PUTs to the same item:
http://www.archive.org/help/abouts3.txt


> Just FYI I have found that the only way to upload an item that works
> consistently for me is to use the Flash method in Internet Explorer on
> Windows.  When I have used the non-Flash method in Linux the file gets
> uploaded but nothing gets derived.  I ended up posting emails to the
> support address and someone there ran the derive jobs for me.  I went
> through this several times before giving up and using IE on Windows
> instead, which has never failed me.

Thanks for the feedback.. that is too bad. I will see if I can reproduce, but in the mean time, I can confirm that the FTP path works well.

> I'd like to be of use to this project.  If you do a search on
> "nicestep" in IA you'll see that I've donated a fair number of books
> that I've scanned myself.  I've written a FLOSS Manual (E-Book
> Enlightenment) on the subject, and I've created an Activity for Sugar
> Labs called Get Internet Archive Books which lets children easily
> search the Internet Archive catalog and download books in the
> different formats.  I've also donated texts to Project Gutenberg and
> PG Canada (most recently The Big Sleep by Raymond Chandler).

Thanks for all your work, and for the Get Internet Archive Books activity! We love showing it off when we do OLPC demos :)

> If you can explain how to replace a derived EPUB with a corrected one
> by hand I'll document the process in my book.  I'll also do testing
> for Booki (I have an installation in my home office) and make myself
> useful in any other way I can.

Great!

For most books, we don't have EPUBs in the item, so we are generating (and then caching out-of-band) a EPUB file. If you add an EPUB to an item, we would serve that instead.

If the item already has an epub, you can delete it and upload a new one.

As an item owner, you can click "Edit Item" at the top of the details page, which takes you to:
http://www.archive.org/edit/itemId

Then click "I want to change the files in my item", which will open a flash-based file manager.

You can right-click on a file and choose delete.
Then use the 'share' button to upload the new epub.
Finally, click "Update Item!" in the "step 2/Finish my changes" section at the bottom of the page.

Hope that helps!
-raj

> 
> Thanks,
> 
> James Simmons
> 
> 
> On Sat, Jan 15, 2011 at 4:27 PM, raj kumar <rkumar at archive.org> wrote:
>> Hi!
>> 
>> James, if you upload a pdf and epub to the Internet Archive at the same time, we won't generate an epub, but will serve the uploaded one instead. Additionally, as an item owner, you can remove the generated epub and replace it with your own. If this doesn't work for you, I can help you.
>> 
>> As Adam says, we are hoping to get first-class booki integration in the near future :)
>> 
>> Also, it would be nice if the IA deriver used the text in the pdf to generate the djvu.xml file if possible, instead of running ocr. The archive.org book scanning process had been  traditionally image-based, but we hope to change this in the future.
>> 
>> -raj
>> 
>> On Jan 15, 2011, at 3:20 AM, adam wrote:
>> 
>>> so. the thing is james you are talking about the future :)
>>> 
>>> We have a bit of a problem here at the booki dev team. its a question of
>>> capacity. the scenario you suggest is _exactly_ what the Internet
>>> Archive wants to do with booki. We have discussed this with them at
>>> length and they want to build booki into their system so that people can
>>> correct epubs just as you have done.
>>> 
>>> This is an enormous opportunity for booki but we have so far failed to
>>> realise it because, simply, we dont have the people power. It is
>>> _extremely_ frustrating.
>>> 
>>> What we need to do is get a few heads together and solve this and solve
>>> it fast. We have all the code in place what I think we need to do is
>>> this:
>>> * find a good server to host an archive instance of booki
>>> * work out some basic code to automate the import
>>> * install
>>> * run test projects
>>> 
>>> if you would like to be a part of this greater picture i would *love* to
>>> work with someone to get this cracking...
>>> 
>>> adam
>>> 
>>> 
>>> 
>>> 
>>> On Fri, 2011-01-14 at 14:51 -0600, James Simmons wrote:
>>>> When I submitted "Make Your Own Sugar Activities!" to the Internet
>>>> Archive it created books in multiple forms, including an EPUB.
>>>> However, the way it makes an EPUB is defective in this case.  It
>>>> assumes that you will give it a PDF composed of page images, which it
>>>> then does OCR on to create the EPUB.  Doing OCR on a file that already
>>>> contains text is not going to give good results, and the resulting
>>>> EPUB is useless.  However, Booki can create a really good EPUB.  I
>>>> know I could donate that EPUB to the Internet Archive, but what I
>>>> really want to do is REPLACE the lousy generated EPUB with my good
>>>> one.  I know that Booki was created in part to do this very thing, but
>>>> I can't figure out how to make it happen.  The Internet Archive page
>>>> does not let you delete the existing EPUB, and when you upload a new
>>>> one it does not seem to replace the existing one.
>>>> 
>>>> I'll ask over at IA but I figured that whoever developed the function
>>>> in Booki would know the answer.
>>>> 
>>>> James Simmons
>>>> _______________________________________________
>>>> Discuss mailing list
>>>> Discuss at lists.flossmanuals.net
>>>> http://lists.flossmanuals.net/listinfo.cgi/discuss-flossmanuals.net
>>> 
>>> 
>>> _______________________________________________
>>> Booki-dev mailing list
>>> Booki-dev at lists.flossmanuals.net
>>> http://lists.flossmanuals.net/listinfo.cgi/booki-dev-flossmanuals.net
>> 
>> _______________________________________________
>> Booki-dev mailing list
>> Booki-dev at lists.flossmanuals.net
>> http://lists.flossmanuals.net/listinfo.cgi/booki-dev-flossmanuals.net
>> 
> <MYOSA.jpg>_______________________________________________
> Booki-dev mailing list
> Booki-dev at lists.flossmanuals.net
> http://lists.flossmanuals.net/listinfo.cgi/booki-dev-flossmanuals.net




More information about the Booki-dev mailing list