Reflow for PDF
February 29, 2012  

When I argued that academic researchers should abandon paper document formats like PDF, one of the reasons that I gave was that PDF is poorly suited to reading onscreen, since it is not easily reformatted for different devices.

I’m happy to find out that, while difficult, it is feasible. Scribd has just introduced “reflow zoom” for PDF. They say:

… we have to:

  1. Analyze the layout and detect the reading order of the text

  2. Detect and join back words where hyphens were used for line-wrapping

  3. Remove page numbers, headers/footers, table of contents etc.

  4. Interleave images with the text
They process PDF documents server-side, not in the application. I had previously considering doing something similar in-browser, starting from paper2ebook and pdf.js, but that would have been a quite a lot of work. Bravo, Scribd!

This doesn’t change my opinion on paper document formats. All of the reasons I give for abandoning them still hold. We should be targeting document formats that are designed from the start for reformatting, and only convert to a paper format for printing.