Sunday, November 13, 2011

pdfedit - pdf's etc (FAIL)

Links: sourceforge - pdfedit   www.boost.org   Xournal

Like most blog posts, this one is born from annoyance. My current rage was with PDF books retreived from Project Gutenberg. Typical PDF book files should be a few hundred K and fast to load. Some are. Some are several MB and also open quickly. But a few are several megabytes and, while loading, push one's CPU to an unhealthy 100% for minutes, instead of for a few seconds. This subset of larger PDF's are of course impossible to open on portable devices. The problem is the Gutenberg volunteers make a normal PDF, but then add a 1200 lines of resolution photo of the book's cover to the first page of the PDF. It takes a lot of CPU and memory for PDF software to simultaneously render a huge photo down to a tray icon, display the huge photo full-screen, and load the first few pages of text. The fix is to edit such PDF's initial page, reducing the first page photo to a typical 75 or 150 lines of resolution photo.

So these PDFs need repairing or else one's CPU will need replacing, but is there a Linux program out there which does this? We can say "yes" definitively if we want to spend hundreds on the Adobe Acrobat solution. And there are well-proven Linux tools like pdftotext that quickly extract all the PDF's text unformatted. But what about a Linux program that just opens the PDF, allows us to edit, and then close the file? Based on this ideal, I decided to give PDFedit a shot.

installation (v.0.4.5)

Comes as a .bz2 because they have decided to pander to the Windows crowd, apparently. The README indicates "Boost" is the dependency. Boost is just a set of C++ libraries, so I ran configure before doing any checks to see if they might already be installed. Nope:
checking for boostlib >= 1.20.0... configure: error: We could not detect the boost libraries (version 1.20 or higher). If you have a staged boost library (still not installed) please specify $BOOST_ROOT in your environment and do not give a PATH to --with-boost option. If you are sure you have boost installed, then check your version number looking in . See http://randspringer.de/boost for more documentation.

boost (v.1.47) installation

PDFedit's pretension of requiring Boost is annoying. For example, 1) C++ libraries sufficient for compiling are already on people's systems, we don't need a redundant set, 2) installing them means bloating one's system for no reason and, worst of all, 3) they are on Sourceforge servers so add an hour to the installation timeline. (Edit: indeed, the first download was 30MB and was a set of PDF documents mislabeled as source.) A half hour was already wasted, but it's a dependency, gotta get it in. Let's go directly to the boost site to get the libraries. And...the Boost site bounced me back to Sourceforge for another 53 frigging MB at Sourceforge 60K "speeds". Installing PDFedit is starting to look like a 2 hour operation.

bootstrap

So, opened the PDFedit source. No configure file, no README. Great. Noting there are some bootstrap files however, so we're apparently dealing with frigging bootstrap. Now we have bad choices by both the Boost and the PDFedit developers. Also Boost appears to require Python. So the real dependency tree is apparently: PYTHON-->BOOST-->XPDF-->PDFEDIT
$ ./boostrap.sh
$ ./b2 --prefix=/usr
This doesn't work. I finally located some installation instructions. They're on the Boost website instead of in a simple README in the source. They appear partially inaccurate since they are without root. Let's start over and change it to a way it will work.
$ ./boostrap.sh --prefix=/usr
# ./b2 install

back to pdfedit (hours later)

I've almost forgotten why I needed to install PDFedit in the first place, but here we go. Did a mostly standard configure -prefix=/usr, however the results showed me that no tools or kernel tests would be included. Start over.
$ configure -prefix=/usr --enable-tools --enable-kernel-tests
This went well except that kernel checks couldn't be configured due to some missing package apparently called Cppunit for which it wants version 1.10 or later. Let's see if we can get that in.

Cppunit (v.1.12.1) installation

This was a standard configure -prefix=/usr, make, # make install. No problems.

back to pdfedit

Attempted 3 ways
$ configure -prefix=/usr --enable-tools --enable-pdfedit-core-dev --enable-kernel-tests
$ configure -prefix=/usr --enable-tools --enable-pdfedit-core-dev
$ configure -prefix=/usr
All of these resulted in fatal errors during make
make[2]: *** [cpagecontents.o] Error 1
make[2]: *** Waiting for unfinished jobs....
make[2]: Leaving directory `/home/foo/Download/pdfedit-0.4.5/src/kernel'
make[1]: *** [kernel] Error 2
make[1]: Leaving directory `/home/foo/Download/pdfedit-0.4.5/src'
make: *** [source] Error 2
Apparently the PDFedit source has design and documentation flaws much deeper than one can suss out in the time required for reasonable installation. On the first account, it should run with normal kernel settings. On the second account, they left the little detail of kernel recompiling out of their hard-to-locate documentation, when it should be the first thing noted. Further, the documentation neglects any information regarding which kernel switches would need to be set. So really, users would have to guess among 2,600 kernel options in order to use PDFedit. In short, PDFedit will either work on one's PC or it won't, dealer's choice. Troubleshooting using strace and finding the needle in the haystack of the entire PDFedit source, goes far beyond the investment most people should have to make to simply install a program. I certainly have more appealing things to do with two weeks.

I wasted half a day on the shiatty PDFedit product and was unable to install it or edit my PDF's. In the end, I ran pdftotext on the particular PDF I wanted to fix. I'll format that basic text file with LaTeX as I read it, and then recompile when finished -- the resultant PDF will be easily read on a portable device. This is extra work I'll first have to do with a desktop, so I guess I'll read the book at home.

No comments: