Saturday, September 29, 2012

EXIF data, renumbering, cropping, rotating (lossless)

Links: ExifTool :: "rename" command notes :: PDF merging

A common activity is stripping annoying JPEG EXIF data and/or renumbering shots into a reasonable schema. Of course, pulling the EXIF, say with imagemagick, typically means reducing the file quality, since any program which opens a JPG slightly reduces its quality. As for renaming, for many users, this means the prospect of a Perl or Bash script which includes the mv command.

Instead, most batch stripping and renaming can be done without a script. Two CLI applications exiftool and rename handle most situations.

renumbering

A typical problem is a set of files with the same prefix, say, "blue", but a mixture of places in their numbers. So, eg, "blue9.jpg", "blue109.jpg", "blue43.jpg". To put these in order, we want the same number of digits, say three digits. We want blue009.jpg, blue043.jpg and blue109.jpg(as-is). Suppose we have hundreds of these files. We could write a script, but are there simple commands instead? Yes. In a terminal, cd to the folder with the files and:
$ rename blue blue00 blue?.jpg
The one question mark locates the one digit file, and adds two zeros. This means blue9 will be changed to blue009. What about the two digit file, blue43.jpg? We only need to add one zero to files with two digits:
$ rename blue blue0 blue??.jpg
Now we have blue043.jpg. This is the template for any numbering system: decide on a number of digits and then one or two commands should manage it.

Exif - ExifTool

Available at the link at top. Easily extracts, writes, deletes EXIF data. I don't even recall compiling, I think all I had to do was move a copy of into /usr/bin or some such. Maybe I had to compile.

ExifTool can do several actions, but this post concerns stripping EXIF data for all photos in a directory. Go to that directory, filled with JPG files:
$ exiftool -all= *.jpg
Or, if you have exiv2 and want to losslessly remove IPTC, XMP, and EXIF data...
$ exiv2 -d a *.jpg

Renaming

Go to the directory with the files and use the rename command. Let's say they all start with "IMG" and then some number. To rename them to "Cam01_" in a numbered sequence:
$ rename IMG Cam01_ IMG*
Done

Cropping

Scanning in 150 lines typically makes a 1256x1752 image. If I forget to set the scan size, the bottom 110 lines just show the scanner bed. To get rid of this excess, copy those JPG's (or use the originals if you don't mind them being changed) into a directory and:
$ mogrify -crop 1256x1642+0+0 *.jpg
The 0+0 is the offset, in other words the image will begin in the top left corner(0,0) and go 1256 pixels horizontally and 1642 vertically. Similarly, if I scan at 75 lines, I get a 624x877 and typically crop down to 624x822.

Rotating

$ jpegtran -rot 90/270 -trim infile.jpg > outfile.jpg
...90 being clockwise and 270 being 90 degrees ccw. "Trim" drops edge pixels which can't properly rotate. I haven't made a script yet to batch these rotations. Jpegtran doesn't appear to allow wildcards -- if that's correct, a script will be necessary.

pdf conversion

Edit - 2016: like nearly all applications over the years, ImageMagick has become less and less helpful to the average user. One must now specify the "density", to avoid page size variance, and "compress" to avoid a PDF file 10 times larger than the JPGs being concatenated:
$ convert *.jpg -compress JPEG -density 150 somefile.pdf
A rough yardstick for the density number is to use what you'd want for say, "dpi", on a printer. If one wants to change the page orientation to landscape, then add: -rotate 90 . There's more on all this here. At the bottom of that page is a link to yet a more elaborate page, etc etc.

Prior to 2013, ImageMagick's "convert" automatically read the density from the JPG, as well as auto-detected the compression format. A much simpler command was performed in those days:
$ convert *.jpg somefile.pdf

Concatenate/extract PDF pages

Combining several PDF's into a larger PDF (common with letter attachments) is best with GhostScript. Imagine the first input file is in1, second in2, etc...
$ gs -dBATCH -dNOPAUSE -q -sDEVICE=pdfwrite -sOutputFile=out.pdf in1.pdf in2.pdf

To cleanly extract one or more pages from a PDF, use Ghostscript to avoid the rasterization of extracting as a JPG:
$ gs -sDEVICE=pdfwrite -dNOPAUSE -dBATCH -dSAFER -dFirstPage=22 -dLastPage=36 -sOutputFile=outfile.pdf 100p-inputfile.pdf

Once a PDF page is isolated, make it into a reasonably clean JPG for editing.
$ convert -colorspace RGB -resize 800 -interlace none -density 300 -quality 90 input.pdf someoutput.jpg

evince pdf reader

Good gnome-based reader, however requires that gvfs be installed to display an index alongsside the final. This is an incomprehensibly stupid dependency, on the order of requiring all cows to have wings. Nothing related to file systems and volume management should have been made a dependency for an application that displays PDF''s. It's not a file manager.