Saturday, September 29, 2012

EXIF data, renumbering, cropping, rotating (lossless)

Links: ExifTool :: "rename" command notes :: PDF merging

A common activity is stripping annoying JPEG EXIF data and/or renumbering shots into a reasonable schema. Of course, pulling the EXIF, say with imagemagick, typically means reducing the file quality, since any program which opens a JPG slightly reduces its quality. As for renaming, for many users, this means the prospect of a Perl or Bash script which includes the mv command.

Instead, most batch stripping and renaming can be done without a script. Two CLI applications exiftool and rename handle most situations.


A typical problem is a set of files with the same prefix, say, "blue", but a mixture of places in their numbers. So, eg, "blue9.jpg", "blue109.jpg", "blue43.jpg". To put these in order, we want the same number of digits, say three digits. We want blue009.jpg, blue043.jpg and blue109.jpg(as-is). Suppose we have hundreds of these files. We could write a script, but are there simple commands instead? Yes. In a terminal, cd to the folder with the files and:
$ rename blue blue00 blue?.jpg
The one question mark locates the one digit file, and adds two zeros. This means blue9 will be changed to blue009. What about the two digit file, blue43.jpg? We only need to add one zero to files with two digits:
$ rename blue blue0 blue??.jpg
Now we have blue043.jpg. This is the template for any numbering system: decide on a number of digits and then one or two commands should manage it.

Exif - ExifTool

Available at the link at top. Easily extracts, writes, deletes EXIF data. I don't even recall compiling, I think all I had to do was move a copy of into /usr/bin or some such. Maybe I had to compile.

ExifTool can do several actions, but this post concerns stripping EXIF data for all photos in a directory. Go to that directory, filled with JPG files:
$ exiftool -all= *.jpg
Or, if you have exiv2 and want to losslessly remove IPTC, XMP, and EXIF data...
$ exiv2 -d a *.jpg


Go to the directory with the files and use the rename command. Let's say they all start with "IMG" and then some number. To rename them to "Cam01_" in a numbered sequence:
$ rename IMG Cam01_ IMG*


Scanning in 150 lines typically makes a 1256x1752 image. If I forget to set the scan size, the bottom 110 lines just show the scanner bed. To get rid of this excess, copy those JPG's (or use the originals if you don't mind them being changed) into a directory and:
$ mogrify -crop 1256x1642+0+0 *.jpg
The 0+0 is the offset, in other words the image will begin in the top left corner(0,0) and go 1256 pixels horizontally and 1642 vertically. Similarly, if I scan at 75 lines, I get a 624x877 and typically crop down to 624x822.


$ jpegtran -rot 90/270 -trim infile.jpg > outfile.jpg
...90 being clockwise and 270 being 90 degrees ccw. "Trim" drops edge pixels which can't properly rotate. I haven't made a script yet to batch these rotations. Jpegtran doesn't appear to allow wildcards -- if that's correct, a script will be necessary.

pdf conversion

Edit - 2016: ImageMagick (like most apps) has become less and less intuitive to the average user over the years. 1) users should now specify "compress" to avoid a PDF file 10 times larger than the sum of the JPGs being concatenated. 2) users should attempt to match density to the scanned density. The size of the resulting PDF file in megabytes does not change due to density (unlike "compress"), but the size of the page inside the reader does change. If your scans were at 200 DPI, but your density is 100, the page size will be 2x size in the PDF browser and will not match well if concatenated with other docs:
$ convert *.jpg -compress JPEG -density 200 somefile.pdf
A rough yardstick for the density number is to use what you'd want for say, "dpi", on a printer. If one wants to change the page orientation to landscape, then add: -rotate 90 . There's more on all this here. At the bottom of that page is a link to yet a more elaborate page, etc etc.

In 2019, an ImageMagick security vulnerability apparently developed during conversion, such that conversion was de-authorized. To fix: change "none" to "read|write", or "all", in the following file, eg.
# nano /etc/ImageMagick-6/policy.xml
<policy domain="coder" rights="all" pattern="PDF,PS" />
If this does not work, check to be sure ImageMagick hasn't been upgraded to say, version 7, b/c the version 6 XML file will still exist, but will have zero effect. In such a case, update /etc/ImageMagick-7/policy.xml, or whatever the latest version is, to allow PDF creation. The effect is immediate and requires no restart or logout.

Prior to 2013, ImageMagick's "convert" automatically read the density from the JPG, as well as auto-detected the compression format. A much simpler command was performed in those days:
$ convert *.jpg somefile.pdf

Concatenate/extract PDF pages

Combining several PDF's into a larger PDF (common with letter attachments) is best with GhostScript. Imagine the first input file is in1, second in2, etc...
$ gs -dBATCH -dNOPAUSE -q -sDEVICE=pdfwrite -sOutputFile=out.pdf in1.pdf in2.pdf

To cleanly extract one or more pages from a PDF, use Ghostscript to avoid the rasterization of extracting as a JPG:
$ gs -sDEVICE=pdfwrite -dNOPAUSE -dBATCH -dSAFER -dFirstPage=22 -dLastPage=36 -sOutputFile=outfile.pdf 100p-inputfile.pdf

Once a PDF page is isolated, make it into a reasonably clean JPG for editing.
$ convert -colorspace RGB -resize 800 -interlace none -density 300 -quality 90 input.pdf someoutput.jpg

evince pdf reader

Evince is a reliable gnome-based reader, however it has a incomprehensibly stupid flaw: Evince requires that gvfs be installed to display a sidebar index in the PDF. Nothing related to file systems and volume management should have been made a dependency for an application that displays PDFs. PDF readers are not file managers and can easily internally index themselves for a sidebar without an external indexer, especially an intrusive configuration and memory hog like gvfs.