Tuesday, October 23, 2007

Adding an annotative text file to a pdf image

Earlier this week (10/21/2007), I described how you can add textual key/value image descriptors to the header of a pdf file.

If you like, you can add entire documents to a pdf file using the pdftk utility.

I described the pdftk utility in an earlier blog this week, but you can go directly to the pdftk site for a free download:


We'll work from the pdftk subdirectory in Windows, and we'll use the pdf image out.pdf.

As an attachment, we'll add the neo2.dot file, which happens to contain the GraphViz specification for the out.pdf image (we could have used any file).

We put pdftk in the c:\pdftk\ subdirectory.

We enter the command line:

C:\pdftk>pdftk out.pdf attach_files neo2.dot to_page 1 output out_att.pdf

The pdftk utility will attach the text file, neo2.dot, to page 1 of the pdf image file, out.pdf, creating a new file, out_att.pdf. When we view the out_att.pdf file, it shows the same image as the out.pdf file, but the file contains a hidden attachment file, neo2.dot.

We can now delete the neo2.dot file

C:\pdftk>del neo2.dot

We can extract the neo2.dot file from the out_att.pdf file using the following pdftk command line.

C:\pdftk>pdftk out_att.pdf unpack_files output

The default output file is the attachment file, and it will have its original name, neo2.dot.

We can verify by printing out the contents of neo2.dot

C:\pdftk>type neo2.dot
digraph G {
node [style=filled color=gray65];
Neoplasm [label="Neoplasm"];
node [style=filled color=lightgray];
NeuralCrest [label="Neural Crest"];
GermCell [label="Germ cell"];
Neoplasm -> EndodermEctoderm;
Neoplasm -> Mesoderm;
Neoplasm -> GermCell;
Neoplasm -> Trophectoderm;
Neoplasm -> Neuroectoderm;
Neoplasm -> NeuralCrest;
and so on

In summary, any image can be converted to a PDF file, and the descriptive text for the image can be added as an attachment to the file. You can send the file to a colleague knowing that the image file conveys textual descriptors that you have provided. At any time, the textual descriptors can be extracted from the pdf file.

- Jules Berman

Science is not a collection of facts. Science is what facts teach us; what we can learn about our universe, and ourselves, by deductive thinking. From observations of the night sky, made without the aid of telescopes, we can deduce that the universe is expanding, that the universe is not infinitely old, and why black holes exist. Without resorting to experimentation or mathematical analysis, we can deduce that gravity is a curvature in space-time, that the particles that compose light have no mass, that there is a theoretical limit to the number of different elements in the universe, and that the earth is billions of years old. Likewise, simple observations on animals tell us much about the migration of continents, the evolutionary relationships among classes of animals, why the nuclei of cells contain our genetic material, why certain animals are long-lived, why the gestation period of humans is 9 months, and why some diseases are rare and other diseases are common. In “Armchair Science”, the reader is confronted with 129 scientific mysteries, in cosmology, particle physics, chemistry, biology, and medicine. Beginning with simple observations, step-by-step analyses guide the reader toward solutions that are sometimes startling, and always entertaining. “Armchair Science” is written for general readers who are curious about science, and who want to sharpen their deductive skills.