How to extract images from PDF

PDF files can have images embedded in them. These images can be extracted easily using the pdfimages tool from the poppler-utils package.

  • Installation is easy:
$ sudo apt install poppler-utils
  • To list the details of images embedded in a PDF file:
$ pdfimages -list foo.pdf
page   num  type   width height color comp bpc  enc interp  object ID
---------------------------------------------------------------------
   1     0 image      80   100  icc     3   8  image  yes       11  0
   1     1 smask      80   100  gray    1   8  image  yes       11  0
   1     2 image      80   100  icc     3   8  image  yes       13  0
   1     3 smask      80   100  gray    1   8  image  yes       13  0
   1     4 image      80   100  icc     3   8  image  yes       15  0
   1     5 smask      80   100  gray    1   8  image  yes       15  0
   1     6 image      80   100  icc     3   8  image  yes       17  0
  • To extract the images embedded in a PDF file provide a prefix for the extracted image filenames:
$ pdfimages foo.pdf foo_img

$ ls foo_img*
foo_img-000.ppm
foo_img-001.ppm
foo_img-002.ppm
foo_img-003.ppm
foo_img-004.ppm
foo_img-005.ppm
foo_img-006.ppm

Tried with: PopplerUtils 0.24.5 and Ubuntu 14.04

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s