How to extract images from PDF

PDF files can have images embedded in them. Since PDF is the defacto format used for research papers, it can be sometimes necessary to extract these images. Images in a PDF file can be extracted easily using the pdfimages tool from the poppler-utils package.

  • Installation is easy:
$ sudo apt install poppler-utils
  • To list the details of images embedded in a PDF file:
$ pdfimages -list foo.pdf
page   num  type   width height color comp bpc  enc interp  object ID
---------------------------------------------------------------------
   1     0 image      80   100  icc     3   8  image  yes       11  0
   1     1 smask      80   100  gray    1   8  image  yes       11  0
   1     2 image      80   100  icc     3   8  image  yes       13  0
   1     3 smask      80   100  gray    1   8  image  yes       13  0
   1     4 image      80   100  icc     3   8  image  yes       15  0
   1     5 smask      80   100  gray    1   8  image  yes       15  0
   1     6 image      80   100  icc     3   8  image  yes       17  0
  • To extract the images embedded in a PDF file provide a prefix for the extracted image filenames:
$ pdfimages foo.pdf foo_img

$ ls foo_img*
foo_img-000.ppm
foo_img-001.ppm
foo_img-002.ppm
foo_img-003.ppm
foo_img-004.ppm
foo_img-005.ppm
foo_img-006.ppm

Tried with: PopplerUtils 0.24.5 and Ubuntu 14.04

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.