Code Yarns ‍👨‍💻
Tech BlogPersonal Blog

How to extract images from PDF

📅 2015-Apr-07 ⬩ ✍️ Ashwin Nanjappa ⬩ 🏷️ image, pdfimages, poppler-utils ⬩ 📚 Archive

PDF files can have images embedded in them. Since PDF is the defacto format used for research papers, it can be sometimes necessary to extract these images. Images in a PDF file can be extracted easily using the pdfimages tool from the poppler-utils package.

$ sudo apt install poppler-utils
$ pdfimages -list foo.pdf
page   num  type   width height color comp bpc  enc interp  object ID
---------------------------------------------------------------------
   1     0 image      80   100  icc     3   8  image  yes       11  0
   1     1 smask      80   100  gray    1   8  image  yes       11  0
   1     2 image      80   100  icc     3   8  image  yes       13  0
   1     3 smask      80   100  gray    1   8  image  yes       13  0
   1     4 image      80   100  icc     3   8  image  yes       15  0
   1     5 smask      80   100  gray    1   8  image  yes       15  0
   1     6 image      80   100  icc     3   8  image  yes       17  0
$ pdfimages foo.pdf foo_img

$ ls foo_img*
foo_img-000.ppm
foo_img-001.ppm
foo_img-002.ppm
foo_img-003.ppm
foo_img-004.ppm
foo_img-005.ppm
foo_img-006.ppm

Tried with: PopplerUtils 0.24.5 and Ubuntu 14.04


© 2022 Ashwin Nanjappa • All writing under CC BY-SA license • 🐘 @codeyarns@hachyderm.io📧