How to convert DjVu to PDF using Foxit Reader

There are several ways to convert DjVu file to PDF. All these methods result in PDF file sizes that are quite large, in hundreds of MBs. I have found that the converting using Foxit Reader results in the smallest PDF file sizes.

To do this:

  • Install the free Foxit Reader from here. This also installs a PDF Printer which we will use for conversion.

  • Install any DjVu viewer. I use WinDjView from here.

  • Open the DjVu file in WinDjView. Choose to print it. In the printer list, choose Foxit Reader PDF Printer. This printer has settings that you can modify. For the print settings, choose a page size (Letter) and remember to choose Scale to fit media. This will use the Foxit PDF Printer to write a PDF file for you.

  • Note that the settings of the Foxit Reader PDF Printer did not seem to have much effect on the file size. Default is 600 DPI. I did not see much reduction in filesize by using 300 or 200 DPI.

  • Note that the resulting PDF does not have OCR. You would need to run it through a OCR tool to get text embedded in the PDF.

Tried with: Foxit Reader, WinDjView 2.1 and Windows 10

How to tile PDF using PDFPoster


I have a PDF file with a single page that I want to print as a large poster (say size A1). However, I do not have an A1 printer. I only have the common A4 printer.

So, I would like to cut up the PDF page into multiple smaller pages so that I can print them out on my A4 printer and paste them together into a A1 poster.


This seems like a messy problem requiring grunt work. Thankfully, there is a simple tool that solves this problem: PDFPoster!

  • Installing it is easy:
$ sudo apt install pdfposter
  • Just specify the size of your printer and the intended size for your PDF and it does the job. For example, say I have an A4 printer but I need to print an A1 size poster:
$ pdfposter -mA4 -pA1 in.pdf out.pdf

The output PDF contains 9 pages that I can print on A4 pages, arrange as a 3×3 grid to get my A1 poster! 😄

Tried with: PDFPoster 0.6.0-1 and Ubuntu 14.04

How to search your documents using Recoll

Google Desktop was a good search tool to find documents on your local computer that contained a particular text. It was useful because it indexed all types of documents, not just text files, like MS Office, Open Office and PDF.

Since Google Desktop is discontinued, I have found that a good replacement is Recoll. It works similarly and supports many more document formats, including DjVu. It can be a good tool to search your library of documents and papers.

  • Installation is easy:
$ sudo apt install recoll
  • When you start Recoll for the first time, you can set the directory containing your documents and the daily time to run a Cron job to index new document files. So, the first time you use Recoll, you will need to wait until it has indexed your documents for the first time.

  • Usage is straightforward: type in the text or phrase you are looking for and it shows the documents which have it and some excerpts from them to make it easy to pick what you want.

  • Recoll learns how to index a type of file by using a helper. You can see which helpers are needed for the documents in your directory, by clicking File -> Show missing helpers.

  • If you have MS Word documents, install Antiword:

$ sudo apt install antiword
  • If you have RTF documents, install UnRTF:
$ sudo apt install unrtf
  • If you have EPub documents, install the Python EPub module:
$ sudo pip install epub

Tried with: Recoll 1.17.3 and Ubuntu 14.04

How to annotate documents using Okular

Okular is not only a great viewer for documents, but it can also be used to annotate and take notes on these documents. This is typically used with PDF documents.

  • If you want to work with DjVu and other formats, remember to install those backends:
$ sudo apt install okular-extra-backends
  • To annotate, open the document and choose Tools -> Review. The keyboard shortcut for this is F6.

  • A sidebar appears with buttons to add popup notes, freehand, highlight and other operations.

  • Single-click on any of these buttons, to be able to perform that operation once. To perform it again, you will need to click the button again.

  • Double-click on any of these buttons, to be able to perform that operation multiple times. Press Esc to disable the operation after you are done.

  • To edit the color, thickness or any other property of these annotation tools, right-click anywhere in the sidebar and choose Configure Annotations.

  • To remove any annotation (even from a PDF file), right-click on it and choose Delete.

  • To move annotations such as popup notes, hold Ctrl key while you click and drag them with the mouse.

  • By default, the annotations are saved locally in a hidden file in the home directory.

  • To save the annotations along with the file, save it as a Okular document archive. To do this, choose File -> Export as -> Document Archive. This is typically saved with the file extension .okular.

Reference: Annotations documentation page about Okular

Tried with: Okular 4.13.3 and Ubuntu 14.04


PDFFonts is a useful tool to view information about the fonts in a PDF file. Typically, you need to bother with this only if you are having problems with submitting a PDF online or printing it.

This tool ships along with Poppler. To install it:

$ sudo apt install poppler-utils

Usage is straightforward. It is illustrated here with a sample PDF file:

$ pdffonts foo.pdf 
name                                 type              encoding         emb sub uni object ID
------------------------------------ ----------------- ---------------- --- --- --- ---------
MMFXPR+NimbusSanL-Bold               Type 1            Custom           yes yes no      25  0
KPMIUD+NimbusRomNo9L-ReguItal        Type 1            Custom           yes yes no      31  0
Arial,Bold                           TrueType          WinAnsi          no  no  no      77  0
Times New Roman,Bold                 TrueType          WinAnsi          no  no  no      78  0
ABCDEE+Calibri                       TrueType          WinAnsi          yes yes no      79  0
NHFWQZ+CMMI10                        Type 1            Builtin          yes yes no     160  0
COBEQK+CMMI7                         Type 1            Builtin          yes yes no     166  0
FOCRQR+CMR12                         Type 1            Custom           yes yes no     175  0
UJTTFG+CMR8                          Type 1            Builtin          yes yes no     266  0
DejaVuSans                           Type 3            Custom           yes no  no     283  0
SBITTL+CMMI8                         Type 1            Builtin          yes yes no     331  0
Cmr10                                Type 3            Custom           yes no  no     475  0
DejaVuSans                           Type 3            Custom           yes no  no     958  0

It displays various attributes of the font:

  • Name of the font
  • Type of the font (Type 1, Type 3 or TrueType)
  • The encoding (Builtin, WinAnsi or Custom)
  • Whether the font is embedded in the PDF
  • ID of the object in the PDF which uses this font

Tried with: Poppler-Utils 0.24.5 and Ubuntu 14.04

How to view PDF in terminal using FBGS

Framebuffer Ghostscript Viewer (fbgs) can be used to view PostScript (PS) and PDF files at the terminal. However, it only works with real terminals (/dev/tty) and not with pseudo terminals (/dev/pts).

fbgs ships along with the fbi package. So, to install it:

$ sudo apt install fbi

To be enable use of the program by any user, the username must be added to video group:

$ sudo adduser joe video

To view a PDF file:

$ fbgs foo.pdf

Tried with: FBI 2.07-11 and Ubuntu 14.04

How to convert PDF to image

You may sometimes want to convert a PDF file into an image format like PNG or JPG. Doing this is easy using the convert application from ImageMagick.

If you do not have it, first install ImageMagick:

$ sudo apt install imagemagick

To convert a PDF with a single page into a JPG:

$ convert foo.pdf foo.jpg
$ ls

If the PDF has multiple pages, one image file is produced per page:

$ convert foo.pdf foo.png
$ ls

By default, the image file is produced at 96 DPI resolution of the PDF. If you need higher DPI, use the density option. For example, to generate at 300 DPI:

$ convert -density 300 foo.pdf foo.png

Note: The -density parameter has to come first. If you place it anywhere else, the program runs silently without complaining, but the output image will be in the default DPI.

Tried with: ImageMagick 6.7.7-10 and Ubuntu 14.04

How to shrink a scanned PDF

Documents that are generated from a scanner typically are in the PDF format. Sometimes the scanned PDF given by others can be really huge, running into hundreds of MBs for just a few pages. Ghostscript is the most common program used to shrink the PDF file. There is a long invocation to Ghostscript with multiple input parameters that is required to do this correctly. Thankfully, we have easier-to-use programs that invoke Ghostscript correctly for us.

Install Ghostscript and Imagemagick if you do not have it:

$ sudo apt install ghostscript imagemagick

Imagemagick’s convert utility can be used to compress the PDF. By default, it tries to use a huge DPI which makes Ghostscript occupy all the RAM and brings the computer to its knees. Instead instruct convert what DPI to use. Start from a small DPI and work your way up until you are satisfied with the output quality.

For example:

$ convert -density 20 original.pdf out.pdf

If you are curious about the Ghostscript invocation it is using to perform the conversion, ask it to be verbose:

$ convert -density 20 -verbose original.pdf out.pdf

Note: Another way to shrink PDF is to convert to PS (pdf2ps) and then back to PDF (ps2pdf). Note that this does double the work and creates a ginormous intermediate PS file. I would not recommend this since it takes longer time, creates huge intermediate files and you get no control over the DPI or compression ratio.

Tried with: Ubuntu 14.04

How to add OCR to PDF using PDFOCR

PDFOCR is a Ruby script that can be used to add OCR text to a scanned PDF file.

  • Install the OCR engines that it depends on:
$ sudo apt install tesseract-ocr tesseract-ocr-eng exactimage
  • Get the PDFOCR script:
$ git clone
  • Use it to add OCR to a scanned PDF:
$ pdfocr.rb -i foo.pdf -o out.pdf

Tried with: Ubuntu 14.04