There are several ways to convert DjVu file to PDF. All these methods result in PDF file sizes that are quite large, in hundreds of MBs. I have found that the converting using Foxit Reader results in the smallest PDF file sizes.
To do this:
- Install the free Foxit Reader from here. This also installs a PDF Printer which we will use for conversion.
Install any DjVu viewer. I use WinDjView from here.
Open the DjVu file in WinDjView. Choose to print it. In the printer list, choose Foxit Reader PDF Printer. This printer has settings that you can modify. For the print settings, choose a page size (Letter) and remember to choose Scale to fit media. This will use the Foxit PDF Printer to write a PDF file for you.
Note that the settings of the Foxit Reader PDF Printer did not seem to have much effect on the file size. Default is 600 DPI. I did not see much reduction in filesize by using 300 or 200 DPI.
Note that the resulting PDF does not have OCR. You would need to run it through a OCR tool to get text embedded in the PDF.
Tried with: Foxit Reader 18.104.22.1684, WinDjView 2.1 and Windows 10
I have a PDF file with a single page that I want to print as a large poster (say size A1). However, I do not have an A1 printer. I only have the common A4 printer.
So, I would like to cut up the PDF page into multiple smaller pages so that I can print them out on my A4 printer and paste them together into a A1 poster.
This seems like a messy problem requiring grunt work. Thankfully, there is a simple tool that solves this problem: PDFPoster!
$ sudo apt install pdfposter
- Just specify the size of your printer and the intended size for your PDF and it does the job. For example, say I have an A4 printer but I need to print an A1 size poster:
$ pdfposter -mA4 -pA1 in.pdf out.pdf
The output PDF contains 9 pages that I can print on A4 pages, arrange as a 3×3 grid to get my A1 poster! 😄
Tried with: PDFPoster 0.6.0-1 and Ubuntu 14.04
Google Desktop was a good search tool to find documents on your local computer that contained a particular text. It was useful because it indexed all types of documents, not just text files, like MS Office, Open Office and PDF.
Since Google Desktop is discontinued, I have found that a good replacement is Recoll. It works similarly and supports many more document formats, including DjVu. It can be a good tool to search your library of documents and papers.
$ sudo apt install recoll
- When you start Recoll for the first time, you can set the directory containing your documents and the daily time to run a Cron job to index new document files. So, the first time you use Recoll, you will need to wait until it has indexed your documents for the first time.
Usage is straightforward: type in the text or phrase you are looking for and it shows the documents which have it and some excerpts from them to make it easy to pick what you want.
Recoll learns how to index a type of file by using a helper. You can see which helpers are needed for the documents in your directory, by clicking File -> Show missing helpers.
If you have MS Word documents, install Antiword:
$ sudo apt install antiword
- If you have RTF documents, install UnRTF:
$ sudo apt install unrtf
- If you have EPub documents, install the Python EPub module:
$ sudo pip install epub
Tried with: Recoll 1.17.3 and Ubuntu 14.04
Okular is not only a great viewer for documents, but it can also be used to annotate and take notes on these documents. This is typically used with PDF documents.
- If you want to work with DjVu and other formats, remember to install those backends:
$ sudo apt install okular-extra-backends
- To annotate, open the document and choose Tools -> Review. The keyboard shortcut for this is
A sidebar appears with buttons to add popup notes, freehand, highlight and other operations.
Single-click on any of these buttons, to be able to perform that operation once. To perform it again, you will need to click the button again.
Double-click on any of these buttons, to be able to perform that operation multiple times. Press
Esc to disable the operation after you are done.
To edit the color, thickness or any other property of these annotation tools, right-click anywhere in the sidebar and choose Configure Annotations.
To remove any annotation (even from a PDF file), right-click on it and choose Delete.
To move annotations such as popup notes, hold Ctrl key while you click and drag them with the mouse.
By default, the annotations are saved locally in a hidden file in the home directory.
To save the annotations along with the file, save it as a Okular document archive. To do this, choose File -> Export as -> Document Archive. This is typically saved with the file extension
Reference: Annotations documentation page about Okular
Tried with: Okular 4.13.3 and Ubuntu 14.04
PDFFonts is a useful tool to view information about the fonts in a PDF file. Typically, you need to bother with this only if you are having problems with submitting a PDF online or printing it.
This tool ships along with Poppler. To install it:
$ sudo apt install poppler-utils
Usage is straightforward. It is illustrated here with a sample PDF file:
$ pdffonts foo.pdf
name type encoding emb sub uni object ID
------------------------------------ ----------------- ---------------- --- --- --- ---------
MMFXPR+NimbusSanL-Bold Type 1 Custom yes yes no 25 0
KPMIUD+NimbusRomNo9L-ReguItal Type 1 Custom yes yes no 31 0
Arial,Bold TrueType WinAnsi no no no 77 0
Times New Roman,Bold TrueType WinAnsi no no no 78 0
ABCDEE+Calibri TrueType WinAnsi yes yes no 79 0
NHFWQZ+CMMI10 Type 1 Builtin yes yes no 160 0
COBEQK+CMMI7 Type 1 Builtin yes yes no 166 0
FOCRQR+CMR12 Type 1 Custom yes yes no 175 0
UJTTFG+CMR8 Type 1 Builtin yes yes no 266 0
DejaVuSans Type 3 Custom yes no no 283 0
SBITTL+CMMI8 Type 1 Builtin yes yes no 331 0
Cmr10 Type 3 Custom yes no no 475 0
DejaVuSans Type 3 Custom yes no no 958 0
It displays various attributes of the font:
- Name of the font
- Type of the font (Type 1, Type 3 or TrueType)
- The encoding (Builtin, WinAnsi or Custom)
- Whether the font is embedded in the PDF
- ID of the object in the PDF which uses this font
Tried with: Poppler-Utils 0.24.5 and Ubuntu 14.04
Framebuffer Ghostscript Viewer (fbgs) can be used to view PostScript (PS) and PDF files at the terminal. However, it only works with real terminals (
/dev/tty) and not with pseudo terminals (
fbgs ships along with the fbi package. So, to install it:
$ sudo apt install fbi
To be enable use of the program by any user, the username must be added to
$ sudo adduser joe video
To view a PDF file:
$ fbgs foo.pdf
Tried with: FBI 2.07-11 and Ubuntu 14.04
You may sometimes want to convert a PDF file into an image format like PNG or JPG. Doing this is easy using the convert application from ImageMagick.
If you do not have it, first install ImageMagick:
$ sudo apt install imagemagick
To convert a PDF with a single page into a JPG:
$ convert foo.pdf foo.jpg
If the PDF has multiple pages, one image file is produced per page:
$ convert foo.pdf foo.png
By default, the image file is produced at 96 DPI resolution of the PDF. If you need higher DPI, use the density option. For example, to generate at 300 DPI:
$ convert -density 300 foo.pdf foo.png
-density parameter has to come first. If you place it anywhere else, the program runs silently without complaining, but the output image will be in the default DPI.
Tried with: ImageMagick 6.7.7-10 and Ubuntu 14.04
Documents that are generated from a scanner typically are in the PDF format. Sometimes the scanned PDF given by others can be really huge, running into hundreds of MBs for just a few pages. Ghostscript is the most common program used to shrink the PDF file. There is a long invocation to Ghostscript with multiple input parameters that is required to do this correctly. Thankfully, we have easier-to-use programs that invoke Ghostscript correctly for us.
Install Ghostscript and Imagemagick if you do not have it:
$ sudo apt install ghostscript imagemagick
Imagemagick’s convert utility can be used to compress the PDF. By default, it tries to use a huge DPI which makes Ghostscript occupy all the RAM and brings the computer to its knees. Instead instruct convert what DPI to use. Start from a small DPI and work your way up until you are satisfied with the output quality.
$ convert -density 20 original.pdf out.pdf
If you are curious about the Ghostscript invocation it is using to perform the conversion, ask it to be verbose:
$ convert -density 20 -verbose original.pdf out.pdf
Note: Another way to shrink PDF is to convert to PS (pdf2ps) and then back to PDF (ps2pdf). Note that this does double the work and creates a ginormous intermediate PS file. I would not recommend this since it takes longer time, creates huge intermediate files and you get no control over the DPI or compression ratio.
Tried with: Ubuntu 14.04
PDFOCR is a Ruby script that can be used to add OCR text to a scanned PDF file.
- Install the OCR engines that it depends on:
$ sudo apt install tesseract-ocr tesseract-ocr-eng exactimage
$ git clone https://github.com/gkovacs/pdfocr
- Use it to add OCR to a scanned PDF:
$ pdfocr.rb -i foo.pdf -o out.pdf
Tried with: Ubuntu 14.04