PDF files created by scanning pages or by printing or exporting from other file formats may not have a text layer. Without a text layer, every page is just an image on which you will not be able to search or highlight. The OCRMyPDF tool can be used to add an OCR text layer to any PDF easily.
Installing is easy:
$ sudo apt install ocrmypdf
Usage is straightforward:
$ ocrmypdf in.pdf out.pdf
I noticed that adding an OCR text layer increased the PDF file size by 1.5x! The tool also mentions at the end that this file size increase is surprising.
A DJVU document typically contains both a layer of scanned image and a layer of the text in that image. Sometimes, a DJVU document is produced which does not have the text layer. This makes it hard to search and find text in the document.
Recognising the text in the DJVU document using OCR and adding that as a text layer to that document is easy:
You have scanned in a paper or a section of a book and converted it to a PDF. What next? The next best thing to do would be to run OCR on the PDF.
I use Adobe Acrobat for this. The converted PDF document right now is only acting as a container for the scanned bitmap images. By running OCR on it, Acrobat can recognize text in the image and embed it along with the image. This way you can mark and copy text in the PDF. And also be able to search for text in the document.
To do OCR:
Choose Document → OCR Text Recognition → Recognize text using OCR …
In the Recognize Text dialog that pops up the default options should be fine. Click OK.
The OCR then runs on each page serially and may take some time on long documents. You may also notice that the scanned images in the document get straightened a bit and may also get downsampled. After the OCR is complete, save the PDF.