📅 2009-Dec-29 ⬩ ✍️ Ashwin Nanjappa ⬩ 🏷️ adobe acrobat, ocr ⬩ 📚 Archive
You have scanned in a paper or a section of a book and converted it to a PDF. What next? The next best thing to do would be to run OCR on the PDF.
I use Adobe Acrobat for this. The converted PDF document right now is only acting as a container for the scanned bitmap images. By running OCR on it, Acrobat can recognize text in the image and embed it along with the image. This way you can mark and copy text in the PDF. And also be able to search for text in the document.
To do OCR:
Choose Document → OCR Text Recognition → Recognize text using OCR …
In the Recognize Text dialog that pops up the default options should be fine. Click OK.
The OCR then runs on each page serially and may take some time on long documents. You may also notice that the scanned images in the document get straightened a bit and may also get downsampled. After the OCR is complete, save the PDF.