OCRmyPDF is a free open source OCR (Optical Character Recognition) application for Linux. It is released under the GNU General Public License v3.0 and is written in Python. You can add OCR text layers to your scanned PDF files to search PDF text or copy and paste text. OCRmyPDF allows you to convert scanned PDFs into text searchable PDFs. Some of its features keep the original image at the correct resolution at the output and validate the input and output pdf files. Recognize the PDF language using the Tesseract OCR engine. Supports over 100 languages.
Install OCRmyPDF on Ubuntu
You can install OCRmyPDF on Ubuntu with the following command: Open a terminal application (ctrl + alt + t) and run this command.
sudo apt update
sudo apt install ocrmypdf
If required, enter the Ubuntu user password.
Install additional language packs.
Run this command in a terminal to see a list of all available tesseract language packs.
sudo apt-cache search tesseract-ocr
If you want to install the Tamil language pack, run this command from the list.
sudo apt install tesseract-ocr-tam
Convert scanned PDF to text-searchable PDF:
ocrmypdf input.pdf output.pdf
Replace input.pdf with the scanned filename and output.pdf with the new filename.
Recommended Recommendation: How to install Motrix Download Manager on Ubuntu
An example: First move to the scanned pdf folder. The scanned file is download folder. Then at the terminal.
ocrmypdf scanned.pdf newgeneratedfilename.pdf
here”scand.pdf“Is the name of my pdf file in the download folder. After running this command. New”newgeneratedfilename.pdfFiles in the same download folder. When you open a new file it will be searchable and you can also copy and paste the text.
Run this command in a terminal for usage details.
It’s a sequel.