How to Convert Scanned PDF to Text Searchable PDF on Ubuntu – Install OCRmyPDF on Ubuntu

OCRmyPDF is a free open source OCR (Optical Character Recognition) application for Linux. It is released under the GNU General Public License v3.0 and is written in Python. You can add OCR text layers to your scanned PDF files to search PDF text or copy and paste text. OCRmyPDF allows you to convert scanned PDFs into text searchable PDFs. Some of its features keep the original image at the correct resolution at the output and validate the input and output pdf files. Recognize the PDF language using the Tesseract OCR engine. Supports over 100 languages.

Install OCRmyPDF on Ubuntu

You can install OCRmyPDF on Ubuntu with the following command: Open a terminal application (ctrl + alt + t) and run this command.

sudo apt update
sudo apt install ocrmypdf

If required, enter the Ubuntu user password.

Install additional language packs.

Run this command in a terminal to see a list of all available tesseract language packs.

sudo apt-cache search tesseract-ocr

If you want to install the Tamil language pack, run this command from the list.

sudo apt install tesseract-ocr-tam

Convert scanned PDF to text-searchable PDF:


ocrmypdf input.pdf output.pdf

Replace input.pdf with the scanned filename and output.pdf with the new filename.

Recommended Recommendation: How to install Motrix Download Manager on Ubuntu

An example: First move to the scanned pdf folder. The scanned file is download folder. Then at the terminal.

cd Downloads

Then run

ocrmypdf scanned.pdf newgeneratedfilename.pdf

here”scand.pdf“Is the name of my pdf file in the download folder. After running this command. New”newgeneratedfilename.pdfFiles in the same download folder. When you open a new file it will be searchable and you can also copy and paste the text.

Using OCRmyPDF:

Run this command in a terminal for usage details.

ocrmypdf -h

If you like this article, please subscribe to us Youtube channel.. You can also stay connected with us twitter And Facebook..

It’s a sequel.