rga: search text in PDF, eBooks, Office documents, archives, etc. (ripgrep packaging)

ga (Or ripgrep-all) is a command line tool for recursively searching for regular expression patterns in all files in a directory, which can be run on Linux, macOS, and Windows. This is a package ripgrep This is a line-oriented recursive search program. On top of this program, it can search multiple file types, such as PDF, DOCX, ODT, EPUB, SQLite database, movie subtitles embedded in MKV or MP4 files, ZIP Or GZ and other files and more.

When you are searching for some text from files in a folder that contains many files of many file types, rga is great even if some of them are available in the archive.
And because it uses multithreading, it’s fast even from the first run. However, on subsequent runs, it is even faster (like searching in a plain text file) thanks to the cache. If needed, you can disable caching in the following ways: --rga-no-cache .
rga uses ripgrep (rg) for searching and sets some options. For some file types, use external programs to do the actual work, for example, use ffmpeg to read subtitles from mkv or mp4 files, and use pandoc to convert documents like EPUB, ODT, DOCX, FB2, or IPYNB to pure Markdown format. Text, and rip and tar can read the contents of the archive. In addition to searching for text in documents, archives and subtitles embedded in mkv or mp4 files, rga can also use OCR (using tesseract). However, this feature is disabled by default because it is slow and not useful most of the time, but you can enable it with the following command --rga-adapters=+pdfpages,tesseract Search-related content: Drill through: New desktop file search utility that uses clever crawling instead of indexing
This is a list of rga (ripgrep-all) adapters and supported file types:

  • ffmpeg:
    • Extract video metadata / chapters and subtitles using ffmpeg
    • Extensions: .mkv, .mp4, .avi
  • Pandock:
    • Use pandoc to convert binary / unreadable text documents to markdown-like plain text
    • Extensions: .epub, .odt, .docx, .fb2, .ipynb
  • poppler:
    • Use pdftotext (from poppler-utils) to extract plain text from a PDF file
    • Extension: .pdf
  • compression:
    • Read a zip file as a stream and then recurse to its contents
    • Extension: .zip
    • MIME type: application / zip
  • unzip:
    • Read the compressed file as a stream and then run different extractors on the content.
    • Extensions: .tgz, .tbz, .tbz2, .gz, .bz2, .xz, .zst
    • MIME types: application / gzip, application / x-bzip, application / x-xz, application / zstd
  • Asphalt:
    • Read a tar file as a stream and then recursively into its contents
    • Extension: .tar
  • sqlite:
    • Convert sqlite database to simple plain text format using sqlite binding
    • Extensions: .db, .db3, .sqlite, .sqlite3
    • MIME type: application / x-sqlite3
  • pdfpages (disabled by default):
    • Convert a pdf to a single page of a png file. Used only in conjunction with tesseract
    • Extension: .pdf
  • tesseract (disabled by default):
    • Use tesseract to run OCR on the image to make it searchable. might need -j1 To prevent system overload. Make sure tesseract is installed.
    • Extension: .jpg, .png

Download rga (ripgrep-all)

The rga GitHub project page has Instructions for installing the tool on Linux, Windows or macOS .
Remember to install the dependencies used by the rga adapter in order to be able to search all file types it supports (and ripgrep itself): ripgrep, pandoc, poppler (poppler-utils package on Debian / Ubuntu ; the name depends on the Linux distribution you use ) (In use), ffmpeg and cargo.
You can install the rga binary by: downloading Linux x86_64 .tar.gz archive, unzip it, then install the rga and rga-preproc binaries to /usr/local/bin Use (run the command in the folder where these two binaries were extracted):

                        sudo install rga rga-preproc /usr/local/bin/

After installation, use it by typing rga, then entering a search query and finding the location of the folder. E.g:

                        rga "text to find" ~/Documents

Check out available at the same time RGA logo , And its help information ( rga --help ).


Related Posts