Save web pages as a single HTML file for Monolith offline use (console)

Integral Is a command line tool for saving any web page as a single HTML file that contains everything needed to render a web page locally without an effective Internet connection. Use it to save web pages containing documents, wiki articles, and anything else that interests you for local / offline use. Because web pages are saved in plain HTML format, use a tool that can search in files to quickly find the web page you are looking for.
And the regular “Save Page As” (or Ctrl + s) An option provided by a web browser to save web pages to your computer, this option saves web page assets in a folder next to the saved web page. This command line tool can retrieve web page assets and convert them using the following command The base64 data URL. Documents instead of regular URLs. Therefore, page assets (such as Javascript, CSS, or images) are already embedded in the page HTML, so you only need a web browser to access locally saved web pages. The tool also provides 2 useful options: -i Delete images from saved web pages, and -j Exclude JavaScript. Monolith originally used Node.js, but recently (about 11 hours before this article was published) was rewritten in Rust. Currently, it works for basic pages, but there are still some things to deal with. For example, embedding CSS imports and web fonts does not seem to be supported at this time, although developers seem to be planning to achieve this soon. In Monolith 2.1.0, CSS imports and web fonts are supported, so these elements will be embedded in the saved HTML file.

Saving web pages that require authentication does not currently work. Also, saving embedded videos doesn’t work, but it’s not entirely feasible anyway, because embedding videos as data URLs can result in very large HTML files, which can be painful if you are editing HTML files.
It is also worth noting that Monolith saves the content on the webpage when it is loaded, so it is not suitable for websites that implement infinite scrolling, especially since it usually differs depending on how the website is implemented (in my tests, only the initial website Case, the article has been saved). Web pages that use lazy loading also seem to not handle well.
The idea of ​​saving any web page as a single file with all assets embedded is nothing new and there are many alternatives. For example, with the Safari web browser, you can save a single web page for offline viewing by storing all elements of the page in a web archive (.webarchive file extension). There is also MHTML, a web archive format that similarly saves web pages in a file.
But these have some limitations, such as requiring a specific browser or third-party client to save the view. For example, you can only save and view .webarchive files using the Safari web browser and certain third-party solutions. As for MHTML, Firefox no longer supports it, and Google Chrome recently removed customizations #save-page-as-mhtml Tag, which previously allowed it to save the webpage as MHTML (there may be some extensions that restore the functionality, but I didn’t check it).
Since Monolith saves web pages as regular HTML files, you can view them with any web browser. This means that you don’t need to rely on any third-party solutions, nor do you need a web browser to continue to support specific web archive formats, so you can verify locally saved web pages in the future. You might also like: Use WebArchives to browse Wikipedia offline for Linux

Installation and use of Monolith Linux

To install Monolith, we will use Rust’s build system Cargo and Package Manager. You also need to install OpenSSL (development) to build Monolith. Install them on Linux using the following command:

sudo apt install cargo libssl-dev
  • Fedora:
sudo dnf install rust-cargo openssl-devel
  • Arch Linux, Manjaro:
sudo pacman -S rust openssl
  • openSUSE:
sudo zypper install cargo libopenssl-devel
  • Solus OS:
sudo eopkg install cargo openssl-devel

Now you can get Holistic source Install it from Git:

git clone https://github.com/Y2Z/monolith
cd monolith
cargo install

Monolith binaries are installed at ~/.cargo/binThis is not yours $PATH by default. You can do this by adding it to your PATH (so you can enter “monolith” without using its full path) export PATH="$PATH:$HOME/.cargo/bin Give you ~/.bashrc Either ~/.zsh Files (depending on the file you use). You can do this and get the source code ~/.bashrc / ~/.zsh use:

  • For Bash:
echo "export PATH="$PATH:$HOME/.cargo/bin"" >> ~/.bashrc
. ~/.bashrc
  • For Zsh:
echo "export PATH="$PATH:$HOME/.cargo/bin"" >> ~/.zshrc
. ~/.zshrc

Make sure to run the “echo” command only once as it will add export PATH="$PATH:$HOME/.cargo/bin to ~/.bashrc / ~/.zsh Every time.
Now you can start using Monolith to save web pages and embed their resources in a single HTML file. For example, let’s save the Monolith GitHub page (https://github.com/Y2Z/monolith) Local file monolith.html:

monolith https://github.com/Y2Z/monolith > monolith.html

Do you want to remove JavaScript from the page? Attach -j, like this:

monolith -j https://github.com/Y2Z/monolith > monolith.html

Similarly, using -i Delete images from saved web pages.

Source

Sidebar