Linux Cat Team. Basic and advanced examples

Continuing our tour of Linux commands, today we will look at the command cat

Name cat stands for catenate, the main purpose of this command is to concatenate multiple input files by sequentially sending their contents to standard output:

# Давайте сначала получим некоторые примеры файлов данных:
curl -so - dict://'d:andreyex:gcide' | unexpand -a -t 3 |
  sed -Ee '/^151/,/^[.]/!d;/^[.0-9]/s/.*//' > andreyex.txt
curl -so - dict://'d:ubuntu:gcide' | unexpand -a -t 3 |
  sed -Ee '/^151/,/^[.]/!d;/^[.0-9]/s/.*//' > ubuntu.txt

# Объединение файлов
cat andreyex.txt ubuntu.txt

If you want to store the result of this concatenation in a file, you need to use shell redirection:

cat andreyex.txt ubuntu.txt > result.txt
cat result.txt

Even if the main purpose of the project is to combine files, the utility cat it is also often used with only one argument to display the contents of this file on the screen, just like we did in the last line of the example above.

A. Using the cat command with standard input

When used without any arguments, the command cat will read data from its stdin and write it to its stdout, which is mostly useless … unless you use some parameter to convert the data. We’ll talk about a few interesting options later.

In addition to the file path, the command cat also understands - special filename as an alias for standard input. Thus, you can insert data read from standard input between files specified on the command line:

# Вставить разделитель между двумя соединенными файлами
echo '----' | cat ubuntu.txt - andreyex.txt

B. Using cat with binaries

1. Combining split files

Team cat makes no assumptions about the contents of the file, so it will work with binary data. Something that might be useful for merging files split with commands split or csplit… Or attach to partial downloads, as we will do now:

# Изображение AndreyEx (CC-SA 3.0)
# Оптимизировать использование полосы пропускания, разбив загрузку на две части
# (в нашей системе мы наблюдаем 10%-ный прирост по сравнению с "полной" загрузкой)
curl -s -r 0-50000 
    -o first-half &
curl -s -r 50001- 
    -o second-half &

We now have two halves of the image. You can open the first half and see that it is “broken” with ImageMagick, display, gimp, or any other software capable of reading image files:

display first-half
# -или-
gimp first-half
# -или-
firefox first-half

If you study the curl command we used, you can see that the two parts complement each other beautifully. The first half is from 0 bytes to 50,000, and the second half is from 50001 bytes to the end of the file. There should be no missing data in between. Therefore, we only need to link the two parts together (in the correct order) to get the complete file back:

cat first-half second-half > image.jpg
display image.jpg

2. Working with streaming file formats

You can not only use the command cat to “reunite” binaries that have been split into multiple parts, in some cases you can also create new files this way. This works especially well with “no header” or “streaming” file formats such as video files (files .TS) MPEG transport stream:

# Давайте сделаем еще видео файл с нашего изображения
ffmpeg -y -loop 1 -i cat.jpg -t 3  
    -c:v libx264 -vf scale=w=800:h=-1 

# Давайте сделаем fade-in из того же изображения
ffmpeg -y -loop 1 -i cat.jpg -t 3  
    -c:v libx264 -vf scale=w=800:h=-1,fade=in:0:75 

# Давайте сделаем затухание из того же изображения
ffmpeg -y -loop 1 -i cat.jpg -t 3  
    -c:v libx264 -vf scale=w=800:h=-1,fade=out:0:75 

We can now combine all these video streams with a data stream using the command cat, having received an absolutely correct TS-file in the output file:

cat fadein.ts still.ts fadeout.ts > video.ts
mplayer video.ts

Thanks to the TS file format, you can combine these files in the order you want, and you can even use the same file multiple times in the argument list to create loops or repeat in the output video. Obviously it would be fun if we used animated images, but can you do it yourself: many consumer devices record TS files, and if not, you can still use ffmpeg to convert almost any video file to a transport stream file. Feel free to share your creations using the comments section!

3. Cracking cpio archives

As a final example, let’s see how we can use the command cat to combine multiple archives cpio… But this time it will not be so easy as it will require a little knowledge of the archive file format cpio

In the archive cpio stores metadata and file content sequentially, making it suitable for file concatenation using a utility cat… Unfortunately in the archive cpio there is also a container used to mark the end of an archive:

# Создать два подлинных CPIO` bin ' архива:
$ find ubuntu.txt andreyex.txt | cpio -o > part1.cpio
2 blocks
$ echo cat.jpg | cpio -o > part2.cpio
238 blocks

$ hexdump -C part1.cpio | tail -7
000002d0  2e 0d 0a 09 09 20 20 5b  57 6f 72 64 4e 65 74 20  |.....  [WordNet |
000002e0  31 2e 35 5d 0d 0a 0a 00  c7 71 00 00 00 00 00 00  |1.5].....q......|
000002f0  00 00 00 00 01 00 00 00  00 00 00 00 0b 00 00 00  |................|
00000300  00 00 54 52 41 49 4c 45  52 21 21 21 00 00 00 00  |..TRAILER!!!....|
00000310  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
$ hexdump -C part2.cpio | tail -7
0001da40  46 96 ab f8 ad 11 23 90  32 79 ac 1f 8f ff d9 00  |F.....#.2y......|
0001da50  c7 71 00 00 00 00 00 00  00 00 00 00 01 00 00 00  |.q..............|
0001da60  00 00 00 00 0b 00 00 00  00 00 54 52 41 49 4c 45  |..........TRAILE|
0001da70  52 21 21 21 00 00 00 00  00 00 00 00 00 00 00 00  |R!!!............|
0001da80  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|

The good news is that the cpio binary is a 280 byte fixed length container. So, using the standard head command, we have an easy way to remove it:

# Каждый архив заканчивается 280-байтовым контейнером.
# Для catenate оба архива, просто убрать контейнер
# в конце первой части:
$ head -c-280 part1.cpio | cat - part2.cpio > cat.cpio
$ cpio -it < cat.cpio
239 blocks

C. Basic options for the cat command

After you’ve played around with the different binary file formats, let’s go back to plain old text files by examining a couple of options specifically designed to work with those files. Although they are not part of the POSIX standard, these options are portable across BSD and GNU versions for cat… Please note: we do not pretend that we will deal with all the points here, so check the page manto see a complete list of options supported by your system command cat!

-n: numeric strings

With option n team cat prefix each output line by line number:

cat -n andreyex.txt
     2    andreyex andreyex n.
     3       a natural family of lithe-bodied round-headed fissiped
     4       mammals, including the cats; wildcats; lions; leopards;
     5       cheetahs; and saber-toothed tigers.
     7       Syn: family {andreyex}.
     8            [WordNet 1.5]

In parameter -n the number of output lines. This means that the counter is not reset when moving from one input file to another, as you will see if you tried the following command yourself:

cat -n feli*.txt

-s: suppress duplicate blank output lines

With option -s team cat flushes several consecutive blank lines:

 cat -n ubuntu.txt andreyex.txt | sed -n 8,13p
     8       lynx ({ubuntu lynx}) is also called {Lynx lynx}.
     9       [1913 Webster +PJC]
    12    andreyex andreyex n.
    13       a natural family of lithe-bodied round-headed fissiped
[email protected]:~$ cat -ns ubuntu.txt andreyex.txt | sed -n 8,13p
     8       lynx ({ubuntu lynx}) is also called {Lynx lynx}.
     9       [1913 Webster +PJC]
    11    andreyex andreyex n.
    12       a natural family of lithe-bodied round-headed fissiped
    13       mammals, including the cats; wildcats; lions; leopards;

In the example above, you can see that in the default output, lines 10 and 11 were blank. When adding an option -s the second empty line was discarded.

-b: number only non-empty strings

Somewhat related to the two previous options, option -b will contain lines, ignoring empty ones:

$ cat -b andreyex.txt | cat -n
     2         1    andreyex andreyex n.
     3         2        a natural family of lithe-bodied round-headed fissiped
     4         3        mammals, including the cats; wildcats; lions; leopards;
     5         4        cheetahs; and saber-toothed tigers.
     6         5
     7         6        Syn: family {andreyex}.
     8         7              [WordNet 1.5]

The above example uses two instances of the command cat with different parameters in the container. Internal numbering is performed using the option -bused with the first command cat… External numbering comes with the option -nused with the second command cat

As you can see, the first and last lines were not numbered with the option -bbecause they are empty. But what about the 6th line? Why is it still numbered with option -b? Well, because it’s a whitespace-filled string, but not empty, as we’ll see in the next section.

-v,,-e, -t: Display non-printable characters

Three options,-v, -e `, and `-t which are used to display various sets of invisible symbols. Even if the sets overlap, there is no catch-all option, so you have to combine them if you want to display all invisible characters.

-v: view invisible characters

Option -v displays all non-printable carriage and meta – notation characters, except for line and tabs.

With this option, control characters will be displayed as caret ( ^) followed by the appropriate ASCII character (e.g. carriage return, 13 bytes displayed like this ^M as M in ASCII – 64 + 13), and characters with a set of high-order bits will appear in the “meta” description M- followed by a representation corresponding to the 7 least significant bits (for example, byte 141 would be displayed as M-^M as 141 – 128 + 13).

While seemingly esoteric, this feature can be useful when working with binaries, for example if you want to examine the raw information embedded in a JPEG file:

$ cat -v cat.jpg | fold -75 | head -10
M-^?M-XM-^?M-`^@^PJFIF^@^A^A^A^@H^@H^@^@M-^?M-~^@QFile source: http://commo^?M-b^LXICC_PROFILE
^@^A^A^@^@^LHLino^B^P^@^@mntrRGB XYZ ^GM-N^@^B^@    ^@^F^@1^@^@acspMSFT
^@^@^@^@IEC sRGB^@^@^@^@^@^@^@^@^@^@^@^@^@^@M-vM-V^@^A^@^@^@^@M-S-HP  ^@^@^
Tbkpt^@^@^B^D^@^@^@^TrXYZ^@^@^B^X^@^@^@^TgXYZ^@^@^B,^@^@^@^TbXYZ^@^@^[email protected]^@^@
@^@^D<^@^@^H^LgTRC^@^@^D<^@^@^H^LbTRC^@^@^D<^@^@^H^Ltext^@^@^@^@Copyright (

Another use case for this parameter -v is to search for control characters that might leak into a text file. If you remember this, we have this strange problem above with the option,-bdenoting the 6th line of input, when it looked empty. So let’s investigate the following:

$ cat -v andreyex.txt
andreyex andreyex n.^M
    a natural family of lithe-bodied round-headed fissiped^M
    mammals, including the cats; wildcats; lions; leopards;^M
    cheetahs; and saber-toothed tigers.^M
    Syn: family {andreyex}.^M
          [WordNet 1.5]^M

Ah ah! Do you see these signs ^M? They are used to replace the backward invisible carriage character. Where is it from? Well protocol,dictlike any other Internet protocol, it uses CRLF as a line terminator. Therefore, we have downloaded them as part of our sample files. But for now it explains why cat thinks the 6th line is not empty.

-e: display invisible characters including end of line

Option -e works with option -v, only will add a dollar sign ($) before each newline character, thus clearly showing the end of the line:

$ cat -e andreyex.txt
andreyex andreyex n.^M$
    a natural family of lithe-bodied round-headed fissiped^M$
    mammals, including the cats; wildcats; lions; leopards;^M$
    cheetahs; and saber-toothed tigers.^M$
    Syn: family {andreyex}.^M$
          [WordNet 1.5]^M$

-t: display invisible characters, including tabs

Option -t works as an option -v, except that will also display tabular data with a cursor ^I (the bookmark is stored as a hold byte value 9, and I in ASCII is 64 + 9 = 73):

$ cat -t andreyex.txt

andreyex andreyex n.^M
^Ia natural family of lithe-bodied round-headed fissiped^M
^Imammals, including the cats; wildcats; lions; leopards;^M
^Icheetahs; and saber-toothed tigers.^M
^ISyn: family {andreyex}.^M
^I^I  [WordNet 1.5]^M

-et: show all hidden characters

As we said briefly, if you want to display all non-printable characters, including tables and end-of-line markers, you will need to use the options -e and -t:

$ cat -et andreyex.txt
andreyex andreyex n.^M$
^Ia natural family of lithe-bodied round-headed fissiped^M$
^Imammals, including the cats; wildcats; lions; leopards;^M$
^Icheetahs; and saber-toothed tigers.^M$
^ISyn: family {andreyex}.^M$
^I^I  [WordNet 1.5]^M$

Bonus: useless Linux cat command

No team article cat would not be complete without mentioning the use of the “Useless cat” anti-pattern.

This happens when you use cat, for the sole purpose of sending the contents of a file to the standard input of another command. This is the use of the command cat called “useless” because a simple redirection or filename parameter would do the job and would do it better. But an example is worth a thousand words:

$ curl -so - dict://'d:uuoc:jargon' |    sed -Ee '/^151/,/^[.]/!d;/^[.0-9]/s/.*//'  > uuoc.txt
$ cat uuoc.txt | less


    [from the group on Usenet] Stands for Useless Use of {cat};
    the reference is to the Unix command cat(1), not the feline animal. As
    received wisdom on observes, ?The purpose of cat is to
    concatenate (or ?catenate?) files. If it's only one file, concatenating it
    with nothing at all is a waste of time, and costs you a process.?
    Nevertheless one sees people doing

    cat file | some_command and its args ...

    instead of the equivalent and cheaper

    <file some_command and its args ...

    or (equivalently and more classically)

    some_command and its args ... <file

In the above example, we used a container to display the content of the file uuoc.txt using less command:

cat uuoc.txt | less

So the team’s only goal cat was to serve the standard input of the command less with file content uuoc.txt… We would get the same behavior using shell redirection:

less < uuoc.txt

Actually, the teamlesslike many commands, it also accepts a filename as an argument. So we could just write this instead:

less uuoc.txt

As you can see, there is no need to use the command here cat… We mention the “Useless cat” anti-pattern because if you publicly use it in a forum or elsewhere, no doubt someone will point this out with an argument that you create “extra process for nothing”.

I must admit that for a long time we were quite dismissive with such comments. After all, on our modern hardware, adding one additional process for a one-time operation cannot cause too much overhead.

But while writing this article, we did a quick experiment comparing the time required with and without UUOC by the test script. awk to handle 500 MB of data coming from slow media.

To our surprise, the difference was far from insignificant.

However, the reason is not to create an additional process. But due to the extra read / write and context switching, the UUOC takes over (how can you infer it from the time spent executing the system code). Therefore, indeed, when you are working with large datasets, the extra command cat has a disadvantageous cost. As for us, we will try to be more vigilant with this now! And you? If you have any useless Cat examples, feel free to share them with us!