CUT(1) General Commands Manual CUT(1)

cutextract bytes, characters, or fields

cut -b range[,range]… [-Czn] [file]…
cut -c range[,range]… [-Cz] [file]…
cut -f range[,range]… [-Czs] [-d elimiter] [-O out-delimiter] [file]…

Copies bytes, characters, or fields specified by ranges from each line of the input files (standard input stream if "-", the default), to the standard output stream.

ranges can be separated by commas or spaces, and each can be in the format:

number
{number}
from-
[from, )
from-to
[from, to]
    -to
[1, to]
Indices are 1-based, and a union is taken of all ranges. Empty ranges (from < to) are invalid.

With -b, bytes are extracted; with -n, characters are never interrupted mid-sequence, with rounding preferred down (see EXAMPLES). With complementary ranges (like -20 and 21-), each character is guaranteed to only be output once.

With -c, characters in the current locale are extracted, and invalid sequences are ignored.

With -f, -d-delimited fields are extracted. If more than one field is matched in a line, they are merged with -O. Lines that don't contain a delimiter are passed through verbatim, unless -s, in which case they're removed.

The newline (NUL with -z) is never matched and always written (unless the entire line was removed with -fs).

, --bytes=range[,range]…
Extract bytes.
Don't interrupt multi-byte character sequences.

, --characters=range[,range]…
Extract characters in the current locale.

, --fields=range[,range]…
Extract delimited fields.
, --delimiter=delim
Split fields on delim. Default: tab.
, --output-delimiter=out-delim
When merging fields for output, use out-delim. Default: delim.
, --only-delimited
Remove lines that don't contain delim instead of passing them through.

, --complement
Invert ranges: select all what they match ([1, ) - Σrange). For the purposes of -n, the most minimal set of ranges is constructed.
, --zero-terminated
Line separator is NUL, not newline.

1 if a file couldn't be opened.

$ printf '\x01\x02\x03\x04\0\x05\x06\x07' | cut -zb 1,3- | hexdump -C
00000000  01 03 04 00 05 07 00                              |.......|
00000007

$ for i in $(seq 10); do
>   echo "-$i;$((i+1))-" | paste - \
>     <(printf 'яйцо\nЯЙЦО' | cut -nb -$i) \
>     <(printf 'яйцо\nЯЙЦО' | cut -nb $((i+1))-)
> done
-1;2-           яйцо
                ЯЙЦО
-2;3-   я       йцо
        Я       ЙЦО
-3;4-   я       йцо
        Я       ЙЦО
-4;5-   яй      цо
        ЯЙ      ЦО
-5;6-   яй      цо
        ЯЙ      ЦО
-6;7-   яйц     о
        ЯЙЦ     О
-7;8-   яйц     о
        ЯЙЦ     О
-8;9-   яйцо
        ЯЙЦО
-9;10-  яйцо
        ЯЙЦО
-10;11- яйцо
        ЯЙЦО

$ printf 'яйцо\nЯЙЦО' | cut -c 1,3-
яцо
ЯЦО

# name, IDs, homedir, shell, ...
$ cut -f 1,3-4,6- -d: -O"$(printf '\t')" /etc/passwd
root    0       0       /root   /bin/bash
bin     2       2       /bin    /usr/sbin/nologin
irc     39      39      /var/run/ircd   /usr/sbin/nologin
cicada  1000    100     /home/cicada    /bin/bash
nobody  65534   65534   /nonexistent    /usr/sbin/nologin
# Everything else: password and GNATS
$ cut -Cf 1,3-4,6- -d: -O"$(printf '\t')" /etc/passwd
x       root
x       bin
x       ircd
x       Cicadum,,,
x       nobody

paste(1), mbrlen(3)

Conforms to IEEE Std 1003.1-2008 (“POSIX.1”); -z, --complement, --output-delimiter are extensions, originating from the GNU system; -CO are extensions. Allowing -d longer than one character is an extension — some nonconformant implementations only allow a single byte (the GNU system) or only use the first byte of the delim (NetBSD, OpenBSD).

Тhis implementation allows all formats allowed by strtoull(3) in ranges, but some others (the BSD, the GNU system) only allow decimal digits.

Written by Gottfried W. R. Luderer with the corresponding paste(1), appeared mostly-fully-formed in CB-UNIX at or before 2.1 as cut(1):

cut - cut out selected fields of each line of a file
Supporting -c and -fds, with -c handling a single ASCII backspace () "as produced by nroff" by not considering it and the next character as being in the position of the previous one (i.e. printf 'abc\bdef\bgh\n' | cut -c3 isc\bd’, | cut -c4 ise’, &c.). -c is equivalent, with the single-byte characters of then, to today's -b. It also notes that the ranges are "as in the -o option to / for page ranges". CB-UNIX was, among others, the basis for AT&T System III UNIX, where it first saw light outside of AT&T.

A bug in AT&T System V Release 2 UNIX, caused by a transition to reading in blocks, breaks the backspace behaviour.

AT&T System V Release 3 UNIX fixes that, and allows "-" as file to mean the standard input stream.

IEEE Std 1003.2-1992 (“POSIX.2”) created -b and -n, as part of support for multi-byte character encodings.

A CB-UNIX-compatible cut appears in 4.3BSD-Reno.

June 5, 2023 voreutils pre-v0.0.0-latest