NAME
wc
—
line, word, character, byte, and line
length count
SYNOPSIS
wc |
[-lwmcL ]
[file]… |
wc |
[-lwmcL ]
--files0-from =files |
DESCRIPTION
For each file (standard input stream if "-", the default), writes the amount of lines, words, characters, bytes, or the maximum line width, followed by the filename, if any, separated by spaces, to the standard output stream. If more than one filename was specified, a total is produced as well.
With --files0-from
, the contents of
files (standard input stream if
"-") are used as a NUL-separated list of
filenames instead.
If none of -lwmcL
are specified, defaults
to -lwc
. The output is columnated and always in the
-lwmcL
order. -wcL
operate
according to the current locale, conversion errors are silently ignored.
If only -c
is selected, the size is taken
from
fstat(2) for regular files, and the appropriate
ioctl(2) for devices, if supported.
OPTIONS
-l
,--lines
- Number of newline (0x0A) bytes.
-w
,--words
- Number of words, defined as non-whitespace separated by whitespace in the current locale.
-m
,--chars
- Number of characters in the current locale.
-c
,--bytes
- Number of bytes read.
-L
,--max-line-length
- The widest line according to the current locale. Lines are delimited by newlines, carriage returns, and form feeds, tabs stopped at 8 columns, and non-printable characters ignored.
--files0-from
=files- Read filenames from NUL-separated files instead of arguments.
EXIT STATUS
1 if a file couldn't be opened or read. Diagnostics are withheld and mixed in with their corresponding output line, if any.
EXAMPLES
Assuming a default UTF-8 encoding:
$
cat
form Groceries for February: Bananas 3.5kg $4.51 Kiwis 2kg $3.19 Call §iegfried to explain short! Bread $20.21$
cat
data 147.312$ 12$ 12 2.3% 11 520$ 320$ 30 20%$
wc
form data 4 16 111 form 2 9 41 data 6 25 152 total$
wc
-lwmcL
form data 4 16 110 111 64 form 2 9 41 41 36 data 6 25 151 152 64 total$
printf
'\x88' |
wc
-lwmcL
# Invalid UTF-8: one byte, no characters
0 0 0 1 0
SEE ALSO
STANDARDS
Conforms to IEEE Std 1003.1-2008
(“POSIX.1”), except that additional spaces are inserted
between numbers, if need be, to achieve columnation. POSIX only specifies
-lwmc
, with -m
excluding
-c
: accepting them together,
-L
, and --files0-from
are
extensions, originating from the GNU system. The column order is compatible
therewith.
Be wary about assuming anything about the spacing of the numbers — the standard forbids it, but all known implementations ignore that in some form or another: some implementations columnate, usually inconsistently, others output fixed-with space-padded numbers, regardless of the fields selected. Tokenisation is required to achieve consistent results.
NetBSD and FreeBSD
support -L
, but FreeBSD
counts in bytes unless -m
. Under
NetBSD, -mc
override each
other. Under OpenBSD, -m
overrides -c
.
HISTORY
Appears in the first edition of the UNIX Programmer's Manual as wc(I):
Counting words, text lines, and "roff
control lines" — thsose beginning with a
‘.’. Words separated by newlines, spaces, or
tabs, and counted only on text lines. The output format is much more verbose,
too:
@
cat
wc-test .ft B wc test@
wc
wc-test File: wc-test text 1 lines control 1 lines words 2
Version 3 AT&T UNIX extends control lines to start with ‘!’ and ‘'’ and outputs a total if at least two files were specified.
Version 4 AT&T UNIX reads the standard input stream if no files were specified.
Version 5 AT&T UNIX sees a new implementation in C with no total, but a familiar output format:
lines words
filename
A Version 5 AT&T UNIX manual page dated 10th of March, 1974 paints a much more complex, but clearly derived from the above implementation, image:
With-r
ignoring roff
control
lines, this time defined as starting with ‘.’
or ‘'’. Unless -lwapc
,
the default is -lw
, which output the expected.
-a
counts the amount of words consisting of
alphanumeric-or-underscore characters. -p
counts the
amount of words consisting of punctuation (printing-alphanumeric). It's not
clear if the underscore is alphanumeric in this case.
-c
counts the amount of roff
control lines, regardless of -r
.
Version 6 AT&T UNIX inherits the former implementation, with an added re-appearance of the total count.
Version 7 AT&T UNIX sports a familiar
-lwc
. The fields are written in the order specified
(rather, the characters supported after the -
are,
with, e.g., -llejować
equivalent to
llw
). All three columns remain space-padded to seven
characters and the space before the filename is only output if one exists.
Except that w
always has an additional space after the
number.
The BSD
4BSD sees
with the same parsing for-lwcpt
.
-p
adds a field defined as
ceil
(l
/ pagesize), with a default
pagesize of
60.
-t
adds a field defined as c
/ (baud
/ 10) with an exception for
baud=100
yielding 10 characters per second, and default
baud of
300.
-u
(for UUCP) divides that by a further
10/9ths, for
"27cps at 300 baud".
Numbers are space-padded to seven characters. Time is
scaled to the first of
"hr" or
"mi"
equal to at least 1, or
"se"
otherwise, space-padded to two digits plus one of precision, a space, the
unit, and a tab. -v
selects
lwcpt
, replaces number padding with being followed
by a tab, and starts off with a tab-separated heading, with columns
lines
, words
,
chars
, pages
, and
time@
baud. The filename, if
any, is still preceded by a space, however. An output example follows:
$
./wc3
-v
-u
wc* lines words chars pages time@300 11 76 18200 1 11.2 mi wc 8 65 16936 1 10.4 mi wc2 86 196 1356 2 50.0 se wc2.c 15 79 17464 1 10.8 mi wc3 195 463 5586 3 3.4 mi wc3.c 52 101 732 1 27.0 se wc.c 367 980 60274 9 37.2 mi total
4.2BSD removes -pt
and -svub
, and adds a space before each number.
4.3BSD-Tahoe uses normal flag processing
for -lwc
, allows a -
file as
standard input (but also omits the filename, as in the no-filename case),
and reverts back to delimiting words only by the newline, tab, and space. If
only -c
was specified, the files are
stat(2)ted, and, if regular or directories, the size is used
as the byte count.
4.4BSD notes, as it processes
-c
byte-by-byte, that:
System V
Programmer's Workbench (PWB/UNIX),
otherwise inheriting from Version 6 AT&T
UNIX, sees -l
, suppressing the word count,
leaving only the line count and the filename, if any, and no total.
AT&T System III UNIX,
inheriting from Version 7 AT&T UNIX,
fixes the extraneous space after the word count, refuses
non-lwc
columns, and exits
2 on open failure or
invalid columns.
AT&T System V Release 4 UNIX becomes locale-aware and adds spaces after all numbers, preventing them melding together.
Standards
IEEE Std 1003.2-1992
(“POSIX.2”) specifies -clw
, but
is otherwise as present-day. X/Open Portability Guide
Issue 4 (“XPG4”) creates
-m
.