The GNU Text Utilities are the basic text-manipulation utilities of the GNU operating system. The tools supplied with this package are:
  • cat - concatenate files and print to the standard output
  • cksum - checksum and count the bytes in a file
  • comm - compare two sorted files line by line
  • csplit - split a file into sections determined by context lines
  • cut - remove sections from each line of files
  • expand - convert tabs to spaces
  • fmt- simple optimal text formatter
  • fold - wrap each input line to fit in specified width
  • head - output the first part of files
  • join - join lines of two files on a common field
  • md5sum - compute and check MD5 message digest
  • nl - number lines of files
  • od - dump files in octal and other formats
  • paste - merge lines of files
  • pr - convert text files for printing
  • ptx - produce a permuted index of file contents
  • sort - sort lines of text files
  • split - split a file into pieces
  • sum - checksum and count the blocks in a file
  • tac - concatenate and print files in reverse
  • tail - output the last part of files
  • tr - translate or delete characters
  • tsort - perform topological sort
  • unexpand - convert spaces to tabs
  • uniq - remove duplicate lines from a sorted file
  • wc - print the number of bytes, words, and lines in files
The current version of the textutils is 2.1, which will also be the last version. All new development is going into coreutils, which combines textutils, fileutils and shellutils into one convenient bundle.

For more and better functionality, go to http://alexautils.sourceforge.net which holds the Alexa extensions to the gnu textutils. Some of the extensions include

  • If an input file is in gzip format, it is decompressed on the fly
  • Cut not only goes a lot faster, but it has a much broader range of options.
  • uniq, comm and join now can take the same "-k key" parameters as sort, so they can work much better together.
  • join goes much faster, and no longer crashes with large data sets.
  • new tool : cw which compresses whitespace
  • new tool : ununiq which turns a single line like
    A B 1,2,3 C D
    into three lines like
    A B 1 C D
    A B 2 C D
    A B 3 C D
  • All field based tools have these options :
    • --delimiter=CHAR (input delimiter = CHAR)
    • --output-delimiter=STRING (output delimiter = STRING, null string means one-byte null character)
    • --ds alias for --delimiter=" " (input delimiter is space)
    • --dt alias for --delimiter="\t" (input delimiter is tab)
    • --dz alias for --delimiter="" (input delimiter is zero)
    • --dw alias for inexpressible (input delimiter is whitespace)
    • --Ds alias for --output-delimiter=" " (output delimiter is space)
    • --Dt alias for --output-delimiter="\t" (output delimiter is tab)
    • --Dz alias for --output-delimiter="" (output delimiter is zero)
    • --De alias for inexpressible (output delimiter is empty or skipped)
    In both delimiters, escaped characters are interpreted, so you can use \t and \9 and such.

Anyone with desires for other extensions to gnu textutils, please contact me.

Log in or register to write something here or to contact authors.