sort

merge or check

sort [-c|-m] -bfirdMn infile []

sort has three modes of operation: Sort(default), Check, and Merge

--check
-c
if the files are sorted.
If not, display "sort: disorder on filename:lll: disorder: uuuu" and exit with a status of 1.
lll is the last key in order, uuu is the next key .
 
--merge
-m
Input files need not be sorted in which case output will not be sorted

Sort always works ; merging is faster.

examples

bfirdMn may be specified globally and/or appended to a key .
If no key fields are specified, global options apply to comparison of entire lines.

Global options are inherited by keys that do not specify options.
The --ignore-leading-blanks (-b), --dictionary-order (-d), --ignore-case (-f) and ignore unprintable (-i ) options are dependent on LC_CTYPE locale.

LC_COLLATE

using fields ( not absolute columns )
--field-separator=sep
-t sep

default:
111| bbb
   









-t ' ' --ignore-leading-blanks 

-t \'
Default: Fields are separated by the empty string between a non-whitespace character and a whitespace character.
The field separator is not part of either field.

Shown in this example as |

111| bbb
The fields are 111 and  bbb.

With -t :

111:bbb
The fields are 111 and bbb.

When using a space, ??(_) ?? consider using --ignore-leading-blanks (-b).

If using ' for delimiter use \' since shell treats ' special.

--key B.b[,E.e]
 Not comma  If omitted → end of line
-k
Begining field number, the b column in that field and the
Ending field, the e column (or the end of the line, if e is omitted), inclusive.
Fields and character positions are numbered starting with 1.
    To sort on
  • only on the second field use: -k 2,2 i.e. start field 2 , end field 2
  • column 48 throught the end of the record: -k1.48
  • field 5 beginning at the second character: -k 5.2
more examples.

Keys may span multiple fields.
Any of bfirdMn may be appended to a key.

-b
--ignore-leading-blanks
blanks at the begining of keys are ignored

-f
--ignore-case
fold lowercase characters to uppercase.
-i ignore unprintable characters.
Invalid names compare low to valid names. spellings.
-r
--reverse
reverse the comparison, DESCending.
If global -r and -k…r then ASCEnding on that key.
-d
--dictionary-order
"phone directory" order:consider only letters, digits and blanks when sorting.
-M
--month-sort
Use Month name abbreviations, folded to UPPER case and compared in the order
junk < JAN < FEB < ... < DEC. i.e. JUN comes before JUL .
The LC_TIME locale determines the month
-n
--numeric-sort
numerical: may include whitespace, a - but Not a + nor a , nor a radix character.
For example:
input sort sort -n
12 1 1
9 12 9
1 9 12
Treating keys spanning more than one field as numeric will usually not do what you expect.

The LC_NUMERIC locale specifies the radix character and thousands separator.

Leading + and exponential notation are not recognized, use -g .

sort aligns the radix characters in the two strings and compares the strings a character at a time.

-g Create a prefix of a double-precision float For Example: 1.0e-34 and 10e100.
Collating sequence:
  1. Lines that do not start with numbers (all considered to be equal).
  2. NaNs ("Not a Number" values, in a consistent but machine-dependent order.
  3. Minus infinity.
  4. Finite numbers in ascending numeric order (with -0 and +0 equal).
  5. Plus infinity.

Use this option only if there is no alternative; it can lose information,.

-u
--unique
For sorting or merging, only output the first line of a sequence of lines that compare equal.
in sort -u out
11
22
33
2
For checking, check that no pair of consecutive lines compares equal.
-o file
--output=file
output to file instead of standard output. Useful with --merge to concatenate files.
Unfortunately this will read the entire, potentially very large, first file and write it to the --output , not really append!

If file is one of the input files, it is copied it to a temporary file before sorting

-z
--zero-terminated
input lines are terminated by a zero byte (Null) , not an ASCII LF (Line Feed, x'0A').
Useful with perl -0 or find -print0 and xargs -0 which do the same.
+b[-e] A field consists of the line between b and up to but _not including_ e (or the end of the line if e is omitted).
Fields and character positions are numbered starting with 0. obsolete

-s
--stable
stabilize sort by disabling last-resort comparison
-S
--buffer-size=size
use size for memory buffer
-R
--random-sort
sort by random hash of keys
--random-source=file get random bytes from file (default /dev/urandom) for -R, --random-sort
--prog  
-T dir
--temporary-directory=dir
Overrides $TMPDIR as the directory for temporary files. Default /tmp.
Useful when /tmp has insufficient space.
--help
--version


How lines are compared

Compares each pair of fields using the collating sequence specified by the LC_COLLATE locale, Unless otherwise specified.

If global --Month-sort, --blankIgnore, --directory, --fold, --ignoreUnprintable, --numerical or --reverse are given without key fields, the entire lines is compared according to the global options.

When all keys compare equal (or if no ordering options were specified ), sort compares the entire lines honoring --reverse .
-s (stable) lines in which all fields compare equal are left in their original relative order. If no fields or global options are specified, -s has no effect.

If the final byte of an input file is not a newline one is added.
A line's trailing newline is part of the line for comparison purposes; for example, with no options in an ASCII locale,
a line starting with tab sorts before an empty line because tab precedes newline.

Upon error, sort exits with a status of 2.

Notes

With --ignore-leading-blanks the column part of a field is counted from the first nonblank character
of the field (for +POS)
AAAAA BBBBB
      12345
following the previous field (for -POS)
AAAAA BBBBB
     12345

A key may have Mbdfinr appended to it, which overides global options for this key.
-b (ignore leading blanks) may be attached to either or both of the +POS and -POS parts of a field specification, If inherited from global options it will be attached to both. Keys may span multiple fields.

Some implementations treat -b, -f, and -n. GNU sort follows the POSIX behavior, -n does not imply -b.
-M has been changed in the same way.
This may affect the meaning of character positions in field specifications in obscure cases. If in doubt add an explicit -b.

Examples

  1. Alphabetically using a key which begins at the 3rd field and extends to the end of the line.

    sort -k3

    1. Field delimiter is :
    2. First key is the 2nd field , treating numbers as values. Resolve ties alphabetically
    3. Second key is the 5th field, considering only the 3rd through 5th characters.

    sort -t : -k 2,2n -k 5.3,5.5

-k 2 (instead of -k 2,2; ) would use characters beginning in the 2nd field to the end of the line as the primary key.

Sorting multiple log files with month abbrevations (reverse order), day (numeric,reverse order) showing the most recent:

> sort   --key=2Mr --key=3nr error_logs.ln/* | head
 
Dec 09 03:30:21 2016] [12156627] [cgi:error] [client 45.79.197.44:48718] script
Dec 09 15:39:11 2016] [12156627] [cgi:error] [client 74.105.216.249:53677] AH01215:
Dec 09 15:39:11 2016] [12156627] [cgi:error] [client 74.105.216.249:53677] AH01215:
Dec 08 00:42:00 2016] [12156627] [core:error] [client 89.248.167.131:44947] AH00135:
Dec 08 03:29:45 2016] [12156627] [cgi:error] [client 45.79.197.44:41226] script
Dec 07 23:46:48 2016] [12156627] [log_config:warn] (28)No space left
Dec 06 03:14:20 2016] [12156627] [log_config:warn] (28)No space left
Modifiers (except b ) apply to the field, if attached to the field-start and/or end.
-k 2n,2 is the same as -k 2n,2n.

 /etc/passwd

    1 :2:3:4 : 5
  root:x:0:0:   Boss:/root:/bin/bash
  mail:x:1:1:   maildaemon:/mail:/sbin/nologin
  realge:x:32:8:MrG:/home/realge:/usr/local/cpanel/bin/jailshell
  saturn:x:33:8:planet:/home/realge:/usr/local/cpanel/bin/jailshell
  uranus:x:135:8:planet:/home/realge:/usr/local/cpanel/bin/jailshell 
Sort /etc/passwd on the 4th field ( group ID)
Lines with equal values in field 4 are then sorted on field 3( numeric user ID ).

sort -t : -k 4,4n -k 3,3n /etc/passwd

An alternative is to use the global numeric modifier -n.

sort -t : -n -k 4,4 -k 3,3 /etc/passwd

  • Generate a tags file in case insensitive sorted order.
    find src -type f -print0 | sort --field-separator=/ --zero-terminated --ignore-case | xargs -0 etags --append

    The use of -print0, -z, and -0 in this case means that pathnames that contain Line Feed characters will not get broken up by the sort operation.

    To ignore both leading and trailing white space, you could have applied the b modifier to the field-end specifier for the first key,

     sort -t : -n -k 5b,5b -k 3,3 /etc/passwd

    or by using the global -b modifier instead of -n and an explicit n with the second key specifier.

     sort -t : -b -k 5,5 -k 3,3n /etc/passwd

    Sort processes by CPU time used

      /bin/ps -eafA|sort -k 1.40  |tail -20
    1234567891123456789212345678931234567890
    root      3532     1  0 Sep04 ?        00:08:13 /usr/sbin/snmpd -Lsd -Lf /dev/null -p /var/run/snmpd.pid -a
    dovecot  27893 27869  0 Dec08 ?        00:08:38 pop3-login
    root     21100     1  0 Dec15 ?        00:09:57 /usr/local/apache/bin/httpd -k start -DSSL
    root        11     1  0 Sep04 ?        00:10:08 [migration/3]
    root      7737  7559  0 Sep04 ?        00:10:11 hald-addon-storage: polling /dev/scd0
    root       564    35  0 Sep04 ?        00:10:46 [usb-storage]
    mailman   7499  7486  0 Sep04 ?        00:10:59 /usr/bin/python2.4 /usr/local/cpanel/3rdparty/mailman/bin/qrun
    root      3337    35  0 Sep04 ?        00:18:08 [rpciod/7] 
    Sort .htaccess lines "deny from ip.ip.ip.ip" .
    All the keys are numeric( i.e. 2 comes before 10), fields seperated by a dot.
    The first key begins with the 1st field begining with the 10th character( skipping the deny from . The other 3 keys begin and end with fields 2 , 3 and 4.
    sort --numeric-sort --field-separator=. -k1.10,1 -k2,2 -k3,3 -k4,4   .htaccess > 0 
    1234567890   k2 k3 k4
    deny from 23.96.208.191
    deny from 23.254.113.26
    deny from 23.254.138.210
    deny from 23.254.201.89
    deny from 31.15.10.37
    
    A file name of - means standard input.

    By default, sort writes the results to the standard output.

    Sort exits with a return code of 2 if an input file is not accessable.

    Error message:"sort: memory exhausted" may occur on very large files.
    sort: No such file or directory

    Sort's command line is:
            sort … infile… > outfile.
    NOT sort … infile    outfile.

    poor(?) delimiter specified:  multi-character tab
    poor(?) delimiter specified:  empty tab