BSD

tr

translate (or deletes) characters

tr [-dCcsu] matchString replaceString < infile

tr matchString replaceString < file
the first character in matchString is translated into the first character in replaceString ….
If matchString is longer than replaceString, the last character found in replaceString is duplicated until matchString is exhausted.

tr "aeiou" "_" implies tr "aeiou" "_____"
such that:
       echo abcdefghijklmnopqrstuvwxyz1234 | tr "aeiou" "_"
produces:   _bcd_fgh_jklmn_pqrst_vwxyz1234

-d matchString delete characters specified in matchString.
-s matchString squeeze multiple characters specified in matchString to a single occurance (either matchString or replaceString) after deletion and translation.

-C Complement the set of characters in matchString.
For example -C aeiou includes every character except for aeiou
-c complement the set of byte values in matchString.
-u unbuffered output .

These characters have special meaning to the shell so enclose string in quotes or escape them with \

< > & ` $ ( ) " ; ' | * = [ ] # ? ~

\a alertCharacter( usually BELL,beep), \b BackSpace, \f Form-Feed, \n NewLine,
\r CarriageReturn, \t Tab, \v VerticalTab.

Backslash followed by any other character is ignored (so \\ specifies a backslash).

c1-c2 range of characters, inclusively.

To specify a hex string use $'\xxx' .

(_) using the mouse to highlight and copy characters in the x'C0' - … range, then pasting into linux, causes 2 characters to be inserted for each character received!
A x'C3' and then the character received .AND. x'DF' (b'1011 1111')
For example: pasting an uppercase A with an over struck accent grave ( x'C0' ) inserts x'C380' into the input stream!
Using the following translate to delete the x'C3' and changing the x'80' to x'C0' works BUT …

Fix up characters x'C0' thru x'CF' inserted from console that came in as x'80' thru x'8F'.

tr -d $'\xc3' < 0 | \
tr $'\x80'-$'\x8f' $'\xC0'-$'\xCF' > 0fixed
> hexdump -C 0
43 30 20 c3 80 20 09 43 31 20 c3 81
> hexdump -C 0fixed
43 30 20 c0 20 09 43 31 20 c1

Using mouse to copy paste from some pdf files may cause the inclusion of EF 82 B7 for dot, or E2 80 99 as apostrophes,
or EF BF BC as Table Verticles
or CC 82 as quotes

The following translates EF BF BC to "++ which can be edited with vi

tr $'\xef' \" < calliope | tr $'\xbf' + |tr $'\xbc' + >calliope2

[:class:] all characters belonging to the character class.

when translating, the only character classes that may appear in replaceString are `upper' and `lower'.(linux)

upper UPPER-CASE , lower, alpha alphabetic, digit, alnum alphanumeric,
punct punctuation, blank, space, print, graph,

tr "[:cntrl:]" "[:lower:]"
  carriage return is shown as n,
  line feed is shown as k,
  TAB is shown as j

  tr "[:cntrl:]" "[:lower:]" < cedar2.txt 
 nkIf  RIDGEjcould tell its story it would
 start with a vast ice-cap that stretched down from thenkNorth Pole to here. When 
 a wide river had previously cut deep channels into the 
 sandstone and shale bedrock,nkThe melting glacier left 
 millions of cubic yards of sand, gravel and stone.nkCedar Ridge is a 
 pile of such glacial till.nkThe rounded boulders that

[:xdigit:] hexadecimal
  carriage return is shown as D (x'0D'),
  line feed is shown as A (x'0A'),
  TAB is shown as 9 (x'09')

tr "[:cntrl:]" "[:xdigit:]" < cedar2.txt
 DAIf9CEDAR RIDGE could tell its story it would start 
 with a vast ice-cap that stretched down from theDANorth 

[:print:]
! " ` # ' $ % & ( ) * + , - _ . / \ | : ; < = > ? @
{ } ~[ ] ^ 0-9,space,A-Z,a-z
Only upper and lower are ordered.

See ctype(3) manual pages for details as to which characters are included in these classes,

[c*n] c repeated n times in replaceString.
If n is omitted or 0, it is be interpreted as large enough to extend replaceString to the length of matchString.
If nn. has a leading 0, it is interpreted as octal

\000 octal .
To follow 0 with a digit as a character, left 0-pad the 0n to 3 octal digits ex: 007.

[=equiv=] Represents all characters belonging to the same equivalence class as equiv, ordered by their encoded values.

tr exits 0 on success, and >0 if an error occurs.

The Mac OSX darwin BSD version as of 10.5.6 exits
if a copyright symbol (x'A9', © ), left double quote (x'93', “) etc,
B8, D1, C0, CF, C2, D8, D4 is encountered with the message:
tr: Illegal byte sequence 
and a exit status of 1.

This is easily corrected with:

export LC_ALL=C

EXAMPLES

  1. tr "[:lower:]" "[:upper:]" < file (not linux)
  2. Remove all 00 characters : tr -d '\000' < file
  3. Convert strings of spaces to single spaces: tr -s ' ' < file
  4. Delete non-printable characters: tr -cd "\n[:print:]" < file # delete includes
  5. Display doubled occurrences of words in a document.
    People often write "the the" with the duplicated words separated by a newline.
    1. translate each sequence of punctuation and space characters to a newline
      (This causes each word to be on a seperate line)
    2. translates all uppercase characters to lower case,
    3. finally uniq -d prints the words that were adjacent duplicates.

    tr -s '[:punct:][:space:]' '\n' < file | \
    tr '[:upper:]' '[:lower:]'| \
    uniq -d

  6. Mac to Unix or DOS/Windows to Unix: N.B. ftp will do the conversion if transfer is done in ascii mode.

  7. clear the high bit: tr "\200-\377" "\000-\177"
    (translate characters with the high bit to the corresponding character without the high bit. Some programs use the high bit as a flag.)

  8. Create a list of the words, one per line. This translates all non-alphabetic characters to newlines, then squeezes repeated newlines into a single newline:
    tr -cs "[:alpha:]" "\n" < file

  9. Remove diacritical marks from all accented variants of the letter e:
    tr "[=e=]" "e" < file

perl extension -U

  1. Latin-1 to Unicode: tr -CU "\0-\xFF"   "" < file
  2. Unicode to Latin-1: tr -UC "\0-\x{FF}" "" < file


The LANG, LC_ALL, LC_CTYPE and LC_COLLATE environment variables affect the execution of tr as described in environ.

COMPATIBILITY

the command tr [a-z] [A-Z] will work as it will map the [ character in matchString to the [ character in replaceString. However, if the shell script is deleting or squeezing characters as in the command tr -d [a-z], the characters [ and ] will be included in the deletion or compression list
To have a-z to represent the three characters a, - and z use a\-z.

the feature wherein the last character of replaceString is duplicated if replaceString has less characters than matchString is permitted by POSIX but is not required. Shell scripts attempting to be portable to other POSIX systems should use the [#*] . .
-u is an extension


[:cntrl:] (octal)
 000 NUL  001 SOH  002 STX  003 ETX  004 EOT 005 ENQ  006 ACK  007 BEL   
 010 BS   011 HT  012 NL   013 VT   014 NP  015 CR   016 SO   017 SI 
 020 DLE  021 DC1  022 DC2  023 DC3  024 DC4 025 NAK  026 SYN  027 ETB
 030 CAN  031 EM   032 SUB  033 ESC  034 FS  035 GS   036 RS   037 US   
 177 DEL 
[:punct:] (octal)
        041 !    042 "    043 #    044 $    045 %   046 &    047 ' 
050 (   051 )    052 *    053 +    054 ,    055 -   056 .    057 /
                 072 :    073 ;    074 <    075 =   076 >    077 ?    
100 @    
                          133 [    134 \    135 ]    136 ^    137 _    
140 `   
                          173 {    174 |    175 }    176 ~

BUG:

Case conversion fails in a locale with differing numbers of lower case and upper case characters
with message:tr '[:upper:]' '[:lower:]' -- misaligned construct

Example:
LC_CTYPE=en_US.iso88591
tr '[:upper:]' '[:lower:]'

See sed: stream editor for multiple character string manipulation.

BSD October 11, 1997