BSD
tr
[-dCcsu] matchString replaceString
<
†
infile
tr matchString replaceString < file
the first character in matchString
is translated into the first character in replaceString
….
If matchString
is longer than replaceString
, the last character found in replaceString
is duplicated until matchString
is exhausted.
tr "aeiou" "_" implies tr "aeiou" "_____"such that:
echo abcdefghijklmnopqrstuvwxyz1234 | tr "aeiou" "_"
_bcd_fgh_jklmn_pqrst_vwxyz1234
-d matchString | delete characters specified in matchString .
|
-s matchString | squeeze multiple characters specified in matchString to a single occurance
(either matchString or replaceString ) after deletion and translation.
|
-C | Complement the set of characters in matchString . For example -C aeiou includes every character except for aeiou
|
-c | complement the set of byte values in matchString .
|
-u | unbuffered output . |
These characters have special meaning to the shell so enclose string
in quotes or escape them with \
<
>
&
`
$
(
)
"
;
'
|
*
=
[
]
#
?
~
\a
alertCharacter( usually BELL,beep),\b
BackSpace,\f
Form-Feed,\n
NewLine,
\r
CarriageReturn,\t
Tab,\v
VerticalTab.
Backslash followed by any other character is ignored (so \\
specifies a backslash).
c1-c2
range of characters, inclusively.
To specify a hex string use $'\xxx'
† .
(_) using the mouse to highlight and copy characters in the x'C0' - … range,
then pasting into linux, causes 2 characters to be inserted for each character received!
A x'C3' and then the character received .AND. x'DF' (b'1011 1111')
For example: pasting an uppercase A with
an over struck accent grave ( x'C0' ) inserts x'C380' into the input stream!
Using the following translate to delete the x'C3' and changing the x'80' to x'C0' works BUT …
Fix up characters x'C0' thru x'CF' inserted from console that came in as x'80' thru x'8F'.
tr -d $'\xc3' < 0 | \
tr $'\x80'-$'\x8f' $'\xC0'-$'\xCF' > 0fixed
> hexdump -C 0
43 30 20 c3 80 20 09 43 31 20 c3 81
> hexdump -C 0fixed
43 30 20 c0 20 09 43 31 20 c1
Using mouse to copy paste from some pdf files may cause the inclusion of
The following translates EF BF BC to "++ which can be edited with vi
[
when translating, the only character classes that may appear in
See ctype(3) manual pages for details as to which characters are included in these classes,
[
[
This is easily corrected with:
perl extension
the feature wherein the last character of replaceString
is duplicated if replaceString has less characters than matchString is permitted by
POSIX but is not required. Shell scripts attempting to be portable to
other POSIX systems should use the
Example:
See sed: stream editor for multiple character string manipulation.
BSD October 11, 1997
EF 82 B7
E2 80 99
or EF BF BC
or CC 82
as quotes
tr $'\xef' \" < calliope | tr $'\xbf' + |tr $'\xbc' + >calliope2
:class:
] all characters belonging to the character class
.replaceString
are `upper
' and `lower
'.(linux)upper
UPPER-CASE , lower
,
alpha
alphabetic, digit
, alnum
alphanumeric,
punct
punctuation†blank
†,space
†,
print
†,
graph
†, tr "[:cntrl:]" "[:lower:]"
carriage return
is shown as n
,
line feed
is shown as k
,
TAB
is shown as j
tr "[:cntrl:]" "[:lower:]" < cedar2.txt
nkIf RIDGEjcould tell its story it would
start with a vast ice-cap that stretched down from thenkNorth Pole to here. When
a wide river had previously cut deep channels into the
sandstone and shale bedrock,nkThe melting glacier left
millions of cubic yards of sand, gravel and stone.nkCedar Ridge is a
pile of such glacial till.nkThe rounded boulders that
[:xdigit:]
hexadecimal
carriage return
is shown as D (x'0D')
,
line feed
is shown as A (x'0A')
,
TAB
is shown as 9 (x'09')
tr "[:cntrl:]" "[:xdigit:]" < cedar2.txt
DAIf9CEDAR RIDGE could tell its story it would start
with a vast ice-cap that stretched down from theDANorth
Only
! " ` # ' $ % & ( ) * + , - _ . / \ | : ; < = > ? @
{ } ~[ ] ^ 0-9,space,A-Z,a-zupper
and lower
are ordered.
c*n] c
repeated n
times in replaceString
.
If n
is omitted or 0, it is be interpreted as large enough to extend replaceString
to the length of matchString.
If nn
. has a leading 0, it is interpreted as octal \000
octal .
To follow 0 with a digit as a character, left 0-pad the 0n to 3 octal digits ex: 007.=equiv=
] Represents all characters belonging to the same equivalence
class as equiv, ordered by their encoded values.tr
exits 0 on success, and >0 if an error occurs.
The Mac OSX darwin BSD version as of 10.5.6 exits
if a copyright symbol (x'A9', © ), left double quote (x'93', “) etc,
B8, D1, C0, CF, C2, D8, D4
is encountered with the message:
tr: Illegal byte sequence
and a exit status of 1.
export LC_ALL=C
EXAMPLES
tr "[:lower:]" "[:upper:]" < file
† (not linux)
tr -d '\000' < file
tr -s ' ' < file
tr -cd† "\n[:print:]" < file
# delete includes ␍
…
People often write "the the"
with the duplicated words separated by a newline.
(This causes each word to be on a seperate line)
uniq -d
prints the words that were adjacent duplicates.
tr -s '[:punct:][:space:]' '\n' < file | \
tr '[:upper:]' '[:lower:]'| \
uniq -d
N.B. ftp will do the conversion if transfer is done in CR
(carriage return, x'0D' ) as the line terminator.
For Unix convert to LF
(line feed, x'0A') .
tr '\015' '\012' < macfile > Unixfile
CR
and LF
characters at the end of each line and have a ^Z
at the end of the file. For unix delete CR
and ^Z
(x'1B') leaving the LF
:
tr -d '\015\032' < DOSfile > Unixfile
ascii
mode. tr "\200-\377" "\000-\177"
(translate characters with the high bit to the corresponding character without the high bit. Some programs use the high bit as a flag.)
tr -cs "[:alpha:]" "\n" < file
e
:
tr "[=e=]" "e" < file
-U
tr -CU "\0-\xFF" "" < file
tr -UC "\0-\x{FF}" "" < file
The LANG, LC_ALL, LC_CTYPE and LC_COLLATE environment variables affect
the execution of tr
as described in environ.
COMPATIBILITY
the command tr [a-z] [A-Z]
will work as it will map the [
character in matchString to the [
character in replaceString. However, if the
shell script is deleting or squeezing characters as in the command tr
-d [a-z]
, the characters [
and ]
will be included in the deletion or compression list
To have a-z
to represent the three characters a
, -
and z
use a\-z
.
[#*]
.
.
-u
is an extension
[:cntrl:] (octal)
000 NUL 001 SOH 002 STX 003 ETX 004 EOT 005 ENQ 006 ACK 007 BEL
010 BS 011 HT† 012 NL 013 VT 014 NP 015 CR 016 SO 017 SI
020 DLE 021 DC1 022 DC2 023 DC3 024 DC4 025 NAK 026 SYN 027 ETB
030 CAN 031 EM 032 SUB 033 ESC 034 FS 035 GS 036 RS 037 US
177 DEL
[:punct:] (octal)
041 ! 042 " 043 # 044 $ 045 % 046 & 047 '
050 ( 051 ) 052 * 053 + 054 , 055 - 056 . 057 /
072 : 073 ; 074 < 075 = 076 > 077 ?
100 @
133 [ 134 \ 135 ] 136 ^ 137 _
140 `
173 { 174 | 175 } 176 ~
BUG:
Case conversion fails in a locale with differing numbers of lower case and upper case characters
with message:
LC_CTYPE=en_US.iso88591
tr '[:upper:]' '[:lower:]'