4.1d
http://www.gnu.org/software/sed/manual/html_node/index.html
| |
SED is a stream editor used to perform basic text transformations on an input stream (a file or input from a pipeline) , by making only one pass over the input(s). SED's ability to filter text in a pipeline it from other types of editors.
p
command.
If no -e
, -f
, --expression
, or
--file
options are given on the command-line, then the first
non-option argument on the command line is taken to be the script to
be executed.
If any command-line parameters remain after processing the
above, these parameters are interpreted as the names of input files to be
processed. A file name of -
refers to the
standard input stream. The standard input will processed if no file names are
specified.
A SED program consists of one or more SED
commands, passed in by one or more of the -e
, -f
,
--expression
, and --file
options, or the first
non-option argument if zero of these options are used. This document will refer
to "the" SED script; this will be understood to mean the in-order catenation of
all of the scripts and script-files passed in.
Each SED command consists of an optional address or address range, followed by a one-character command name and any additional command-specific code.
Addresses in a script can be in any of :
If no addresses are given, then all lines are matched; if one address is given, then only lines matching that address are matched.
An address range can be specified by
specifying two addresses separated by a comma (,
). An address range
matches lines starting from where the first address matches, and continues until
the second address matches (inclusively). If the second address is a
regexp, then checking for the ending match will start with the line
following the line which matched the first address. If the second
address is a number less than (or equal to) the line matching the
first address, then only the one line is matched.
Appending the !
character to
the end of an address specification will negate the sense of the match. That is,
if the !
character follows an address range, then only lines which
do not match the address range will be selected. This also works for
singleton addresses, and, perhaps perversely, for the null address.
[[I may add a brief overview of regular expressions at a later date; for now see any of the various other documentations for regular expressions, such as the AWK info page.]]
SED maintains two data buffers: the active pattern space, and the auxiliary hold space. In "normal" operation, SED reads in one line from the input stream and places it in the pattern space. This pattern space is where text manipulations occur. The hold space is initially empty, but there are commands for moving data between the pattern and hold spaces.
If you use SED at all, you will quite likely want to know these commands.
#
"command" begins a comment; the comment continues until the
next newline. If you are concerned about portability, be
aware that some implementations of SED (which are not POSIX.2 conformant) may
only support a single one-line comment, and then only when the very first
character of the script is a #
. Warning: if the first two characters of the SED script are
#n
, then the -n
(no-autodisplay) option is forced. If
you want to put a comment in the first line of your script and that comment
begins with the letter `n' and you do not want this behavior, then be sure to
either use a capital `N', or place at least one space before the `n'.
/
characters may be uniformly replaced by any other
single character within any given s
command.) The /
character (or whatever other character is used in its stead) can appear in the
regexp or replacement only if it is preceded by a
\
character. Also newlines may appear in the regexp
using the two character sequence \n
. The s
command
attempts to match the pattern space against the supplied regexp. If
the match is successful, then that portion of the pattern space which was
matched is replaced with replacement. The replacement can contain
\n
(n being a number from 1 to 9,
inclusive) references, which refer to the portion of the match which is
contained between the nth \(
and its matching
\)
. Also, the replacement can contain unescaped
&
characters which will reference the whole matched portion
of the pattern space. To include a literal \
, &
,
or newline in the final replacement, be sure to precede the desired
\
, &
, or newline in the replacement
with a \
. The s
command can be followed with zero or more of
the following flags:
-n
command-line option. Note: some
implementations of SED, such as this one, will double-display lines when
auto-display is not disabled and the p
command is given. Other
implementations will only display the line once. Both ways conform with the
POSIX.2 standard, and so neither way can be considered to be in error. Portable SED scripts should thus avoid relying on either
behavior; either use the -n
option and explicitly display what you
want, or avoid use of the p
command (and also the p
flag to the s
command).
{
and }
characters. (The
}
must appear in a zero-address command context.) This is
particularly useful when you want a group of commands to be triggered by a
single address (or address-range) match. Though perhaps less frequently used than those in the previous section, some very small yet useful SED scripts can be built with these commands.
/
characters may be uniformly replaced by any other
single character within any given y
command.) Transliterate any characters in the pattern
space which match any of the source-chars with the corresponding
character in dest-chars. Instances of the /
(or
whatever other character is used in its stead), \
, or newlines
can appear in the source-chars or dest-chars lists,
provide that each instance is escaped by a \
. The
source-chars and dest-chars lists must contain
the same number of characters (after de-escaping).
\
, which will be removed from the output) to
be output at the end of the current cycle, or when the next input line is
read.
\
, which will be removed from
the output).
\
, which
will be removed from the output) in place of the last line (or in place of
each line, if no addresses were specified). A new cycle is started after this
command is done, since the pattern space will have been deleted.
\
character) are displayed in C-style escaped form; long lines are split, with a
trailing \
character to indicate the split; the end of each line
is marked with a $
.
w
commands (including
instances of w
flag on successful s
commands) which
refer to the same filename are output through the same FILE stream.
In most cases, use of these commands indicates that you are probably better off programming in something like PERL. But occasionally one is committed to sticking with SED, and these commands can enable one to write quite convoluted scripts.
b
and t
commands. In all other respects, a no-op.
s
ubstitution
since the last input line was read or t
branch was taken. The
label may be omitted, in which case the next cycle is started.
For those who want to write portable SED scripts, be aware that some implementations have been known to limit line lengths (for the pattern and hold spaces) to be no more than 4000 bytes. The POSIX.2 standard specifies that conforming SED implementations shall support at least 8192 byte line lengths. GNU SED has no built-in limit on line length; as long as SED can malloc() more (virtual) memory, it will allow lines as long as you care to feed it (or construct within it).
In addition to several books that have been written about SED (either specifically or as chapters in books which discuss shell programming), one can find out more about SED (including suggestions of a few books) from the FAQ for the seders mailing list, available from any of:
http://www.dbnet.ece.ntua.gr/~george/sed/sedfaq.html http://www.ptug.org/sed/sedfaq.htm http://www.wollery.demon.co.uk/sedtut10.txt
There is an informal "seders" mailing list manually maintained by Al Aab. To subscribe, send e-mail to af137@torfree.net with a brief description of your interest.
Email bug reports to bug-gnu-utils@gnu.org. Be sure to include the word "sed" somewhere in the "Subject:" field.
-n, forcing from within a script
#
(comment) :
(label) =
(display line number) a
(append text lines) b
(branch) c
(change to text lines) D
(delete first line) d (delete) G
(appending Get) g (get)
H
(append Hold) h (hold) i
(insert text lines) l (list unambiguously) N
(append Next line) n (next-line) P
(display first line) p (print) q
(quit) r
(read file) s
(substitute) s , option flags t
(conditional branch) w
(write file) x
(eXchange) y
(transliterate) {} grouping
This document was generated on 28 October 1999 using the texi2html
translator version
1.54.
And converted to real HTML byDennis German