sed

13/10/2024

A Guide to the sed Stream Editor

Function Overview:
sed is a stream editor that reads text from files or input streams line by line, edits the text according to user-specified patterns or commands, and then outputs the result to the screen or a file. When used in conjunction with regular expressions, it is incredibly powerful.

Syntax:

sed [options] 'command' file(s)
sed [options] -f scriptfile file(s)

Explanation:
sed first stores each line of the text in a temporary buffer called the “pattern space.” It then processes the content of this buffer according to the given sed commands. Once the processing is complete, the result is output to the terminal, and sed moves on to the next line. The content of the file itself is not altered unless the -i option is used. sed is mainly used to edit one or more text files, simplify repeated text file operations, or create text transformation scripts. Its functionality is similar to awk, but sed is simpler and less capable of handling column-specific operations, while awk is more powerful in that regard.

Options:

-e: Use the specified commands to process the input text file.
-n: Suppress automatic output (only prints lines modified when used with the p command).
-h: Display help information.
-V: Display version information.

Parameters:

command: The command to be executed.
file(s): One or more text files to be processed.
scriptfile: A file containing a list of commands to execute.

Common Actions:

a: Append text after the current line.
i: Insert text before the current line.
c: Replace the selected lines with new text.
d: Delete the selected lines.
D: Delete the first line of the pattern block.
s: Replace specified characters.
h: Copy the pattern block’s content to an internal buffer.
H: Append the pattern block’s content to the internal buffer.
g: Retrieve content from the internal buffer and replace the text in the current pattern block.
G: Retrieve content from the internal buffer and append it to the current pattern block.
l: List non-printable characters in the text.
L: Similar to l, but specifically for handling non-ASCII characters.
n: Read the next input line and apply the next command to it instead of reapplying the first command.
N: Append the next input line to the current pattern block and insert a new line between them, changing the current line number.
p: Print the matching lines.
P: Print the first line of the pattern block.
q: Quit sed.
b label: Branch to the location marked by label in the script; if the label doesn’t exist, the branch goes to the end of the script.
r file: Read lines from a file.
t label: Conditional branch to a marked location, starting from the last line. If the condition is met, or a T/t command is used, the branch jumps to the specified label or the end of the script.
T label: Error branch. If an error occurs, this branches to the labeled command or the end of the script.
w file: Write the processed block of the pattern space to the end of a file.
W file: Write the first line of the pattern space to the end of a file.
!: Execute the following commands on all lines not selected by the current pattern.
=: Print the current line number.
#: Extend comments to the next newline character.

Replacement Commands:

g: Global replacement within a line (used with the s command).
p: Print the line.
w: Write the line to a file.
x: Exchange the text in the pattern block with the text in the internal buffer.
y: Translate one character to another (not used with regular expressions).
&: Reference to the matched string.

Basic Regular Expression (BRE) Syntax in `sed`:

^: Match the beginning of a line.
$: Match the end of a line.
.: Match any single character except a newline.
*: Match zero or more of the preceding characters.
[]: Match a single character from a specified range.
[^]: Match a single character not in the specified range.
(..): Capture a substring.
&: Save the matched text for later use in replacements.
<: Match the start of a word.
>: Match the end of a word.
x{m}: Match exactly m occurrences of x.
x{m,}: Match at least m occurrences of x.
x{m,n}: Match between m and n occurrences of x.

To match the start of a word, use \<. To match the end of a word, use \>.

Extended Regular Expression (ERE) Syntax in `sed`:

\b: Match a word boundary (not supported by default in sed regular expressions).
+: Match one or more occurrences of the preceding character.

Practical Examples:

1 Print specific lines:
To print only lines 1 and the last line:

sed -n '1p;$p' test.txt

2 Delete lines:
To delete the second line:

sed '2d' filename

3 Basic match and replace:
Replace spaces with hyphens:

echo "hello world" | sed 's/ /-/g'

4 Advanced match and replace:
Reverse words in a string:

echo "abc def ghi" | sed 's/\([a-zA-Z]*\) \([a-zA-Z]*\) \([a-zA-Z]*\)/\3 \2 \1/'

5 Multiple edits:
Replace “Hello” with “Hi” and “Goodbye” with “Farewell” in one command:

sed 's/Hello/Hi/; s/Goodbye/Farewell/' example.txt

6 Read a file:
Insert content from an external file after lines matching a pattern:

sed '/Line 2/r extra.txt' data.txt

7 Write to a file:
Save processed content into a new file:

sed 's/World/Everyone/' input.txt > output.txt

In summary, sed is a versatile and efficient tool for editing text in a stream, offering powerful pattern matching and text transformation capabilities when combined with regular expressions. From basic line printing to advanced text manipulation, sed serves a wide range of text processing needs.

13/10/202410/12/2024

sed流编辑器

功能说明：sed是一种流编辑器，能够从文件或输入流中逐行读取文本，并根据用户指定的模式或命令对文本进行编辑，之后将结果输出到屏幕或文件中。配合正则表达式使用功能强大。

语　　法：

sed [options] ‘command’ file(s)

sed [options] -f scriptfile file(s)

补充说明：sed先把当前处理的一行文本存储在临时缓冲区中，称为“模式空间”，接着用sed命令处理缓冲区的内容，完成后输出到终端，接着处理下一行文本。文件内容并没有被改变，除非使用-i选项。sed主要用来编辑一个或多个文本文件，简化对文本文件的反复操作或者用来编写文本转换程序等。sed功能同awk类似，差别在于sed更加简单，对列处理的功能要差一些，awk功能复杂，对列处理的功能比较强大。

选项：

-e 以指定的指令来处理输入的文本文件

-n 取消默认输出（如果和p命令同时使用只会打印发生改变的行）

-h 显示帮助信息

-V 显示版本信息

参　　数：

command 命令

file(s) 一个或多个文本文件

scriptfile 存放了命令的脚本文件

动作：

a 在当前行下面插入文本

i 在当前行上面插入文本

c 把选定的行改为新的文本

d 删除选择的行

D 删除模板块的第一行

s 替换指定字符

h 拷贝模板块的内容到内存中的缓冲区

H 追加模板块的内容到内存中的缓冲区

g 获得内存缓冲区的内容，并替代当前模板块中的文本

G 获得内存缓冲区的内容，并追加到当前模板块文本的后面

l 列出不能打印字符的清单

L 列出不能打印字符的清单，该选项用于非ASCII字符

n 读取下一个输入行，用下一个命令处理新的行而不是用第一个命令

N 追加下一个输入行到模板块后面并在二者间嵌入一个新行，改变当前行号码

p 打印匹配的行

P 打印模板的第一行

q 退出sed

b lable 分支到脚本中带有标记的地方，如果分支不存在则分支到脚本的末尾

r file 从文件中读行

t label if分支，从最后一行开始，条件一旦满足或者T，t命令，将导致分支到带有标号的命令处，或者到脚本的末尾

T label 错误分支，从最后一行开始，一旦发生错误或者T，t命令，将导致分支到带有标号的命令处，或者到脚本的末尾

w file 写并追加模板块到文件的末尾

W file 写并追加模板块的第一行到file末尾

! 表示后面的命令对所有没有被选定的行发生作用

= 打印当前行号码

# 把注释扩展到下一个换行符以前

替换命令：

g 表示行内全面替换（全局替换配合s命令使用）

p 表示打印行

w 表示把行写入一个文件

x 表示互换模板块中的文本和缓冲区中的文本

y 表示把一个字符翻译为另外的字符（但是不用于正则表达式）

1 子串匹配的标记

& 已匹配字符串的标记

sed的基本正则表达式（BRE，Basic Regular Expression）语法：

^ 匹配行开始

$ 匹配行结束

. 匹配一个非换行符的任意字符

* 匹配0个或多个字符

[] 匹配指定范围内的一个字符

[^] 匹配不在指定范围内的一个字符

(..) 匹配子串

& 保存搜索字符用来替换其他字符

< 匹配单词的开始

> 匹配单词的结束

x{m} 重复字符x，m次

x{m,} 重复字符x，至少m次

x{m,n} 重复字符x，至少m次，不多于n次

使用 \< 来匹配单词开头，\> 来匹配单词结尾

sed的扩展正则表达式（ERE，Extended Regular Expression）语法：

\b 匹配单词边界，但默认的sed正则表达式语法不支持 \b

+ 匹配一个或多个字符

实例：

1 打印输出

只输出指定行号的行：

$ cat test.txt

abcd 12345

输出第1行和最后一行：

$ sed -n ‘1p;$p’ test.txt

abcd 12345

输出第2行和第3行：

$ sed -n ‘2p;3p’ test.txt

输出第2行、第3行和第4行：

$ sed -n ‘2p;3p;4p’ test.txt

其中-n选项取消默认输出，p命令只打印输出指定行号的行。

只输出奇数行号的行：

$ sed -n ‘p;n’ test.txt

abcd 12345

只输出偶数行号的行：

$ sed -n ‘n;p’ test.txt

从第1行开始隔行输出：

$ sed -n ‘1~2p’ test.txt

abcd 12345

从第2行开始隔行输出：

$ sed -n ‘2~2p’ test.txt

打印匹配字符串行的下一行：

$ sed -n ‘/^b/{n;p}’ test.txt

或

$ awk ‘/^b/{getline; print}’ test.txt

使用l 和 L 动作打印输出行内容，并以不同的方式显示控制字符（如不可打印字符、换行符等）：

l 动作：显示行内容，并将非打印字符（如制表符、换行符）以可视化符号显示，适用于处理 ASCII 文本。
L 动作：类似于 l，但专为处理多字节字符（如 UTF-8）设计，适合包含国际化字符的文本。

示例1

$ cat test.txt

c d 12345

执行 l 动作后，sed 会将每一行的内容打印出来，并将非打印字符（如换行、制表符等）显示为可视符号：

$ sed -n ‘l’ test.txt

ab$

c\td 12345$

其中

\t 表示制表符，\n 表示换行符。
$ 表示行的结尾，通常被 sed 用来可视化显示每行的结束。

示例2

$ cat test1.txt

Hello 世界

与 l 动作不同，L 更适用于处理多字节字符，特别是在显示非 ASCII 字符时：

$ sed -n ‘L’ test1.txt

Hello 世$

界$

其中多字节字符（如中文字符“世界”）会被正确显示为两行，其中世和界分别占用一行，这在某些编辑场景下可能是期望的效果。

注意，低版本sed不支持L选项！

2 删除

删除空行：

sed ‘/^$/d’ filename

删除第二行：

sed ‘2d’ filename

删除第二直到未尾所有行：

sed ‘2, $d’ filename

删除最后一行：

sed ‘$d’ filename

删除以test开头行：

sed ‘/^test/’d filename

3 简单的匹配和替换

echo “hello world” |sed ‘s/ /-/g’

hello-world

从第一个空格开始把空格符号全局替换成’-‘符号，只不过”hello world”文本中只有一个空格。

匹配一个完整的单词并替换：

$ echo “hello world” | sed ‘s/[a-zA-Z0-9_][a-zA-Z0-9_]*/replacement/g’

replacement replacement

其中

[a-zA-Z0-9_] 匹配一个字母、数字或下划线。
[a-zA-Z0-9_]* 匹配零个或多个后续的字母、数字或下划线

在某些支持扩展正则表达式的工具（如 sed -E 或 grep -E），你可以直接使用 + 来表示一个或多个字符：

$ echo “hello world” | sed -E ‘s/[a-zA-Z0-9_]+/replacement/g’

replacement replacement

4 进阶的匹配和替换

$ echo “hello world” | sed ‘s/[a-zA-Z0-9_][a-zA-Z0-9_]*/[&]/g’

[hello] [world]

其中&表示匹配到的子串。

通过正则表达式分组和替换实现反转输出一个字符串中的空格分隔的子串：

$ echo “abc def ghi” | sed ‘s/$[a-zA-Z]*$ $[a-zA-Z]*$ $[a-zA-Z]*$/\3 \2 \1/’

ghi def abc

如果有更多的子串，使用 sed 进行手动反转就会变得非常复杂，因为 sed 的捕获组数量有限（通常只能捕获到9个组，即 \1 到 \9）。如果需要反转更多子串，建议使用更强大的文本处理工具，如 awk 或 perl。例如：

$ echo “abc def ghi jkl mno” | awk ‘{ for (i=NF; i>0; i–) printf(“%s “, $i); print “” }’

mno jkl ghi def abc

其中

NF 表示字段数量，$i 表示第 i 个字段。
for (i=NF; i>0; i–) 从最后一个字段开始向前输出，直到第一个字段。

5 多点编辑功能

多点编辑功能可以通过 -e 选项来实现。-e 选项允许你在同一个 sed 命令中执行多个编辑操作。每个编辑命令都可以通过 -e 传递，这样你可以在一次执行中对文件或输入流进行多种编辑操作，而不需要多次调用 sed。

基本语法：

sed -e ‘command1’ -e ‘command2’ … filename

或者将多个 -e 选项合并为一个（不使用 -e 的情况下也可以）：

sed ‘command1; command2’ filename

示例1 一次完成两个替换操作

$ cat example.txt

Hello World

This is a test

Goodbye World

$ sed -e ‘s/Hello/Hi/’ -e ‘s/Goodbye/Farewell/’ example.txt

Hi World

This is a test

Farewell World

你也可以不用多次使用 -e，而是通过分号分隔多个命令：

sed ‘s/Hello/Hi/; s/Goodbye/Farewell/’ example.txt

示例2 删除和替换操作的组合

假设你想要删除文件中的第 2 行，并将 “World” 替换为 “Everyone”。你可以通过以下命令来实现：

$ cat example.txt

Hello World

This is a test

Goodbye World

$ sed -e ‘2d’ -e ‘s/World/Everyone/’ example.txt

Hello Everyone

Goodbye Everyone

6 读一个文本文件

sed默认操作就是读取文本文件内容并对其进行处理。

示例1 读取并打印文件内容

$ cat input.txt

Hello World

This is a test

Goodbye World

$ sed ” input.txt

Hello World

This is a test

Goodbye World

示例2 读取并替换文件内容

假设你想将 World 替换为 Everyone，可以这样做：

$ sed ‘s/World/Everyone/’ input.txt

Hello Everyone

This is a test

Goodbye Everyone

7 使用r动作读取文件并插入内容

r 动作用于将外部文件的内容读入并插入到当前处理的文本中。指定一个文件，sed 会将该文件的内容插入到匹配的行之后。语法：

sed ‘/pattern/r file_to_read’ input_file

其中

/pattern/：匹配模式行（可选），即插入文件内容的位置。
file_to_read：你想要读取的文件。
input_file：原始文件，sed 将对其进行处理。

示例1

假设有一个文件 data.txt，内容如下：

Line 1

Line 2

Line 3

还有另一个文件 extra.txt，内容如下：

Extra content 1

Extra content 2

如果你想在 data.txt 的匹配 Line 2的每一行后插入 extra.txt 的内容，可以使用以下命令：

sed ‘/Line 2/r extra.txt’ data.txt

输出结果：

Line 1

Line 2

Extra content 1

Extra content 2

Line 3

8 写一个文本文件

为了将 sed 的输出保存到一个新的文件，或者覆盖现有的文件，可以使用输出重定向或 -i 选项（用于直接修改文件）。

示例1 使用输出重定向写入文件

假设你想将替换后的内容写入到一个新文件 output.txt：

$ cat input.txt

Hello World

This is a test

Goodbye World

$ sed ‘s/World/Everyone/’ input.txt > output.txt

$ cat output.txt

Hello Everyone

This is a test

Goodbye Everyone

以上将 sed 的输出结果重定向到 output.txt，不会改变原始文件 input.txt 的内容。

示例2 使用 -i 选项直接修改文件本身

如果你想直接修改 input.txt 文件本身，可以使用 -i 选项：

$ sed -i ‘s/World/Everyone/’ input.txt

$ cat input.txt

Hello Everyone

This is a test

Goodbye Everyone

示例3 在文件中添加内容

你也可以通过 sed 来插入或添加内容，并保存到文件中。假设你想在input.txt文件的第 1 行之前插入一行新文本 “ID: 1234″，并将其保存到原文件中：

$ cat input.txt

Hello Everyone

This is a test

Goodbye Everyone

$ sed -i ‘1i ID: 1234’ input.txt

$ cat input.txt

ID: 1234

Hello Everyone

This is a test

Goodbye Everyone

其中1i表示在第 1 行之前插入一行新文本。

9 使用w 动作写入文件

w 动作用于将匹配的行或处理后的内容写入到一个指定的文件。它通常用于保存处理过的内容到新的文件，而不是修改原文件。语法：

sed ‘/pattern/w output_file’ input_file

其中

/pattern/：匹配模式行，符合该模式的行会被写入指定的文件。
output_file：写入的目标文件，如果文件不存在，sed 会自动创建它。
input_file：原始文件，sed 将对其进行处理。

示例1

假设有一个文件 data.txt，内容如下：

Line 1

Line 2

Line 3

你想将匹配 Line 2 的行写入到文件 output.txt 中，可以使用以下命令：

sed ‘/Line 2/w output.txt’ data.txt

执行该命令后，output.txt 文件将包含以下内容：

Line 2

示例2 结合 r 和 w

假设你想读取外部文件的内容并插入到某个模式之后，同时将匹配的行写入到另一个文件中，可以这样做：

sed ‘/Line 2/r extra.txt; /Line 2/w output.txt’ data.txt

其中

/Line 2/r extra.txt：在匹配到 Line 2 的地方插入 extra.txt 文件的内容。
/Line 2/w output.txt：将匹配的 Line 2 行写入到 output.txt。

Common Actions:

Replacement Commands:

Basic Regular Expression (BRE) Syntax in sed:

Extended Regular Expression (ERE) Syntax in sed:

Practical Examples:

Basic Regular Expression (BRE) Syntax in `sed`:

Extended Regular Expression (ERE) Syntax in `sed`: