What’s the Difference Between TSV and CSV Files?

Both TSV and CSV files are plain text formats for representing tabular data.

TSV (Tab-separated values): In a TSV file, each row uses a tab character (\t) as the delimiter between fields. You can think of it as pressing the “Tab” key to separate data values.

CSV (Comma-separated values): In a CSV file, each row uses a comma (,) as the delimiter between fields. Each value is separated by a half-width comma.

Though both formats serve similar purposes, the choice between TSV and CSV often depends on the data and the specific application you’re using.

mpstat: Displaying CPU Statistics for Each Available CPU

Functionality:
The mpstat command is used to display statistics for each available CPU.

Syntax:
mpstat [ options ]

Overview:
mpstat (Multi-Processor Statistics) is a tool for displaying statistics on the performance of individual CPUs, functioning as a real-time monitoring utility. While similar to vmstat, mpstat focuses solely on CPU performance statistics. To install mpstat on Ubuntu or CentOS, you can use the following commands:

For Ubuntu:

sudo apt install sysstat

For CentOS:

sudo yum install sysstat

Options:

  • -P: Specify a CPU number or use ALL to display statistics for all CPUs.

Examples:

Running mpstat without any options will display overall performance statistics for all CPUs:

$ mpstat
Linux 6.8.0-45-generic (Ubuntu22-VirtualBox)   2024年10月15日   _x86_64_   (2 CPU)

13:48:03     CPU   %usr   %nice   %sys   %iowait   %irq   %soft   %steal   %guest   %gnice   %idle
13:48:03     all   0.11   0.02    0.10     0.01    0.00    0.17     0.00     0.00     0.00   99.60

Using the -P ALL option provides both overall CPU performance statistics as well as detailed statistics for each individual CPU:

$ mpstat -P ALL
Linux 6.8.0-45-generic (Ubuntu22-VirtualBox)   2024年10月15日   _x86_64_   (2 CPU)

13:59:40     CPU   %usr   %nice   %sys   %iowait   %irq   %soft   %steal   %guest   %gnice   %idle
13:59:40     all   0.11   0.02    0.10     0.01    0.00    0.17     0.00     0.00     0.00   99.60
13:59:40       0   0.11   0.02    0.10     0.01    0.00    0.31     0.00     0.00     0.00   99.45
13:59:40       1   0.10   0.02    0.10     0.01    0.00    0.03     0.00     0.00     0.00   99.75

You can also specify a particular CPU by using the -P n option, where n represents the CPU number starting from 0:

$ mpstat -P 0
Linux 6.8.0-45-generic (Ubuntu22-VirtualBox)   2024年10月15日   _x86_64_   (2 CPU)

14:18:14     CPU   %usr   %nice   %sys   %iowait   %irq   %soft   %steal   %guest   %gnice   %idle
14:18:14       0   0.11   0.02    0.10     0.01    0.00    0.31     0.00     0.00     0.00   99.45

$ mpstat -P 1
Linux 6.8.0-45-generic (Ubuntu22-VirtualBox)   2024年10月15日   _x86_64_   (2 CPU)

14:18:17     CPU   %usr   %nice   %sys   %iowait   %irq   %soft   %steal   %guest   %gnice   %idle
14:18:17       1   0.10   0.02    0.10     0.01    0.00    0.03     0.00     0.00     0.00   99.75

Field Descriptions:

  • %usr: Percentage of CPU time spent in user mode (excluding processes with a negative nice value). Calculation: (usr/total)*100
  • %nice: Percentage of CPU time spent on processes with a negative nice value. Calculation: (nice/total)*100
  • %sys: Percentage of CPU time spent in kernel mode. Calculation: (system/total)*100
  • %iowait: Percentage of time spent waiting for I/O operations. Calculation: (iowait/total)*100
  • %irq: Percentage of CPU time spent handling hardware interrupts. Calculation: (irq/total)*100
  • %soft: Percentage of CPU time spent handling software interrupts. Calculation: (softirq/total)*100
  • %idle: Percentage of CPU time spent idle, excluding time waiting for I/O operations. Calculation: (idle/total)*100

mpstat显示各个可用CPU的状态统计

功能说明:mpstat显示各个可用CPU的状态统计

语  法:mpstat [ options ]

补充说明:mpstat(Multi-Processor Statistics)工具软件用于显示各个可用CPU的状态统计,是一个实时监控工具,与vmstat类似,但只能监控CPU的整体性能状态。Ubuntu系统可以通过以下命令安装mpstat:

sudo apt install sysstat

CentOS系统可以通过以下命令安装mpstat:

sudo yum install sysstat

   项:

-P           指定CPU编号或ALL值表示统计所有CPU的整体性能信息

参  数:

   例:

不加任何选项直接运行mpstat统计的是所有CPU的整体性能信息:

$ mpstat

Linux 6.8.0-45-generic (Ubuntu22-VirtualBox)       2024年10月15日 _x86_64_     (2 CPU)

13时48分03秒  CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle

13时48分03秒  all    0.11    0.02    0.10    0.01    0.00    0.17    0.00    0.00    0.00   99.60

使用-P ALL选项既能统计所有CPU的整体性能信息,又能单独统计每一个CPU的性能信息:

$ mpstat -P ALL

Linux 6.8.0-45-generic (Ubuntu22-VirtualBox)       2024年10月15日 _x86_64_     (2 CPU)

13时59分40秒  CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle

13时59分40秒  all    0.11    0.02    0.10    0.01    0.00    0.17    0.00    0.00    0.00   99.60

13时59分40秒    0    0.11    0.02    0.10    0.01    0.00    0.31    0.00    0.00    0.00   99.45

13时59分40秒    1    0.10    0.02    0.10    0.01    0.00    0.03    0.00    0.00    0.00   99.75

使用-P n选项指定CPU编号n单独统计某一个CPU的性能信息,其中n从0开始:

$ mpstat -P 0

Linux 6.8.0-45-generic (Ubuntu22-VirtualBox)       2024年10月15日 _x86_64_     (2 CPU)

14时18分14秒  CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle

14时18分14秒    0    0.11    0.02    0.10    0.01    0.00    0.31    0.00    0.00    0.00   99.45

$ mpstat -P 1

Linux 6.8.0-45-generic (Ubuntu22-VirtualBox)       2024年10月15日 _x86_64_     (2 CPU)

14时18分17秒  CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle

14时18分17秒    1    0.10    0.02    0.10    0.01    0.00    0.03    0.00    0.00    0.00   99.75

统计字段说明:

%user    在internal时间段里,用户态的CPU时间(%),不包含nice值为负的进程  (usr/total)*100

%nice    在internal时间段里,nice值为负进程的CPU时间(%)(nice/total)*100

%sys      在internal时间段里,内核时间(%)(system/total)*100

%iowait 在internal时间段里,硬盘IO等待时间(%) (iowait/total)*100
%irq      在internal时间段里,硬中断时间(%)(irq/total)*100
%soft     在internal时间段里,软中断时间(%)(softirq/total)*100
%idle     在internal时间段里,CPU除去等待磁盘IO操作外的因为任何原因而空闲的时间闲置时间(%) (idle/total)*100

AWK: A Powerful Text Processing Tool and Programming Language

Overview:
AWK is a robust text processing tool and programming language, primarily used for formatting, analyzing, and processing text on Unix and Linux systems. It excels at handling structured text like tables, CSV files, and logs.

Syntax:

awk -f 'scripts' -v var=value filename
awk 'BEGIN{ print "start" } pattern{ commands } END{ print "end" }' filename

Explanation:
AWK reads files or input streams (including stdin) line by line, processing text data based on user-specified patterns and actions. It is particularly useful for structured text. While AWK can be used directly from the command line, it is more often employed through scripts. As a programming language, AWK shares many features with C, such as arrays and functions.

Options:

  • -F: Specifies the field separator (can be a string or regular expression).
  • -f 'scripts': Reads AWK commands from the script file 'scripts'.
  • -v var=value: Assigns a value to a variable, passing external variables to AWK.

AWK Script Structure:

  • pattern: Matches specific lines.
  • { commands }: Executes actions on matching lines.
  • filename: The file to be processed by AWK.

An AWK script typically consists of three optional parts: a BEGIN block, pattern matching, and an END block. The workflow proceeds as follows:

  1. Execute the BEGIN statement.
  2. Process each line from the file or standard input, executing the pattern matching.
  3. Execute the END statement.

AWK Built-in Variables:

  • $0: The current record (line).
  • $n: The nth field (column) of the current record, where $1 is the first column and $n is the nth column.
  • FS: The field separator (default is a space or tab), can be customized with the -F option.
  • OFS: Output field separator (used for formatted output).
  • RS: Record separator (default is newline).
  • ORS: Output record separator (default is newline).
  • NR: Current line number (starting from 1).
  • NF: Number of fields (columns) in the current line.

AWK Operators:

  • Arithmetic Operators:
    +, -, *, /, %, ^
    Increment and decrement operators (++, --) can be used as prefixes or suffixes.
  • Assignment Operators:
    =, +=, -=, *=, /=, %=, ^=
  • Regular Expression Operators:
    ~: Matches regular expression.
    !~: Does not match regular expression.
  • Logical Operators:
    ||: Logical OR
    &&: Logical AND
  • Relational Operators:
    <, <=, >, >=, !=, ==
  • Other Operators:
    $: Refers to a field by its number.
    Space: Concatenates strings.
    ?:: Ternary operator.
    in: Checks if a key exists in an array.

AWK Regular Expression Syntax:

  • ^: Matches the start of a line.
  • $: Matches the end of a line.
  • .: Matches any single character.
  • *: Matches zero or more occurrences of the preceding character.
  • +: Matches one or more occurrences of the preceding character.
  • ?: Matches zero or one occurrence of the preceding character.
  • []: Matches any character in the specified range.
  • [^]: Matches any character not in the specified range.
  • () and |: Subexpressions and alternations.
  • \: Escape character.
  • {m}: Matches exactly m occurrences of a character.
  • {m,}: Matches at least m occurrences.
  • {m,n}: Matches between m and n occurrences.

AWK Built-in Functions:

  • toupper(): Converts all lowercase letters to uppercase.
  • length(): Returns the length of a string.

Custom Functions in AWK: AWK scripts can include user-defined functions. For example:

function square(x) {
  return x * x;
}

To use the function:

awk '{ print square($1) }' file.txt

AWK Control Flow Statements:

  • if-else: Conditional statements.
  • while and do-while: Loops.
  • for: Standard loops, including array traversal with for-in.
  • break and continue: Loop control.
  • exit: Terminates the script execution.
  • next: Skips the remaining commands for the current line.
  • return: Returns a value from a function.
  • ?:: Ternary operator for conditional expressions.

AWK Arrays: AWK supports associative arrays, meaning array indexes can be strings as well as numbers. Arrays in AWK don’t need to be declared or sized; they are created as soon as you assign a value to an index.

Examples:

1. Basic Example:

$ echo "hello" | awk 'BEGIN{ print "start" } END{ print "end" }'
start
end

2. Using Built-in Variables: To print the first and third columns of a file:

awk '{ print $1, $3 }' test.txt

3. Using External Variables:

$ a=100
$ b=100
$ echo | awk '{ print v1 * v2 }' v1=$a v2=$b
10000

4. Using Regular Expressions: To print the second column of lines starting with “a”:

awk '/^a/ { print $2 }' test.txt

5. Using Built-in Functions: Convert all lowercase letters to uppercase:

awk '{ print toupper($0) }' test.txt

6. Handling Different Delimiters: For CSV files with comma-separated values:

awk -F ',' '{ print $1, $2 }' test.csv

7. Writing and Running AWK Scripts: Save an AWK script to a file (e.g., script.awk):

BEGIN { FS=","; OFS=" - " }
{ print $1, $3 }

Run the script:

awk -f script.awk test.csv

Conclusion:

AWK is a versatile and powerful tool for text processing, offering rich features like pattern matching, regular expressions, and scripting capabilities. From simple one-liners to complex data analysis scripts, AWK excels at processing structured text efficiently and flexibly.

awk文本处理工具和编程语言

功能说明:awk是一个强大的文本处理工具和编程语言,主要用于在 Unix 和 Linux 系统中对文本进行格式化、分析和处理。

语  法:awk -f ‘scripts’ -v var=value filename

awk ‘BEGIN{ print “start” } pattern{ commands } END{ print “end” }’ filename

补充说明:awk 可以逐行读取文件或者输入流(包括stdin),按照用户指定的模式和操作来处理文本数据,特别适用于结构化的文本(如表格、CSV、日志等结构化数据)。awk可在命令行中使用,但更多是作为脚本来使用。awk作为一门编程语言有很多内建的功能,比如数组、函数等,这是它和C语言的相似之处。

   项:

-F                         指定分隔符(可以是字符串或正则表达式)

-f ‘scripts’             从脚本文件’scripts’中读取awk命令

-v var=value       赋值变量,将外部变量传递给awk

awk脚本基本结构:

pattern                 用于匹配特定的行

{ commands }     用于对匹配的行执行操作

filename                     要被awk处理的文件

一个awk脚本通常由BEGIN语句+模式匹配+END语句三部分组成,这三部分都是可选项。工作步骤:

第一步,执行BEGIN语句

第二步,从文件或标准输入读取一行,然后再执行pattern语句,以此类推,逐行扫描文件到文件全部被读取

第三步,执行END语句

awk内置变量:

awk默认将每行文本按照空格或特定分隔符分成多个字段(列),每个字段可以通过 $ 符号访问:

$0          当前记录(行)

$n          当前记录(行)的第n个字段(列),$1代表第1列,$n代表第n列

FS          字段(列)的分隔符(默认是空格或制表符),可以使用-F选项自定义分隔符

OFS       输出字段(列)分隔符(用于格式化输出)

RS         记录(行)的分隔符,默认是换行符

ORS      输出记录(行)的分隔符,默认是换行符

NR         当前处理的行号,默认从1开始

NF         当前行的字段(列)数

awk运算符:

算术运算符:

+     加

–      减

*     乘

/      除

%    求余

^     求幂

++         自增,作为前缀或后缀

—           自减,作为前缀或后缀

注意,非数值的变量在使用算术运算符时会被自动转换为0

赋值运算符:

=

+=

-=

*=

/=

%=

^=

正则运算符:

~     匹配正则表达式

!~    不匹配正则表达式

逻辑运算符:

||     逻辑或 

&& 逻辑与

关系运算符:

<=

>=

!=

== 

其它运算符:

$            通过序号引用字段(列)

空格      字符串链接符

?:           三目运算符

ln           数组中是否存在某键值

awk正则表达式语法:

^            行首定位符

$            行尾定位符

.             匹配任意单个字符

*            匹配0个或多个前导字符(包括回车)

+            匹配1个或多个前导字符

?            匹配0个或1个前导字符

[]           匹配指定字符组内的任意一个字符/^[ab]

[^]          匹配不在指定字符组内的任意一个字符

()           子表达式

|             或

\             转义符

~            匹配条件语句

!~           不匹配条件语句

x{m}     x字符重复m次

x{m,}    x字符至少重复m次

X{m,n} x字符至少重复m次但不起过n次(需指定选项-posix或–re-interval)

awk打印(输出)命令:

print简单输出,用于输出字段(列)或文本,自动在每个字段(列)间插入空格,并在行末自动换行

printf提供了更多格式控制,可以指定字段的输出格式、宽度、对齐方式等。它类似于 C 语言的 printf 函数。若要输出换行,需要手动添加换行符\n

awk读取输入命令:

getline用于读取文件或命令的输出

awk内置函数:

toupper函数将所有小写字母转换成大写字母

length函数返回字符个数

system函数可以用来调用系统命令,虽然它不直接打印到终端,但可以执行其他命令并返回执行结果

awk自定义函数:

awk脚本中可以定义自己的函数,例如

function square(x) {

  return x * x

}

awk ‘{ print square($1) }’ file.txt

awk流程控制语句:

if-else 条件判断

while 和 do-while 循环

for 循环,包括 for-in 数组遍历

break 和 continue 控制循环的执行

exit 终止脚本执行

next 跳过当前行

return 函数返回值

三元条件表达式 ? :

awk数组:

awk 支持关联数组,这意味着数组的索引不仅可以是整数,也可以是字符串。awk 中的数组无需声明,也不需要定义大小,直接通过索引赋值即可使用。

    例:

1 输入和输出

$ echo “hello ” | awk ‘BEGIN{ print “start” } END{ print “end” }’

start

end

打印读取到的文本:

$ echo “hello ” | awk ‘BEGIN{ print “start” } {print} END{ print “end” }’

start

hello

end

打印整行:

$ echo “Hello World” | awk ‘{print $0}’

Hello World

打印特定字段(列):

$ echo “Alice 30” | awk ‘{print $1}’ # 输出第一列,awk默认以空白符分隔列

Alice

打印多个字段:

$ echo “Alice 30” | awk ‘{print $1, $2}’

Alice 30

简单格式化输出

$ echo “Alice 30” | awk ‘{printf “%s is %d years old\n”, $1, $2}’

Alice is 30 years old

指定输出列的宽度:

$ echo “Alice 30” | awk ‘{printf “%-10s %-5d\n”, $1, $2}’ # %-10s 表示左对齐,宽度为10;%-5d 表示左对齐,宽度为5

Alice      30

控制小数点位数:

$ echo “3.14159” | awk ‘{printf “%.2f\n”, $1}’

3.14

输出到文件或追加到文件:

awk 的 print 和 printf 命令都可以配合 >, >> 操作符,将内容输出到文件中:

  • 输出到文件:使用 > 将输出重定向到文件(会覆盖文件内容)。
  • 追加到文件:使用 >> 将输出追加到文件(不会覆盖已有内容)。

将输出写入文件:

$ echo “Alice 30” | awk ‘{print $1, $2 > “output.txt”}’ # 结果输出到 output.txt 文件中

将输出追加到文件:

$ echo “Bob 25” | awk ‘{print $1, $2 >> “output.txt”}’ # 输出会追加到 output.txt 文件末尾

重定向输出到标准错误stderr:

$ echo “Error message” | awk ‘{print $0 > “/dev/stderr”}’

输出多列时自定义列的分隔符:

$ echo -e “Alice 30\nBob 25″ | awk ‘BEGIN { OFS=” | ” } {print $1, $2}’

Alice | 30

Bob | 25

使用 getline 读取文件或命令的输出:

$ awk ‘BEGIN { while ((getline line < “input.txt”) > 0) print line }’ # 从 input.txt 文件中逐行读取并输出内容

2 普通变量的定义和使用

定义多个变量,然后打印它们:

$ echo|awk ‘{ a=”aa”; b=”bb”; c=”cc”; print a,b,c; }’

aa bb cc

print语句中的””起到拼接字符串的作用:

$ echo|awk ‘{ a=”aa”; b=”bb”; c=”cc”; print a” is “b” or “c; }’

aa is bb or cc

3 内置变量的使用

打印文件中的第一列和第三列:

$ cat test.txt

aa bb cc dd ee

11 22 33 44 55

xyz yui tt

$ awk ‘{ print $1, $3 }’ test.txt

aa cc

11 33

xyz tt

$ awk ‘{ print $n }’ test.txt

aa bb cc dd ee

11 22 33 44 55

xyz yui tt

打印第二列第一列并以,符号分隔:

$ awk ‘{print $2″,”$1}’  test.txt

bb,aa

22,11

yui,xyz

打印每行的行号和字段(列)数:

$ awk ‘{ print NR, NF }’ test.txt

1 5

2 5

3 3

4 外部变量的使用

$ a=100

$ b=100

$ echo |awk ‘{print v1*v2 }’ v1=$a v2=$b

10000

$ e=eee

$ echo |awk ‘{print e }’ e=$e

eee

在shell中,awk可以直接使用shell的环境变量。

5 awk运算符的使用

$ awk ‘BEGIN{a=”b”;print a,a++,a–,++a;}’

b 0 1 1

$ awk ‘BEGIN{a=”0″;print a,a++,a–,++a;}’

0 0 1 1

$ awk ‘BEGIN{a=7;b=2;print a/b;}’

3.5

$ awk ‘BEGIN{a=7;b=2;print a%b;}’

1

$ awk ‘BEGIN{a=7;b=2;print a^b;}’

49

$ awk ‘BEGIN{a=7;b=2;print a==b;}’

0

$ awk ‘BEGIN{a=7;b=2;print a=b;}’

2

打印第二列是22的行第一列:

$ cat test.txt

aa bb cc dd ee

11 22 33 44 55

xyz yui tt

$ awk ‘($2 == 22) {print $1}’ test.txt

11

打印第3列的值大于66的行:

$ awk ‘$3 > 66’ test.txt

aa bb cc dd ee

xyz yui tt

? :三目运算符的使用示例:

$ awk ‘BEGIN{a=”b”;print a==”b”?”yes”:”no”}’

yes

6 正则表达式的使用

$ cat test.txt

aa bb cc dd ee

11 22 33 44 55

xyz yui tt

打印以a开头的行的第2列:

$ awk ‘/^a/{print $2}’ test.txt

bb

打印以a开头的行的第2列,并在第2列前加上aaa子串:

$ awk ‘/^a/{print “aaa”$2}’ test.txt

aaabb

打印第1列匹配xyz的行的第3列的值:

$ awk ‘$1~/xyz/ {print $3}’ test.txt

tt

如果变量a中包含test子串,那么打印yes:

$ echo|awk ‘BEGIN{a=”100testaaa”}a~/test/{print “yes”}’

yes

输出所有包含子串root的行:

awk ‘/root/{print $0}’ test.txt

7 awk内置函数的使用

toupper函数将所有小写字母转换成大写字母:

$ cat test.txt

aa bb cc dd ee

11 22 33 44 55

xyz yui tt

$ awk ‘{print toupper($0)}’ test.txt

AA BB CC DD EE

11 22 33 44 55

XYZ YUI TT

length函数返回字符个数。打印长度为3个字符的第一列的内容:

$ awk ‘{if(length($1) == 3) print $1}’ test.txt

xyz

8 处理不同分隔符的文件

通过设置 -F 选项来指定输入的字段的分隔符。例如,对于以逗号分隔的 CSV 文件:

$ cat test.csv

ttc,yui,layui,tailwind

c,c++,go,java,python,php,js

apple,google,facebook,reddit,twitter,amazon

john@Ubuntu22-VirtualBox:~/test$ awk -F ‘,’ ‘{ print $1, $2 }’ test.csv

ttc yui

c c++

apple google

如果不使用-F ‘,’选项,就无法正确处理CSV文件:

$ awk ‘{ print $1, $2 }’ test.csv

ttc,yui,layui,tailwind

c,c++,go,java,python,php,js

apple,google,facebook,reddit,twitter,amazon

9 awk脚本的编写和使用

除了命令行上使用,我们也可以编写awk脚本文件,适合复杂的数据处理任务。例如,保存以下代码到 script.awk 文件中:

BEGIN { FS=”,”; OFS=” – ” }

{ print $1, $3 }

然后用 awk -f script.awk test.csv 来执行这个脚本:

$ awk -f script.awk test.csv

ttc – layui

c – go

apple – facebook

10 控制流程的使用

10.1 if-else 语句(条件判断)

$ cat test.txt

aa bb cc dd ee

11 22 33 44 55

xyz yui tt

$ awk ‘{ if ($3 > 66) print $1, $3; else print $3, “not greater” }’ test.txt

aa cc

33 not greater

xyz tt

可以嵌套多个 if-else 语句:

awk ‘{ if ($1 > 10) print “Greater”; else if ($1 == 10) print “Equal”; else print “Smaller” }’ file.txt

10.2 while 语句(循环)

awk ‘{ i = 1; while (i <= NF) { print $i; i++ } }’ file.txt

10.3 do-while 语句(后测试循环)

do-while 循环会先执行一次循环体,然后检查条件是否为真:

awk ‘{ i = 1; do { print $i; i++ } while (i <= NF) }’ file.txt

10.4 for 语句(循环)

awk ‘{ for (i = 1; i <= NF; i++) print $i }’ file.txt

10.5 for-in遍历数组的所有索引

awk ‘{ for (i in arr) print arr[i] }’

10.6 break 语句(跳出循环)

awk ‘{ for (i = 1; i <= NF; i++) { if ($i == “stop”) break; print $i } }’ file.txt

10.7 continue 语句(继续下一次循环)

continue 语句用于跳过本次循环的剩余部分,并继续执行下一次循环:

awk ‘{ for (i = 1; i <= NF; i++) { if ($i == “skip”) continue; print $i } }’ file.txt

10.8 exit 语句(退出程序)

exit 语句用于终止 awk 脚本的执行,退出脚本时可以指定退出状态码:

awk ‘{ if ($1 == “exit”) exit; print $1 }’ file.txt

立即终止脚本的执行,剩余的行不会被处理。

可以在 END 块中使用 exit 返回状态码:

awk ‘END { if (NR == 0) exit 1 }’ file.txt

10.9 next 语句(跳到下一行)

next 语句用于跳过当前行的剩余操作,直接处理下一行:

awk ‘{ if ($1 == “skip”) next; print $1 }’ file.txt

10.10 return 语句(函数中使用)

return 语句用于在函数中返回值并退出函数:

function square(x) {

  return x * x

}

awk ‘{ print square($1) }’ file.txt

10.11 条件表达式(三元操作符)

awk ‘{ print ($1 > 10 ? “Greater” : “Smaller”) }’ file.txt

11 数组

11.1 定义和使用数组

awk 中通过赋值操作可以直接定义数组。例如:

awk ‘BEGIN { arr[1] = “apple”; arr[2] = “banana”; print arr[1], arr[2] }’

在这个例子中,arr[1] 和 arr[2] 定义了两个元素,分别存储 “apple” 和 “banana”。

使用字符串作为数组的索引:

awk ‘BEGIN { arr[“fruit”] = “apple”; print arr[“fruit”] }’

11.2 遍历数组

你可以使用 for-in 循环遍历数组中的所有元素,数组中的索引会被 for 循环访问:

awk ‘BEGIN {

    arr[1] = “apple”;

    arr[2] = “banana”;

    arr[3] = “cherry”;

    for (i in arr) {

        print i, arr[i];

    }

}’

这个例子会遍历数组 arr,输出数组的索引和对应的值。值得注意的是,awk 中 for-in 循环的遍历顺序并不保证是按索引的顺序,具体顺序取决于 awk 的实现。

11.3 删除数组元素

可以使用 delete 语句删除数组中的某个元素:

awk ‘BEGIN {

    arr[1] = “apple”;

    arr[2] = “banana”;

    delete arr[1];

    print arr[1];  # 输出为空,因为arr[1]已被删除

}’

delete 会将指定的数组元素完全移除,后续对该索引的访问将返回空值。

11.4 计算数组长度

awk 本身没有直接的函数来计算数组的长度,但你可以通过遍历数组来计算元素个数。例如:

awk ‘BEGIN {

    arr[1] = “apple”;

    arr[2] = “banana”;

    arr[3] = “cherry”;

    count = 0;

    for (i in arr) {

        count++;

    }

    print “Array length:”, count;

}’

11.5 多维数组

虽然 awk 本质上是支持一维数组,但可以通过组合索引来实现多维数组的效果。你可以使用多个索引作为键,例如:

awk ‘BEGIN {

    arr[1,1] = “apple”;

    arr[1,2] = “banana”;

    arr[2,1] = “cherry”;

    print arr[1,1];  # 输出apple

    print arr[1,2];  # 输出banana

    print arr[2,1];  # 输出cherry

}’

这里 arr[1,1] 和 arr[1,2] 类似于二维数组的定义,逗号 , 将多个索引组合在一起。

11.6 数组的默认值

在 awk 中,未被初始化的数组元素会默认返回空字符串或 0,具体取决于如何使用。例如:

awk ‘BEGIN {

    print arr[1];  # 输出为空,因为arr[1]未被初始化

}’

如果试图对一个未初始化的数组元素进行运算操作,它将被视为 0。

12 综合实例

获取enp0s3网卡的IP地址:

$ ifconfig enp0s3|awk ‘BEGIN{FS=”[[:space:]:]+”} NR==2{print $3}’

10.0.2.15

打印当前目录下的子目录和文件的总字节数,以MB为单位:

$ ls -alh|awk ‘BEGIN{size=0;} {size=size+$5;} END{print “total size is “,size/1024/1024,”MB”}’

total size is  0.000601768 MB

A Guide to the sed Stream Editor

Function Overview:
sed is a stream editor that reads text from files or input streams line by line, edits the text according to user-specified patterns or commands, and then outputs the result to the screen or a file. When used in conjunction with regular expressions, it is incredibly powerful.

Syntax:

sed [options] 'command' file(s)
sed [options] -f scriptfile file(s)

Explanation:
sed first stores each line of the text in a temporary buffer called the “pattern space.” It then processes the content of this buffer according to the given sed commands. Once the processing is complete, the result is output to the terminal, and sed moves on to the next line. The content of the file itself is not altered unless the -i option is used. sed is mainly used to edit one or more text files, simplify repeated text file operations, or create text transformation scripts. Its functionality is similar to awk, but sed is simpler and less capable of handling column-specific operations, while awk is more powerful in that regard.

Options:

  • -e: Use the specified commands to process the input text file.
  • -n: Suppress automatic output (only prints lines modified when used with the p command).
  • -h: Display help information.
  • -V: Display version information.

Parameters:

  • command: The command to be executed.
  • file(s): One or more text files to be processed.
  • scriptfile: A file containing a list of commands to execute.

Common Actions:

  • a: Append text after the current line.
  • i: Insert text before the current line.
  • c: Replace the selected lines with new text.
  • d: Delete the selected lines.
  • D: Delete the first line of the pattern block.
  • s: Replace specified characters.
  • h: Copy the pattern block’s content to an internal buffer.
  • H: Append the pattern block’s content to the internal buffer.
  • g: Retrieve content from the internal buffer and replace the text in the current pattern block.
  • G: Retrieve content from the internal buffer and append it to the current pattern block.
  • l: List non-printable characters in the text.
  • L: Similar to l, but specifically for handling non-ASCII characters.
  • n: Read the next input line and apply the next command to it instead of reapplying the first command.
  • N: Append the next input line to the current pattern block and insert a new line between them, changing the current line number.
  • p: Print the matching lines.
  • P: Print the first line of the pattern block.
  • q: Quit sed.
  • b label: Branch to the location marked by label in the script; if the label doesn’t exist, the branch goes to the end of the script.
  • r file: Read lines from a file.
  • t label: Conditional branch to a marked location, starting from the last line. If the condition is met, or a T/t command is used, the branch jumps to the specified label or the end of the script.
  • T label: Error branch. If an error occurs, this branches to the labeled command or the end of the script.
  • w file: Write the processed block of the pattern space to the end of a file.
  • W file: Write the first line of the pattern space to the end of a file.
  • !: Execute the following commands on all lines not selected by the current pattern.
  • =: Print the current line number.
  • #: Extend comments to the next newline character.

Replacement Commands:

  • g: Global replacement within a line (used with the s command).
  • p: Print the line.
  • w: Write the line to a file.
  • x: Exchange the text in the pattern block with the text in the internal buffer.
  • y: Translate one character to another (not used with regular expressions).
  • &: Reference to the matched string.

Basic Regular Expression (BRE) Syntax in sed:

  • ^: Match the beginning of a line.
  • $: Match the end of a line.
  • .: Match any single character except a newline.
  • *: Match zero or more of the preceding characters.
  • []: Match a single character from a specified range.
  • [^]: Match a single character not in the specified range.
  • (..): Capture a substring.
  • &: Save the matched text for later use in replacements.
  • <: Match the start of a word.
  • >: Match the end of a word.
  • x{m}: Match exactly m occurrences of x.
  • x{m,}: Match at least m occurrences of x.
  • x{m,n}: Match between m and n occurrences of x.

To match the start of a word, use \<. To match the end of a word, use \>.

Extended Regular Expression (ERE) Syntax in sed:

  • \b: Match a word boundary (not supported by default in sed regular expressions).
  • +: Match one or more occurrences of the preceding character.

Practical Examples:

1 Print specific lines:
To print only lines 1 and the last line:

sed -n '1p;$p' test.txt

2 Delete lines:
To delete the second line:

sed '2d' filename

3 Basic match and replace:
Replace spaces with hyphens:

echo "hello world" | sed 's/ /-/g'

4 Advanced match and replace:
Reverse words in a string:

echo "abc def ghi" | sed 's/\([a-zA-Z]*\) \([a-zA-Z]*\) \([a-zA-Z]*\)/\3 \2 \1/'

5 Multiple edits:
Replace “Hello” with “Hi” and “Goodbye” with “Farewell” in one command:

sed 's/Hello/Hi/; s/Goodbye/Farewell/' example.txt

6 Read a file:
Insert content from an external file after lines matching a pattern:

sed '/Line 2/r extra.txt' data.txt

7 Write to a file:
Save processed content into a new file:

sed 's/World/Everyone/' input.txt > output.txt

In summary, sed is a versatile and efficient tool for editing text in a stream, offering powerful pattern matching and text transformation capabilities when combined with regular expressions. From basic line printing to advanced text manipulation, sed serves a wide range of text processing needs.

sed流编辑器

功能说明:sed是一种流编辑器,能够从文件或输入流中逐行读取文本,并根据用户指定的模式或命令对文本进行编辑,之后将结果输出到屏幕或文件中。配合正则表达式使用功能强大。

语  法:

sed [options] ‘command’ file(s)

sed [options] -f scriptfile file(s)

补充说明:sed先把当前处理的一行文本存储在临时缓冲区中,称为“模式空间”,接着用sed命令处理缓冲区的内容,完成后输出到终端,接着处理下一行文本。文件内容并没有被改变,除非使用-i选项。sed主要用来编辑一个或多个文本文件,简化对文本文件的反复操作或者用来编写文本转换程序等。sed功能同awk类似,差别在于sed更加简单,对列处理的功能要差一些,awk功能复杂,对列处理的功能比较强大。

   项:

-e    以指定的指令来处理输入的文本文件

-n    取消默认输出(如果和p命令同时使用只会打印发生改变的行)

-h    显示帮助信息

-V   显示版本信息

参  数:

command     命令

file(s)           一个或多个文本文件

scriptfile       存放了命令的脚本文件

   作:

a     在当前行下面插入文本

i      在当前行上面插入文本

c     把选定的行改为新的文本

d     删除选择的行

D    删除模板块的第一行

s      替换指定字符

h     拷贝模板块的内容到内存中的缓冲区

H    追加模板块的内容到内存中的缓冲区

g     获得内存缓冲区的内容,并替代当前模板块中的文本

G    获得内存缓冲区的内容,并追加到当前模板块文本的后面

l      列出不能打印字符的清单

L  列出不能打印字符的清单,该选项用于非ASCII字符

n     读取下一个输入行,用下一个命令处理新的行而不是用第一个命令

N    追加下一个输入行到模板块后面并在二者间嵌入一个新行,改变当前行号码

p     打印匹配的行

P     打印模板的第一行

q     退出sed

b     lable 分支到脚本中带有标记的地方,如果分支不存在则分支到脚本的末尾

r      file 从文件中读行

t      label if分支,从最后一行开始,条件一旦满足或者T,t命令,将导致分支到带有标号的命令处,或者到脚本的末尾

T     label 错误分支,从最后一行开始,一旦发生错误或者T,t命令,将导致分支到带有标号的命令处,或者到脚本的末尾

w    file 写并追加模板块到文件的末尾

W   file 写并追加模板块的第一行到file末尾

!      表示后面的命令对所有没有被选定的行发生作用

=     打印当前行号码

#     把注释扩展到下一个换行符以前

替换命令:

g     表示行内全面替换(全局替换配合s命令使用)

p     表示打印行

w    表示把行写入一个文件

x     表示互换模板块中的文本和缓冲区中的文本

y     表示把一个字符翻译为另外的字符(但是不用于正则表达式)

1     子串匹配的标记 

&    已匹配字符串的标记

sed的基本正则表达式(BREBasic Regular Expression)语法:

^     匹配行开始

$     匹配行结束

.      匹配一个非换行符的任意字符

*     匹配0个或多个字符

[]    匹配指定范围内的一个字符

[^]   匹配不在指定范围内的一个字符

(..)  匹配子串

&    保存搜索字符用来替换其他字符

<     匹配单词的开始

>     匹配单词的结束

x{m}     重复字符x,m次

x{m,}    重复字符x,至少m次

x{m,n}  重复字符x,至少m次,不多于n次

使用 \< 来匹配单词开头,\> 来匹配单词结尾

sed的扩展正则表达式(EREExtended Regular Expression)语法:

\b    匹配单词边界,但默认的sed正则表达式语法不支持 \b

+     匹配一个或多个字符

    例:

1 打印输出

只输出指定行号的行:

$ cat test.txt

abcd 12345

b

c

d

e

输出第1行和最后一行:

$ sed -n ‘1p;$p’ test.txt

abcd 12345

e

输出第2行和第3行:

$ sed -n ‘2p;3p’ test.txt

b

c

输出第2行、第3行和第4行:

$ sed -n ‘2p;3p;4p’ test.txt

b

c

d

其中-n选项取消默认输出,p命令只打印输出指定行号的行。

只输出奇数行号的行:

$ sed -n ‘p;n’ test.txt

abcd 12345

c

e

只输出偶数行号的行:

$ sed -n ‘n;p’ test.txt

b

d

从第1行开始隔行输出:

$ sed -n ‘1~2p’ test.txt

abcd 12345

c

e

从第2行开始隔行输出:

$ sed -n ‘2~2p’ test.txt

b

d

打印匹配字符串行的下一行:

$ sed -n ‘/^b/{n;p}’ test.txt

c

$ awk ‘/^b/{getline; print}’ test.txt

c

使用l 和 L 动作打印输出行内容,并以不同的方式显示控制字符(如不可打印字符、换行符等):

  • l 动作:显示行内容,并将非打印字符(如制表符、换行符)以可视化符号显示,适用于处理 ASCII 文本。
  • L 动作:类似于 l,但专为处理多字节字符(如 UTF-8)设计,适合包含国际化字符的文本。

示例1

$ cat test.txt

ab

c     d 12345

b

c

d

e

执行 l 动作后,sed 会将每一行的内容打印出来,并将非打印字符(如换行、制表符等)显示为可视符号:

$ sed -n ‘l’ test.txt

ab$

c\td 12345$

b$

c$

d$

e$

其中

  • \t 表示制表符,\n 表示换行符。
  • $ 表示行的结尾,通常被 sed 用来可视化显示每行的结束。

示例2

$ cat test1.txt

Hello 世界

与 l 动作不同,L 更适用于处理多字节字符,特别是在显示非 ASCII 字符时:

$ sed -n ‘L’ test1.txt

Hello 世$

界$

其中多字节字符(如中文字符“世界”)会被正确显示为两行,其中 世 和 界 分别占用一行,这在某些编辑场景下可能是期望的效果。

注意,低版本sed不支持L选项!

2 删除

删除空行:

sed ‘/^$/d’ filename

删除第二行:

sed ‘2d’ filename

删除第二直到未尾所有行:

sed ‘2, $d’ filename

删除最后一行:

sed ‘$d’ filename

删除以test开头行:

sed ‘/^test/’d filename

3 简单的匹配和替换

echo “hello world” |sed ‘s/ /-/g’

hello-world 

从第一个空格开始把空格符号全局替换成’-‘符号,只不过”hello world”文本中只有一个空格。

匹配一个完整的单词并替换:

$ echo “hello world” | sed ‘s/[a-zA-Z0-9_][a-zA-Z0-9_]*/replacement/g’

replacement replacement

其中

  • [a-zA-Z0-9_] 匹配一个字母、数字或下划线。
  • [a-zA-Z0-9_]* 匹配零个或多个后续的字母、数字或下划线

在某些支持扩展正则表达式的工具(如 sed -E 或 grep -E),你可以直接使用 + 来表示一个或多个字符:

$ echo “hello world” | sed -E ‘s/[a-zA-Z0-9_]+/replacement/g’

replacement replacement

4 进阶的匹配和替换

$ echo “hello world” | sed ‘s/[a-zA-Z0-9_][a-zA-Z0-9_]*/[&]/g’

[hello] [world]

其中&表示匹配到的子串。

通过正则表达式分组和替换实现反转输出一个字符串中的空格分隔的子串:

$ echo “abc def ghi” | sed ‘s/\([a-zA-Z]*\) \([a-zA-Z]*\) \([a-zA-Z]*\)/\3 \2 \1/’

ghi def abc

如果有更多的子串,使用 sed 进行手动反转就会变得非常复杂,因为 sed 的捕获组数量有限(通常只能捕获到9个组,即 \1 到 \9)。如果需要反转更多子串,建议使用更强大的文本处理工具,如 awk 或 perl。例如:

$ echo “abc def ghi jkl mno” | awk ‘{ for (i=NF; i>0; i–) printf(“%s “, $i); print “” }’

mno jkl ghi def abc

其中

  • NF 表示字段数量,$i 表示第 i 个字段。
  • for (i=NF; i>0; i–) 从最后一个字段开始向前输出,直到第一个字段。

5 多点编辑功能

多点编辑功能可以通过 -e 选项来实现。-e 选项允许你在同一个 sed 命令中执行多个编辑操作。每个编辑命令都可以通过 -e 传递,这样你可以在一次执行中对文件或输入流进行多种编辑操作,而不需要多次调用 sed。

基本语法:

sed -e ‘command1’ -e ‘command2’ … filename

或者将多个 -e 选项合并为一个(不使用 -e 的情况下也可以):

sed ‘command1; command2’ filename

示例1 一次完成两个替换操作

$ cat example.txt

Hello World

This is a test

Goodbye World

$ sed -e ‘s/Hello/Hi/’ -e ‘s/Goodbye/Farewell/’ example.txt

Hi World

This is a test

Farewell World

你也可以不用多次使用 -e,而是通过分号分隔多个命令:

sed ‘s/Hello/Hi/; s/Goodbye/Farewell/’ example.txt

示例2 删除和替换操作的组合

假设你想要删除文件中的第 2 行,并将 “World” 替换为 “Everyone”。你可以通过以下命令来实现:

$ cat example.txt

Hello World

This is a test

Goodbye World

$ sed -e ‘2d’ -e ‘s/World/Everyone/’ example.txt

Hello Everyone

Goodbye Everyone

6 读一个文本文件

sed默认操作就是读取文本文件内容并对其进行处理。

示例1 读取并打印文件内容

$ cat input.txt

Hello World

This is a test

Goodbye World

$ sed ” input.txt

Hello World

This is a test

Goodbye World

示例2 读取并替换文件内容

假设你想将 World 替换为 Everyone,可以这样做:

$ sed ‘s/World/Everyone/’ input.txt

Hello Everyone

This is a test

Goodbye Everyone

7 使用r动作读取文件并插入内容

r 动作用于将外部文件的内容读入并插入到当前处理的文本中。指定一个文件,sed 会将该文件的内容插入到匹配的行之后。语法:

sed ‘/pattern/r file_to_read’ input_file

其中

  • /pattern/:匹配模式行(可选),即插入文件内容的位置。
  • file_to_read:你想要读取的文件。
  • input_file:原始文件,sed 将对其进行处理。

示例1

假设有一个文件 data.txt,内容如下:

Line 1

Line 2

Line 3

还有另一个文件 extra.txt,内容如下:

Extra content 1

Extra content 2

如果你想在 data.txt 的匹配 Line 2的每一行后插入 extra.txt 的内容,可以使用以下命令:

sed ‘/Line 2/r extra.txt’ data.txt

输出结果:

Line 1

Line 2

Extra content 1

Extra content 2

Line 3

8 写一个文本文件

为了将 sed 的输出保存到一个新的文件,或者覆盖现有的文件,可以使用输出重定向或 -i 选项(用于直接修改文件)。

示例1 使用输出重定向写入文件

假设你想将替换后的内容写入到一个新文件 output.txt:

$ cat input.txt

Hello World

This is a test

Goodbye World

$ sed ‘s/World/Everyone/’ input.txt > output.txt

$ cat output.txt

Hello Everyone

This is a test

Goodbye Everyone

以上将 sed 的输出结果重定向到 output.txt,不会改变原始文件 input.txt 的内容。

示例2 使用 -i 选项直接修改文件本身

如果你想直接修改 input.txt 文件本身,可以使用 -i 选项:

$ sed -i ‘s/World/Everyone/’ input.txt

$ cat input.txt

Hello Everyone

This is a test

Goodbye Everyone

示例3 在文件中添加内容

你也可以通过 sed 来插入或添加内容,并保存到文件中。假设你想在input.txt文件的第 1 行之前插入一行新文本 “ID: 1234″,并将其保存到原文件中:

$ cat input.txt

Hello Everyone

This is a test

Goodbye Everyone

$ sed -i ‘1i ID: 1234’ input.txt

$ cat input.txt

ID: 1234

Hello Everyone

This is a test

Goodbye Everyone

其中1i表示在第 1 行之前插入一行新文本。

9 使用w 动作写入文件

w 动作用于将匹配的行或处理后的内容写入到一个指定的文件。它通常用于保存处理过的内容到新的文件,而不是修改原文件。语法:

sed ‘/pattern/w output_file’ input_file

其中

  • /pattern/:匹配模式行,符合该模式的行会被写入指定的文件。
  • output_file:写入的目标文件,如果文件不存在,sed 会自动创建它。
  • input_file:原始文件,sed 将对其进行处理。

示例1

假设有一个文件 data.txt,内容如下:

Line 1

Line 2

Line 3

你想将匹配 Line 2 的行写入到文件 output.txt 中,可以使用以下命令:

sed ‘/Line 2/w output.txt’ data.txt

执行该命令后,output.txt 文件将包含以下内容:

Line 2

示例2 结合 r 和 w

假设你想读取外部文件的内容并插入到某个模式之后,同时将匹配的行写入到另一个文件中,可以这样做:

sed ‘/Line 2/r extra.txt; /Line 2/w output.txt’ data.txt

其中

  • /Line 2/r extra.txt:在匹配到 Line 2 的地方插入 extra.txt 文件的内容。
  • /Line 2/w output.txt:将匹配的 Line 2 行写入到 output.txt。

What is the od Command and How to Use It?

The od (octal dump) command is a versatile tool that outputs the contents of a specified file in various formats such as octal, decimal, hexadecimal, floating-point numbers, or ASCII characters. It displays the content to the standard output (usually the terminal), with the leftmost column showing the byte offset, starting from 0.

Function:

The od command outputs file content in various formats like octal, decimal, hexadecimal, floating-point, or ASCII, with the byte offset displayed in the leftmost column. It can handle both text and binary files and is typically used to view file data that cannot be directly displayed in the terminal, such as binary data. The command can interpret the file content and output its values in various formats, whether they are IEEE754 floating-point numbers or ASCII codes. You might also want to check out the hexdump command, which by default outputs data in hexadecimal format but isn’t as powerful as od.

Syntax:

od [OPTION…] [FILE…]

Key Options:

  • -A RADIX or --address-radix=RADIX: Specifies the radix (base) for the byte offset. By default, the offset is displayed in octal.
  • -j BYTES or --skip-bytes=BYTES: Skips the specified number of bytes before displaying the file content.
  • -N BYTES or --read-bytes=BYTES: Outputs only the specified number of bytes.
  • -S [BYTES] or --strings[=BYTES]: Outputs strings at least BYTES bytes long (default is 3).
  • -v or --output-duplicates: Ensures that duplicate data is not omitted in the output.
  • -w [BYTES] or --width[=BYTES]: Sets the number of bytes to display per line (default is 32 bytes).
  • -t TYPE or --format=TYPE: Specifies the format of the output. Options include:
    • a: Named characters (e.g., newline is shown as “nl”).
    • c: Printable characters or escaped sequences (e.g., newline is shown as “\n”).
    • d[SIZE]: Signed decimal integers of SIZE bytes (default is sizeof(int)).
    • f[SIZE]: Floating-point numbers of SIZE bytes (default is sizeof(double)).
    • o[SIZE]: Octal integers of SIZE bytes (default is sizeof(int)).
    • u[SIZE]: Unsigned decimal integers of SIZE bytes (default is sizeof(int)).
    • x[SIZE]: Hexadecimal integers of SIZE bytes (default is sizeof(int)).
    The SIZE can be specified as 1 (byte), or as uppercase letters like C (char), S (short), I (int), and L (long). For floating-point numbers, SIZE can be F (float), D (double), or L (long double).
  • --help: Displays help information.
  • --version: Displays version information.

Parameters:

  • FILE…: One or more files whose content will be displayed.

Examples:

Example 1: Basic Output

$ cat test.txt
abcd 12345
$ od test.txt 
0000000 061141 062143 030440 031462 032464 000012
0000013

In this output, the first column shows the byte offset (default in octal).

Example 2: Show Byte Offset in Decimal

$ od -Ad test.txt 
0000000 061141 062143 030440 031462 032464 000012
0000011

Example 3: Hide Byte Offset

$ od -An test.txt 
 061141 062143 030440 031462 032464 000012

Example 4: Output in Hexadecimal (4 Bytes per Group)

$ od -tx test.txt 
0000000 64636261 33323120 000a3534
0000013

Example 5: Output in Hexadecimal (1 Byte per Group)

$ od -tx1 test.txt
0000000 61 62 63 64 20 31 32 33 34 35 0a
0000013

Example 6: Display Named ASCII Characters

$ od -ta test.txt
0000000   a   b   c   d  sp   1   2   3   4   5  nl
0000013

Or display printable characters and escape sequences:

$ od -tc test.txt
0000000   a   b   c   d       1   2   3   4   5  \n
0000013

Example 7: Hexadecimal with Original Characters

$ od -tcx1 test.txt
0000000   a   b   c   d       1   2   3   4   5  \n
         61  62  63  64  20  31  32  33  34  35  0a
0000013

Example 8: Specify Bytes per Line

$ od -w8 -tc test.txt
0000000   a   b   c   d       1   2   3
0000010   4   5  \n
0000013

Example 9: Remove Spaces Between Columns

To remove spaces between columns during od output:

  1. Use -An to hide the offset.
  2. Use -v to avoid omitting duplicate data.
  3. Use -tx1 to output one byte per group in hexadecimal format, and -w1 to display one byte per line.
  4. Finally, pipe the output to awk to concatenate it into a single line.
$ od -An -w1 -tx1 test.txt | awk '{for(i=1;i<=NF;++i){printf "%s",$i}}'
616263642031323334350a

od将指定文件内容以八进制数、十进制数、十六进制数、浮点数或ASCII字符的方式输出到标准输出显示

功能说明:od将指定文件内容以八进制数、十进制数、十六进制数、浮点数或ASCII字符方式输出到标准输出显示,并且最左边一列显示字节地址偏移量,从0开始

语  法:od [OPTION…] [FILE…]

补充说明:od命令默认的显示方式是八进制数。常见的文件为文本文件和二进制文件。od命令通常用于显示或查看文件中不能直接显示在终端的字符,主要用来查看保存在二进制文件中的数据,按照指定格式解释文件中的数据并输出,不管是IEEE754格式的浮点数还是ASCII码,od命令都能按照需求输出它们的值。大家也可以了解一下hexdump命令,默认以十六进制数输出数据,但感觉hexdump命令没有od命令强大。

          项:

-A RADIX或–address-radix=RADIX        选择以何种基数表示字节地址偏移量。默认以八进制数显示

-j BYTES或–skip-bytes=BYTES               跳过指定数目的字节

-N BYTES或–read-bytes=BYTES             输出指定字节个数

-S [BYTES]或–strings[=BYTES]               输出长度不小于指定字节数的字符串,BYTES 缺省值为 3

-v或–output-duplicates                               输出时不省略重复的数据

-w [BYTES]或–width[=BYTES]                设置每行最多显示的字节个数,BYTES 缺省为 32 字节

-t TYPE或–format=TYPE                          指定输出格式,格式包括 a、c、d、f、o、u 和 x,各含义如下:

  • a:具名字符。比如换行符显示为 nl
  • c:可打印字符或反斜杠表示的转义字符。比如换行符显示为 \n
  • d[SIZE]:SIZE 字节组成一个有符号十进制整数。SIZE 缺省值为 sizeof(int)
  • f[SIZE]:SIZE 字节组成一个浮点数。SIZE 缺省为 sizeof(double)
  • o[SIZE]:SIZE 字节组成一个八进制整数。SIZE 缺省为 sizeof(int)
  • u[SIZE]:SIZE 字节组成一个无符号十进制整数。SIZE 缺省为 sizeof(int)
  • x[SIZE]:SIZE 字节组成一个十六进制整数。SIZE 缺省为 sizeof(int)

SIZE可以1数字,也可以是大写字母。如果 TYPE 是 [doux] 中的一个,那么SIZE 可以是C = sizeof(char),S = sizeof(short),I = sizeof(int),L = sizeof(long)。如果 TYPE 是 f,那么 SIZE 可以是 F = sizeof(float),D = sizeof(double) ,L = sizeof(long double)

–help           显示帮助信息

–version       显示版本信息

参  数:

FILE…         要显示内容数据的一个或多个文件

   例:

实例1
$ cat test.txt
abcd 12345
$ od test.txt 
0000000 061141 062143 030440 031462 032464 000012
0000013
输出中的第一列是字节地址偏移量,默认以八进制数显示。

实例2
设置第一列的字节偏移地址以十进制显示:
$ od -Ad test.txt 
0000000 061141 062143 030440 031462 032464 000012
0000011

实例3
不显示第一列偏移地址:
$ od -An test.txt 
 061141 062143 030440 031462 032464 000012

实例4
以十六进制数输出,默认以四字节为一组(一列)显示:
$ od -tx test.txt 
0000000 64636261 33323120 000a3534
0000013

实例5
以十六进制数输出,每列输出1个字节:
$ od -tx1 test.txt
0000000 61 62 63 64 20 31 32 33 34 35 0a
0000013

实例6
以具名字符显示ASCII字符:
$ od -ta test.txt
0000000   a   b   c   d  sp   1   2   3   4   5  nl
0000013
以可打印字符或反斜杠表示的转义字符显示ASCII字符:
$ od -tc test.txt
0000000   a   b   c   d       1   2   3   4   5  \n
0000013

实例7
以十六进制数显示的同时显示原字符:
$ od -tcx1 test.txt
0000000   a   b   c   d       1   2   3   4   5  \n
         61  62  63  64  20  31  32  33  34  35  0a
0000013

实例8
指定每行显示512字节:
$ od -w8 -tc test.txt
0000000   a   b   c   d       1   2   3
0000010   4   5  \n
0000013

实例9
实现od命令输出时去除列与列之间的空格符的方法:
1	使用-An不输出偏移地址;
2 使用-v输出时不省略重复的数据;
3 使用-tx1以单个字节为一组按照十六进制输出,-w1每列输出一个字节;
4 最后通过管道传递给 awk 的标准输入,通过awk不换行输出所有行,拼接为一行输出。
$ od -An -w1 -tx1 test.txt|awk '{for(i=1;i<=NF;++i){printf "%s",$i}}'
616263642031323334350a

What are Modules, Components, and Services? What Are Their Differences?

Components are also known as building blocks. But what exactly is a component? A component is a software unit that can be independently replaced and upgraded. It has the following characteristics:

  1. It performs a specific function or provides certain services.
  2. It cannot operate independently and must function as part of a system.
  3. It is a physical concept, not a logical one.
  4. It can be maintained, upgraded, or replaced independently without affecting the entire system.

A component is a physically independent entity that can be maintained, upgraded, or replaced on its own. The purpose of creating a component diagram is to perform component-based design for the system, thinking about the physical division of the system, identifying which existing components can be reused, and determining which parts can be turned into components for reuse in future projects.

Question 1: When designing software, we often mention the term “module.” Is a module the same as a component?

Not necessarily. Everyone has different standards for what constitutes a “module.” Sometimes modules are divided based on business logic, and other times they are divided from a technical perspective. Modules are simply a way to divide software into parts for ease of explanation. You can refer to the characteristics of a component listed above to determine whether a “module” qualifies as a component.

Question 2: Software often uses layered design. Is each layer a component?

In most cases, each layer in a layered design is just a logical division and is not physically represented as separate files. In such cases, the layers are not components. However, the actual design may vary, and you can refer to the characteristics of a component to make a judgment.

Question 3: How do we distinguish between a “service” and a “component”?

A “component” refers to a software unit that will be used by other applications beyond the author’s control, but these applications cannot modify the component. In other words, an application using a component cannot alter the component’s source code, but it can extend the component in a predefined way to change its behavior.

Services and components share some similarities: both are used by external applications. In my view, the biggest difference between them lies in the fact that components are libraries used locally, such as JAR files, assemblies, DLLs, or source code imports. Services, on the other hand, are components external to the process, accessed by applications through mechanisms such as synchronous or asynchronous inter-process communication or remote interface calls (e.g., web services, messaging systems, RPC, or sockets).

Services can also call other services since a service is, in itself, an application.

You could map each service to a runtime process, though this is only an approximation. A service could consist of multiple processes, such as the main service application process and a database process used exclusively by that service.

Services can be deployed independently. If an application system is composed of multiple libraries within a single process, any modification to one component requires redeployment of the entire application. However, if the system is divided into multiple services, only the modified service needs to be redeployed—unless changes were made to its exposed interface.

Another outcome of implementing components as services is the availability of more explicit component interfaces.

Compared to in-process calls, remote service calls are more expensive in terms of performance.

References:

  • “Microservices” by Martin Fowler
  • “Inversion of Control Containers and the Dependency Injection pattern” by Martin Fowler
  • “Fireball: UML and the War for Requirements Analysis”