A Guide to the sed Stream Editor

Function Overview:
sed is a stream editor that reads text from files or input streams line by line, edits the text according to user-specified patterns or commands, and then outputs the result to the screen or a file. When used in conjunction with regular expressions, it is incredibly powerful.

Syntax:

sed [options] 'command' file(s)
sed [options] -f scriptfile file(s)

Explanation:
sed first stores each line of the text in a temporary buffer called the “pattern space.” It then processes the content of this buffer according to the given sed commands. Once the processing is complete, the result is output to the terminal, and sed moves on to the next line. The content of the file itself is not altered unless the -i option is used. sed is mainly used to edit one or more text files, simplify repeated text file operations, or create text transformation scripts. Its functionality is similar to awk, but sed is simpler and less capable of handling column-specific operations, while awk is more powerful in that regard.

Options:

  • -e: Use the specified commands to process the input text file.
  • -n: Suppress automatic output (only prints lines modified when used with the p command).
  • -h: Display help information.
  • -V: Display version information.

Parameters:

  • command: The command to be executed.
  • file(s): One or more text files to be processed.
  • scriptfile: A file containing a list of commands to execute.

Common Actions:

  • a: Append text after the current line.
  • i: Insert text before the current line.
  • c: Replace the selected lines with new text.
  • d: Delete the selected lines.
  • D: Delete the first line of the pattern block.
  • s: Replace specified characters.
  • h: Copy the pattern block’s content to an internal buffer.
  • H: Append the pattern block’s content to the internal buffer.
  • g: Retrieve content from the internal buffer and replace the text in the current pattern block.
  • G: Retrieve content from the internal buffer and append it to the current pattern block.
  • l: List non-printable characters in the text.
  • L: Similar to l, but specifically for handling non-ASCII characters.
  • n: Read the next input line and apply the next command to it instead of reapplying the first command.
  • N: Append the next input line to the current pattern block and insert a new line between them, changing the current line number.
  • p: Print the matching lines.
  • P: Print the first line of the pattern block.
  • q: Quit sed.
  • b label: Branch to the location marked by label in the script; if the label doesn’t exist, the branch goes to the end of the script.
  • r file: Read lines from a file.
  • t label: Conditional branch to a marked location, starting from the last line. If the condition is met, or a T/t command is used, the branch jumps to the specified label or the end of the script.
  • T label: Error branch. If an error occurs, this branches to the labeled command or the end of the script.
  • w file: Write the processed block of the pattern space to the end of a file.
  • W file: Write the first line of the pattern space to the end of a file.
  • !: Execute the following commands on all lines not selected by the current pattern.
  • =: Print the current line number.
  • #: Extend comments to the next newline character.

Replacement Commands:

  • g: Global replacement within a line (used with the s command).
  • p: Print the line.
  • w: Write the line to a file.
  • x: Exchange the text in the pattern block with the text in the internal buffer.
  • y: Translate one character to another (not used with regular expressions).
  • &: Reference to the matched string.

Basic Regular Expression (BRE) Syntax in sed:

  • ^: Match the beginning of a line.
  • $: Match the end of a line.
  • .: Match any single character except a newline.
  • *: Match zero or more of the preceding characters.
  • []: Match a single character from a specified range.
  • [^]: Match a single character not in the specified range.
  • (..): Capture a substring.
  • &: Save the matched text for later use in replacements.
  • <: Match the start of a word.
  • >: Match the end of a word.
  • x{m}: Match exactly m occurrences of x.
  • x{m,}: Match at least m occurrences of x.
  • x{m,n}: Match between m and n occurrences of x.

To match the start of a word, use \<. To match the end of a word, use \>.

Extended Regular Expression (ERE) Syntax in sed:

  • \b: Match a word boundary (not supported by default in sed regular expressions).
  • +: Match one or more occurrences of the preceding character.

Practical Examples:

1 Print specific lines:
To print only lines 1 and the last line:

sed -n '1p;$p' test.txt

2 Delete lines:
To delete the second line:

sed '2d' filename

3 Basic match and replace:
Replace spaces with hyphens:

echo "hello world" | sed 's/ /-/g'

4 Advanced match and replace:
Reverse words in a string:

echo "abc def ghi" | sed 's/\([a-zA-Z]*\) \([a-zA-Z]*\) \([a-zA-Z]*\)/\3 \2 \1/'

5 Multiple edits:
Replace “Hello” with “Hi” and “Goodbye” with “Farewell” in one command:

sed 's/Hello/Hi/; s/Goodbye/Farewell/' example.txt

6 Read a file:
Insert content from an external file after lines matching a pattern:

sed '/Line 2/r extra.txt' data.txt

7 Write to a file:
Save processed content into a new file:

sed 's/World/Everyone/' input.txt > output.txt

In summary, sed is a versatile and efficient tool for editing text in a stream, offering powerful pattern matching and text transformation capabilities when combined with regular expressions. From basic line printing to advanced text manipulation, sed serves a wide range of text processing needs.

What is the od Command and How to Use It?

The od (octal dump) command is a versatile tool that outputs the contents of a specified file in various formats such as octal, decimal, hexadecimal, floating-point numbers, or ASCII characters. It displays the content to the standard output (usually the terminal), with the leftmost column showing the byte offset, starting from 0.

Function:

The od command outputs file content in various formats like octal, decimal, hexadecimal, floating-point, or ASCII, with the byte offset displayed in the leftmost column. It can handle both text and binary files and is typically used to view file data that cannot be directly displayed in the terminal, such as binary data. The command can interpret the file content and output its values in various formats, whether they are IEEE754 floating-point numbers or ASCII codes. You might also want to check out the hexdump command, which by default outputs data in hexadecimal format but isn’t as powerful as od.

Syntax:

od [OPTION…] [FILE…]

Key Options:

  • -A RADIX or --address-radix=RADIX: Specifies the radix (base) for the byte offset. By default, the offset is displayed in octal.
  • -j BYTES or --skip-bytes=BYTES: Skips the specified number of bytes before displaying the file content.
  • -N BYTES or --read-bytes=BYTES: Outputs only the specified number of bytes.
  • -S [BYTES] or --strings[=BYTES]: Outputs strings at least BYTES bytes long (default is 3).
  • -v or --output-duplicates: Ensures that duplicate data is not omitted in the output.
  • -w [BYTES] or --width[=BYTES]: Sets the number of bytes to display per line (default is 32 bytes).
  • -t TYPE or --format=TYPE: Specifies the format of the output. Options include:
    • a: Named characters (e.g., newline is shown as “nl”).
    • c: Printable characters or escaped sequences (e.g., newline is shown as “\n”).
    • d[SIZE]: Signed decimal integers of SIZE bytes (default is sizeof(int)).
    • f[SIZE]: Floating-point numbers of SIZE bytes (default is sizeof(double)).
    • o[SIZE]: Octal integers of SIZE bytes (default is sizeof(int)).
    • u[SIZE]: Unsigned decimal integers of SIZE bytes (default is sizeof(int)).
    • x[SIZE]: Hexadecimal integers of SIZE bytes (default is sizeof(int)).
    The SIZE can be specified as 1 (byte), or as uppercase letters like C (char), S (short), I (int), and L (long). For floating-point numbers, SIZE can be F (float), D (double), or L (long double).
  • --help: Displays help information.
  • --version: Displays version information.

Parameters:

  • FILE…: One or more files whose content will be displayed.

Examples:

Example 1: Basic Output

$ cat test.txt
abcd 12345
$ od test.txt 
0000000 061141 062143 030440 031462 032464 000012
0000013

In this output, the first column shows the byte offset (default in octal).

Example 2: Show Byte Offset in Decimal

$ od -Ad test.txt 
0000000 061141 062143 030440 031462 032464 000012
0000011

Example 3: Hide Byte Offset

$ od -An test.txt 
 061141 062143 030440 031462 032464 000012

Example 4: Output in Hexadecimal (4 Bytes per Group)

$ od -tx test.txt 
0000000 64636261 33323120 000a3534
0000013

Example 5: Output in Hexadecimal (1 Byte per Group)

$ od -tx1 test.txt
0000000 61 62 63 64 20 31 32 33 34 35 0a
0000013

Example 6: Display Named ASCII Characters

$ od -ta test.txt
0000000   a   b   c   d  sp   1   2   3   4   5  nl
0000013

Or display printable characters and escape sequences:

$ od -tc test.txt
0000000   a   b   c   d       1   2   3   4   5  \n
0000013

Example 7: Hexadecimal with Original Characters

$ od -tcx1 test.txt
0000000   a   b   c   d       1   2   3   4   5  \n
         61  62  63  64  20  31  32  33  34  35  0a
0000013

Example 8: Specify Bytes per Line

$ od -w8 -tc test.txt
0000000   a   b   c   d       1   2   3
0000010   4   5  \n
0000013

Example 9: Remove Spaces Between Columns

To remove spaces between columns during od output:

  1. Use -An to hide the offset.
  2. Use -v to avoid omitting duplicate data.
  3. Use -tx1 to output one byte per group in hexadecimal format, and -w1 to display one byte per line.
  4. Finally, pipe the output to awk to concatenate it into a single line.
$ od -An -w1 -tx1 test.txt | awk '{for(i=1;i<=NF;++i){printf "%s",$i}}'
616263642031323334350a

Linux xargs Command Passes Arguments to Other Commands

Description: xargs is used to pass arguments to other commands and is an essential component for building one-liner commands.

Syntax:
xargs [OPTIONS] [COMMAND]

Overview:
xargs takes input from stdin, separated by spaces or newline characters, and passes it as space-separated arguments to other commands. However, be careful when filenames or strings contain spaces, as xargs may misinterpret them.

Options:

  • -0, --null: Default option. If stdin contains special characters like backticks (), backslashes (\), or spaces, xargs` restores them to regular characters.
  • -a, --arg-file=FILE: Reads input from the specified file instead of stdin.
  • -d, --delimiter=DEL: Specifies the delimiter to separate input. By default, xargs uses spaces and newlines, outputting arguments separated by spaces.
  • -E EOF_STR: Sets an end-of-input string. If none is specified, input has no terminator. EOF_STR must be a separate field (i.e., space or newline separated).
  • -e, --eof[=EOF_STR]: Same as -E, but non-POSIX compliant. Use -E if available.
  • -I REPLACE_STR: Assigns each argument to the specified placeholder (e.g., {}, $, @). Useful for positioning arguments when there are multiple parameters. For example:find . -name "*.txt" | xargs -I {} cp {} /tmp/{}.bak
  • -i, --replace[=REPLACE_STR]: Same as -I, but REPLACE_STR is optional and defaults to {}. Use -I for POSIX compliance.
  • -L MAX_LINES: Limits the number of input lines per execution, implying the -x option.
  • -l, --max-lines[=MAX_LINES]: Same as -L. Defaults to 1 line. Use -L for POSIX compliance.
  • -n, --max-args=MAX_ARGS: Specifies the maximum number of arguments to pass to the command at once.
  • -o, --open-tty: Reopens stdin to /dev/TTY before running the command in a subprocess, useful for interactive applications.
  • -P, --max-procs=MAX_PROCS: Sets the maximum number of parallel processes. Default is 1. Use with -n or -L for batch processing.
  • -p, --interactive: Prompts the user for confirmation before executing each command.
  • --process-slot-var=NAME: Sets an environment variable with a unique value for each running subprocess. Once a process finishes, the value is reused.
  • -r, --no-run-if-empty: Stops xargs from running if there is no input. This is the default behavior.
  • -s, --max-chars=MAX_CHARS: Limits the maximum number of characters (including command, spaces, and newlines) in the command.
  • --show-limits: Displays the system’s command-line length limitations.
  • -t, --verbose: Prints the command to stderr before executing it.
  • -x, --exit: Exits if the command line exceeds the specified character limit (-s).
  • --help: Displays help information.
  • --version: Displays version information.

Parameters:

  • COMMAND: The command string to execute.

Examples:

Example 1
Some commands don’t accept piped arguments directly. Use xargs to pass them:

# Incorrect: `ls` cannot accept piped input directly
find /sbin -perm +700 | ls -l

# Correct: use `xargs` to pass arguments to `ls`
find /sbin -perm +700 | xargs ls -l

Example 2
Show system command-line length limitations:

$ xargs --show-limits

Example 3
Restore shell special characters like backticks:

$ echo '`0123`4 56789' | xargs -t echo

Example 4
Set the delimiter for reading input as a comma:

$ echo 01234 , 56789 | xargs -E ","

Example 5
Solve “argument list too long” errors when working with many files:

# Add a suffix to all files in the current directory
ls | xargs -t -i mv {} {}.bak

Example 6
Set how many lines to pass as arguments at a time:

$ echo -e "01234\n56789\n01234" | xargs -t -L 2 echo

Example 7
Merge multi-line input into a single line:

$ cat test.txt | xargs

Example 8
Kill processes in combination with ps, grep, awk, and kill:

$ ps -ef | grep spp | awk '{printf "%s ",$2}' | xargs kill -9