Standard input/output
A process uses file descriptors to read and write data from files or other processes. Input, output and errors of each process are handled by file descriptors. A file descriptor is just a number that represents a stream of process data. The file descriptors can be used to redirect data to other processes. The POSIX commands are read/write and open. Each process has three standard file descriptors: 0 - standard input (stdin) 1 - standard output (stdout) 2 - standard error (stderr)
Pipes are pair of file descriptors for unidirectional data channels. They connect the output of a process to the input of another process. A pipe can only connect a output file discriptor to a input file descriptor. Example: The pipe connects stdout of date process with stdin of dd process.
date 1>&1 | 0<&0 dd
The same example can be written as
date | dd
Child processes copy file descriptors of their parent process. The POSIX commands are pipe(2) and dup(2).
Type of file descriptors are determined by their source. If this is ambiguous the less than or greater than define the direction. The less than should be used only for redirection of input and the greater than version only for the output. Both examples are equal redirection of stderr to stdout The parenthesis create subshells so that the redirection to stops after the process is finished.
( exec 2>&1 ) ( exec 2<&1 )
Redirection
The file descriptors can be used to redirect data to other processes. Child processes copy file descriptors of their parent process. The POSIX commands are pipe(2) and dup(2).
Named pipe or fifo
A pipe connects two process stdin and stdout. The POSIX commands are pipe and dup. A unnamed pipe connects two process stdin and stdout. The pipe is unidirectional one process sends data and another process receices data. Fifos are named pipes which are unidirectional. A pipe with a fifo works like a network sockets. Two dd processes can be connected with a pipe and fifo.
-----< fifo <----- v ^ dd dd v ^ -----> pipe >----
In the next example two dd process connected with each other. It is a quite useless but working example in the shell. The commands are included in a here document as standard input of bash.
bash <<-'EOF' mkfifo fifo <fifo dd | dd >fifo EOF
Another example is on http://en.wikipedia.org/wiki/Netcat. Two netcat processes redirect data from a webserver to localhost. One process listens on localhost:12345 and sends data to webserver:80. The other process receives data from the webserver.
Besides fifo there are other ways to connect output with input filedescriptors. Bash coproc can replace fifo and in yash shell there is operator >>| for pipeline redirection. Another shell can be invoked from bash with input from a here document.
yash <<-'EOF' exec 3>>|4 <&4 dd | dd >&3 EOF
Here documents
Here documents are connected to file descriptors in a script.
0<<-'BASH' \ 3<<-'CAT' \ 4<<-'TEXT' \ bash bash -s <&3 4 BASH exec <&$1 cat CAT text TEXT
The next example uses a here document in bash script with perl
#!/bin/bash # https://perldoc.perl.org/open.html while read -d $'\0' || EOF=$? && [[ $REPLY ]] do <<<$REPLY makeref 2>&- | w3m -dump [[ ! $eof ]] && printf \\x0C\\x0A done < <( perl -- - \ <<'PERL' ${BASH_ARGV[*]:${MAIL}} use open ':std', ':utf8'; use Mail::Box::Mbox; print join "\x00", \ map { $_->head->get('Date') . "\t" . $_->head->get('From') . "\r" . ($_->isMultipart ? $_->body->part(0) : $_)->body->decoded } Mail::Box::Mbox->new( folder => "$ARGV[0]" )->messages; PERL )
GNU Bash
This chapter is about Bash only features. Bash as lot of additional features. A limited set of variables reduces complexity. Names of current working directory are stored in $PWD, $OLDPWD. Subshell do not change the environment variables ie. $PWD and $OLDPWD. A problem is chdir with empty string. cd "" is equal to cd "$PWD". This does not exit cd "" || exit and is a bug in POSIX and bash!
Process substitution
Process substitution is a convenient feature in bash. The previous example with process substitution.
bash <(echo cat) <<<text
Some examples with file descriptor and process substitution. Educational example of file descriptor and process substitution The example uses a subshell with input from a here document. Educational example of file descriptor and process substitution
( # Create file descriptors through process substitution exec >&- # close stdout exec 3> >(tac >&2) # output to stderr exec 4< <(date) # input from date exec 5< <(seq 3) # input sequence of numbers # Join all input streams and redirect to stderr cat \ <(cat <&4) \ <(cat <&5) \ | tee >(cat >&3) >&3 )
There are examples to insert commands into a build environment with file descriptors.
Coprocess
In Bash coproc returns a pair file descriptors for input and output Similar to named pipes coproc command can be used to redirect data streams. A coproc returns a pair of file descriptors for input and output. This example uses file descriptors of coproc. For closing the filedescriptor eval is required. Otherwise there is error exec: 60: not found. https://www.linuxjournal.com/content/bash-co-processes
bash <<-'EOF' coproc { tac ; } date >&${COPROC[1]} eval "exec ${COPROC[1]}>&-" cat <&${COPROC[0]} EOF
There are a some problems with bash’s coproc. Array variables are not exported like filedescriptors. This is in bash 4.4 (man bash | grep -A16 BUGS).
This does not work because of the pipe between coproc and tee. It returns Bad file descriptor. .bad filedescriptor
bash -e <<-'ERROR' coproc BAD { head | nl; } date | tee >&${BAD[1]} ERROR
The bad file descriptor can be prevented with a compound command. .compound command
bash -e <<-'OK' coproc { head | nl; } { date | tee ; } >&${COPROC[1]} OK
The file descriptors can be duplicated. In this way they are exported and closing them is easier.
bash -e <<-'OK' seq 2 6 | { coproc { nl; } exec 3<&${COPROC[0]}- exec 4<&${COPROC[1]}- cat > >(cat >&4) exec 4>&- cat <&3 } OK
A problem with bash and coreutils is how to do a self join with data from standard input. As mentioned earlier coproc is a method for pipeline redirection. This is not required by the Posix standard. A coproc can join two streams. At some point the input must be closed before reading the output. A second coproc is necessary to join two streams.
bash <<-'EOF' seq 4 | { coproc { cat; } exec 3<&${COPROC[0]}- 4<&${COPROC[1]}- coproc { cat; } exec 5<&${COPROC[0]}- 6<&${COPROC[1]}- tee >(cat >&4) > >(cat >&6) & exec 4>&- 6>&- join -j2 <(sort <&3) <(sort <&5) & wait } EOF
Example
bash <<-'EXAMPLE' # Find DocumentRoot of enabled apache2 sites find /etc/apache2/sites-enabled/ -type l | xargs -I@ file @ | tee >( cut -f1 -d: | xargs -I@ grep -H DocumentRoot @ | tr \ : | tr : \\\t ) > >( tr -d : ) | { coproc { cat; } exec 3<&${COPROC[0]}- 4<&${COPROC[1]}- coproc { cat; } exec 5<&${COPROC[0]}- 6<&${COPROC[1]}- tee >(cat >&4) > >(cat >&6) & exec 4>&- 6>&- join -j100 <(<&3 cat) <(<&5 cat) } | grep '^\( [[:graph:]]\+ \).\+\1' | grep link.\\+DocumentRoot EXAMPLE
The tee process can be used to duplicate input but multiple process may conflict writing stdout. The tee process can be redirected through coproc to align output. Redirection with coproc is also used in the self join example. .
# https://stackoverflow.com/a/43906764 # https://lists.gnu.org/archive/html/coreutils/2019-10/msg00021.html #!/bin/bash # Do you think the following coproc solves the problem? # The join or paste print expected results. # Curiously cat or tac will block the process. # Can you explain why join/paste finish the process whereas cat/tac block it? # ii bash 4.4-5 amd64 GNU Bourne Again SHell # ii coreutils 8.26-3 amd64 GNU core utilities seq 100000 | { coproc { cat; } && exec 3<&${COPROC[0]}- 4<&${COPROC[1]}- coproc { cat; } && exec 5<&${COPROC[0]}- 6<&${COPROC[1]}- tee >(cat >&4) > >(cat >&6) & exec 4>&- 6>&- join <(cat <&3) <(cat<&5) # GOOD : paste <(cat <&3) <(cat<&5) # GOOD : cat <(cat <&3) <(cat <&5) # BAD : tac <(cat <&3) <(cat<&5) # BAD : cat <(cat <&3) & cat <(cat <&5) & # GOOD wait }
#!/bin/bash # This command prints unpredictable lines seq 1000 | tee >(nl) > >(nl) | grep ' 1' : <<SIC 1 1 755 1 1 SIC
The join is also possible with a single coproc. It requires a wait command after the filedescriptor is closed.
bash -e <<-'EOF' seq 10000 | { coproc { cat ; }; exec 3<&${COPROC[0]}- exec 4<&${COPROC[1]}- tee >(cat >&4) | join - <(sort <&3 &) & exec 4>&- wait } EOF
A file output file discriptor can not connect to a input. Joining the result of different processes requires coproc or named pipes. Coproc is not a standard feature and there is a warning about multiple coproc in version 4.4 of bash.
The coproc can be replaced with fifo or yash pipline redirection. The yash shell has pipeline redirection. Yash shell can be invoked from bash. Yash shell has a >>| operator which creates a pipe in a similar way as coproc. - https://unix.stackexchange.com/questions/86270/how-do-you-use-the-command-coproc-in-various-shells
The two pipes (4>>|5 and 6>>|7) join head and tail of the input. It creates two filedescriptor like the coproc bash.
bash <<-'EOF' exec yash -c 'exec bash 3>>|4' EOF
Shells can be invoked with input from a here document. The syntax of yash process redirection is shorter than coproc. Unfortunately yash can not process substitution. Example: Join head and tail with yash pipeline redirection.
yash <<-'YASH' seq 123 | { exec 3<&0 4>>|5 6>>|7 bash <<-'BASH' tee <&3 >(head | nl >&4) > >(tail | nl >&6) exec 4>&- 6>&- join <(cat <&5) <(cat <&7) ls /proc/$BASHPID/fd BASH } YASH
The xargs and parallel programs can execute multiple process at same time. Joining the result of different threads is difficult. Execute bash command string return from parallel
bash -exc "$( cat <<-EOF # wait for tmux terminal to start sleep 3 # execute 6 parallel sleep processes in tmux coproc { # parallel starts tmux and returns a attach command seq 6 | tac \ parallel --tmux --max-procs 4 sleep 2>&1 | \ head -n1 | \ cut -d: -f2 } exec ${COPROC[1]}>&- cat <&${COPROC[0]} wait EOF )"
Coreutils
Many commands in coreutils are a tools for text processing. All together they are like the SQL of shell programming. The join command is one of the most important commands in SQL and coreutils. There is no self join in coreutils. In coreutils the joined columns is moved to the left. Empty values are equal.
# Column with two rows seq 2 # Additional column with two rows seq 2 | nl # Inner join with three rows seq 2 | nl | join - <(seq 3) # Outer join with three rows seq 2 | nl | join - <(seq 3) -a2 # Cross join with the empty columns seq 2 | nl | join - <(seq 3) -j3 # Additional column with a cross join of a single row seq 2 | nl | join - <(seq 1) -j3
A additonal key column can be added before ordering with the join program. In the example the key is used to sort the output of the tee process. .join
seq 6 | sed s/^/\\\\x4/ | xargs -0 printf %b | nl \ | tee \ >(head -n4 | join -j3 <(seq 1) -) \ > \ >(tail -n4 | join -j3 <(seq 2 2) -) \ | sort -k2 | join -11 -22 <(seq 3 4) -
The join command moves the key column to the left. There is no easy way to reorder columns with coreutils. With cut the order remains in the same order that is read. This is where awk becomes a useful tool. - https://lists.gnu.org/archive/html/bug-coreutils/2012-12/msg00117.html
This example replaces column 2 with 0 where column 1 is not 0.
bash <<-'EOF' paste <( echo $'a\n0\nb\n0\nc' ) <( seq -5 -1 ) | awk '{if($1) print $1,0; else print $1,$2} ' EOF
Typescript records the in and output of a terminal. After recording the file can be replayed with scriptreplay. The input from keybourd is wrong with a different file descriptor. Please see man script | less +/^BUGS about issues with input redirection
script 3<&0 0<<-'SCRIPT' -c " exec bash <&0 " date exec bash 0<&3 SCRIPT
The same example with grep and coreutils to sort the stdout of subprocesses.
bash <<-'EOF' join <( join <( paste <( echo $'a\n0\nb\n0\nc' ) <( seq -5 -1 ) | nl | sort -k2 ) <( seq 0 0 | nl ) -j2 -a1 | sort -k4 ) <( seq 0 1 ) -e0 -21 -14 -a1 | sort -k3 | nl -nln | tee >( grep -P \\t0 | cut -d\ -f1- | join - <( seq 0 0 ) -j3 -a1 | cut -d\ -f1-3 ) > >( grep -P \\t1 | cut -d\ -f1- | join - <( seq 0 0 ) -j3 -a1 | cut -d\ -f1,2,5 ) | sort -k2 | cut -d\ -f1,3 EOF
Stream editor sed
Streams can be manipulated with the sed command. It is one of the simple but powerful commands. info sed --index-search=evaluate
# Pattern space and hold space, branches and evaluate. # The exmple swaps every other line of the input. # There are GNU specific commands like evaluate. # It is not documented in the man page. seq 8 | sed '1~2{h;d};0~2{G}' seq 6 | sed -e's/^/seq /' -e 's/$/ | paste -s/e' printf ' 1\n 2\n 3\n4\n5\n 6\n 7\n' | sed '1s/^ //;:a;$!N;s/\n / /;ta;P;D;' # With sed monitor file and execute commands with sed stdbuf -o0 bash <<-'EOF' tail -F fifo | xxd -p -c1 | sed s/.*/date/e | xargs -I@ printf %q\\n @ # info sed shows additional documentation info sed --index-search='evaluate'
# A process pipeline that greps upgrades from debian package log file: bash <<-'EOF' cat /var/log/dpkg.log{.1,} | grep -P '.{20}upgrade' | cut -d' ' -f4,5,7 | sed 's/:/ /' | grep -o '[^[:space:]]*' | sed -n -e 'p;n;x;n;p;x;p;' | grep xserver -A+2 | paste - - - -d_ | xargs urlencode | tr '[[:upper:]]' '[[:lower:]]' | sed 's/$/.deb/' EOF
# With sed monitor file and execute commands with sed stdbuf -o0 bash <<-'EOF' tail -F fifo | xxd -p -c1 | sed s/.*/date/e | xargs -I@ printf %q\\n @ EOF
# Examples: Pattern space and hold space, branches and evaluate. seq 8 | sed '1~2{h;d};0~2{G}' sed P;D commands seq 6 | sed 'l;N;P;D' printf ' 1\n 2\n 3\n4\n5\n 6\n 7\n' | sed '1s/^ //;:a;$!N;s/\n / /;ta;P;D;'
xargs
# Execute commands with xargs seq 3 | xargs -0 --arg-file <(yes "wc -l" | head) bash -c EOF
Escaping interpretation of special characters
The shell needs special characters to evaluate its commands. The special characters need escaping to avoid shell interpretation. Escaping is different depending where the special character occurs. Backslash escape works everywhere but not inside single quotes. In a single quoted word no special characters needs escaping. A single quote can be escaped only with backslash or double quotes. It is not possible to use a single quote inside single quotes.
printf %q\\n echo\ \\\'\\\\\\\'\\\' | xargs bash -c printf %q\\n "echo \"'\\'\"\\'" | xargs bash -c