Standard input/output

A process uses file descriptors to read and write data from files or other processes. Input, output and errors of each process are handled by file descriptors. A file descriptor is just a number that represents a stream of process data. The file descriptors can be used to redirect data to other processes. The POSIX commands are read/write and open. Each process has three standard file descriptors: 0 - standard input (stdin) 1 - standard output (stdout) 2 - standard error (stderr)

Pipes are pair of file descriptors for unidirectional data channels. They connect the output of a process to the input of another process. A pipe can only connect a output file discriptor to a input file descriptor. Example: The pipe connects stdout of date process with stdin of dd process.

        date 1>&1 | 0<&0 dd

The same example can be written as

        date | dd

Child processes copy file descriptors of their parent process. The POSIX commands are pipe(2) and dup(2).

Type of file descriptors are determined by their source. If this is ambiguous the less than or greater than define the direction. The less than should be used only for redirection of input and the greater than version only for the output. Both examples are equal redirection of stderr to stdout The parenthesis create subshells so that the redirection to stops after the process is finished.

( exec 2>&1 )
( exec 2<&1 )

Redirection

The file descriptors can be used to redirect data to other processes. Child processes copy file descriptors of their parent process. The POSIX commands are pipe(2) and dup(2).

Named pipe or fifo

A pipe connects two process stdin and stdout. The POSIX commands are pipe and dup. A unnamed pipe connects two process stdin and stdout. The pipe is unidirectional one process sends data and another process receices data. Fifos are named pipes which are unidirectional. A pipe with a fifo works like a network sockets. Two dd processes can be connected with a pipe and fifo.

      -----< fifo <-----
      v                 ^
      dd                 dd
      v                 ^
      -----> pipe >----

In the next example two dd process connected with each other. It is a quite useless but working example in the shell. The commands are included in a here document as standard input of bash.

bash <<-'EOF'
        mkfifo fifo
        <fifo dd | dd >fifo
EOF

Another example is on http://en.wikipedia.org/wiki/Netcat. Two netcat processes redirect data from a webserver to localhost. One process listens on localhost:12345 and sends data to webserver:80. The other process receives data from the webserver.

Besides fifo there are other ways to connect output with input filedescriptors. Bash coproc can replace fifo and in yash shell there is operator >>| for pipeline redirection. Another shell can be invoked from bash with input from a here document.

yash <<-'EOF'
        exec 3>>|4
        <&4 dd | dd >&3
EOF

Here documents

Here documents are connected to file descriptors in a script.

0<<-'BASH' \
3<<-'CAT' \
4<<-'TEXT' \
bash
        bash -s <&3 4
BASH
        exec <&$1
        cat
CAT
        text
TEXT

The next example uses a here document in bash script with perl

#!/bin/bash
# https://perldoc.perl.org/open.html
while read -d $'\0' || EOF=$? && [[ $REPLY ]]
do
        <<<$REPLY makeref 2>&- | w3m -dump
        [[ ! $eof ]] && printf \\x0C\\x0A
done < <(
perl -- - \
<<'PERL' ${BASH_ARGV[*]:${MAIL}}
        use open ':std', ':utf8';
        use Mail::Box::Mbox;
        print join "\x00", \
                map { $_->head->get('Date') . "\t" . $_->head->get('From') . "\r" . ($_->isMultipart ? $_->body->part(0) : $_)->body->decoded } Mail::Box::Mbox->new(
                        folder => "$ARGV[0]"
                )->messages;
PERL
)

GNU Bash

This chapter is about Bash only features. Bash as lot of additional features. A limited set of variables reduces complexity. Names of current working directory are stored in $PWD, $OLDPWD. Subshell do not change the environment variables ie. $PWD and $OLDPWD. A problem is chdir with empty string. cd "" is equal to cd "$PWD". This does not exit cd "" || exit and is a bug in POSIX and bash!

Process substitution

Process substitution is a convenient feature in bash. The previous example with process substitution.

bash <(echo cat) <<<text

Some examples with file descriptor and process substitution. Educational example of file descriptor and process substitution The example uses a subshell with input from a here document. Educational example of file descriptor and process substitution

(
        # Create file descriptors through process substitution
        exec >&-                   # close stdout
        exec 3> >(tac >&2) # output to stderr
        exec 4< <(date)    # input from date
        exec 5< <(seq 3)   # input sequence of numbers

        # Join all input streams and redirect to stderr
        cat \
                <(cat <&4) \
                <(cat <&5) \
                | tee >(cat >&3) >&3
)

There are examples to insert commands into a build environment with file descriptors.

Coprocess

In Bash coproc returns a pair file descriptors for input and output Similar to named pipes coproc command can be used to redirect data streams. A coproc returns a pair of file descriptors for input and output. This example uses file descriptors of coproc. For closing the filedescriptor eval is required. Otherwise there is error exec: 60: not found. https://www.linuxjournal.com/content/bash-co-processes

bash <<-'EOF'
        coproc { tac ; }
        date >&${COPROC[1]}
        eval "exec ${COPROC[1]}>&-"
        cat <&${COPROC[0]}
EOF

There are a some problems with bash’s coproc. Array variables are not exported like filedescriptors. This is in bash 4.4 (man bash | grep -A16 BUGS).

This does not work because of the pipe between coproc and tee. It returns Bad file descriptor. .bad filedescriptor

bash -e <<-'ERROR'
        coproc BAD { head | nl; }
        date | tee >&${BAD[1]}
ERROR

The bad file descriptor can be prevented with a compound command. .compound command

bash -e <<-'OK'
        coproc { head | nl; }
        { date | tee ; } >&${COPROC[1]}
OK

The file descriptors can be duplicated. In this way they are exported and closing them is easier.

bash -e <<-'OK'
    seq 2 6 | {
        coproc { nl; }
        exec 3<&${COPROC[0]}-
        exec 4<&${COPROC[1]}-
        cat > >(cat >&4)
        exec 4>&-
        cat <&3
    }
OK

A problem with bash and coreutils is how to do a self join with data from standard input. As mentioned earlier coproc is a method for pipeline redirection. This is not required by the Posix standard. A coproc can join two streams. At some point the input must be closed before reading the output. A second coproc is necessary to join two streams.

bash <<-'EOF'
    seq 4 | {
        coproc { cat; }
        exec 3<&${COPROC[0]}- 4<&${COPROC[1]}-
        coproc { cat; }
        exec 5<&${COPROC[0]}- 6<&${COPROC[1]}-
        tee >(cat >&4) > >(cat >&6) &
        exec 4>&- 6>&-
        join -j2 <(sort <&3) <(sort <&5) &
                wait
    }
EOF

Example

bash <<-'EXAMPLE'
        # Find DocumentRoot of enabled apache2 sites
        find /etc/apache2/sites-enabled/ -type l |
        xargs -I@ file @ |
        tee >(
                cut -f1 -d: |
                xargs -I@ grep -H DocumentRoot @ |
                tr \  : | tr : \\\t
        ) > >(
                tr -d :
        ) | {
                coproc { cat; }
                exec 3<&${COPROC[0]}- 4<&${COPROC[1]}-
                coproc { cat; }
                exec 5<&${COPROC[0]}- 6<&${COPROC[1]}-
                tee >(cat >&4) > >(cat >&6) &
                exec 4>&- 6>&-
                join -j100 <(<&3 cat) <(<&5 cat)
        } |
        grep '^\( [[:graph:]]\+ \).\+\1' |
        grep link.\\+DocumentRoot
EXAMPLE

The tee process can be used to duplicate input but multiple process may conflict writing stdout. The tee process can be redirected through coproc to align output. Redirection with coproc is also used in the self join example. .

# https://stackoverflow.com/a/43906764
# https://lists.gnu.org/archive/html/coreutils/2019-10/msg00021.html
#!/bin/bash
# Do you think the following coproc solves the problem?
# The join or paste print expected results.
# Curiously cat or tac will block the process.
# Can you explain why join/paste finish the process whereas cat/tac block it?
# ii  bash           4.4-5        amd64        GNU Bourne Again SHell
# ii  coreutils      8.26-3       amd64        GNU core utilities
seq 100000 |
{
        coproc { cat; } && exec 3<&${COPROC[0]}- 4<&${COPROC[1]}-
        coproc { cat; } && exec 5<&${COPROC[0]}- 6<&${COPROC[1]}-
        tee >(cat >&4) > >(cat >&6) & exec 4>&- 6>&-

        join <(cat <&3) <(cat<&5)    # GOOD
        : paste <(cat <&3) <(cat<&5) # GOOD
        : cat <(cat <&3) <(cat <&5)   # BAD
        : tac <(cat <&3) <(cat<&5)   # BAD
        : cat <(cat <&3) & cat <(cat <&5) &  # GOOD
        wait
}
#!/bin/bash
# This command prints unpredictable lines
seq 1000 | tee >(nl) > >(nl) | grep '     1'
: <<SIC
     1  1
   755     1    1
SIC

The join is also possible with a single coproc. It requires a wait command after the filedescriptor is closed.

bash -e <<-'EOF'
    seq 10000 | {
        coproc { cat ; };
        exec 3<&${COPROC[0]}-
        exec 4<&${COPROC[1]}-
        tee >(cat >&4) |
        join - <(sort <&3 &) &
        exec 4>&-
        wait
    }
EOF

A file output file discriptor can not connect to a input. Joining the result of different processes requires coproc or named pipes. Coproc is not a standard feature and there is a warning about multiple coproc in version 4.4 of bash.

The coproc can be replaced with fifo or yash pipline redirection. The yash shell has pipeline redirection. Yash shell can be invoked from bash. Yash shell has a >>| operator which creates a pipe in a similar way as coproc. - https://unix.stackexchange.com/questions/86270/how-do-you-use-the-command-coproc-in-various-shells

The two pipes (4>>|5 and 6>>|7) join head and tail of the input. It creates two filedescriptor like the coproc bash.

bash <<-'EOF'
        exec yash -c 'exec bash 3>>|4'
EOF

Shells can be invoked with input from a here document. The syntax of yash process redirection is shorter than coproc. Unfortunately yash can not process substitution. Example: Join head and tail with yash pipeline redirection.

yash <<-'YASH'
        seq 123 | {
                exec 3<&0 4>>|5 6>>|7
                bash <<-'BASH'
                        tee <&3 >(head | nl >&4) > >(tail | nl >&6)
                        exec 4>&- 6>&-
                        join <(cat <&5) <(cat <&7)
            ls /proc/$BASHPID/fd
                BASH
        }
YASH

The xargs and parallel programs can execute multiple process at same time. Joining the result of different threads is difficult. Execute bash command string return from parallel

bash -exc "$( cat <<-EOF
        # wait for tmux terminal to start
        sleep 3
        # execute 6 parallel sleep processes in tmux
        coproc {
                # parallel starts tmux and returns a attach command
                seq 6 | tac \
                parallel --tmux --max-procs 4 sleep 2>&1 | \
                head -n1 | \
                cut -d: -f2
        }
        exec ${COPROC[1]}>&-
        cat <&${COPROC[0]}
        wait
EOF
)"

Coreutils

This package contains the basic file, shell and text manipulation utilities which are expected to exist on every operating system.
— Debian

Many commands in coreutils are a tools for text processing. All together they are like the SQL of shell programming. The join command is one of the most important commands in SQL and coreutils. There is no self join in coreutils. In coreutils the joined columns is moved to the left. Empty values are equal.

join
# Column with two rows
seq 2

# Additional column with two rows
seq 2 | nl

# Inner join with three rows
seq 2 | nl | join - <(seq 3)

# Outer join with three rows
seq 2 | nl | join - <(seq 3) -a2

# Cross join with the empty columns
seq 2 | nl | join - <(seq 3) -j3

# Additional column with a cross join of a single row
seq 2 | nl | join - <(seq 1) -j3

A additonal key column can be added before ordering with the join program. In the example the key is used to sort the output of the tee process. .join

seq 6 | sed s/^/\\\\x4/ | xargs -0 printf %b | nl \
        | tee \
                >(head -n4 | join -j3 <(seq 1) -) \
                        > \
                >(tail -n4 | join -j3 <(seq 2 2) -) \
        | sort -k2 | join -11 -22 <(seq 3 4) -

The join command moves the key column to the left. There is no easy way to reorder columns with coreutils. With cut the order remains in the same order that is read. This is where awk becomes a useful tool. - https://lists.gnu.org/archive/html/bug-coreutils/2012-12/msg00117.html

This example replaces column 2 with 0 where column 1 is not 0.

bash <<-'EOF'
        paste <(
                echo $'a\n0\nb\n0\nc'
        ) <(
                seq -5 -1
        ) | awk '{if($1) print $1,0; else print $1,$2} '
EOF

Typescript records the in and output of a terminal. After recording the file can be replayed with scriptreplay. The input from keybourd is wrong with a different file descriptor. Please see man script | less +/^BUGS about issues with input redirection

typescript
script 3<&0 0<<-'SCRIPT' -c "
                exec bash <&0
        "
        date
        exec bash 0<&3
SCRIPT

The same example with grep and coreutils to sort the stdout of subprocesses.

bash <<-'EOF'
        join <(
                join <(
                        paste <(
                                echo $'a\n0\nb\n0\nc'
                        ) <(
                                seq -5 -1
                        ) | nl | sort -k2
                        ) <(
                                seq 0 0 | nl
                        ) -j2 -a1 | sort -k4
                ) <(
                        seq 0 1
                ) -e0 -21 -14 -a1 | sort -k3 | nl -nln | tee >(
                        grep -P \\t0 | cut -d\  -f1- | join - <(
                                seq 0 0
                        ) -j3 -a1 | cut -d\  -f1-3
                ) > >(
                        grep -P \\t1 | cut -d\  -f1- | join - <(
                                seq 0 0
                        ) -j3 -a1 | cut -d\  -f1,2,5
                ) | sort -k2 | cut -d\  -f1,3
EOF

Stream editor sed

Streams can be manipulated with the sed command. It is one of the simple but powerful commands. info sed --index-search=evaluate

sed
# Pattern space and hold space, branches and evaluate.
# The exmple swaps every other line of the input.
# There are GNU specific commands like evaluate.
# It is not documented in the man page.
seq 8 |
sed '1~2{h;d};0~2{G}'
seq 6  |
sed -e's/^/seq /' -e 's/$/ |
paste -s/e'
printf ' 1\n 2\n 3\n4\n5\n 6\n 7\n' | sed '1s/^ //;:a;$!N;s/\n / /;ta;P;D;'

# With sed monitor file and execute commands with sed
stdbuf -o0 bash <<-'EOF'
        tail -F fifo | xxd -p -c1 | sed s/.*/date/e | xargs -I@ printf %q\\n @

# info sed shows additional documentation
info sed --index-search='evaluate'
# A process pipeline that greps upgrades from debian package log file:
bash <<-'EOF'
        cat /var/log/dpkg.log{.1,} |
        grep -P '.{20}upgrade' |
        cut -d' ' -f4,5,7 |
        sed 's/:/ /' |
        grep -o '[^[:space:]]*' |
        sed  -n -e 'p;n;x;n;p;x;p;' |
        grep xserver -A+2 |
        paste - - - -d_ |
        xargs urlencode |
        tr '[[:upper:]]' '[[:lower:]]' |
        sed 's/$/.deb/'
EOF
# With sed monitor file and execute commands with sed
stdbuf -o0 bash <<-'EOF'
        tail -F fifo | xxd -p -c1 | sed s/.*/date/e | xargs -I@ printf %q\\n @
EOF
# Examples: Pattern space and hold space, branches and evaluate.
seq 8 | sed '1~2{h;d};0~2{G}'

sed P;D commands
seq 6  | sed 'l;N;P;D'
printf ' 1\n 2\n 3\n4\n5\n 6\n 7\n' | sed '1s/^ //;:a;$!N;s/\n / /;ta;P;D;'

xargs

# Execute commands with xargs
seq 3 | xargs -0 --arg-file <(yes "wc -l" | head) bash -c
EOF

Escaping interpretation of special characters

The shell needs special characters to evaluate its commands. The special characters need escaping to avoid shell interpretation. Escaping is different depending where the special character occurs. Backslash escape works everywhere but not inside single quotes. In a single quoted word no special characters needs escaping. A single quote can be escaped only with backslash or double quotes. It is not possible to use a single quote inside single quotes.

printf
printf %q\\n echo\ \\\'\\\\\\\'\\\' | xargs bash -c
printf %q\\n "echo \"'\\'\"\\'" | xargs bash -c