• Regular expressions are text strings used for matching a specific pattern, or to search for a specific location, such as the start or end of a line or a word. Regular expressions can contain both normal characters, as well as so-called meta-characters, such as * and $.


  • Many text editors and utilities such as vi, sed, awk, find and grep work extensively with regular expressions. Some of the popular computer languages that use regular expressions include Perl, Python and Ruby.


  • It can get rather complicated, and there are whole books written about regular expressions; thus, we will do no more than skim the surface here.


  • These regular expressions are different from the wildcards (or meta-characters) used in filename matching in command shells such as bash. The table below lists search patterns and their usage.


  • Search Patterns Usage
    .(dot) Match any single character
    a|z Match a or z
    $ Match end of string
    ^ Match beginning of string
    * Match preceding item 0 or more times


  • For example, consider the following sentence: The quick brown fox jumped over the lazy dog.


  • Some of the patterns that can be applied to this sentence are as follows:


  • Command Usage
    a.. matches azy
    b.|j. matches both br and ju
    ..$ matches og
    l.* matches lazy dog
    l.*y matches lazy
    the.* matches the whole sentence


  • grep is extensively used as a primary text searching tool.


  • It scans files for specified patterns and can be used with regular expressions, as well as simple strings, as shown in the table:


  • Command Usage
    grep [pattern] <filename> Search for a pattern in a file and print all matching lines
    grep -v [pattern] <filename> Print all lines that do not match the pattern
    grep [0-9] <filename> Print the lines that contain the numbers 0 through 9
    grep -C 3 [pattern] <filename> Print  context of lines (specified number of lines above and below the  pattern) for matching the pattern; here, the number of lines is  specified as 3
  • strings is used to extract all printable character strings found in the file or files given as arguments.


  • It is useful in locating human-readable content embedded in binary files; for text files, you can just use grep.


  • For example, to search for the string my_string in a spreadsheet:

     
    strings book1.xls | grep my_string
    
    
  • The tr utility is used to translate specified characters into other characters or to delete them. The general syntax is as follows:


  •  
    tr [options] set1 [set2]
    
    
  • The items in the square brackets are optional.


  • tr requires at least one argument and accepts a maximum of two. The first designated set1 in the example lists the characters in the text to be replaced or removed.


  • The second, set2, lists the characters that are to be substituted for the characters listed in the first argument.


  • Sometimes, these sets need to be surrounded by apostrophes (or single-quotes (')) in order to have the shell ignore that they mean something special to the shell.


  • It is usually safe (and may be required) to use the single-quotes around each of the sets, as you will see in the examples below.


  • For example, suppose you have a file named city containing several lines of text in mixed case.


  • To translate all lower case characters to upper case, at the command prompt type cat city | tr a-z A-Z and press the Enter key.


  • Command Usage
    $ tr abcdefghijklmnopqrstuvwxyz ABCDEFGHIJKLMNOPQRSTUVWXYZ Convert lower case to upper case
    $ tr '{}' '()' < inputfile > outputfile Translate braces into parenthesis
    $ echo "This is for testing" | tr [:space:] '\t' Translate white-space to tabs
    $ echo "This   is   for    testing" | tr -s [:space:] Squeeze repetition of characters using -s
    $ echo "the geek stuff" | tr -d 't' Delete specified characters using -d option
    $ echo "my username is 432234" | tr -cd [:digit:] Complement the sets using -c option
    $ tr -cd [:print:] < file.txt Remove all non-printable characters from a file
    $ tr -s '\n' ' ' < file.txt Join all the lines in a file into a single line
  • tee takes the output from any command, and, while sending it to standard output, it also saves it to a file.


  • In other words, it "tees" the output stream from the command: one stream is displayed on the standard output and the other is saved to a file.


  • For example, to list the contents of a directory on the screen and save the output to a file, at the command prompt type ls -l | tee newfile and press the Enter key.


  • Typing cat newfile will then display the output of ls –l.


  • wc (word count) counts the number of lines, words, and characters in a file or list of files. Options are given in the table below:


  • Option Description
    –l Displays the number of lines
    -c Displays the number of bytes
    -w Displays the number of words
  • By default, all three of these options are active.


  • For example, to print only the number of lines contained in a file, type wc -l filename and press the Enter key.


  • cut is used for manipulating column-based files and is designed to extract specific columns. The default column separator is the tab character.


  • A different delimiter can be given as a command option.


  • For example, to display the third column delimited by a blank space, at the command prompt type ls -l | cut -d" " -f3 and press the Enter key.


  • Search for all instances of the user command interpreter (shell) equal to /sbin/nologin in /etc/passwd and replace them with /bin/bash. (Do not overwrite /etc/passwd.)


  • Solution You can see a solution for this exercise here:


  • To get the output on standard out (terminal screen):


  •  
    sed  s/’\/sbin\/nologin’/’\/bin\/bash’/g /etc/passwd
    
    
    
  • or to direct to a file:


  •  
    sed  s/’\/sbin\/nologin’/’\/bin\/bash’/g /etc/passwd > passwd_new
    
    
    
  • Note this is kind of painful and obscure, because we are trying to use the forward slash (/) as both a string and a delimiter between fields. Instead, you can do:


  •  
    
    sed s:’/sbin/nologin’:’/bin/bash’:g /etc/passwd
    
    
  • where we have used the colon (:) as the delimiter instead (you are free to choose your delimiting character!). In fact, when doing this, we don’t even need the single quotes:


  •  
    sed s:/sbin/nologin:/bin/bash:g /etc/passwd
    
    
    
  • Generate a column containing a unique list of all the shells used for users in /etc/passwd.


  • You may need to consult the manual page for /etc/passwd as in:


  •  
    man 5 passwd
    
    
    
    
  • Which field in /etc/passwd holds the account’s default shell (user command interpreter)?


  • How do you make a list of unique entries (with no repeats)?


  • Solution You can see a solution for this exercise here:


  • The field in /etc/passwd that holds the shell is #7. To display the field holding the shell in /etc/passwd using awk and produce a unique list:


  •  
    awk -F: ’{print $7}’ /etc/passwd | sort -u
    
    
    
    
  • or


  •  
    awk -F: ’{print $7}’ /etc/passwd | sort | uniq
    
    
    
    
  • For example:


  •  
    awk -F: ’{print $7}’ /etc/passwd | sort -u
    
    /bin/bash
    /bin/sync
    /sbin/halt
    /sbin/nologin
    /sbin/shutdown
    
    
  • In the following, we give some examples of things you can do with the grep command; your task is to experiment with these examples and extend them.


  • Search for your username in file /etc/passwd.


  • Find all entries in /etc/services that include the string ftp:


  • Restrict to those that use the tcp protocol.


  • Now, restrict to those that do not use the tcp protocol, while printing out the line number.


  • Get all strings that start with ts or end with st.


  • Solution


  • You can see a solution for this exercise here:


  • Search for your username in file /etc/passwd:


  •  
    grep your-username /etc/passwd
    
    
  • Find all entries in /etc/services that include the string ftp:


  •  
    grep ftp /etc/services
    
    
  • Restrict to those that use the tcp protocol:


  •  
    grep ftp /etc/services | grep tcp
    
    
  • Now, restrict to those that do not use the tcp protocol, while printing out the line number:


  •  
    grep -n ftp /etc/services | grep -v tcp
    
    
  • Get all strings that start with ts or end with st:


  •  
        grep ^ts /etc/services $ grep st$ /etc/services
       
    
  • The tee utility is very useful for saving a copy of your output while you are watching it being generated.


  • Execute a command such as doing a directory listing of the /etc directory:


  •  
    ls -l /etc
     
    
  • while both saving the output in a file and displaying it at your terminal.


  • Solution


  • You can see a solution for this exercise here:


  •  
    ls -l /etc | tee /tmp/ls-output
    less /tmp/ls-output
     
    
  • Using wc (word count), find out how many lines, words, and characters there are in all the files in /var/log that have the .log extension.


  • Solution


  • You can see a solution for this exercise here:


  •  
    wc /var/log/*.log
     
    
  • Note that you would have do this with sudo to get every file counted.