A workshop on Unix Programming Principles using tools such as grep, sed, awk, shell programming and regular expressions
- Unix History
- Unix Software Philosophy
- Self Contained Shell Scripts
- Basics of Shell Programming
- Introduction to Text Processing
- Text Searching
- Text Substitution
- Filename expansions and globbing
- Working with Fields
- Text Sorting
- Arithmetic Operations and variables
- Decision Making and Exit Status
- Looping
- Input and Output
- Command Process Substitution
- History Substitution
- Subshells
- Shell Functions
- Signal Handling
- Working with Files
- Building Command line applications
-
Shell Scripting was developed in the context of the UNIX Operating System from Bell Labs
-
Early UNIX systems packed incredible power into very small machines
- 64 Kb "virtual" address space for the code and for data
- This was often less than that of physical memory on the early PDP-11S
-
Source Code made it easy to experiment and change the system
-
AT&T Bell Labs heavily influenced Unix by the likes of Ken Thompson, Dennis Ritchie, and others
Quote from Dennis Ritchie for Vision of Unix:
What we wanted to preserve was not just a good environment in which to do programming, but a system around which a fellowship could form. We knew from experience that the essence of communal computing, as supplied by remote-access, time-shared machines, is not just to type programs into a terminal instead of a keypunch, but to encourage close communication.
- Unix Developers were the users of the system and they developed tools to solve their own problems
- Unix Developers were given freedom to experiment and rewrite Unix as needed
- Unix was designed in a quest for elegance
Software Tools Book and Software Tools in Pascal
-
Programs should be like specialized tools in a carpenter's toolbox
- Avoid
create programs to rule them all
- Don't create programs that are like a Swiss Army Knife
- Meaning they do too much
- Tools can be combined using pipelines and the shell to get your work done
- This philosophy became populare in Kernighan & Plauger Books
Programs are easier to:
-
Write and to correct
-
Document
-
Understand and use
The
cat
command originally only concatenated filesThe
cp
command copies filesThe
mv
command moves and renames files - Avoid
Using Text as the main data format has advantages:
-
Text is easy to process with existing and new tools
-
Text can be edited with any text editor
-
Text is portable across networks and machine architectures
For example to list some popular baby names and sort them:
cat data/top-10-baby-names-2016.txt | awk '{print $2 }' | sort
- Regular Expressions provide powerful text matching and substitution
2 Flavors of regular expressions standardized by POSIX
- Basic Regular Expressions (BREs)
- grep, sed, ...
- Extended Regular Expressions (EREs)
- egrep, awk, ...
Use Standard Input/Output (I/O) when there is no files on the command line:
- Helps simplify writing programs
- Helps you hook programs together with pipelines
- Helps encourage programs to do one thing well
- Status messages that are mixed with standard output confuse programs downstream
- If you ask then you get it. Don't prompt with 'Are you sure'
- Do know what you are doing:
rm -rf /
Before running a command like this- This will delete everything starting from the root directory
- We have version control systems such as
Git
use them
- If your text is structured then after processing
- Write the same format for standard output in the same format of standard input
- Doing this affords you to build specialized tools that work together
- At times a tool does not exist, that is when you need to write the tool
- Can the tool be useful to other people?
- Can the tool be generalized?
If any of the answers to these questions are yes:
-
then write a general purpose tool
-
Scripting languages can often be used to write a software tool:
- Awk
- Perl
- Python
- Ruby
- Shell
-
You can also use other languages like for example
Golang
as we will see
- Using the
software tools
approach helps provide a framework and a mindset for programming and scripting - You can combine software tools to solve software programs
- This strategy in turn gives you flexibility and helps promote innovation
- Know your tools and thinking in the
Software Tools
Philosophy will improve your scripting
In computing, executable code or an executable file or executable program, sometimes simply referred to as an executable or binary, causes a computer "to perform indicated tasks according to encoded instructions,"[1] as opposed to a data file that must be parsed by a program to be meaningful.
Typically a high level language is used that compiles to executable machine code files
- Executable scripts typicall start with a
Shebang
=>#! /bin/bash
or the like- An optional argument can be provided
- Some Unix systems have small limits on the path name length
Shell Scripts can be simple executable text files that contain shell commands.
- Keep in mind that this only works if the shell script is in the same language as the interactive shell
- For example to expect a zsh shell script to run in a bash environment