Intro to the Command Line
Goal of the class
Our goal today is to get everyone to a point where they feel comfortable opening the command line, navigating files, and learning something new. There are a lot of command line tools you can learn to use at NICAR, but those can be super intimidating if you have never used it before. My goal is that everyone is going to leave class today ready, willing, and able to use the command line and learn new command line tools.
In the class today, we are going to learn about the command line, receive some leaked data, make a home for our reporting on our computer, and inspect the data. We'll learn some command line tools, and learn how to learn new tools as well.
Data (If you are in-person at NICAR22, you don't need to download the data, its been downloaded for you)
Command Cheatsheet
Here are the commands we'll use in class and what they do. Consult this if you get lost.
Commands
pwd
- print working directory, this command tells you where you are in your computer, literally where the folder you are in is located on your computer. If you get lost and are unsure where you are, type pwd
and it will tell you where you are.
ls
- list, the ls
command lists the contents of a directory. You can use ls
in a few different ways. Typing ls
with no other arguments lists the contents of the directory you are currently in. You can also specify which folder you would like to see the contents of, like so: ls $DIR
($DIR
is shorthand for a specific directory you specify on the command line). ls Downloads
will show you all the files and folders in Downloads
cd
- change directory, you use cd
to say which directory you want to move into. You need to specify which folder you want to move into, and that folder needs to be beneath you in the file structure. Usage: cd $DIR
moves you into the directory you specify. ($DIR
is shorthand for a specific directory you specify on the command line). cd Downloads
moves you into the Downloads
folder.
mv
- move, mv
is used to move a file from one location to another. Usage: mv $FILE $LOCATION
, which means it moves the file you specify ($FILE
is a shorthand for a specific file name) to a location you specify.
mkdir
- mkdir
is the command to make a directory, usage: mkdir $DIRNAME
, which means you write the new directory name after the command.
cat
- cat $FILE
prints what is inside the file you specify to the output of the command line.
Shortcuts
~
- The tilde, when used on the command line or in a file path, is short for your home folder. This usually the folder that you are in when you open the command line. You can use the ~
to quickly get back to your home folder by running cd ~
, which is the command for change directory to the home folder.
cd ..
- This is the command for going back up into the directory above the one you are in. Let's say that you have my_folder/Downloads/investigation
and you are in Downloads
, running cd ..
will put you in my_folder
Terminology
GUI - Graphical user interface
CLI - Command line interface, or the command line.
STDIN - Computer jargon for where you are typing stuff into the command line
STDOUT - Computer jargon for where the command line prints stuff out to.
Directory - Another word for folder. When you open Finder and look at some files, you are viewing a directory'
Utility - Another word for a command line tool
Argument - The arguments you pass to the command line tool. You need to pass arguments to the tool when it needs to know exactly what to work on, like a filename or a directory. When you write cd Downloads
to move into Downloads
, the directory name Downloads
is the arugment.
Class Notes
Class will follow the slides. Here are the notes for the session.
-
Intro
- Will Craft, data reporter for APM Reports and the podcast In The Dark
- Goal of the class: get you familiar with the command line. I am NOT going to teach you everything you ever need to know, but you’ll leave the class knowing how to use the command line to learn other things. There are thousands of tools you can use from the command line
- examples: removing duplicates from a dataset, moving thousands of files with a few keystrokes, uploading data to databases.
-
Agenda
- Why learn the command line?
- What is the command line?
- How to read and write on the command line, plus some definitions
- Someone leaked us some data!
- Building a home for our project
- Learning a new tool
-
Why Learn the Command Line?
- The command line is an incredibly powerful tool that allows you to do a whole lot on your computer. Here are some classes at NICAR that I want to prepare you to go learn
- Finding needles in haystacks with fuzzy matching
- Fuzzy Matching: matching bits of data when they are spelled slightly different, like matching "LA Dodgers" and "Los Angeles Dodgers", its literally measuring the similarity between two differnt pieces of text.
- Advanced PDF processing with OCR and command-line tools
- We've all been there, getting documents that have our valuable, valuable data stored in paper. There are plenty of command line tools that help you open up the PDF and pull the data out of the document.
- Finding needles in haystacks with fuzzy matching
- The command line is an incredibly powerful tool that allows you to do a whole lot on your computer. Here are some classes at NICAR that I want to prepare you to go learn
-
Now we'll do the same using the command line! But first... What is the command line?
- short answer, its a way to directly give your computer commands. You are telling your computer to do stuff and it can be VERY powerful
-
Text-based video games
- maybe you've played an old video game where you typed commands to your character and they do something. Basically this is what we do with the command line
-
How to read and write on the command line 1
- screenshot of the terminal, diagrammed. Also called the shell. The program that allows you to type commadns to the computer
- prompt - The prompt is a bit of information that is displayed on the command line, literally prompting you to input information. It usually separates from the space you type from the info with a
$
. On my computer, it just shows the user and the directory I am in (me,wcraft
, and the home folder~
, which isUsers/wcraft
). It will be different for you. - command line, where you type
-
How to read and write on the command line 2
- screenshot of the terminal, with a command
- utility - utility is the proper name for the tools & commands that we use on the command line. This is
ls
, which l i s ts the contents. - flag - flags change how the utility operates. There are a lot of different flags for pretty much every utility.
ls
with-G
displays in color,ls
without-G
displays in the normal text color - arguments - arguments are the things that you want the utility to operate on. Not every utility needs arguments, but most do. This command is saying list the stuff that is inside the folder
apm_reports/training_and_guides
-
Make a home for the project using Terminal
- Steps:
- Open the command line - Terminal
- Where are we on the computer? What's there?
pwd
ls
- Go into the Downloads folder and look around for the secret data folder. What's there?
cd Downloads
ls
ls SECERT_DATA_DO_NOT_SHARE
- Make a folder for the project and make folders for data
cd ~
mkdir secret-data-project
cd secret-data-project
mkdir data
mkdir data/source
mkdir data/processed
- Move the data from downloads into our project folders
mv ~/Downloads/SECRET_DATA_DO_NOT_SHARE/
(then hit tab twice to see files)mv ~/Downloads/SECRET_DATA_DO_NOT_SHARE/Janmeownary_Bribes.csv data/source/
ls data/source/
mv ~/Downloads/SECRET_DATA_DO_NOT_SHARE/Febmeownary_Bribes.csv data/source/
ls data/source/
- Inspect the file locations (ls & cat/head)
cat data/source/Janmeownary_Bribes.csv
- Learn a new utility: csvkit
- Using
cat
on the data sucks, it does not make inspecting spreadsheets on the command line easy at all. So lets learn a new utility,csvlook
, which is part of the csvkit set of tools for working with CSVs. - csvlook
- Using
- Steps:
Other Classes here at NICAR22
Finding needles in haystacks with fuzzy matching Advanced PDF processing with OCR and command-line tools Intro to R or Intro to Python.