/ire23-command-line

Materials for a command-line workshop at IRE 2023 in Orlando.

Primary LanguageJavaScriptMIT LicenseMIT

IRE 2023: Don't be afraid of the command line (Macs)

This repo contains materials for a one-hour workshop at the IRE 2023 conference in Orlando on using the command line on a Mac.

The session is scheduled for Saturday, June 24, from 3-4 p.m. in room Coral A on the first floor.

Also check out "Finding needles in haystacks," an introduction to using the command-line software csvmatch, immediately following this class in the same room.

Course outline

  • What, would you say, are we doing here? Some command-line workflow pros and cons
  • What if you work on a PC?
  • Get yourself a good text editor like Sublime or VS Code
  • Introduction to Terminal
  • Navigating the file system starts at home: ~
  • Where am I? pwd
  • Printing text with echo
  • Moving around with cd
  • Using . and .. as you move around
  • Using tab completion
  • Listing a directory's contents with ls
  • Using command flags
  • Copy, move and rename files: cp and mv
  • Editing your command
    • Ctrl + a: Move to the beginning of the line
    • Ctrl + e: Move to the end of the line
    • Ctrl + c: Stop running process
  • Reading files with cat
  • Making directories with mkdir
  • Creating files with touch
  • Opening files or directories with open
  • Getting a sneak peek at files with head and tail
  • Searching inside files with grep
  • Piping, redirecting and appending: |, >, >> (can also use << etc.)
  • Reading and writing to your clipboard: pbcopy and pbpaste
  • Issuing multiple commands with &&
  • Counting lines with wc (with the -l flag to count rows)
  • Unzipping archives with unzip
  • Getting help with man
  • Using CLI software (installed separately)
    • curl (HTTP client)
    • pdftotext and other xpdf tools (working with PDFs)
      • pdftotext -table kristi-noem-campfin.pdf
    • youtube-dl (archiving YouTube videos)
    • csvkit (working with tabular data files)
      • in2csv MLB2018.xlsx | head
      • in2csv MLB2018.xlsx > mlb2018.csv
      • in2csv MLB2018.xlsx | csvcut -c TEAM | sort | uniq
    • csvmatch (fuzzy matching between tabular data files)
      • csvmatch data1.csv data2.csv --fields1 name --fields2 'Person Name'
    • ffmpeg (audio and video editing)
    • imagemagick (image manipulation)
    • git (version control)
    • exiftool (image metadata)
  • Running scripts
    • Bash/Zsh (c.f. changing file permissions with chmod)
    • Python
    • Node.js
    • ... etc.
  • Working with databases (postgres, sqlite, etc.)
  • A more complicated example putting a few things together -- download 2023 Congressional financial disclosure and get some stats: curl https://disclosures-clerk.house.gov/public_disc/financial-pdfs/2023FD.ZIP > congress_disclosures_2023.zip && unzip congress_disclosures_2023.zip && csvstat -t 2023FD.txt

Practice

  • Use youtube-dl to download a video, but pipe the output to ffmpeg to cut all but the first 10 seconds and redirect the output to file
  • Use curl and csvkit to download a file from an open data registry of your choice and cut out just the columns of interest, then write to file (here's a good collection)
  • Use pdftotext and a good text editor to clean up messy data tables in a PDF (bonus: Use regular expressions to help clean it up!)

Highly recommend