This is a quick refresher of INFS2822 (Programming for Data Analytics) for UNSW students.
These notes will reacquaint you with some concepts that you have previously studied in depth, but these notes will not fully explain these concepts for you. For that, please refer to your notes from when you studied INFS2822.
Contents:
- π»β1. Command Line
- πβ2. Python Essentials
- πΌβ3. Project Management
- πβ4. Web Publishing
- πΊοΈβ5. Geographical Visualisations
- π₯£β6. Web Scraping
- π€β7. Machine Learning
- βοΈβ8. Social, Legal and Ethical Issues
- π‘β9. Tips and Tricks
(Itemised based on concepts, not teaching weeks.)
Authors: Blair Wang (UNSW Business School). See also the GitHub repo.
People by and large are used to graphical user interfaces (GUIs), which are easy to use. However, for precision and power, it is beneficial to use command line interfaces (CLIs). There are two families of CLIs - the DOS-based family (DOS, cmd.exe
on Windows, PowerShell) and the UNIX-based family (UNIX, macOS, Linux). We use the UNIX-based family in INFS2822 because it's pretty much the standard for science and data work.
At the command line, there are various shells in which you can execute commands. The most common shell for UNIX-based work is based on Bash. In this course, we use zsh
which is an upgrade from Bash. Note that from one shell you can move to another shell, e.g. from zsh
we can move to python3
.
Getting started at the UNIX command line, there are a lot of useful commands:
date
(not to be confused withtime
)ls -lah
find
cat
/head
/tail
/less
/nano
/grep
A sequence of commands can be stored in a script. For example SQL comamnds can be stored in a .sql
file. Likewise, Bash/zsh commands can be stored in a .sh
file.
This section of INFS2822 is based on the Software Carpentry Programming with Python text.
This section is based on Python Software Carpentry set 1. If you are also taking INFS1609, you may wish to compare with INFS1609 Elementary Programming for the equivalent in Java.
- Variable assignment, e.g.
weight_kg = 60
(integer),weight_kg = 60.0
(float),weight_kg = 'sixty'
(String) - Printing to the command line with
print(..)
This section is based on Python Software Carpentry set 2.
import numpy
csvdata = numpy.loadtxt(fname='file.csv', delimiter=',')
# -- Data Types --
print(type(csvdata)) # will be <class 'numpy.ndarray'>
print(type(csvdata.dtype)) # will be float64
print(type(csvdata.shape)) # will be (rowcount, colcount)
# -- Rows and Columns --
print(csvdata[0,0]) # will be value at row 0, col 0
# python numbering starts at 0
print(csvdata[31,41]) # will be value at row 31, col 41
print(csvdata[31, :] ) # will be all of row 31
print(csvdata[: ,41] ) # will be all of col 41
# -- Quick Stats Examples --
print(numpy.mean(csvdata)) # mean across all rows and cols
print(numpy.max(csvdata[31, :])) # maximum value across row 31
print(numpy.min(csvdata[: ,41])) # minimum value across col 31
print(numpy.std(csvdata)) # standard deviation across all rows and cols
# -- Array of Statistics --
print(numpy.mean(csvdata, axis=1)) # array of averages for each row
print(numpy.mean(csvdata, axis=0)) # array of averages for each column
print(numpy.diff(csvdata[31, :]) # cell minus previous cell for row 31
This section is based on Python Software Carpentry set 3.
import numpy
csvdata = numpy.loadtxt(fname='file.csv', delimiter=',')
import matplotlib.pyplot as pp
# heat map
pp.imshow(data)
pp.show()
# line graph
pp.plot(numpy.mean(csvdata, axis=0))
pp.set_xlabel('x axis label goes here')
pp.set_ylabel('y axis label goes here')
pp.show()
# figure with mulitple plots
fig = pp.figure(figsize = 10.0, 3.0)
part1 = fig.add_subplot(1, 3, 1) # }
part2 = fig.add_subplot(1, 3, 2) # } each of these begins with 1,3 because the grid is 1x3
part3 = fig.add_subplot(1, 3, 3) # } then the 3rd one is just the index (starts at 1 here)
part1.plot(..)
part2.plot(..) # same syntax as above
part3.plot(..)
fig.tight_layout()
pp.savefig('file.png')
pp.show()
This section is based on Python Software Carpentry set 4 and Python Software Carpentry set 5. If you are also taking INFS1609, you may wish to compare with INFS1609 Loops and Arrays for the equivalent in Java.
# incrementing
for number in range(1, 6):
print(number)
# will give you 1, 2, 3, 4, 5
# arrays (lists)
foods = ["Fish Fingers", "Custard", "Grapes"]
print(len(foods)) # will give you 2
print(foods[1]) # will give you Custard
print(foods[-1]) # will give you last element = Grapes
print(foods[-2]) # will give you 2nd last element = Custard
for this_food in foods:
print(this_food)
This section is based on Python Software Carpentry set 6.
import glob
import numpy
filenames = sorted(glob.glob('dataset-2020-08-*.csv'))
for filename in filenames:
print(filename)
This section is based on Python Software Carpentry set 7. If you are also taking INFS1609, you may wish to compare with INFS1609 Selections for the equivalent in Java.
# using else-if (elif)
if x > y:
print('x is bigger than y')
elif x > z:
print('x is not bigger than y, but it is bigger than z')
else:
print('x is less than y and z')
# using logical operator (and, or, etc)
if (x > y) and (x > z):
print('z is bigger than y and z')
This section is based on Python Software Carpentry set 8. If you are also taking INFS1609, you may wish to compare with INFS1609 Methods for the equivalent in Java.
def bmi(mass_kg, height_m):
numerator = mass_kg
denominator = height_m ** 2
return numerator / denominator
CRISP-DM methodology:
- Business Understanding
- Data Understanding
- Data Preparation
- Modeling
- Evaluation
- Deployment
Web documents are written in Hypertext Markup Language (HTML). This involves tags (e.g. <p>
) that must later be closed (e.g. </p
>). These tags are nested inside one another in a hierarchy known as the Document Object Model (DOM). The visual style of the DOM when rendered in a web browser can be designed using Cascading Style Sheets (CSS).
HTML example β index.html
<!DOCTYPE html>
<html>
<head>
<title>Hello World</title>
<meta charset="UTF-8">
<link rel="stylesheet" type="text/css" href="styles.css" />
</head>
<body>
<h1>Hello World!</h1>
<p>The quick brown fox jumps over the lazy dog.</p>
</body>
</html>
Corresponding CSS example β styles.css
body {
max-width: 900px;
margin: 0 auto;
font-family: 'Verdana';
}
h1 {
color: blue;
}