- Learn
- Impress, easy to present
- Social impact
- product/company
- Demonstrate skills
- value/impact
- original
- data availability
- demonstrate visually
- suppervised learning, have standard
- No 3rd party
-
Why: packaging, quick for prototyping, big community, R is specific for statistic,
-
Object oriented programming, everything can be defined as an object (class) > so due to can it is not a fully-object oriented > java is
-
Learning python OREILLY
-
JupyterNotebook, good for visualization
-
python comprehention is quicker (one line for loop)
PEP = Python Enhancement Proposal
Code Style - The Hitchhiker’s Guide to Python!
Chapter 2 of Effective Python - Brett Slatkin
Imports at the top of the .py
file (each section in alphabetical order)
- standard library
- third party modules
- your own modules
Limits line length to 79 characters
Learning outcomes
- difference between API & webscraping
- what JSON is (and why it's like a Python
dict
) - how to properly handle files in Python
- what a REST API is
- how to use the
requests
library
Both are ways to sample data from the internet
API
- structured
- provided as a service (you are talking to a server via a REST API)
- limited data / rate limits / paid / require auth (sometimes)
- most will give back JSON (maybe XML or CSV)
Web scraping
- less structure
- parsing HTML meant for your browser
Neither is better than the other
- API developer can limit what data is accessible through the API
- API developer can not maintain the API
- website page can change HTML structure
- website page can have dynamic (Javascript) content that requires execution (usually done by the browser) before the correct HTML is available
Much of the work in using an API is figuring out how to properly construct URLs for GET
requests
- requires looking at their documentation (& ideally a Python example!)
- ProgrammableWeb - a collection of available API's
- For the Developer or For Developers documentation on your favourite website
- public-apis/public-apis
Most APIs require authentication
- so they API developer knows who you are
- can charge you
- can limit access
- commonly via key or OAuth (both of which may be free)
All the APIs we use here are unauthenticated - this is to avoid the time of you all signing up
If your app requries authentication, it's usually done by passing in your credentials into the request (as a header)
response = requests.get(url, auth=auth)
JSON (JavaScript Object Notation) is a:
- lightweight data-interchange format (text)
- easy for humans to read and write
- easy for machines to parse and generate
- based on key, value pairs
You can think of the Python dict
as JSON like:
- dict to json string: json.dumps(data)
- json string to dict: json.loads(data)
open(path, mode)
- use encoding = 'UTF-8'
Common values for the mode:
-
r
read -
rb
read binary -
w+
write (+
to create file if it doesn't exist) -
a
append -
Reading files with context management, with or use close()
REST is a set of constraints that allow stateless communication of text data on the internet
- REST = REpresentational State Transfer
- API = Application Programming Interface
REST
- communication of resources (located at URLs / URIs)
- requests for a resource are responded to with a text payload (HTML, JSON etc)
- these requests are made using HTTP (determines how messages are formatted, what actions (methods) can be taken)
- common HTTP methods are
GET
andPOST
HTTP methods
- GET - retrieve information about the REST API resource
- POST - create a REST API resource
- PUT - update a REST API resource
- DELETE - delete a REST API resource or related component
RESTful APIs enable you to develop any kind of web application having all possible CRUD (create, retrieve, update, delete) operations
- can do anything we would want to do with a database
Further reading
- Web Architecture 101 for more detail on how the web works
- CDN = Content Delivery Network
- DNS = domain name system
- H vs V scaling: horizontal scaling means that you scale by adding more machines into your pool of resources whereas “vertical” scaling means that you scale by adding more power (e.g., CPU, RAM) to an existing machine.
- In web development, you (almost) always want to scale horizontally because, to keep it simple, stuff breaks
- your app is “fault tolerant.”
- minimally couple different parts of your application backend
- load balancers = They’re the magic sauce that makes scaling horizontally possible.
Docs - https://sunrise-sunset.org/api
First we need to form the url
- use
?
to separate the API server name from the parameters for our request - use
&
to separate the parameters from each other - use
+
instead of space in the parameter
res = requests.get("https://api.sunrise-sunset.org/json?lat=52.5200&lng=13.4050") data = res.json() data
[item for item in dir(response) if '__' not in item]
from collections.abc import Iterable
for k, v in item.items():
if isinstance(v, Iterable) and len(v) < 100:
print(f'{k}: {v}')
Here use strptime
to convert the integer into a proper datetime:
- (Python's strftime directives is very useful!)
url = 'https://www.google.com/images/branding/googlelogo/2x/googlelogo_color_272x92dp.png'
res = requests.get(url)
res.text[:100]
with open('./data/google-logo.png', 'wb') as fi:
fi.write(res.content)
- why: list can have any data type, list has (head, length and type), numpy has constrains on type and will store on memory all together,
- broadcasting
hint: size, itemsize
Z = np.zeros((10,10))
print("%d bytes" % (Z.size * Z.itemsize))
hint: arange
Z = np.arange(10,50)
print
hint: array[::-1]
hint: reshape
hint: np.eye
hint: np.random.random
hint: min, max, amin is for axis
-
np.full, np.full((3,5), 3.14)
-
np.linspace(0, 1, 5)
-
np.random.random((3,3)) # uniform distibution
-
np.random.normal(0, 1, (3, 3)) # normal dis
-
np.random.randint(0, 10, (3,3))
-
np.zeros()
-
np.ones()
-
np.eye()
-
np.empty()
-
np.ones_like()
hint: array[1:-1, 1:-1]
Z = np.ones((10,10))
Z[1:-1,1:-1] = 0
print(Z)
Z[:, [0, -1]] = 0 Z[[0, -1], :] = 0 print(Z)
#### 16. How to add a border (filled with 0's) around an existing array? (★☆☆)
`hint: np.pad`
```python
Z = np.ones((5,5))
Z = np.pad(Z, pad_width=1, mode='constant', constant_values=0)
print(Z)
hint: np.diag
Z = np.diag(1+np.arange(4),k=-1)
print(Z)
hint: np.unravel_index
print(np.unravel_index(99,(6,7,8)))
hint: np.tile
Z = np.tile( np.array([[0,1],[1,0]]), (4,4))
print(Z)
hint:
Z = np.dot(np.ones((5,3)), np.ones((3,2)))
print(Z)
# Author: Jake VanderPlas
print(sum(range(5),-1))
from numpy import *
print(sum(range(5),-1))
np.array(0) / np.array(0)
np.array(0) // np.array(0)
np.array([np.nan]).astype(int).astype(float)
nan 0 [-9.22337204e+18]
hint: np.intersect1d
Z1 = np.random.randint(0,10,10)
Z2 = np.random.randint(0,10,10)
print(np.intersect1d(Z1,Z2))
hint: np.intersect1d
Z1 = np.random.randint(0,10,10)
Z2 = np.random.randint(0,10,10)
print(np.intersect1d(Z1,Z2))
np.sqrt(-1) == np.emath.sqrt(-1)
- For negative input elements, a complex value is returned (unlike numpy.sqrt which returns NaN).
hint: np.datetime64, np.timedelta64
hint: np.arange(dtype=datetime64['D'])
Z = np.arange('2016-07', '2016-08', dtype='datetime64[D]')
print(Z)
- D3js, low level
- Dasg, streamlit
- life server extention for vscode
- Probability vs Liklihood, the presenvce of condition, link
- Random variables, the outputs depond on random phenomena > probability theory
-probabiity distribution
- pribability distribution, continues variable, density function. Probability distribution:
- continues, distribution function
- discrete, mass function .
- Marginal probability (can't go back to each single probability) vs conditional probability
- joint distribution, can't go back into signel distribution as marginal distribution.
- chain rule
- Probabality dependnece/Independence vs conditional indpendence
- expection, expected value > summation of all probabillity > for normal distribution equal to the mean
- Variance > how close are we to the expected value, same as spread
- covariance, > how much two item
- Bionomial Distribution
- Bernouli destribution
- Multinoulli Distribution & Categorical distribution
- Gaussian distribution
- neumann's random generator
- Dirac distribution
- Nistaure of distribution
- Bayes Rule
- Structured probabilistic. directed vs undirected
- directed vs directed probability chain
- Monto carlo, maximize the probability with structured probability
- Marcov chain, don't need to know the path, you are always in a state, and have the probability of going to a state
- Median is outlier resistance
- Why using varainace- population and variance-sample, (n-1) in sample to consider the bias and genedr one column instead of 2.
-skewness and Kurtosis of the destribution
- Permutation, n!
- K-permutation, n!/(n-k)!
- combination, P(n,k) = n!/(n-k)!/k!
- Pascal's triangle
- bionomial dist, need to have two, for unfairness it works onli with coin not dice as dice has more than two choice
- Poisson Distribution, works for situation with binomial situation and number of occurance is little, good for extreme events.
- BIONOMIAL, scipy.stats.binom.pmf(k, n, p)
- POISSON, scipy.stats.poisson.pmf(k, mu)
- NORMAL, scipy.stats.norm.cdf(x, mu, sigma)
- T-DIST, scipy.stats.t.cdf(t_score, df), t_score = (x - mu) / (s / (df + 1) ** 0.5)
- x2, scipy.stats.chi2.cdf(x, df)
- central limit theorem
- Confidence interval, at least 30
- one sided, vs two sided hypothesis testing. Man taller than women vs MAn with diff height than women.
- P value, the integral of distribution
- White noise
- error, estimation Blue, best linear unbiased estimateor
-
Cleaning data: quality, quantity, diversity, cardinality (No unique values), dimensionality, sparsity
-
Data Charachter: Stationarity (iterating, new, environment, model effect on data), duplicates, class imbalance, Biased sampling
-
test-validation, k-fault, the validation is moving
-
bi-variant analysis, variable corolate to the variable or the target
-
visualization:
- correlation matrix
- plot the target
-
Data Encoding, sklearn, ctaegorical encoder
- one-hot encoding, each category get a column and give 0, 1 to it, memory issue and sparse issue
- category encoding, add 1,2,3 to each category
- ordinal encoding, same as category but it consider the target value prediction in the ordering
- frequency encoding
- Binary encoding
- Mean encoding, directly using the mean value kof the target value of the categories, target encoding
-
NLP model and data, link
- Tokenize the data
- Lemmatize the data, NLTK, spacy
- Get n-grams
- Visualize, histogram, word cloud
- Repeat
- TF-IDF Vectorization of Text Features, Text Frequency-Inverse Document Frequency
-
Sound data building, link
- ML flow,
- panda profiling, link
- practice i, https://sqlbolt.com/
- interview q : https://leetcode.com/problemset/all/, pramp, levels.fyi
- Data warehouse vs data lakes > datawarehouse supposed to be cleaner
- ACID, atomic, consistant, isolate durable needed for data warehouse not data lake
- ZOR, exlusive, place only for one
- Left table, is the first table, Right table is the second one with join. many sql don't have a right join.
- Building schema: snowflake (finer good for operational, more normalize) vs star(good for opration, more duplicate data).
- kafka: web service to transfer huge data, can be directly connected to warehouse or just to spark (Hadoop) or just dumping to data storage (e.g mango db)
- spark for graph: GRAPHx and kafka also work with it
- spark replacement: Snowflake.AWS Redshift.Azure Synapse.Google BigQuery.
- schema: relation of the tables with forgen key and data types
- primary key vs foreign key, primery is unique id but foreig key is for connecting
- technique of reducing redundanty and duplicate data.
- important for Insert, update, delete (annomally)
- important to think how we can denormalize the data > into different table
- UUID, universal unique identifier, 128 bit > hasshing > SHA hashing common but not secure longer
- Computational complexity, link, course recommended first lectures, e.g why order by makes the request slow (n rows * log of (n rows))
- Index optimization,
- assess tools: ease of use, scalability, security, documentation and support, advance fts, cost
- ETL vs ELT (extract transfer load), ELT new for small data with less sequrity
- OLAP, online analytical processing > optimize for reading
- OLTP, businuss use online transactional processing > optimize for write, update, edit
- OLTP + ETL > OLAP
- E, extracted from OLTP or RDBM
- OLD ETL, hand cde in e.g. python
- NEw ETL, auto intergatre, integrate.io
- high-powered processing offered by modern, cloud-based data warehousing solutions
- corpus data: text data
- data sources like Kaggle or Reddit or Google data Search or the University of California Irvine machine learning repository.
- scaling data: normalization, standardization (mean=0, sd=1 > more gaussian), bining
-
all ds is using the cross entropy, Entropy is a measure of the randomness or unpredictability in a set of data.
-
entropy, measure of disorder.
H(X) = - sum(p(A) log(p(A)))
-
cross entropy,
H(x) = - sum(p(A) log(Q(A)))
log2(1) = 0
-
depth of three is log2 of number of branches
-
sometime we use log on base e, and it behaves smoother
-
equal probability of options has the max entropy
-
Cross-entropy is a measure of the difference between two probability distributions. It is commonly used in machine learning to measure the dissimilarity between the predicted and actual distributions. The cross-entropy H(P, Q) between two probability distributions P and Q is: H(P, Q) = - ∑ [ P(xi) * log2 Q(xi) ] for all i
-
nagative log liklihood
-
confusion matrix,
accuracy = (TN + TP) / (all) precision = (TP) / (TP + FP) Recal = (TP )/(TP + FN) f1_score = 2 * (recall + per) / (recall + per)
-
the information, statistical mechanics
-
Seth Loyed, informational theory complexity explore
-
Softmax is used for multi-classification in logistic regression model (multivariate) whereas Sigmoid is used for binary classification in logistic regression model.
-
covariate, same as feature
-
Loss function needs to be defrentioable
- use cross validation set for hyper parameter training
- R2 score shows how good our model is compared to just using the mean value, close to 1 better
- if R2 is smaller in test than train > underfitting
- Basian works with a believe and it needs less data > statistical rethinking, online course, book: the rule that never dies
Lecture 8: Troubleshooting Deep Neural Networks - Full Stack Deep Learning - March 2019:
the parts of regularization
- lasso, l1, devided by absolut
- rich, l2 , squeared sum
- elastic net, has both l1+l2
- example, https://cs231n.github.io/neural-networks-case-study/
- use lime for explainabilty, toturial
- Gradient boosting
- normaliaztion, sklearn standard scale
- use pd.sample(5) instead of heasd()
- changing number of PCA component form 2> 3 still 2 first component will stay the same (the computation is not stochastic)
- sns-pairplot() good view of variable comparison,
sns.pairplot(penguins, hue="species")
- sklarn.metrices.classification_report() returns F1 score, Recall and Percision
- avoid leackage, split data train, test, cv, normalized the train, save its transfomer and use it on test and cv to avoid the leakage
- pd.crosstab()
- pip install scikit-learn-extra
- k-miedoids
- run command
mlflow ui --backend-store-uri sqlite:///mlflow.db
- lower the varience beetter the split will be.
- Puning, contoled with the hyper parameter
- sklearn.tree.plot_tree()
- entropy computation is just used for classification, never used for regresseion
- Gini: Sum(p_{i}^2) gini more efficent in computetion than entropy, the probability is probability of two items being in the same class
Error = bias + variance + noise
- noise = unmanageable
- variance = fitting to noise
- bias = missing signal
Bias is the difference between the average prediction of our model and the correct value which we are trying to predict. Model with high bias pays very little attention to the training data and oversimplifies the model(underfitting).
Bias are the simplifying assumptions made by a model to make the target function easier to learn.Generally, linear algorithms have a high bias making them fast to learn and easier to understand but generally less flexible. Examples of high-bias machine learning algorithms include: Linear Regression, Logistic Regression.
Variance is the variability of model prediction for a given data point or a value which tells us spread of our data. Model with high variance pays a lot of attention to training data and does not generalize on the data which it hasn’t seen before. As a result, such models perform very well on training data but has high error rates on test data. (Overfitting).
Variance is the amount that the estimate of the target function will change if different training data was used.
High variance may result from an algorithm modeling the random noise in the training data
The bias-variance tradeoff is a central problem in supervised learning.
Ideally, one wants to choose a model that both accurately captures the regularities in its training data, but also generalizes well to unseen data.
Unfortunately, it is typically impossible to do both simultaneously. High-variance learning methods may be able to represent their training set well but are at risk of overfitting to noisy or unrepresentative training data.
In contrast, algorithms with high bias typically produce simpler models that may fail to capture important regularities (i.e. underfit) in the data. (Wikipedia)
These different ensemble methods tackle the tradeoff in different ways
- forests = high variance, low bias base learners
- boosting = low variance, high bias base learners
** The component / individual learner of the ensemble which are combined strategically is referred to as Base learners.
Further Reading:
- https://bit.ly/3Oi3cmH (Overfitting and Underfitting With Machine Learning Algorithms)
- https://bit.ly/3aLv4Su (Understanding the Bias-Variance Tradeoff)
-
command cheat shit, link
-
simple code to run flask app:
FLASK_APP=myapp:app flask run --host 0.0.0.0
-
then making the docker file, buiding it and runing it
docker build -t myflaskapp .
docker run -it --rm -p 8989:8989 myflaskapp
-
for production better to use the gunicorn, uswgi, fastapior uvicorn to run uwsgi server instead of production
-
runing several servers: using dockerfile for each and then set them together with dockercomposer,dsr-db/databases/6_Redis_Exercise example`
-
nginx, reverse proxy, used for security between public internet and the app
-
docker composer a wrpper of severa docker
-
kubernetes, for advance setup with many users same as docker composer but handles much more complex mixing of the containers and images
-
cool cloring, use zsh and oh my zsh, link, to check it the $SHELL should return /bin/zsh >rubyrussel > define the color code theme
Unstructred DB, nosql
-
Appachee Avro (for Hadop large dataset), still used between kafka. python lib
fastavro
. file.avro, 10 times smaller than csv. -
needs an schema, stored in the file as a metadata
-
.npy numpy dependence, on version
-
pickle also python dpendence, not good for long time data storage
-
orjson, faster than json reader
-
Apache paraquet, 2-3 times smaller than avro and can read it with oanda, fastparaquet > best togo format
- NoSQL means not only SQL DB
- relational db
- document db, mangodb, coachdb, TerminusDB(bunch of json files)
- CAP theory, Consistancy, Availability, Performance > all don't go together
-
stocastic gradiant decent
-
Relu vs sigmoid: computaton efficeny and issue with vanishing gradiant that computers don't have enough percisions and small numbers become zero in gradient computation.
-
leaky relU: When the data has a lot of noise or outliers, Leaky ReLU can provide a non-zero output for negative input values, which can help to avoid discarding potentially important information, and thus perform better than ReLU in scenarios where the data has a lot of noise or outliers
-
GELU, Gaussian Error Linear Unit,diffrentiable at zero, better for complex learning, e in-practice disadvantage of being much, much more complex to compute. it makes a difference between negative values that are close to zero.
-
hyperparmeter tuning, link
-
Shatterin dataset ability to perfectly classfy the data.
- most of functions are the same as numpy and broad casting works (extending operation to all cells).
class Network(nn.Module):
def __init__(self):
super().__init__()
# Defining the layers, 128, 64, 10 units each
self.fc1 = nn.Linear(784, 128)
self.fc2 = nn.Linear(128, 64)
# Output layer, 10 units - one for each digit
self.fc3 = nn.Linear(64, 10)
def forward(self, x):
''' Forward pass through the network, returns the output logits '''
import pdb; pdb.set_trace()
x = self.fc1(x)
x = F.relu(x)
x = self.fc2(x)
x = F.relu(x)
x = self.fc3(x)
x = F.softmax(x, dim=1)
return x
model = Network()
model
Commands for the python debugger:
ll - shows context
n - goes to the next line
c - runs to the next breakpoint
q - quits the debugger
more, (link)[https://www.youtube.com/watch?v=P0pIW5tJrRM]
- 2012, Alex net, last big award
- 1989, MINST (digits)
- ResNet, it has sjortcot/skip connection to avoid gradient vanishing by adding the f(x) after calculationg the gradient. vanishing gradient comming from manipulation of small value in gradient calculation.
- in transfer learning, removing last year, > freezing parametr > replacing the last layer, part 8
- Image generation,
- Image Segmentation
- Colorization
- Denoising
- Super sampling
- Image captioning-imag2 text
- Imafe2vec
- Object identification
- Object detetion
- object recognition
- object classification
- pose detection, use LSTM for classification or more data use transformers
- GANs, to network 1, Adversarial/discrimination (fake no fake) 2, generator
- Defusion models, denoiser-autoencoder (data: img+ added noise, image)
- Variational Autoencoder(encoder-decoder: in and out same image), generate normal distribution from latent space representation.
- Wavenet, first used for audio processing
- CNN, the filter is also learned
- VIT, vision transformer.
- Few shot learning, handful of data
- zero-shot learning, NN with seeing no example
Architecture: filter, kernel size, sequence, dropout, FC layer size data: input image size, augmentation, new data
-
Augmentation: yes/no
-
Dropout: yes/no, how much
-
FC size
-
which pretrained network, Other Models: link
-
Dropout: dont freeze wait and biase, some of the output values are set to zero
-
Increasing image shape, add computation due to change of FC layers size
-
number of parameters: filter, kernelw-h, 3rd deimention of the image
-
limit of arcitectures, vaishing gradient, adding residual connection.
Digital Signal processing, audio or picture
# Freeze parameters so we don't backprop through them
for param in model.parameters():
param.requires_grad = False
from collections import OrderedDict
fc = nn.Sequential(OrderedDict([
('fc1', nn.Linear(512, 500)),
('relu', nn.ReLU()),
('fc2', nn.Linear(500, 2)),
('output', nn.LogSoftmax(dim=1))
]))
model.fc = fc
- difference of segmentation and classificiation , labaling is more costly and has localization
- one, one convolution, resizing in z dierection and adding non-linearity
- one-cyle policy, having differnt learning rate between batches
- 60-20-20, train-validation-test
- to make the ratio of classes equal in th data set use stratified, link
- if the number of images is not possible to startify, you can calibrate the, link
- transferlearning, freezing the parameters
learn = vision_learner(dls, resnet34, metrics=error_rate)
- lambda: do the computation
- couldwatch: assess the computation
- Identity, acess mngmn, IAM: who or what can access services and resources
- CloudFormation: model, provision, and manage AWS and third-party resources
- Mngmnt Consule
- Polly: uses deep learning technologies to synthesize natural-sounding human speech
- Commanf line interface CLI
- Budgets
- Tools and SDKs
- Simple Storage service: images, ...
- make user group > administrator
- make user
- make ssh key, access key >cli command
- kg+llm, link
- Terminology:
- what is the situation: State,
- what are the possible options: Action space
- What are the consequences of each environment: Environment
- how rewarding / costly: reward
- where do you end up after: next state
- Policy: what we are learn, learning state to action
-
autoregressive model (same as NLP) works the best, but the difference is that : 1- the pattern is not catchable for human, so hard to judge the result (hard to describe) 2- it is not same pattern, the pattern can totally change
-
hardest to bit, null-baseline, predcting tomorrow based on today
-
model overview, may 2023, slide 7
-
Autocorrelation = correlation with itself in the past
-
flat line (sine curve), has no trend, ARIMA method takes away the trend to be stationary. Trended is if mean is changing by time.
-
Metrices: slide 84
-
Review question: slide 100-104
-
Anomaly detection
-
Forcasting
-
classification
-
keywords: Trend, Seasonality, Residual aka Noise, Stationarity, Autoregressive, Autocorrelation and Partial Autocorrelation, Differencing, Backtesting, Exogenous variable, Look-ahead problem, Multivariate vs univariate, Recursive forecasting, Exponential moving average, Exponential smoothing, LSTM, ARIMA
- NAN: always consider the lag values, never look in the future, not future influcing the past, just te past affecting the future.
- anomalies, defining a percentage or a window in a past and if above that the use the window values for replacing that value
- Scipy overview, link
- kaggle, link
- numer.ai, anonymus kaggle for stock market
- hugging face, link
- DARTS
- online course, link
- PROPHET, link
- KATS, link
- Look into classical computer science problem
- Give 10-20% time to analyse the problem before solving it
- Don't share unnecessary input (start day, different interests and so on)
- Don't get stuck in the emotional challenges of waiting to hear back.
- Do research on the type of their peoblem, question what you assume you know