tinyurl.com/gc-ai-24
Model Architecture:
2012 - 2017
Function written in code
If file name is cat its a cat
if 50% of the pixels are orange in a cat
take in an image a parameter and create an output
Input data then a series of parameterized data to create an output
Take inputs and model arch to make ideal params:
For each parameter I want to fit them to the known outputs
model inference code
Takes in ideal params, new inputs and model arch
a neural network arch takes in a matrix of floats which can represent any kind of parameter you want
It can generalize to many problems and also complex problems like image classification
from cat image to label cat is also a data transformation
2017 - 2022
Pre-training
Google takes a large corpus of data to create initial parameters
then people can take the initial params to fine tune the params with theyre own data (fine tuning transfer learning)
we can piggy back on object detection models for general recognition to help identify specific options.
we can leverage the initial recognition parameters to fine tune our own model
The bulk of the model arch goes into extraction of sub features to the end of the network to do a task specific classification
If we cut it off just before convergence to a prediction we get a feature vector/embedding
the initial params are the weights of the nodes
were just taking a matrix or tensor and transforming it into a different size matrix or tensor
image in -> one of 7 labels out
text in label out
2022
We don't need pre-trained models for image classification blah blah blah, we can make the more general
take image, chop a bit out, see if you can fill it in.
take sentence see if you can fill it in.
autocomplete, fill in the rest
These are meta tasks that no one rly cares about but we could start with them to do task specific things like spam detections
buut if we use the whole internets data we could auto complete whatever we want
Fill in the rest of : " a great blog about go is: " and then it "fills in" a whole blog
2022- On Gen AI
Pre training to params which are used to fine tune, now we take a fine tuned model and give it a prompt and infer an output ( chatgpt )
Training is compute heavy, prompting is just a function call
Pre training will happen on a meta task like auto complete
then fine tuning has a curated set of input and outputs (alignment data set) special format for the auto complete
awdawdawd wadiawda awdaw
train the autocomplete to complete in a certain format
Our prompt needs to use the same format
They take our general prompt, shove it into their format, then call the inference code with that format then poop out the output
Theres some pre-processing that happens on your prompt ( censorship re formatting) then output then post process ( more censoring formatting )
using your own model you can strip this pre and post processing ( dodge moderation )
the string prompt in translated into a matrix of vocabulary indicies (tokenized), the output is also a matrix of vocab that is translated back to a string
for a prompt it has a series of next possible tokens and fills in the most likely one
then it repeats with the extended prompt
it is somewhat unexpected that a model doing this dumb generation of next probable tokens.
when does the model know to stop
when you get to a special token stop generating new tokens
EOS token id, special token that is the end of generation.
Pure chance that the stop character is generated.
Keep in mind
It predicts token
It requires pre-processing and formatting
one new line character in the prompt can change the weights and result in different probabilities for next token
- Iterative generation, one token generation after another
we can stream part of the iterative process to give early output to the user
Potentially high cost or time
The outputs is based upon what people have written about on the internet, the context of the prompt
the training and prompt drive the probabilities of output
false information in the training set can affect the output they don't really have grounding in reality, just grounding on the training data
When asked about the latest presidential debate, the latest events aren't part of the training set unless you include that in the prompt RAG
If you always pick the most likely word you don't get much interesting
you can change a temp parameter to shuffle the probabilities to get more random/interesting outputs
For most closed models you don't have access the pre and post processing you just access by api
for open models you can download the params
Closed Proprietary
Open, Restricted (Licensing can't use in enterprise product etc)
Open, Permissive (Use it for whatever you want)
Hugging face (github for models and data)
Is a model data or code who knows (people use both license types or make their own)
Gemma 2 has its own license
Gemma-2-9b (gemma 2 Size: 9Billion) Pre-trained
Gemma-2-9b-it (gemma 2 Size: 9Billion) Pre-trained and Fine tuned
Use the fine tuned one generally
There are fine tunes on fine tunes in a family of models
The best model you can use is a fine tune of the fine tune made by the publishers of model (Third party fine tunes)
Without proper naming its hard to understand the "lineage" of a model
Look up representation hacking ( changing parameters of a model manually )
Most problems can be solved with the top four with careful usage
Simple Easy
Basic Prompting
-
Propmt Engineering (few shot, CoT, templates, parameters)
-
Augmentation, Retrieval
-
~Agents
-
Fine-tuning via a close API
-
Fine-tuning an open model
-
Training a model from scratch
Complicated Difficult
Open LLM Leader board Ranking of LLMs
Look for the one that works best for you use case though, the leader boards are generalized
The fine tuning of the fine tuning uses special tokens to indicate when a turn in the conversation has ended (prompt format)
When we don't give the model that format, it could go on and on until it magically generates a stop token
By using the correct prompt format we're more likely to hit those end of turn tokens to be generated