General Notes taken from gopher con 2024

Intro to Generative AI with go

tinyurl.com/gc-ai-24

Model Architecture:

2012 - 2017

Function written in code

If file name is cat its a cat

if 50% of the pixels are orange in a cat

take in an image a parameter and create an output

Input data then a series of parameterized data to create an output

Take inputs and model arch to make ideal params:

For each parameter I want to fit them to the known outputs

model inference code

Takes in ideal params, new inputs and model arch

a neural network arch takes in a matrix of floats which can represent any kind of parameter you want

It can generalize to many problems and also complex problems like image classification

from cat image to label cat is also a data transformation

2017 - 2022

Pre-training

Google takes a large corpus of data to create initial parameters

then people can take the initial params to fine tune the params with theyre own data (fine tuning transfer learning)

we can piggy back on object detection models for general recognition to help identify specific options.

we can leverage the initial recognition parameters to fine tune our own model

The bulk of the model arch goes into extraction of sub features to the end of the network to do a task specific classification

If we cut it off just before convergence to a prediction we get a feature vector/embedding

the initial params are the weights of the nodes

were just taking a matrix or tensor and transforming it into a different size matrix or tensor

image in -> one of 7 labels out

text in label out

2022

We don't need pre-trained models for image classification blah blah blah, we can make the more general

take image, chop a bit out, see if you can fill it in.

take sentence see if you can fill it in.

autocomplete, fill in the rest

These are meta tasks that no one rly cares about but we could start with them to do task specific things like spam detections

buut if we use the whole internets data we could auto complete whatever we want

Fill in the rest of : " a great blog about go is: " and then it "fills in" a whole blog

2022- On Gen AI

Pre training to params which are used to fine tune, now we take a fine tuned model and give it a prompt and infer an output ( chatgpt )

Training is compute heavy, prompting is just a function call

Pre training will happen on a meta task like auto complete

then fine tuning has a curated set of input and outputs (alignment data set) special format for the auto complete

awdawdawd wadiawda awdaw

train the autocomplete to complete in a certain format

Our prompt needs to use the same format

They take our general prompt, shove it into their format, then call the inference code with that format then poop out the output

Theres some pre-processing that happens on your prompt ( censorship re formatting) then output then post process ( more censoring formatting )

using your own model you can strip this pre and post processing ( dodge moderation )

the string prompt in translated into a matrix of vocabulary indicies (tokenized), the output is also a matrix of vocab that is translated back to a string

for a prompt it has a series of next possible tokens and fills in the most likely one

then it repeats with the extended prompt

it is somewhat unexpected that a model doing this dumb generation of next probable tokens.

when does the model know to stop

when you get to a special token stop generating new tokens

EOS token id, special token that is the end of generation.

Pure chance that the stop character is generated.

Keep in mind

It predicts token

It requires pre-processing and formatting

one new line character in the prompt can change the weights and result in different probabilities for next token

  • Iterative generation, one token generation after another

we can stream part of the iterative process to give early output to the user

Potentially high cost or time

The outputs is based upon what people have written about on the internet, the context of the prompt

the training and prompt drive the probabilities of output

false information in the training set can affect the output they don't really have grounding in reality, just grounding on the training data

When asked about the latest presidential debate, the latest events aren't part of the training set unless you include that in the prompt RAG

If you always pick the most likely word you don't get much interesting

you can change a temp parameter to shuffle the probabilities to get more random/interesting outputs

For most closed models you don't have access the pre and post processing you just access by api

for open models you can download the params

Closed Proprietary

Open, Restricted (Licensing can't use in enterprise product etc)

Open, Permissive (Use it for whatever you want)

Hugging face (github for models and data)

Is a model data or code who knows (people use both license types or make their own)

Gemma 2 has its own license

Gemma-2-9b (gemma 2 Size: 9Billion) Pre-trained

Gemma-2-9b-it (gemma 2 Size: 9Billion) Pre-trained and Fine tuned

Use the fine tuned one generally

There are fine tunes on fine tunes in a family of models

The best model you can use is a fine tune of the fine tune made by the publishers of model (Third party fine tunes)

Without proper naming its hard to understand the "lineage" of a model

Look up representation hacking ( changing parameters of a model manually )

Most problems can be solved with the top four with careful usage

Simple Easy

Basic Prompting

  • Propmt Engineering (few shot, CoT, templates, parameters)

  • Augmentation, Retrieval

  • ~Agents

  • Fine-tuning via a close API

  • Fine-tuning an open model

  • Training a model from scratch

Complicated Difficult

Open LLM Leader board Ranking of LLMs

Look for the one that works best for you use case though, the leader boards are generalized

The fine tuning of the fine tuning uses special tokens to indicate when a turn in the conversation has ended (prompt format)

When we don't give the model that format, it could go on and on until it magically generates a stop token

By using the correct prompt format we're more likely to hit those end of turn tokens to be generated