This repo contains the notebooks and slides for the Large Language Models: Application through Production course on edX & Databricks Academy.
Notebooks
-
You first need to add Git credentials to Databricks. Refer to documentation here.
-
Click
Repos
in the sidebar. ClickAdd Repo
on the top right. -
Clone the "HTTPS" URL from GitHub, or copy
https://github.com/databricks-academy/large-language-models.git
and paste into the boxGit repository URL
. The rest of the fields, i.e.Git provider
andRepository name
, will be automatically populated. ClickCreate Repo
on the bottom right.
-
You can download the notebooks from a release by navigating to the releases section on the GitHub page:
-
From the releases page, download the
.dbc
file. This contains all of the course notebooks, with the structure and meta data. -
In your Databricks workspace, navigate to the Workspace menu, click on Home and select
Import
: -
Using the import tool, navigate to the location on your computer where the
.dbc
file was dowloaded from Step 1. Once you select the file, clickImport
, and the files will be loaded and extracted to your workspace:
Cluster settings
-
First, select
Single Node
-
This courseware has been tested on Databricks Runtime 13.3 LTS for Machine Learning. If you do not have access to a 13.3 LTS ML Runtime cluster, you will need to install many additional libraries (as the ML Runtime pre-installs many commonly used machine learning packages), and this courseware is not guaranteed to run.
For all of the notebooks except
LLM 04a - Fine-tuning LLMs
andLLM04L - Fine-tuning LLMs Lab
, you can run them on a CPU just fine. We recommend eitheri3.xlarge
ori3.2xlarge
(i3.2xlarge will have slightly faster performance).For these notebooks:
LLM 04a - Fine-tuning LLMs
andLLM04L - Fine-tuning LLMs Lab
, you will need the Databricks Runtime 13.3 LTS for Machine Learning with GPU.Select GPU instance type of
g5.2xlarge
.