Session | Approximate Start Time (PST) | Resources | |
---|---|---|---|
1 | Welcome! | 1:00 | |
2 | Setting up your computing environment | 1:05 | |
3 | Storytime - the summer project that took 50 years | 1:15 | |
4 | Getting image classification results fast with fastai | 1:25 | |
5 | Bounding box detection | 1:50 | |
5 | Break | 2:10 | |
6 | Semantic segmentation | 2:20 | |
7 | Deploying models with docker and render | 2:40 | |
8 | Q&A | 3:10 | |
10 | (Optional, if time) GANs | Big ole notebook | |
11 | (Optional, if time) Reflection on studying and working in AI in 2020 |
Train and deploy deep learning computer vision models, and have some fun along the way :)
Installing the software you need to train deep learning models can be difficult. For the purposes of this workshop, we're offering 3 recommended methods of setting up your computing environment. Your level of experience and access to machines, should help you determine which appraoch is right for you.
Option | Pros | Cons | Cost | Instructions | |
---|---|---|---|---|---|
1 | Google Colab | Virtually no setup required, start coding right away! | GPUs not always available, limited session times, limited RAM | Free! There's also a paid tier at $10/month | Colab Setup |
2 | Your Own Linux GPU Machine | No recurring cost, complete control over hardware. | High up-front cost, takes time to configure. | $1000+ fixed up front cost | Linux Setup |
3 | Virtual Machine | Highly configurable & flexible, pay for the performance level you need | Can be difficult to configure, only terminal-based interface | Starts ~$1/hour | VM Setup |
Google colab is delightfully easy to setup. All you really need to is a google account. Clicking one of the "Open in Colab" links above should take you directly to that notebook in google colab, ready to run. The only configuration change you'll be required to make is changing your runtime type. Simply click the runtime menu dropdown at the top of your notebook, select "change runtime type", and select "GPU" as your hardware accelerator.
After doing this for a while, my preferred configuration is training models on my own Linux GPU machine. This can require some up front investment, but if you're going to be training a lot of models, having your own machine really makes your life easier.
2.2.1 Install Anaconda Python
2.2.2 Clone this repository
git clone https://github.com/stephencwelch/dsgo-dl-workshop-summer-2020
2.2.3 (Optional) Create conda environment
conda create -n dsgo-cv python=3.7
conda activate dsgo-cv
2.2.4 Install packages
cd dsgo-dl-workshop-summer-2020
pip install -r requirements.txt
2.2.5 Launch Jupyter
jupyter notebook
Virtual machines provide a nice highly configurable platform for data science tasks. I recommnend following the fastai server setup guide for the cloud platform of your choice.
Computer Vision has a very interesting history. It's roots really go all the way back to the beginning of computing and Artifical Intelligence. In these early days, it was unknown just how easy or difficult it would be to recreate the function of the human visual system. A great example of this is the 1966 MIT Summer Vision Project. Marvin Minsky and Seymour Papert, co-directors of the MIT AI Labratory, begun the summer with some ambitious goals:
Minsky and Papert assigned Gerald Sussman, an MIT undergraduate studunt as project lead, and setup specific goals for the group around recognizing specific objects in images, and seperating these objects from their backgrounds.
Just how hard is it to acheive the goals Minsky and Papert laid out? How has the field of computer vision advance since that summer? Are these tasks trivial now, 50+ years later? Do we understand how the human visual system works? Just how hard is computer vision and how far have we come?
Interestingly, comptuer vision turns out to be significantly harder than people first expected. Part of the challenge here is that vision, like other processes that involve the brain, can be a bit hard to pin down.
What exactly does it mean to see? To have vision?
You’re probably using your own vision system right now to read these words, but if I asked you to break down piece by piece how exactly your brain is processing the light that hits your retina into meaningful information, you would have a really tough time.
The vision researchers Peter Hart and Richard Duda had a really nice way of putting this when they wrote one of the first computer vision books:
"Paradoxically, we are all expert at perception, but none of us knows much about it."
Now, as you may know, it’s taken longer than we thought, but we have made some good progress in computer vision. Today, the computer vision systems we’ve built are even better than humans at certain tasks.
In fact, in today's workshop, we'll achieve exactly what Minsky and Papert set out to do. And what I think makes this really interesting, is that just 10 years ago, this really would not have been possible. You see, it really took around 50 years to achieve the goals of the MIT summer project.
Now, just becuase it took 50 years to acheive these goals, this does not mean that we should only pay attention to recent breakthroughs. Computer vision has a rich and detailed history that deeply informs that work we see today. One aspect that we would be remiss if we didn't breifly discuss is the tradeoff between analytical and empirical techniques.
Throughout the history of computer vision (and computation and philosophy), we've seen a natural oscillation and competition between techniques that are grounded in reason (analytical), and techniques that are grounding in observation or data. Today we live in a time that is very much dominated by empricism. Decisions must be data driven. In machine learning and computer vision, we see huge breakthroughs from emprical techniques that learn from data, such as deep learning.
So as we dive into building and training our own deep learning models, just remember that this is only one approach. We've seen huge performance increases from empirical approaches recently, and that's what we'll be spending our time on here.
To solve the original computer vision problem using an emprical appraoch, we're going to need some data. We'll be using a fun little dataset called bbc1k. This dataset was collected in the spirit of the original MIT summer project, and contains 1000 images of bricks, balls, and cylinders against cluttered backgrounds.
You can download the dataset here, or with the download script in the util directory of this repo:
python util/get_and_unpack.py -url http://www.welchlabs.io/unccv/deep_learning/bbc_train.zip
BBC-1k dataset includes ~1000 images including classification, bounding box, and segmentation labels. Importantly, each image only contains one brick, ball or cylinder.
Instructions for sections 4-6 are contained inside of the image classification, bounding box detection, and semantic segmentation notebooks. You ran quickly run these notebooks in google colab using the links above, or you can run the notebooks on your own machine by launching your own notebook server:
cd dsgo-dl-workshop-summer-2020
jupyter notebook
Alright, enough training, let's get something into production! We'll be borrowing some deployment code from fastai, and deploying our model as a web service using docker and render.
7.1 On the github website, create a fork of this repository. We'll be directly integrating render with github to pull in the code for deployment.
7.2 (Optional) Clone your forked repository to your machine.
7.3 We'll be deploying the brick/ball/cylinder classifier we trained earlier. You'll need to point your web server at the correct url to downlaod the model you trained. To do this, you'll modify export_file_url
in app/server.py
. The application is setup to pull from google drive, so if you ran the classification notebook from google colab, you should be all set. If you ran in a VM or locally, you may need to manuall upload your model weights .pkl
file to google drive. Once you've done that, you'll need to create a share-able link to the file in the google drive web app, and modify the permissions to allow anyone with the link to view the file. You can then modiry export_file_url
in app/server.py
. Warning to not completely overwrite export_file_url
with your copied link. Instead, only replace the id portion with the id from the share-able link. You id should start after file/d/ and end before /view in your link.
7.4 (Optional) If you cloned your repository locally, you can test your web-service locally by running:
cd dsgo-dl-workshop-summer-2020
python app/server.py serve
and navigating your browser to: http://0.0.0.0:5000
. If your app is working correctly, you should see a website like the one shown below. You can upload an image (there's some example images in /graphics), and analyze it!
7.5 (Optional) If you cloned your repository locally, you can also build the docker container that render will use to deploy your aplication. I find that testing docker containers locally can be really helpful to smoke our potentially issues before you get to close to production. If you have docker installed, build the container with docker build
:
cd dsgo-dl-workshop-summer-2020
docker build -t bbc-classifier .
Run the container with:
docker run --rm -it -p 5000:5000 bbc-classifier
Navigating your browser to: http://0.0.0.0:5000
, and see if your app is working!
7.6 Deploy on render. Setup a render account (no credit card required). Create a new web service, and point it at your forked repository. Watch the logs as your container builds - so cool! Once your build is complete, point your browser to https://YOUR_APP_NAME.onrender.com/
. If eveything is working properly, you should have a deep learning inference app ready to go!