datascience

Template for Data Science, Machine Learning and Deep Learning projects.

Getting started

Clone the repository into the desire folder

With ssh

git clone git@github.com:LucasVmigotto/.datascience.git

With HTTPS

git clone https://github.com/LucasVmigotto/.datascience.git

With GitHub CLI

gh repo clone LucasVmigotto/.datascience

Development

This template project aims to help, and bootstrap, the development of data science projects creating an environment with commonly tools and necessities required - such as a Linux operational system, Jupyter Notebooks and LLM models.

Although you can clone and get started with only the Docker, it is highly recommended that you take advantage of the excellent tool that is the Visual Studio Code support to Docker's Containers based development with Dev Containers extension.

Inside .devcontainer folder, there is a devcontainer.json specification file that take care of providing you with all the tools early listed to a data science project. In case of need, it is possible to deactivate, separably, the services that can be ignored depending of the scenario. Thus, just comment in the runsServices key that services you do not want to be initiated with the development container.

Pre requisites

Mandatory

Optional

Environment variables

Copy and rename the .env.example file to .env.

Customize the values if necessary

Using Docker

Before start

Create a Docker Volume named ollama:
```
docker volume create ollama
```
Creating a main volume, you will be able to share the models between others ollama containers.

App

This is a basic environment prepared to start some application development. It comes with Python, git and zsh with Oh My Zsh!

Jupyter (Jupyter)

With this service, you can connect to a Jupyter Environment and use it to test ideias in Jupyter Notebooks. It is possible to connect, when editing a .ipynb file inside Visual Studio Code, to the Jupyter Server just informing the connection URL http://jupyter:8888/tree

Ollama

This service has few direct use not considering the connection inside a Notebook consuming a model for example. But it is possible to direct interact with the service using the CLI interface with the following command:

docker compose exec ollama ollama run ollama3 # Or, any other model that has been pulled already before

Pulling Models

The service, at first time, start without any model already downloaded. To download a model, you can make a request to the Ollama's API the pull the desired model. The following example shows how would be to pull the llama3 8B:

curl http://localhost:11434/api/pull \
    -d '{ "model": "llama3" }'

This example consider that the command will be executed inside a terminal in the host.

If you want to execute inside a terminal in the Visual Studio Code, change the request URL to http://ollama:11414/api/pull - in this case, it is necessary to consider the hostname inside the Docker network that binds all the services.

Docker Troubleshooting

List all Docker containers

docker ps -a

Remove Docker Compose containers

docker compose rm --stop -f

Prune containers

docker container prune --force

List all Docker images

docker ls -a

Remove Docker dangling images

docker image rm -f $(docker image ls --filter "dangling=true" -aq)

List all Docker volumes

docker volume ls

Prune Docker volumes

docker volume prune --force

WARNING: If you want to remove ALL Docker images, just remove the --filter flag and argument

docker image rm -f $(docker image ls -aq)

lucasvmigotto/.datascience