The dstack package and command line tool must be installed with either pip or Conda:
pip install dstack
or
conda install -c dstack.ai dstack
Note, only Python 3 is supported and if you use pip, it is highly recommended to use virtualenv to manage local environment.
Before you can use dstack package in your code, you must run the dstack command line tool configure a dstack profile where you specify your dstack.ai username and token.
Configuring dstack profiles separately from your code, allows you to make the code safe and not include plain secret tokens.
Configuring a dstack profile can be done by the following command:
dstack config add --token <TOKEN> --user <USER>
or simply
dstack config add
In this case, the dstack profile name will be default
. You can change it by using extended syntax of the command:
dstack config add <PROFILE_NAME>
This allows you to configure multiple profiles and refer to them from your code by their names.
By default, the configuration profile is stored in your home directory: $HOME/.dstack/config.yaml
.
NOTE
Before version 0.4.2 config was stored in a working directory. Please, do not forget to move the local config into your home directory.
If you use proxy it would be useful to disable SSL certificate check. To do that use --no-verify
option for selected profile in command line.
See CLI Reference to more information about command line tools or type dstack config --help
.
From version 0.4 it is possible to use a local version of dstack server.
To start it, use the following command:
dstack server start
This command installs the latest version (if it's not installed) of the server and starts it. If environment variable JAVA_HOME
is set
and version of JDK is compatible with the server, that version will be used. In the case if
installer can't find JAVA_HOME
or JDK version is incompatible with current server version
it will download a compatible version by itself. To update server use dstack server update
.
Follow instructions provided by the server in the terminal.
Use dstack server --help
for more information.
Once the dstack profile is configured, you can publish plots from your Python program or Jupyter notebook. Let's consider the simpliest example, line plot using matplotlib library, but you can use bokeh and plotly plots instead of matplotlib in the same way:
import matplotlib.pyplot as plt
from dstack import push_frame
fig = plt.figure()
plt.plot([1, 2, 3, 4], [1, 4, 9, 16])
push_frame("simple", fig, "My first plot")
In some cases, you want to have plots that are interactive and that can change when the user change its parameters. Suppose you want to publish a line plot that depends on the value of the parameter Coefficient
(slope).
import matplotlib.pyplot as plt
from dstack import create_frame
def line_plot(a):
xs = range(0, 21)
ys = [a * x for x in xs]
fig = plt.figure()
plt.axis([0, 20, 0, 20])
plt.plot(xs, ys)
return fig
frame = create_frame("line_plot")
coeff = [0.5, 1.0, 1.5, 2.0]
for c in coeff:
frame.commit(line_plot(c), f"Line plot with the coefficient of {c}", Coefficient=c)
frame.push()
In case when parameter's name contains space characters, params
dictionary argument must be used, e.g.:
frame.commit(my_plot, "My plot description", params={"My parameter": 0.02})
Of course, you can combine two approaches together, it can be especially useful in case of
comprehensive frames with multiple parameters. In this case parameters which are passed by named arguments
will be merged to params
dictionary. So, the following line
frame.commit(my_plot, "My plot description", params={"My parameter": 0.02}, other=True)
produces the same result as this one:
frame.commit(my_plot, "My plot description", params={"My parameter": 0.02, "other": True})
You can use push
with message to add information related
to this particular revision: push("Fix log scale")
. Function push_frame
can accept message as well.
The dstack package can be used not only publishing plots from popular visualizations packages, bit to publish pandas data frame as well. How you can do it? It can be done in the same way as in the case of plots by replacing plot to pandas data frame object. Here is an example:
import pandas as pd
from dstack import push_frame
raw_data = {"first_name": ["John", "Donald", "Maryam", "Don", "Andrey"],
"last_name": ["Milnor", "Knuth", "Mirzakhani", "Zagier", "Okunkov"],
"birth_year": [1931, 1938, 1977, 1951, 1969],
"school": ["Princeton", "Stanford", "Stanford", "MPIM", "Princeton"]}
df = pd.DataFrame(raw_data, columns = ["first_name", "last_name", "birth_year", "school"])
push_frame("my_data", df, "DataFrame example")
In some cases you not only want to store dataset but retrieve it. You can pull
data frame
object from the stack:
import pandas as pd
from dstack import pull
df = pull("my_data")
As in the case of plots you can use parameters for data frames too. You can also use
data frames and plots in the same frame (with certain parameters). It will work with
Series
as well.
You can also push and pull GeoDataFrame from GeoPandas:
import geopandas
import pandas as pd
from dstack import push_frame, pull
df = pd.DataFrame({'City': ['Buenos Aires', 'Brasilia', 'Santiago', 'Bogota', 'Caracas'],
'Country': ['Argentina', 'Brazil', 'Chile', 'Colombia', 'Venezuela'],
'Latitude': [-34.58, -15.78, -33.45, 4.60, 10.48],
'Longitude': [-58.66, -47.91, -70.66, -74.08, -66.86]})
gdf = geopandas.GeoDataFrame(
df, geometry=geopandas.points_from_xy(df.Longitude, df.Latitude))
push_frame("my_first_geo", gdf)
To pull the GeoDataFrame object just call my_gdf = pull("my_first_geo")
.
It is also possible to store ML models using push
and pull
. Right now such popular
ML frameworks and libraries like PyTorch, TensorFlow and
scikit-learn are supported.
Suppose you have a PyTorch model, for example linear one:
import torch
from dstack import push_frame
from dstack.torch.handlers import TorchModelEncoder
# define a new model
class LinearRegression(torch.nn.Module):
def __init__(self, input_size, output_size):
super(LinearRegression, self).__init__()
self.linear = torch.nn.Linear(input_size, output_size)
def forward(self, x):
out = self.linear(x)
return out
model = LinearRegression(1, 1)
# here you are training the model
for epoch in range(100):
...
# to avoid compatibility issues we will store only model weights
TorchModelEncoder.STORE_WHOLE_MODEL = False
# and finally push the model
push_frame("my_torch_model", model, "My first PyTorch model")
We stored only model weights, so to pull it we should provide model
class to decoder, because pull
method is not smart enough to guess which
particular class to use. The following example shows a common pattern how to use
pull in this case:
from dstack.torch.handlers import TorchModelWeightsDecoder
from dstack import pull
my_model = pull("my_torch_model", decoder=TorchModelWeightsDecoder(LinearRegression(1, 1)))
In the case of TensorFlow (only version 2 is supported), let's use predefined models to show how to deal with them (for custom models technique will be the same as in the case of PyTorch which is described above).
from dstack import push_frame
import tensorflow as tf
d = 30
model = tf.keras.models.Sequential([
tf.keras.layers.Input(shape=(d,)),
tf.keras.layers.Dense(1, activation="sigmoid")
])
# train the model here
# push the model
push_frame("my_tf_model", model, "My first TF model")
To pull model you need simply call pull
, because the model is standard no additional
information required:
from dstack import pull
model1 = pull("my_tf_model")
In the case of scikit-learn all thing as simple as in the TensorFlow case:
from sklearn.linear_model import LinearRegression
from dstack import push_frame
# train the simple Linear regression
model = LinearRegression()
# train the model as usual
# push it
push_frame("my_linear_model", model, "My first linear model")
To pull the model in this case call pull("my_linear_model")
.
For more details on the API and code samples, check out the docs.