How to train models?
superchargez opened this issue · 1 comments
Excited to see a new product coming out to compete with expensive GPUs which are not particularly designed for deep learning tasks. However, it would be nice to see examples how are these cards utilized and compared to GPUs (other 8GB VRAM and high end like 3090 and AMD GPUs).
However, what I really want to know is how to train models from scratch, or fine tune them, not only the ones mentioned in supported list [BERT, ResNet, Whisper and UNet] but other arbitrary architecture (any other transformer, RNNs, dense connections and their combination). But if these cards e75 e150 can be used just like GPUs, just plug one in and install drivers and you are good to go then it would not be necessary to create example notebooks. Though I'd still want to see comparisons.
Thanks for your feedback @superchargez -- we're excited too!
We will have some benchmark comparisons soon along with the public release of our benchmarking framework that will allow users to test models for themselves and even contribute to optimization efforts.
W.r.t training -- we have some basic functionality and APIs available in the docs. There is more work to do on enabling more model architectures and optimizations for training on the backend; however, these APIs will give you a good sense of how the top-level user interface will work.
As an example, the simplest API works by a single line command where all forward, backward, and optimizer updates happen under-the-hood:
pybuda.run_training(epochs=1, steps=1, checkpoint_queue=checkpoint_q, loss_queue=loss_q)
We also have APIs that allow the user more control over the training loop with functions such as run_forward()
, run_backward()
, run_optimizer()
, etc.
Here are some model examples (inference only for now) that we will regularly update as new features and models come available: tt-buda-demos