/towhee

Open source platform for embedding tasks

Primary LanguagePythonApache License 2.0Apache-2.0

https://towhee.io

X2Vec, Towhee is all you need!

Slack License Language Github Actions Coverage

What is Towhee?

Towhee is a flexible machine learning framework currently focused on computing deep learning embeddings over unstructured data. Built on top of PyTorch and Tensorflow (coming soon™), Towhee provides a unified framework for running machine learning pipelines locally, on a multi-GPU/TPU/FPGA machine (coming soon™), or in the cloud (coming soon™). Towhee aims to make democratize machine learning, allowing everyone - from beginner developers to AI/ML research groups to large organizations - to train and deploy machine learning models.

Key features

  • Easy embedding for everyone: Transform your data into vectors with less than five lines of code.

  • Standardized pipeline: Keep your pipeline interface consistent across projects and teams.

  • Rich operators and models: No more reinventing the wheel! Collaborate and share models with the open source community.

  • Support for fine-tuning models: Feed your dataset into our trainer and get a new model in just a few easy steps.

Getting started

Towhee can be installed as follows:

% pip install -U pip
% pip cache purge
% pip install towhee

Towhee provides pre-built computer vision models which can be used to generate embeddings:

>>> from towhee import pipeline
>>> from PIL import Image

# Use our in-built embedding pipeline
>>> img = Image.open('towhee_logo.png')
>>> embedding_pipeline = pipeline('image-embedding')
>>> embedding = embedding_pipeline(img)

Your image embedding is now stored in embedding. It's that simple.

Custom machine learning pipelines can be defined in a YAML file and uploaded to the Towhee hub (coming soon™). Pipelines which already exist in the local Towhee cache (/$HOME/.towhee/pipelines) will be automatically loaded:

# This will load the pipeline defined at $HOME/.towhee/pipelines/fzliu/resnet50_embedding.yaml
>>> embedding_pipeline = pipeline('fzliu/resnet50_embedding')
>>> embedding = embedding_pipeline(img)

Dive deeper

Towhee architecture

  • Pipeline: A Pipeline is a single machine learning task that is composed of several operators. Operators are connected together internally via a directed acyclic graph.

  • Operator: An Operator is a single node within a pipeline. It contains files (e.g. code, configs, models, etc...) and works for reusable operations (e.g., preprocessing an image, inference with a pretrained model).

  • Engine: The Engine sits at Towhee's core, and drives communication between individual operators, acquires and schedules tasks, and maintains CPU/GPU/FPGA/etc executors.

Design concepts

  • Flexible: A Towhee pipeline can be created to implement any machine learning task you can think of.

  • Extensible: Individual operators within each pipeline can be reconfigured and reused in different pipelines. A pipeline can be deployed anywhere you want - on your local machine, on a server with 4 GPUs, or in the cloud (coming soon™)

  • Convenient: Operators can be defined as a single function; new pipelines can be constructed by looking at input and output annotations for those functions. Towhee provides a high-level interface for creating new graphs by stringing together functions in Python code.