PipApp: the application framework for machine learning

Question

PipApp: the application framework for machine learning

yorkie opened this issue 4 years ago · 4 comments

The vision of Pipcook is to take the JavaScript developers and engineers into the world of machine learning quickly and seamlessly, then we're responsible for creating easy enough APIs.

In the Pipcook stack, the pipcook-app is to be defined the ML application, which abstracts some duplicated stuffs and hides low-level algorithm implementation which requires a learning curve for every ML rookie.

APIs

Every module represents a type of dataset, and basically we provide some different methods for developers.

module `ml`

This module is to create machine learning functions, it provides the core abilities to represent your machine learning application in an intuitive way.

interface `ml.Function`

To hide the ML details as possible, Pipcook lets your declare your functions for machine learning purpose in a specific type ml.Function, you can create a ml.Function via the following create() function.

Internally, the Pipcook compiler parses the applications, then generates the training code via the ml.Function instances, and replaces these slots with model generated inferences.

interface `ml.FunctionImpl(arg: data.MLType)`

This interface is to describe the internal machine learning internals for applications, and it accepts an argument in data.MLType as the input, however the output's type is not required.

`create(fn: ml.FunctionImpl): ml.Function`

This is to create the above ml.Function with a ml.FunctionImpl object.

const mlfunc: ml.Function = ml.create((input: ml.ImageType) => {
  // call other ML Application APIs here and return
});

// ...
mlfunc(new ml.ImageType(...)); // call this function anywhere.

module `data`

This module is to declare all types for your application's I/O.

interface `data.MLType`

It's the base interface to tell the Pipcook compiler a type for ML.

interface `data.ImageType` extends `data.MLType`

It represents the image type for given ml.Function I/O.

interface `data.TextType` extends `data.MLType`

It represents the text type for given ml.Function I/O.

module `vision`

This module provides vision-related functions like image classification and object detection.

interface `vision.Position2D`

it represents the position in 2d for object detections:

label {string} the label string represents the object's type.
left {number} the left of detected object in pixel.
top {number} the top of detected object in pixel.
height {number} the height of detected object.
width {number} the width of detected object.

`classify(img: ImageType): string`

It recognizes the type of image, and returns the type string.

ml.create((img: data.ImageType) => {
  const label = vision.classify(img); // returns the label
});

`detect(img: ImageType): vision.Position2D[]`

It detects target from a single image, and returns the position and label of detected objects.

ml.create((img: data.ImageType) => {
  const objects = vision.detect(img);
  objects.forEach((o) => {
    console.log(o.label, o.left, o.top); // prints the label, left and top.
  });
});

module `nlp`

This module provides NLP-related functions like text classification and clustering.

interface `nlp.Cluster`

label {string} the label for this cluster.
items {string[]} the strings in this cluster.

interface `nlp.ClusteringResult`

clusters {nlp.Cluster[]} all grouped clusters, and each is an object of nlp.Cluster.
noises {string[]} all labeled noises strings.

`classify(input: string): string`

it recognizes the type of text, and returns the type string.

`clustering(inputs: string[]): nlp.ClusteringResult`

it clusters all types of given inputs, and returns the result in nlp.ClusteringResult.

Anti-APIs

The anti-API means the API must be hidden under the application user, there is a list here:

hide the training workflow, therefore some interfaces to train and predict should be invisible.
hide the dataset workflow, in the future, developer uses a tool for dataset processing and validation.
hide the model-related APIs: graph structure, parameters and model validation.
hide the serving implementation, every ML application should be serve-able in pipcook-app, thus we don't any other APIs for serving models specially.

Example

// example.ts
import { ml, vision, data } from '@pipcook/pipcook-app';

class MyImage extends data.ImageType {
  constructor(x, y, buffer) {
    super(x, y, buffer, 100, 100);
  }
}

const listAvatars: ml.Function = ml.create((img: MyImage) => {
  const components = vision.recognizeComponent(img);
  if (!component)
    return false;

  components.map((item: UIView) => {
    const img = item.toImage() as UIImage;
    return vision.detectFace(img);
  }).filter((avatar: data.FaceType) => {
    return avatar !== null;
  });
});

// use the listAvatars function for your use
const app = express();
app.get('/', (req, res) => {
  const img = new MyImage(req.body.x, req.body.y, req.body.buffer);
  res.json(listAvatars(img).toJSON());
});

Then run the following commands to train:

$ pipcook train example.ts --epoch=5 --no-validation
generated the model at example.ts.im

And run your ML application:

$ pipcook try example.ts
$ pipcook deploy example.ts --eas=xxx

Answer 1 · 2020-03-31T11:54:59.000Z

@utkobe I have simplified the Modules and APIs, and now we are more focusing on the vision and nlp.

Answer 2 · 2020-06-02T08:18:44.000Z

To implement the PipApp, the following is what we have to achieve:

PipApp Compiler @yorkie
- compiles the PipApp source(TypeScript) to get ml.create slots.
- generates pipelines from slots by the above.
- generates application source code.
- generates TypeScript declaration(.d.ts) files by current environment.
PipDaemon @FeelyChau
- integrates the above compiler and exposes PipApp interfaces for Pipboard and Pipcook Tools.
Pipboard @utkobe
- adds tab page for PipApp.
  - view pipelines by PipApp.
  - labeling and view dataset.
Pipcook Tools @FeelyChau
- supports training, running and deployment of PipApp.
Costa @yorkie
- adds the pipcook.method support for plugins.

The plugin matching algorithm of the pipeline generator:

find the slots by the function call to ml.create.
iterate all slots in each block,
- find the model signature by the function call to {datatype}.{method}.
- find the data processing signature by the function call to {datatype}.{method}.
- find the input and output types of each found model.
- generate a pipeline config,
  - data collect: chosen by {datatype} and {method}.
  - data access: chosen by the selected model define plugin.
  - data process: chosen by {datatype} and {method}(data processing signature).
  - model define: chosen by {datatype} and {method}.
  - model train: chosen by the selected model define plugin and {datatype}.
  - model evaluate: chosen by the selected model define plugin and {datatype}.

Answer 3 · 2020-06-02T13:46:56.000Z

Awesome job~~

Answer 4 · 2020-06-22T03:08:45.000Z

This is finished at #241.

APIs

module ml

interface ml.Function

interface ml.FunctionImpl(arg: data.MLType)

create(fn: ml.FunctionImpl): ml.Function

module data

interface data.MLType

interface data.ImageType extends data.MLType

interface data.TextType extends data.MLType

module vision

interface vision.Position2D

classify(img: ImageType): string

detect(img: ImageType): vision.Position2D[]

module nlp

interface nlp.Cluster

interface nlp.ClusteringResult

classify(input: string): string

clustering(inputs: string[]): nlp.ClusteringResult