PipApp: the application framework for machine learning
yorkie opened this issue · 4 comments
The vision of Pipcook is to take the JavaScript developers and engineers into the world of machine learning quickly and seamlessly, then we're responsible for creating easy enough APIs.
In the Pipcook stack, the pipcook-app
is to be defined the ML application, which abstracts some duplicated stuffs and hides low-level algorithm implementation which requires a learning curve for every ML rookie.
APIs
Every module represents a type of dataset, and basically we provide some different methods for developers.
module ml
This module is to create machine learning functions, it provides the core abilities to represent your machine learning application in an intuitive way.
interface ml.Function
To hide the ML details as possible, Pipcook lets your declare your functions for machine learning purpose in a specific type ml.Function
, you can create a ml.Function
via the following create()
function.
Internally, the Pipcook compiler parses the applications, then generates the training code via the ml.Function
instances, and replaces these slots with model generated inferences.
interface ml.FunctionImpl(arg: data.MLType)
This interface is to describe the internal machine learning internals for applications, and it accepts an argument in data.MLType
as the input, however the output's type is not required.
create(fn: ml.FunctionImpl): ml.Function
This is to create the above ml.Function
with a ml.FunctionImpl
object.
const mlfunc: ml.Function = ml.create((input: ml.ImageType) => {
// call other ML Application APIs here and return
});
// ...
mlfunc(new ml.ImageType(...)); // call this function anywhere.
module data
This module is to declare all types for your application's I/O.
interface data.MLType
It's the base interface to tell the Pipcook compiler a type for ML.
interface data.ImageType
extends data.MLType
It represents the image type for given ml.Function
I/O.
interface data.TextType
extends data.MLType
It represents the text type for given ml.Function
I/O.
module vision
This module provides vision-related functions like image classification and object detection.
interface vision.Position2D
it represents the position in 2d for object detections:
label
{string} the label string represents the object's type.left
{number} the left of detected object in pixel.top
{number} the top of detected object in pixel.height
{number} the height of detected object.width
{number} the width of detected object.
classify(img: ImageType): string
It recognizes the type of image, and returns the type string.
ml.create((img: data.ImageType) => {
const label = vision.classify(img); // returns the label
});
detect(img: ImageType): vision.Position2D[]
It detects target from a single image, and returns the position and label of detected objects.
ml.create((img: data.ImageType) => {
const objects = vision.detect(img);
objects.forEach((o) => {
console.log(o.label, o.left, o.top); // prints the label, left and top.
});
});
module nlp
This module provides NLP-related functions like text classification and clustering.
interface nlp.Cluster
label
{string} the label for this cluster.items
{string[]} the strings in this cluster.
interface nlp.ClusteringResult
clusters
{nlp.Cluster[]} all grouped clusters, and each is an object ofnlp.Cluster
.noises
{string[]} all labeled noises strings.
classify(input: string): string
it recognizes the type of text, and returns the type string.
clustering(inputs: string[]): nlp.ClusteringResult
it clusters all types of given inputs, and returns the result in nlp.ClusteringResult
.
Anti-APIs
The anti-API means the API must be hidden under the application user, there is a list here:
- hide the training workflow, therefore some interfaces to train and predict should be invisible.
- hide the dataset workflow, in the future, developer uses a tool for dataset processing and validation.
- hide the model-related APIs: graph structure, parameters and model validation.
- hide the serving implementation, every ML application should be serve-able in
pipcook-app
, thus we don't any other APIs for serving models specially.
Example
// example.ts
import { ml, vision, data } from '@pipcook/pipcook-app';
class MyImage extends data.ImageType {
constructor(x, y, buffer) {
super(x, y, buffer, 100, 100);
}
}
const listAvatars: ml.Function = ml.create((img: MyImage) => {
const components = vision.recognizeComponent(img);
if (!component)
return false;
components.map((item: UIView) => {
const img = item.toImage() as UIImage;
return vision.detectFace(img);
}).filter((avatar: data.FaceType) => {
return avatar !== null;
});
});
// use the listAvatars function for your use
const app = express();
app.get('/', (req, res) => {
const img = new MyImage(req.body.x, req.body.y, req.body.buffer);
res.json(listAvatars(img).toJSON());
});
Then run the following commands to train:
$ pipcook train example.ts --epoch=5 --no-validation
generated the model at example.ts.im
And run your ML application:
$ pipcook try example.ts
$ pipcook deploy example.ts --eas=xxx
@utkobe I have simplified the Modules and APIs, and now we are more focusing on the vision and nlp.
To implement the PipApp, the following is what we have to achieve:
- PipApp Compiler @yorkie
- compiles the PipApp source(TypeScript) to get
ml.create
slots. - generates pipelines from slots by the above.
- generates application source code.
- generates TypeScript declaration(
.d.ts
) files by current environment.
- compiles the PipApp source(TypeScript) to get
- PipDaemon @FeelyChau
- integrates the above compiler and exposes PipApp interfaces for Pipboard and Pipcook Tools.
- Pipboard @utkobe
- adds tab page for PipApp.
- view pipelines by PipApp.
- labeling and view dataset.
- adds tab page for PipApp.
- Pipcook Tools @FeelyChau
- supports training, running and deployment of PipApp.
- Costa @yorkie
- adds the
pipcook.method
support for plugins.
- adds the
The plugin matching algorithm of the pipeline generator:
- find the slots by the function call to
ml.create
. - iterate all slots in each block,
- find the model signature by the function call to
{datatype}.{method}
. - find the data processing signature by the function call to
{datatype}.{method}
. - find the input and output types of each found model.
- generate a pipeline config,
- data collect: chosen by
{datatype}
and{method}
. - data access: chosen by the selected model define plugin.
- data process: chosen by
{datatype}
and{method}
(data processing signature). - model define: chosen by
{datatype}
and{method}
. - model train: chosen by the selected model define plugin and
{datatype}
. - model evaluate: chosen by the selected model define plugin and
{datatype}
.
- data collect: chosen by
- find the model signature by the function call to
Awesome job~~