xenova/transformers.js

Bug in latest version? 2.16.1

Closed this issue · 4 comments

System Info

Using latest @xenova/transformers version 2.16.1, downloaded from npm

using ts-node to just run a sample file, ts-node version is v10.9.2,

module and target in tsconfig.json is esnext

Environment/Platform

  • Website/web-app
  • Browser extension
  • Server-side (e.g., Node.js, Deno, Bun)
  • Desktop app (e.g., Electron)
  • Other (e.g., VSCode extension)

Description

import { Pipeline, PreTrainedModel } from "@xenova/transformers";code where error occurs:
const newModel = await PreTrainedModel.from_pretrained(pretrainedModelName)
const pipeline = new Pipeline({
task: "embeddings",
model: newModel,
});

Looks like pipelines.d.ts Pipeline class does not have _call method in it.

This is code that comes up in my node_modules:
export class Pipeline extends Pipeline_base {
/**
* Create a new Pipeline.
* @param {Object} options An object containing the following properties:
* @param {string} [options.task] The task of the pipeline. Useful for specifying subtasks.
* @param {PreTrainedModel} [options.model] The model used by the pipeline.
* @param {PreTrainedTokenizer} [options.tokenizer=null] The tokenizer used by the pipeline (if any).
* @param {Processor} [options.processor=null] The processor used by the pipeline (if any).
*/
constructor({ task, model, tokenizer, processor }: {
task?: string;
model?: PreTrainedModel;
tokenizer?: PreTrainedTokenizer;
processor?: Processor;
});
task: string;
model: PreTrainedModel;
tokenizer: PreTrainedTokenizer;
processor: Processor;
dispose(): Promise;
}

Error:

➜ dumper git:(new-flow) ✗ node --loader ts-node/esm src/github/github.sample.ts

(node:78794) ExperimentalWarning: --experimental-loader may be removed in the future; instead use register():
--import 'data:text/javascript,import { register } from "node:module"; import { pathToFileURL } from "node:url"; register("ts-node/esm", pathToFileURL("./"));'
(Use node --trace-warnings ... to show where the warning was created)
Model type for 'undefined' not found, assuming encoder-only architecture. Please report this at https://github.com/xenova/transformers.js/issues/new/choose.
1
pipeline: [Function: closure] Pipeline {
task: 'embeddings',
model: undefined,
tokenizer: null,
processor: null
}
2
Error: Must implement _call method in subclass
at Function._call (file:///Users/kateyeh/DeepUnit/mono/mono/dumper/node_modules/@xenova/transformers/src/utils/core.js:75:15)
at closure (file:///Users/kateyeh/DeepUnit/mono/mono/dumper/node_modules/@xenova/transformers/src/utils/core.js:62:28)
at UsePinecone.embedStuff (file:///Users/kateyeh/DeepUnit/mono/mono/dumper/src/github/github.sample.ts:36:43)
at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
at async main (file:///Users/kateyeh/DeepUnit/mono/mono/dumper/src/github/github.sample.ts:200:26)

Reproduction

import { Pipeline, PreTrainedModel } from "@xenova/transformers";

public async embedStuff() {
const newModel = await PreTrainedModel.from_pretrained(pretrainedModelName)
const query = 'generateTest(testDescription: TestDescription, request?: Request, skipGeneratingCode: boolean = false)'
const pipeline = new Pipeline({
task: "embeddings",
model: newModel,
});
const result = pipeline && (await pipeline(query));
return {
id:'test-id-1234',
metadata: {
query: query,
},
values: Array.from(result.data)
}
}

Your usage appears to be incorrect. Can you instead use the pipeline API to create your pipeline? Example code:

import { pipeline } from '@xenova/transformers';

// Create a feature-extraction pipeline
const extractor = await pipeline('feature-extraction', 'Xenova/gte-small');

// Compute sentence embeddings
const sentences = ['That is a happy person', 'That is a very happy person'];
const output = await extractor(sentences, { pooling: 'mean', normalize: true });
console.log(output);
// Tensor {
//   dims: [ 2, 384 ],
//   type: 'float32',
//   data: Float32Array(768) [ -0.053555335849523544, 0.00843878649175167, ... ],
//   size: 768
// }

// Compute cosine similarity
import { cos_sim } from '@xenova/transformers';
console.log(cos_sim(output[0].data, output[1].data))
// 0.9798319649182318

Also, note that every time you call pipeline(...), new memory is allocated, so it is recommended to move this outside your embedding function.

Additionally, I'm not sure where your code is adapted from (as it is quite non-standard) 😅 Please feel free to share!

Thank you! Honestly, a little embarrassed but was working on this late along with Pinecone and think I just swapped the lower-case p of pipeline with upper case from that and then followed the errors trail to where I was above.

No worries! 🤗 Happy to help :)

Thanks Josh❤️😅