SciSharp/TensorFlow.NET

[Question]: Can now work the TextVectorization method?

Kyokanyou opened this issue · 1 comments

Description

I just used this method to creat words vector.
var text_dataset = tf.constant( "quz foo tak");
print(text_dataset);
var vectorizer = KerasApi.keras.preprocessing.TextVectorization(max_tokens: 1000, output_sequence_length: 4) ;
vectorizer.adapt(text_dataset);
print(vectorizer.Apply(tf.constant("quz" )));

The result is
tf.Tensor: shape=(), dtype=string, numpy='quz foo tak'
tf.Tensor: shape=(), dtype=string, numpy='quz'

It seems nothing hanppen. There is also no examples or test code in docs.
Please help me the right way to use this method, thanks.

I just checked the decompiled code, I found some things maybe the bugs that in CombinerPreprocessingLayer : Layer this class, the method adapt is virtual modifier. But this class inherits from the abstract class Layer. So the vectorizer.adapt(text_dataset) in my code executed the adapt method in the abstract class Layer, as a result this code vectorizer.adapt(text_dataset) didn't work or just did nothing. There is no connection between the adapt method in
TextVectorization class and ILayer class I guessed that.

I don't know. I'm not good at check the code. I had a headache. But I want to get help even just for learning code.

Alternatives

No response

Tensorflow.Keras.Text.Tokenizer tok = keras.preprocessing.text.Tokenizer(10000, filters: "!"); //创建一个实例
tok.fit_on_texts(allTextArr);//创建词典。全文编辑词典。allTextArr是全部数据数量,得用空格隔开。比如:{“您 好 啊”,“how are you”}
//-----------------
var sequencesX = tok.texts_to_sequences(oneTextArr);//一条句子进行one-hot编码。这里记得oneTextArr里面的句子同样是是空格分割的词
var x_train = keras.preprocessing.sequence.pad_sequences(sequencesX, maxlen: 100);//转为向量