mooz/node-icu-charset-detector

Streaming mode?

Closed this issue · 3 comments

I need to detect the character set of potentially huge text files. Would it be possible to do this in a streaming manner somehow, or does the underlying library really need all the data at once?

If it's possible to do this, the ideal interface would IMO be a writable stream that you could keep feeding data into and then call detectCharset() on once you think you've fed it enough data.

mooz commented

I'm sorry for late response. I've added a method detectCharsetStream(stream, onDetectionFinish) to the module, although I'm not confident whether the interface conforms to your opinion.

Here is a simple example.

var detector = require("node-icu-charset-detector");
var fs = require('fs');
var fileStream = fs.createReadStream('/usr/share/dict/british-english');

detector.detectCharsetStream(fileStream, function (charset) {
  console.log("charset: " + charset);
});
mooz commented

Hmm, it looks like current detectCharsetStream() is somewhat problematic because it consumes a given stream. Seeking for a good solution...