/vader-sentiment-cpython

Updated replacement for vader-sentiment (vaderSentiment-js) that runs original vaderSentiment natively, using CPython.

Primary LanguageC++

vader-sentiment-cpython

Updated replacement for vader-sentiment (vaderSentiment-js) that runs original vaderSentiment natively, using CPython.

Motivation

vader-sentiment is outdated and doesn't support emoji - vital element of comments nowadays.

Also, spawning python process from nodejs is slow. This module can analyze ~60,000 strings per second by re-using the same python instance, and doesn't leak memory.

Getting Started for dev

  1. Run nvm i
  2. Run npm ci
  3. Install dependencies. On Ubuntu, run ./deps.sh
  4. Build python and boost statically (see build_boost.sh and build_python.sh)
  5. Generate prebuild: npm run prebuild (might need to update hardcoded libs)
  6. Run it: node test.js "your awesome text goes here"

Getting Started for end user

Note: target machine should have same python (and glibc?) version, as dev machine

Note: currently is was tested on Ubuntu 20, and on Heroku (with Ubuntu 20). Check mac branch for Mac OS

Note: if you'd like to use it on another linux distro, at least install Python 3.8.5 and make sure that python3 points to it

Note: requires Python 3.8.5 (installed on Ubuntu 20 by default)

  1. Run npm install Maxim-Mazurok/vader-sentiment-cpython
  2. Add install_vader.sh script to your project, and run it before using module. You can include it in start script, like so: "start": "bash ./node_modules/vader-sentiment-cpython/install_vader.sh && node start.js",

Current stage

It only has prebuild for Ubuntu 20 and Python 3.8.5.

I have to figure out how to make it more portable and independent from installed Python interpreter.

As far as I understand, I have to create portable Python interpreter, similar to how pyinstaller works.

How pyinstaller embeds python?

Disclaimer: this is my (very limited) understanding.

So... Looks like I have to use python's main.c in order to create portable python interpreter that I can ship with this package. But instead of running in console mode (and requiring to be ran using spawn child process), it'll be possible to embed NAPI into it, so it'll be a NodeJS package.

The most recent attempt to create such a portable python lives in my cpython fork's [vader-branch] Currently it will run vader if you put vaderSentiment-master folder in the cpython root folder.

PyRun_SimpleString is the key function under the hood.

What to do?

There are two options:

  1. Migrate vader branch of cpython to gyp building system

    • Copy all cpython code into this project (or clone and patch)
    • Either separate node-gyp configure step from the build step (in the prebuildify, somehow), so that we can change/merge cpython's Makefile and NodeJS's native module Makefile
    • Or migrate cpython's Makefile to binding.gyp
  2. Move cpython code into this project

    Essentially, copy only the required bits of code from cpython to this project. Probably will require copying a whole lot and diving very deep into cpython. Also, will probably require the same migration of cpython's Makefile to binding.gyp.

So, I think that I should try to migrate cpython's Makefile to binding.gyp. Then include and call cpython's main function in binding.cc?