sense2vec: Semantic Analysis of the Reddit Hivemind

This project demonstrates a powerful and scalable approach to text mining, using our open-source library spaCy. We used spaCy to tag and parse every comment posted to Reddit in 2015, and fed the results to Gensim's word2vec implementation. Using the search, you can get a lot of interesting insights into the Reddit hivemind. See what a syntax-sensitive distributional similarity model thinks Reddit thinks about almost anything.

Run the demo

This demo is implemented in Jade (aka Pug), an extensible templating language that compiles to HTML, and is built or served by Harp. To serve it locally on http://localhost:9000, simply run:

sudo npm install --global harp
git clone https://github.com/explosion/sense2vec-demo
cd sense2vec-demo
harp server

The demo is written in ECMAScript 6. For full, cross-browser compatibility, make sure to use a compiler like Babel. For more info, see this compatibility table.

Using sense2vec.js

Include sense2vec.js and initialize a new instance specifying the API and settings, then use the find() method.

const demo = new sense2vec('http://localhost:8000', {
    container: '#sense2vec',
    defaultWord: 'natural language processing',
    defaultSense: 'noun'
});

demo.find('duck', 'verb');

Our service that produces the input data is open source, too. You can find it at spacy-services.

The following settings are available:

Setting	Description	Default
container	element to display results in, can be any query selector	`#displacy`
defaultText	text used if sense2vec is run without text specified	`'natural language processing'`
defaultModel	model used if run without model specified	`'en'`
defaultSense	part-of-speech tag or "auto" for automatic detection	`'auto'`
onStart	function to be executed on start of server request	`false`
onSuccess	callback function to be executed on successful server response	`false`
onRender	callback function to be executed when results have rendered	`false`
onError	function to be executed if request fails	`false`

SaijoGeorge/sense2vec-demo

sense2vec: Semantic Analysis of the Reddit Hivemind

Run the demo

Using sense2vec.js