/tdle

PageRank of Wikipedia france and search engine with ElasticSearch

Primary LanguageJava

TDLE

Project for the TDLE class in ENSIIE. This project should use elasticsearch, hadoop and map-reduce

Useful links

Elastic Search

Download it here: https://www.elastic.co/downloads/elasticsearch

Then, use ParsePageRank.java to parse the pagerank file into JSON format, which is needed to insert data into ElasticSearch.

To be able to insert everything at once, you need to increase the http request size in the file config/elasticsearch.yml: http.max_content_length: 500mb

You also need to increase the ElasticSearch heap size in the file config/jvm.options:

-Xms4G
-Xmx4G

Finally, type the following command in a shell:
curl -s -H "Content-Type: application/x-ndjson" -XPOST localhost:9200/_bulk --data-binary "@pagerank.json"
It may take a while so be patient. At the end, a lot of prints will be displayed in the shell.