johnsonleee's Stars
kubernetes/kubernetes
Production-Grade Container Scheduling and Management
fxsjy/jieba
结巴中文分词
binux/pyspider
A Powerful Spider(Web Crawler) System in Python.
openedx/edx-platform
The Open edX LMS & Studio, powering education sites around the world!
docker-archive/docker-registry
This is **DEPRECATED**! Please go to https://github.com/docker/distribution
linux-test-project/ltp
Linux Test Project (mailing list: https://lists.linux.it/listinfo/ltp)
OryxProject/oryx
Oryx 2: Lambda architecture on Apache Spark, Apache Kafka for real-time large scale machine learning
memect/hao
好东西传送门
sequenceiq/hadoop-docker
Hadoop docker image
apache/incubator-stormcrawler
A scalable, mature and versatile web crawler based on Apache Storm
openshift/origin-server
OpenShift 2 (deprecated)
sequenceiq/docker-spark
XiaoMi/minos
Minos is beyond a hadoop deployment system.
commoncrawl/commoncrawl
Common Crawl support library to access 2008-2012 crawl archives (ARC files)
hortonworks/cloudbreak
CDP Public Cloud is an integrated analytics and data management platform deployed on cloud services. It offers broad data analytics and artificial intelligence functionality along with secure user access and data governance features.
DigitalPebble/behemoth
Behemoth is an open source platform for large scale document analysis based on Apache Hadoop.
jhorey/ferry
Ferry lets you define, run, and deploy big data applications on AWS, OpenStack, and your local machine using Docker
commoncrawl/commoncrawl-crawler
The Common Crawl Crawler Engine and Related MapReduce code (2008-2012)
xautlx/nutch-htmlunit
基于Apache Nutch和Htmlunit的扩展实现AJAX页面爬虫抓取解析插件
myrrix/myrrix-recommender
Stand-alone recommender system from Myrrix
BayanGroup/nutch-custom-search
mcsrainbow/ansible-playbook-cdh5
Ansible playbook of CDH5
commoncrawl/nutch
Common Crawl fork of Apache Nutch
momer/nutch-selenium
momer/nutch-selenium-grid-plugin
A Nutch 2.2.1 plugin which allows users to shuffle off the responsibility for retrieving pages to a selenium hub/node spoke system. This allows Nutch to rely on Selenium/Firefox to fetch and load javascript/content; while keeping Nutch in charge of what it does best: crawling and further parsing.
jatrost/accumulo-pig
AccumuloStorage module for Pig
kantone/nutch-jsoup
Precise data extraction with nutch and Jsoup css selector
vmware-serengeti/doc
informera/nutchManager
A GUI for nutch configuration
v5tech/Nutch1.0
Nutch1.0修改版(整合中文分词)源码修改,编译打包。