karanjeets
Data Scientist | Committer, PMC member at @apache | Member of @USCDataScience
Apple Inc.San Francisco
Pinned Repositories
drat
A distributed, parallelized (Map Reduce) wrapper around Apache RAT™ to allow it to complete on large code repositories of multiple file types where Apache RAT™ hangs forever.
banana
Banana for Solr - A Port of Kibana
drat-ontosoft
DRAT on OntoSoft code repositories
PCF-Nutch-on-Wrangler
A repository for Nutch crawl evaluation
Political-Inclination-NLP
Project for CSCI-544 - Applied Natural Language Processing
project-mango
Weapons Dashboard with D3 Visuals - CSCI572 - Assignment 3
SolrMerge
An open source project to merge Solr cores in an extremely customizable way.
AerosolDelta
Quantifying aerosol presence and composition over Earth's ice sheets and glaciers - mapping anthropogenic and natural aerosol patters and estimating changes over time
nutch-analytics
Nutch Crawl Analysis - Spark based project
sparkler
Spark-Crawler: Apache Nutch-like crawler that runs on Apache Spark.
karanjeets's Repositories
karanjeets/Political-Inclination-NLP
Project for CSCI-544 - Applied Natural Language Processing
karanjeets/SolrMerge
An open source project to merge Solr cores in an extremely customizable way.
karanjeets/banana
Banana for Solr - A Port of Kibana
karanjeets/drat-ontosoft
DRAT on OntoSoft code repositories
karanjeets/PCF-Nutch-on-Wrangler
A repository for Nutch crawl evaluation
karanjeets/project-mango
Weapons Dashboard with D3 Visuals - CSCI572 - Assignment 3
karanjeets/Applied-NLP
CSCI-544 (Applied Natural Language Processing) homework assignments
karanjeets/cdr-pipeline
Place where Nutch segments are extracted followed by post crawl analysis
karanjeets/counterfeit
Pilot for CE domain.
karanjeets/dd-eval
Domain Discovery Evaluation
karanjeets/drat
A distributed, parallelized (Map Reduce) wrapper around Apache™ RAT to allow it to complete on large code repositories of multiple file types where Apache™ RAT hangs forever.
karanjeets/essential-scala-code
Exercises for Inner Product's Essential Scala Course
karanjeets/felix
Mirror of Apache Felix
karanjeets/FocusedCrawl-Weapons
Nutch Protocol Interactive-Selenium handlers to fetch focused results from Weapons URL. This also includes some extractor scripts to fetch relevant seeds from the website.
karanjeets/karanjeet.github.io
karanjeets/memex-cdr
This repository hosts code and schema information related to the Memex Crawl Data Repository (CDR)
karanjeets/nba-analysis
Simple data engineering task
karanjeets/nutch
Mirror of Apache Nutch
karanjeets/NutchPlugin-HtmlUnit
This is a HtmlUnit plugin for Apache Nutch. Leverage headless browsing capability.
karanjeets/oodt
Mirror of Apache OODT
karanjeets/polar-domain-discovery
Domain Discovery on Polar Domain
karanjeets/sbt-release-test
Test SBT Release
karanjeets/SnapWorld
Capture your memories with SnapWorld and help others!
karanjeets/solrpy
Automatically exported from code.google.com/p/solrpy
karanjeets/sparkler
Spark-Crawler : Evolving Apache Nutch to run on Spark.
karanjeets/startbootstrap-sb-admin-2
A free, open source, Bootstrap admin theme created by Start Bootstrap
karanjeets/startbootstrap-scrolling-nav
An unstyled Bootstrap HTML template for creating smooth scrolling, one page websites - created by Start Bootstrap
karanjeets/tika
Mirror of Apache Tika
karanjeets/tika-python
karanjeets/youtube-privacy