Pinned Repositories
bigdata-vandy.github.io
The blog home of bigdata-vandy
cm_csds
Cloudera Manager Custom Service Descriptors
data-getters
A collection of simple scripts for pulling data from various and sundry sources.
download-stack-dump
Python code to download archived Stack Exchange from https://archive.org/details/stackexchange
HBase-Standalone
HBase Standalone Tutorial
hdfs
Simple HDFS Demos
pyspark-notebook-example
spark-corenlp-demo
A demonstration of the Spark CoreNLP library from databricks
spark-wordcount
A brief demonstration of Spark functionality.
spark-xml-parse
Demonstration of XML parsing using the StackOverflow data dump.
bigdata-vandy's Repositories
bigdata-vandy/bigdata-vandy.github.io
The blog home of bigdata-vandy
bigdata-vandy/cm_csds
Cloudera Manager Custom Service Descriptors
bigdata-vandy/download-stack-dump
Python code to download archived Stack Exchange from https://archive.org/details/stackexchange
bigdata-vandy/pyspark-notebook-example
bigdata-vandy/spark-corenlp-demo
A demonstration of the Spark CoreNLP library from databricks
bigdata-vandy/spark-wordcount
A brief demonstration of Spark functionality.
bigdata-vandy/spark-xml-parse
Demonstration of XML parsing using the StackOverflow data dump.
bigdata-vandy/data-getters
A collection of simple scripts for pulling data from various and sundry sources.
bigdata-vandy/HBase-Standalone
HBase Standalone Tutorial
bigdata-vandy/hdfs
Simple HDFS Demos
bigdata-vandy/akka-demo
A basic demo of web-scraping using Akka (Scala-flavor!)
bigdata-vandy/csvToRDD
bigdata-vandy/mapreduce-wc
Wordcount with MapReduce, written in native Java
bigdata-vandy/password-cracker
A demonstration of distributed computation in Spark.
bigdata-vandy/pyspark_intro_vish
This is a brief Introduction to Pyspark
bigdata-vandy/scp-data-to-hdfs
Bash scripts for copying data to the Big Data cluster with SLURM
bigdata-vandy/spark-sem-classify
Classify SEM data using Spark-ML
bigdata-vandy/spark-taxi
Analyze NYC-TLC taxi trip data
bigdata-vandy/spark-wiki-learn
bigdata-vandy/stack-ex
Parse Stack Exchange data dump
bigdata-vandy/tweet-count
Count batch of Tweet records using Java implementation of MapReduce.