Some tutorials and demos on Hadoop, Spark, etc., mostly in the form of Jupyter notebooks.
- mapreduce_with_bash.ipynb An introduction to MapReduce using MapReduce Streaming and bash to create mapper and reducer
- simplest_mapreduce_bash_wordcount.ipynb A very basic MapReduce wordcount example
- mrjob_wordcount.ipynb A simple MapReduce job with mrjob
- Hadoop_spilling.ipynb Hadoop spilling explained
- TestDFSio.ipynb Demo of TestDFSio for benchmarking Hadoop clusters
- docker_for_beginners.md Docker for beginners: an introduction to the world of containers
- demoSparkSQLPython.ipynb Pyspark basic demo
- ngrams_with_pyspark.ipynb Basic example of ngrams generation with pyspark
- Encoding+dataframe+columns.ipynb Encoding Spark dataframe columns
- Unicode.ipynb Exploring Unicode categories ()
- polynomial_regression.ipynb Worked out example of polynomial regression with numpy
- generate_data_with_Faker.ipynb Generate fake data with the Faker Python library
- online_resources.md Online resources for learning Big Data