/coursera-big-data

Repo for Coursera San Diego big data specialization

Primary LanguageJupyter NotebookGNU General Public License v2.0GPL-2.0

UC San Diego: Big Data, Version 1 Specialization

Certification here Specialization description here

I gained an understanding of what insights big data can provide through hands-on experience with the tools and systems used by big data scientists and engineers.

During this specialization I learned:

  • Basics of using Hadoop with MapReduce, Spark, Pig and Hive.
  • Perform predictive modeling and leverage graph analytics to model problems.
  • Ask the right questions about data, communicate effectively with data scientists, and do basic exploration of large, complex datasets.

In the final Capstone Project, I applied my skills to do basic analyses of big data. You can check the work for this capstone project here: https://github.com/carian2996/big_data/tree/master/capstone_project.

Major technologies used:

  1. PySpark
  2. Pandas
  3. Neo4j
  4. Splunk