This is the code repository for The Ultimate Hands-on Hadoop [Video], published by Packt. It contains all the supporting project files necessary to work through the video course from start to finish.
The world of Hadoop and "Big Data" can be intimidating - hundreds of different technologies with cryptic names form the Hadoop ecosystem. With this course, you'll not only understand what those systems are and how they fit together - but you'll go hands-on and learn how to use them to solve real business problems!This course is comprehensive, covering over 25 different technologies in over 14 hours of video lectures. It's filled with hands-on activities and exercises, so you get some real experience in using Hadoop - it's not just theory.You'll find a range of activities in this course for people at every level. If you're a project manager who just wants to learn the buzzwords, there are web UI's for many of the activities in the course that require no programming knowledge. If you're comfortable with command lines, we'll show you how to work with them too. And if you're a programmer, I'll challenge you with writing real scripts on a Hadoop system using Scala, Pig Latin, and Python.
- Improve your understanding of descriptive statistics and apply them over a dataset.
- Learn how to deal with missing data and outliers to resolve data inconsistencies.
- Explore various visualization techniques for bivariate and multivariate analysis.
- Enhance your programming skills and master data exploration and visualization in Python.
- Learn multidimensional analysis and reduction techniques.
- Master advanced visualization techniques (such as heatmaps) for better analysis and rapidly broaden your understanding
To fully benefit from the coverage included in this course, you will need:
Software engineers and programmers who want to understand the larger Hadoop ecosystem, and use it to store, analyze, and vend "big data" at scale.
Project, program, or product managers who want to understand the lingo and high-level architecture of Hadoop.
Data analysts and database administrators who are curious about Hadoop and how it relates to their work.
System architects who need to understand the components available in the Hadoop ecosystem, and how they fit together.
This course has the following software requirements:
NA