Data_Analytics_Experience_Program

This repository will contain all my work and tasks that I'll be doing in the Data Analytics Experience Program By Entry Level. For starters let me introduce you to the business problem and datasets

TASK 1

READ THE BUSINESS PROBLEM BELOW.

You’re a Data Analyst working for the education tech company Udemy. You have been asked by your manager, Head of Curriculum at Udemy, to present the data on course revenue, and you have been provided with data on courses from different topics to understand where opportunities to increase revenue may lie, and track the performance of courses.

Your manager has suggested encouraging Web Development courses to charge more because she believes that these are the most popular courses. She needs to send a report to the CEO in the next three weeks on how they will increase their next quarterly earnings.

What is the business problem?

The business problem as stated by the stakeholder is how can they increase their quarterly sales and what category of courses should be promoted in order to boost their earnings.
The manager has also added that we should charge more of the Web Development courses because according to her these were more popular among the learners.

What data should be collected to understand this problem? How should it be presented?

  • Data related to every category of courses should be collected.
  • Data regarding the number of people who sign up for the courses and pattern do they follow in enrolling for the courses.
  • Data should be presented in both a fact and dimension table for better clarity and understanding.

What questions would you ask to better understand the business problem?

  • What would be the targeted quarterly revenue.
  • What type of courses are people most likely to enroll.
  • If a person is signing up for a certain course, what according to them would be a follow up course.
  • Which course category has the most number of subscribers.
  • Which course charges the most and how many subscribers does it have?

DATA SOURCES

Data comes from multiple sources, and data analysts often gather data from multiple sources and combine these for data analysis. The process of gathering data from these sources and presenting it is called data consolidation. Data consolidation is a crucial step, as the accuracy of the insights from your data analysis depends heavily on the quality of data used.

DATA CLEANING

Good data is essential when using data to derive insights and make business decisions.

Garbage in, garbage out, is a concept common to computer science and mathematics that can be applied in Data Analysis - the quality of output (your insights generated) are determined by the quality of the input.

This is where Data cleaning comes in - Data cleaning is the process of preparing data for analysis by removing or modifying data that is incorrect, incomplete, irrelevant, duplicated, or improperly formatted.

Data cleaning plays an important role in the analytical process and making sure that the answers we uncover are reliable and of high quality.

We will be going through a few functions to clean our data in excel.

Use the below functions to clean your data.

  • Remove duplicates

Select the entire data sheet data to remove duplicates from. Data > Remove Duplicates

  • Removing blank cells

Select the entire sheet or dataset and go to DataCreate a filter Click on the Filter icon at the top of any column, then click on Filter by condition and select ‘is empty’. Blank cells will arise to the top of the sheet and can be removed.

  • Headers

Ensure you have clear and concise names for headers and use dashes or underscores in between words to make it easier to parse later on.

  • Find and replace

If you examine the data, you will see that the Web Development subject title is not the same as other subject titles. Use the Find and Replace function to make the Web Development subject consistent with other subjects. Edit > Find and Replace.

Click here to view the updated and cleaned datasets for reference.


TASK 2

With your cleaned Google sheet, we’re going to perform some functions to get additional data on the Udemy courses to consolidate and present this clearly. To do this, we will be using the Right/Left Function, IF function, VLookup and we will create Pivot Tables and Charts.

Right/Left Function

While there is a published time available for each course, we want to know the date that each course was published without the time stamp. Use the Right or Left function to create a ‘Date’ Column.

IF Function

We want to know how many courses are free and how many are paid. Use the IF Function to create a column that indicates whether a course is free or paid.

Pivot Tables and Charts

Create the following Pivot Tables and Charts:

  • Total number of subscribers for each subject (Pie Chart)
  • Average number of subscribers for each subject (Bar Chart)
  • Average cost per subject at each level (Bar Chart)
  • Average content duration for each subject (Bar Chart)
  • Average rating per subject for each level (Column Chart)
  • Any other data that you feel will be important to include in your analysis

Click here to view the final dataset


TASK 3

To Create a tableau dashboard.

Link to the dashboard


TASK 4

Understanding the Problem

Once you have framed the problem and gathered initial insights from the data, you can ask the following questions as you dig deeper into your analysis.

What do you see happening? What are the specific symptoms of the problem? What is your hypothesis for the cause of the problem? Five 'Whys' approach

Once you have the problem better defined, you can use Root Cause Analysis to get deeper into the ‘why’ of the business problem. We do this with the 5 Why’s approach, by simply asking ‘why’ five times.

While asking Why is easy, what we're interested in is the answer. Each time you answer why the next time gets more difficult as you have to think deeper behind the reasons for this. As you ask why, you may find that you have multiple answers for the same question.