- Demo
- Overview
- Motivation
- Problem Solving Steps
- Source of Dataset
- Exploratory Data Analysis
- Recommendation System
- Future scope of project
course-recommendation-06-10_12-29-54.mp4
This project helps recommend similar courses on udemy for which the user is searching for. The user enters the name or phrase of a subject of interest and related courses will be recommended and displayed. There is also and analysis dashboard that gives all the analysis of the udemy courses available in the dataset.
- Import the dataset
- Peform Text Preprocessing
- Perform Exploratory Data Analysis (EDA) an generate insights.
- Convert text to numveric values and calculate the cosine similarity score.
- After finding the similarity score, sort the values which have similar similarity score and recommend the course.
- Integrate the Recommendation System with Flask Framework.
- Deploy the web Application on a cloud platform
This dataset contains 3683 records of courses from 4 subjects (Business Finance, Graphic Design, Musical Instruments and Web Design) taken from Udemy.
Udemy is a massive online open course (MOOC) platform that offers both free and paid courses. Anybody can create a course, a business model by which allowed Udemy to have hundreds of thousands of courses. This version modifies column names, removes empty columns and aggregates everything into a single csv file for ease of use.
For the exploratory data analysis, we explored the dataset by trying to answer the following question to have a better understanding of the data;
- Course Title
- What is the most frequent words in course title?
- Longest/Shortest course title?
- How can we build recommendation systems via title using similarity?
- Most famous courses by number of subscribers?
- Subjects/Categories
- What is the distribution of subjects?
- How many courses per subject?
- Distribution of subjects per year?
- How many people purchase a particular subject?
- Which subjects is the most popular?
- Published Year
- Number of courses per year?
- Which year has the highest number of courses?
- What is the trend of courses per year?
- Levels
- How many levels do we have?
- What is the distribution of courses per level?
- Which subject have the highest levels?
- How many subscribers per level?
- How many courses per level?
- Duration of Course
- Which courses have the highest duration (paid and free)?
- Which courses have higher durations?
- Duration vs number of subscribers?
- Subscribers
- Which course have the highest number of subscribers?
- Average number of subscribers?
- Number of subscribers per subject?
- Number of subscribers per year?
- Price
- What is the average price of a course?
- What is the minimum and maximum price?
- How much does Udemy earn?
- The most profitable courses?
- Correlation
- Does number of subscribers depend on;
- Number of reviews?
- Price?
- Number of lectures?
- Content duration?
- Does number of subscribers depend on;
- Question on Time
- Published Year
- Number of courses per year?
- Distribution of subjects per year?
- Which year has the highest number of courses?
- What is the trend of courses per year?
- Published Year
Algorithms
- Cosine Similarity
- Linear Similarity
Workflow
- Import Dataset
- Vectorize Dataset
- Cosine Similarity Matrix
- ID Score
- Recommend
For building the course recommendation system, we will be working with only the course_title
column only. We start by cleaning the course_title
using neattext.functions
column as it is a text data.
neattext
is a simple Natural Language Processing package for cleaning text data and pre-processing text data. It can be used to clean sentences, extract emails, phone numbers, web links, and emojis from sentences.
The data set for thid project included only 4 categories of courses (Business Finance, Graphic Design, Musical Instruments and Web Design). More catecories of courses could be added.