This practical examination encompasses a range of tasks focusing on data wrangling, descriptive statistics, data analytics, text analytics, and data visualization using Python and R programming languages. Below are the details of each problem statement along with instructions:
Description: Perform data wrangling operations using Python on an open-source dataset. Tasks include importing necessary libraries, locating open-source data, loading data into pandas dataframe, checking for missing values, data preprocessing, formatting, and normalization.
Description: Create an "Academic performance" dataset of students and perform data cleaning operations. Tasks involve handling missing values, inconsistencies, outliers, and applying data transformations.
Description: Compute summary statistics and measures of central tendency and variability on an open-source dataset. Tasks include grouping numeric variables by qualitative variables, providing summary statistics, and displaying basic statistical details of specific categories.
Description: Build a Linear Regression Model using Python/R to predict home prices using the Boston Housing Dataset. The dataset contains various parameters related to houses in Boston, and the objective is to predict house prices based on given features.
Description: Implement logistic regression using Python/R for classification on the Social Network Ads dataset. Compute the confusion matrix to evaluate the model's performance.
Description: Implement Simple Naïve Bayes classification algorithm using Python/R on the iris dataset. Compute the confusion matrix to evaluate the model's performance.
Description: Perform text analytics on a sample document. Tasks involve document preprocessing methods such as tokenization, POS tagging, stop words removal, stemming, and lemmatization. Create representations of documents using Term Frequency and Inverse Document Frequency.
Description: Visualize the Titanic dataset using Seaborn library. Plot a histogram to analyze the distribution of ticket prices for passengers.
Description: Visualize the Titanic dataset using box plots to analyze the distribution of age with respect to gender and survival status. Provide observations based on the statistics.
Description: Download and analyze the Iris flower dataset. Tasks include listing down features and their types, creating histograms and boxplots for feature distributions, and identifying outliers.
Description: Write a simple program in Scala using Apache Spark framework.
Note: Each problem statement includes specific instructions and tasks to be performed. Ensure proper documentation and explanation of the code along with the outputs for evaluation.
This README file provides an overview of the practical examination tasks and serves as a guide for students participating in the examination. Good luck!