Savitribai Phule Pune University

Department of Artificial Intelligence and Data Science

TE AI&DS Software Laboratory –III

Problem Statement Overview:

This practical examination encompasses a range of tasks focusing on data wrangling, descriptive statistics, data analytics, text analytics, and data visualization using Python and R programming languages. Below are the details of each problem statement along with instructions:

Problem Statement 1: Data Wrangling I

Description: Perform data wrangling operations using Python on an open-source dataset. Tasks include importing necessary libraries, locating open-source data, loading data into pandas dataframe, checking for missing values, data preprocessing, formatting, and normalization.

Problem Statement 2: Data Wrangling II

Description: Create an "Academic performance" dataset of students and perform data cleaning operations. Tasks involve handling missing values, inconsistencies, outliers, and applying data transformations.

Problem Statement 3: Descriptive Statistics

Description: Compute summary statistics and measures of central tendency and variability on an open-source dataset. Tasks include grouping numeric variables by qualitative variables, providing summary statistics, and displaying basic statistical details of specific categories.

Problem Statement 4: Data Analytics I

Description: Build a Linear Regression Model using Python/R to predict home prices using the Boston Housing Dataset. The dataset contains various parameters related to houses in Boston, and the objective is to predict house prices based on given features.

Problem Statement 5: Data Analytics II

Description: Implement logistic regression using Python/R for classification on the Social Network Ads dataset. Compute the confusion matrix to evaluate the model's performance.

Problem Statement 6: Data Analytics III

Description: Implement Simple Naïve Bayes classification algorithm using Python/R on the iris dataset. Compute the confusion matrix to evaluate the model's performance.

Problem Statement 7: Text Analytics

Description: Perform text analytics on a sample document. Tasks involve document preprocessing methods such as tokenization, POS tagging, stop words removal, stemming, and lemmatization. Create representations of documents using Term Frequency and Inverse Document Frequency.

Problem Statement 8: Data Visualization I

Description: Visualize the Titanic dataset using Seaborn library. Plot a histogram to analyze the distribution of ticket prices for passengers.

Problem Statement 9: Data Visualization II

Description: Visualize the Titanic dataset using box plots to analyze the distribution of age with respect to gender and survival status. Provide observations based on the statistics.

Problem Statement 10: Data Visualization III

Description: Download and analyze the Iris flower dataset. Tasks include listing down features and their types, creating histograms and boxplots for feature distributions, and identifying outliers.

Problem Statement 11: Scala Program with Apache Spark

Description: Write a simple program in Scala using Apache Spark framework.

Note: Each problem statement includes specific instructions and tasks to be performed. Ensure proper documentation and explanation of the code along with the outputs for evaluation.

This README file provides an overview of the practical examination tasks and serves as a guide for students participating in the examination. Good luck!

Yashrajgk/ds