/Mid-Test

Primary LanguageJupyter Notebook

Mid-Test

Description


  • Dalam repository ini terdapat 3 case yang harus diselesaikan yaitu case titanic, case baltimore, dan case victoria.

Case Titanic


  • Case titanic ini berada pada folder Case titanic yang terdiri dari code dan dataset.
  • Case titanic ini kita dituntut untuk menjawab beberapa pertanyaan sesuai dengan soal yang ada.
  • Beberapa pertanyaan yang harus dijawab adalah:
    • I. What is the dimension (col, row) of the data frame?
    • II. How to know data type of each variable?
    • III. How many passengers survived (Survived=1) and not-survived (Survived=0)?
    • IV. How to drop column ‘Name’ from the data frame?
    • V. Add one new column called ‘family’ to represent number of family-member aboard (hint: family = sibsp + parch)
    • VI. As shown, columns ‘Age’ contains missing values. Please add new column named ‘Age_miss’ to indicate whether Age is missing or not (Age_miss = ‘YES’ for missing value and ‘NO’ for non-missing value).
    • VII. Please fill Age missing value with means of existing Age values
    • VIII. What is the maximum passenger Age who survived from the tragedy?
    • IX. How many passengers survived from each ‘PClass’?
    • X. How to randomly split the data frame into 2 parts (titanic1 and titanic2) with proportion of 0.7 for tttanic1 and 0.3 for titanic2 ?

Case Water


  • Case baltimore ini berada pada folder Case Water yang terdiri dari code dan dataset.

  • Case ini terkait dengan Time Series yang harus diselesaikan dengan ARIMA.

  • Background : The Dataset provides the annual water usage in Baltimore from 1885 to 1963, or 79 years of data. The values are in the units of liters per capita per day, and there are 79 observations.

  • Objective : Create a python script for the Use Case below, please upload the script after you finish. The Problem is to predict annual water usage.

Case Victoria


  • Case baltimore ini berada pada folder Case Baltimore yang terdiri dari code dan dataset.
  • Terkait dengan memprediksi harga apartment.
  • Background : The Dataset provides living area and conservation status. There are 218 observations and 16 variables.
  • Objective : Create a python script for the Use Case below, please upload the script after you finish. The Problem is to determine the best model and give the reason. Tony who has profession as a broker wants to predict apartment price in Victoria based on living area environment and apartment conservation status. Determine the best model to predict and redefine new conservation variable in 3 level, A = 1A, B = 2A, C = 2B and 3A.