housing-data-analysis

Developing an algorithm that can forecast the prices of the houses.

Preliminary Analysis and Insights

• Insights: In the mean of Prices and Lot size is greater than the median which depicts that the data is positively skewed. • It is clearly seen that for approximately every three bedrooms the house would contain one bathroom, have two stories and at least one garage. • If a box plot is built for this dataset, it may not provide a huge amount of information particularly for variables other than price and Lot size as the quartiles do not show accuracy when the dataset is small. • All these factors help us choose the algorithm for further analysis.

Identify independent variables will help forecast Price in order to minimize the number of incorrect predictions

• According to me, from the dataset mainly factors like Lot size, Bedroom and Bathroom are the independent variables that help us forecast Price for. • As it depicts in the above line graph there is a spike around 300 to 400 (ID) in both Prices as well as Lot size, this shows there is a strong/high correlation between the two variables. • Along with Lot size, Bedroom and Stories are a catalyst too for the valuation and pricing for the houses for any buyer and they hold significance which affects the prices of the house. • Also, in my opinion, even if it is not included in the dataset one of the critical points to consider forecasting prices should be the location of the house. The proximity of supermarkets, hospitals, schools, bus stops, airports, businesses, and other necessary places does matter as the value will go soaring high if the house is located around these infrastructures.

Using the CRISP-DM model create an outline of how you would develop an algorithm to generate insights

  1. Business Understanding • For a SmartPhone manufacturing company like OnePlus, the goal is to be the sole premium flagship smartphone manufacturer in the market, which is a huge challenge. • With OnePlus 7/Pro/7T they aim to compete toe to toe with Apple's iPhone, Samsung’s Note/Galaxy, Googles Pixel Phones who rule the market. • Constraints include the brand name to be synonymous with quality smartphones, Apple, Google, and Samsung have their own audience which OnePlus may struggle to reach the pace they want to achieve this task considering the international markets. • Apple, Google, Samsung all three companies build products other than smartphones, so they have a trust factor attached to their products. OnePlus builds only smartphones so they may have to find a way to gain trust in the consumers in different ways. Brand Loyalty is what they should address

  2. Data Understanding • Data to be collected and understood for this problem must be in relation to the element which all the competitors have and they do not in terms of the trust. • Data that helps us to determine what could be the possible point of interest for a consumer to buy a OnePlus if he/she is an Apple user or an Android user. • This kind of Data Analysis may not be done by any other company until now as no one has got so close to the other giants as close as OnePlus has got so the dynamics are unique for this kind of problem so to understand what kind of data is required is crucial.

  3. Data Preparation • Data Collection and Data Preparation may consume the most time as it requires ample efforts in terms of money, manpower and time to bring a solution to this problem. • The data acquired needs to be refined properly and the null values must be handled in a careful manner as it creates a humongous difference further for analysis. • Proper treatment should be done to the outliers in this stage itself so we can avoid any anomalies.

  4. Modeling • A proper Model is chosen to keep in mind the type of data and the problem we want to solve for better sales of the smartphones. • Certain Machine Learning Algorithms have prerequisites to be fulfilled which needs to be done before choosing the model • Ensemble modeling can be an option too if more than one model seems to be suitable for a sensible solution.

  5. Evaluation • To evaluate, the company needs to be very efficient in doing so as this is the last step before deployment for the model. • A number of methods need to be considered to evaluate the model if it shall provide the solution or not when it goes live. • The model needs to be tested and validated before the deployment. • A Final review must be done by the project managers, if there are any errors they need to be solved at this stage or else the whole project may go into jeopardy.

  6. Deployment • When the model is deployed, the plan to monitor and maintain its accuracy should be given the utmost priority. • There may be support required at this stage as there will be responses that are to be recorded and to be used for converting these insights into decisions. • The findings are to be presented to the Board and a collective decision needs to taken keeping in mind the results that the models derive to.