This repository consists code for the feature creation from structured data using CNN technique, along with input data and output data
-
Traditionally analysts/data scientists used to create features using a manual process from domain/business knowledge. Often it’s called handcrafted feature engineering. While in data science we can’t deny the importance of domain knowledge, this type of feature engineering has some drawbacks:
-
Tedious: Manual feature engineering can be a tedious process. How many can new features be created from a list of parent variables? For example from the date variable, a data scientist can create 4 new features (month, year, hour, and day in the week) can be created. However, another data scientist can create 5 additional features(weekend indicator, festive season indicator, X-mass month indicator, seasonality index, week no in a month, and so on). Is there any relationship/interaction with any other variable? So manual feature engineering is limited both by human time constraints and imagination: we simply cannot conceive of every possible feature that will be useful.
-
Influence of Human Bias: More often than not any human being working on a particular domain/ modeling project, builds up deep bias for some features(especially if it’s created by that analyst earlier!), irrespective of whether it adds value to the model or not.
-
-
Here comes the power of automated feature engineering. Here no features can be created that are practically infinite and without any human bias. Also, this captures all possible complex non-linear interactions among features. Of course, we can apply dimension reduction/feature selection techniques at any point in time to get rid of redundant/zero-importance features.
full article link as follows: