- Kaggle: https://www.kaggle.com/c/walmart-recruiting-trip-type-classification
- Use gradient boosting
- Key features: ** Department > FinelineNumber > Upc**
- TripType - a categorical id representing the type of shopping trip the customer made. This is the ground truth that you are predicting. TripType_999 is an "other" category.
- VisitNumber - an id corresponding to a single trip by a single customer
- Weekday - the weekday of the trip
- Upc - the UPC number of the product purchased
- ScanCount - the number of the given item that was purchased. A negative value indicates a product return.
- DepartmentDescription - a high-level description of the item's department
- FinelineNumber - a more refined category for each of the products, created by Walmart
- PFA is a kind of feature selection(http://www.ifp.illinois.edu/~qitian/e_paper/icip02/icip02.pdf)
- Use the weight of PC as coordinate for each original feature
- Implement K means Clustering and find centroid
- Calculate euclidean distance between each centroid and elements
- Select
n_features
which are close to each centroid
join_dataframe.ipynb
: overview of a couple of dataframes which construct train data setmake_dataframe.ipynb
: explain details about how to create dataframes injoin_dataframe.ipynb