pip install -r requirements.txt
- A) May not be feassible or logical to consider all 105542 articles
- B) Article_id may not be suitable aggregator(eg. Strap top color black vs. Strap top color White as different articles) as multiple articles are too identical have comparable pattern.product_code level or above hierarchy report might be suitable.
- C) Flexibility- In terms of selection of hierarchy,filtering condition,no of Frequently and Rarely sold products(eg. for each item we may report any no of Frequently co-purchased items for any hierarchy)
- D) Individual vs. All- Based on requirement reports for all products(total no vary based on fitering criteria) or customized for few products. Can be explored a particular product in detail.
- Clone the repository
git clone git@github.com:nitsourish/New-Yorker-case-study.git
-
All inputs files are uploaded except for transactions_train.csv (Dataframe of the purchases each customer for each date with article id) because of size.Make sure to have it under ./data folder.
-
To run and produce results interactively using NoteBook or running scripts
- ./src/report_all.py
- ./src/individual_report.py
parser.add_argument('-m','--min_cnt', type = int, metavar = '',required=True,default=0,help='minimum order count for eligible article_id')
parser.add_argument('-p','--penetration', type = float, metavar = '',required=True,help='Fraction of unique customers for eligible article_id')
parser.add_argument('-ph','--product_hierarchy', type = str, metavar = '',required=True,help='product_hierarchy(article/product/product_type/product_group) of reporting')
parser.add_argument('-n','--num_prod', type = int, metavar = '',required=True, help='number of often/rarely purchased products for each products')
parser.add_argument('-all','--all', type = bool, metavar = '',required=False,default=True,help = 'boolean to indicate if report for all items in hierarchy')
parser.add_argument('-pl','--prods', type = list, metavar = '',required=False, default=[], help='list of items if all = False')
parser.add_argument('-prod','--product_name', type = str, metavar = '',required=True, help='name of the product for reporting')
- For any script to explore CMD argparse
python report_all.py --help
- A) For all items
python report_all.py -p 0.01 -ph 'prod_name' -n 3 -m 100
- B) For list of items
With same command line instruction, make following change in script: -- a) pass list of items in function argument prods with all = False(prepare_all_products_report(df,product_hierarchy = 'prod_name',n=3,all=False,prods = ['Jade HW Skinny Denim TRS','Tilda tank'])) -- b) Alternatively can be provided as CMD argparse with defined format
- C) For individual item detailed report
python individual.py -p 0.01 -prod 'Perrie Slim Mom Denim TRS' -n 3 -m 100 > output.log
- A) Product specific Models
- B) Unit of data- Daily level
- C) With approx. 2 years of data, forecasting is for last 2 weeks(14 days) and model trained on rest of the data(for each product) with considers window_len(lagged features) of 30 days.
- D) Used sktime to build scikit-learn compatible regression model for time series forecasting
- E) used vanila temporal_train_test_split to split data with 2 weeks(14 days) as test_size.
- F) For quick implementation assume quarterly and additive seasonality and Polynomial trend of degree 1
- G) I used TransformedTargetForecaster from sktime to build Forecasting pipeline using default XGBoost as regressor
- A) Daily price will be available during forecasting period
- B) For practical implementation next day forecasting can be done based on last day price level
- A) Daily price will be available during forecasting period
- B) Alternatively next day forecasting can be done based on last day price level
- clone the repository(git@github.com:nitsourish/New-Yorker-case-study.git)
- All inputs files are uploaded except for transactions_train.csv (Dataframe of the purchases each customer for each date with article id) because of size.Make sure to have it under ./data folder.
- The productwise trained models also not uploaded.
- To run and produce results interactively using NoteBook or running scripts
- ./src/forecasting_train_validation.py
- ./src/individual_report.py
- A) For Model Data prep and model training-validation
python forecasting_train_validation.py > output.log
(To change filtering criteria for eligible product selection need to change penetration and or cum_sales_fraction arguments of function data_prep. For example to select more product reducee the penetration value to penetration=0.001,i.e. data_prep(transaction = train,products=products,penetration=0.001, cum_sales_fraction=0.0)
-
B) For forecasting/inference
As the models are not uploaded please run
forecasting_train_validation.py first
python forecasting_inference.py > output.log