New-Yorker-case-study

Firstly install necessary packages preferably inside a virtual environment

  pip install -r requirements.txt

Problem 1- Reporting

For detailed implementation please refer ./nbs/exploratory_report.ipynb

Report Design for Items Frequently and Rarely sold together

A) May not be feassible or logical to consider all 105542 articles
B) Article_id may not be suitable aggregator(eg. Strap top color black vs. Strap top color White as different articles) as multiple articles are too identical have comparable pattern.product_code level or above hierarchy report might be suitable.
C) Flexibility- In terms of selection of hierarchy,filtering condition,no of Frequently and Rarely sold products(eg. for each item we may report any no of Frequently co-purchased items for any hierarchy)
D) Individual vs. All- Based on requirement reports for all products(total no vary based on fitering criteria) or customized for few products. Can be explored a particular product in detail.

Instruction to generate Reports(both for all items and customized)

Clone the repository

      git clone git@github.com:nitsourish/New-Yorker-case-study.git

All inputs files are uploaded except for transactions_train.csv (Dataframe of the purchases each customer for each date with article id) because of size.Make sure to have it under ./data folder.
To run and produce results interactively using NoteBook or running scripts

scripts

./src/report_all.py
./src/individual_report.py

Please refer - List of CMD argparse

parser.add_argument('-m','--min_cnt', type = int, metavar = '',required=True,default=0,help='minimum order count for eligible article_id')
parser.add_argument('-p','--penetration', type = float, metavar = '',required=True,help='Fraction of unique customers for eligible article_id')
parser.add_argument('-ph','--product_hierarchy', type = str, metavar = '',required=True,help='product_hierarchy(article/product/product_type/product_group) of reporting')
parser.add_argument('-n','--num_prod', type = int, metavar = '',required=True, help='number of often/rarely purchased products for each products')
parser.add_argument('-all','--all', type = bool, metavar = '',required=False,default=True,help = 'boolean to indicate if report for all items in hierarchy')
parser.add_argument('-pl','--prods', type = list, metavar = '',required=False, default=[], help='list of items if all = False')
parser.add_argument('-prod','--product_name', type = str, metavar = '',required=True, help='name of the product for reporting')

Example script run from CMD

For any script to explore CMD argparse

python report_all.py --help

A) For all items

      python report_all.py -p 0.01 -ph 'prod_name' -n 3 -m 100

B) For list of items

With same command line instruction, make following change in script: -- a) pass list of items in function argument prods with all = False(prepare_all_products_report(df,product_hierarchy = 'prod_name',n=3,all=False,prods = ['Jade HW Skinny Denim TRS','Tilda tank'])) -- b) Alternatively can be provided as CMD argparse with defined format

C) For individual item detailed report

    python individual.py -p 0.01 -prod 'Perrie Slim Mom Denim TRS' -n 3 -m 100 > output.log

Problem 2: Interesting Problem - Product level Forecasting(of filtered products)

For detailed implementation please refer ./nbs/time_series.ipynb

Design

A) Product specific Models
B) Unit of data- Daily level
C) With approx. 2 years of data, forecasting is for last 2 weeks(14 days) and model trained on rest of the data(for each product) with considers window_len(lagged features) of 30 days.
D) Used sktime to build scikit-learn compatible regression model for time series forecasting
E) used vanila temporal_train_test_split to split data with 2 weeks(14 days) as test_size.
F) For quick implementation assume quarterly and additive seasonality and Polynomial trend of degree 1
G) I used TransformedTargetForecaster from sktime to build Forecasting pipeline using default XGBoost as regressor

Assumption

A) Daily price will be available during forecasting period
B) For practical implementation next day forecasting can be done based on last day price level

Assumption

A) Daily price will be available during forecasting period
B) Alternatively next day forecasting can be done based on last day price level

Instruction to generate results

clone the repository(git@github.com:nitsourish/New-Yorker-case-study.git)
All inputs files are uploaded except for transactions_train.csv (Dataframe of the purchases each customer for each date with article id) because of size.Make sure to have it under ./data folder.
The productwise trained models also not uploaded.
To run and produce results interactively using NoteBook or running scripts

scripts

./src/forecasting_train_validation.py
./src/individual_report.py

Example script run from CMD

A) For Model Data prep and model training-validation

   python forecasting_train_validation.py > output.log

(To change filtering criteria for eligible product selection need to change penetration and or cum_sales_fraction arguments of function data_prep. For example to select more product reducee the penetration value to penetration=0.001,i.e. data_prep(transaction = train,products=products,penetration=0.001, cum_sales_fraction=0.0)

B) For forecasting/inference

As the models are not uploaded please run forecasting_train_validation.py first

   python forecasting_inference.py > output.log