A curated list of open datasets to study E-Commerce from the fields of Marketing, Economics, Operations, and Computer Science.
- Brazilian eCommerce - https://www.kaggle.com/datasets/olistbr/brazilian-ecommerce
- Open CDP - https://rees46.com/en/datasets
- JD.com 2014 - https://www.yongfeng.me/dataset/
- JD.com 2020 (MSOM-20) - https://connect.informs.org/msom/events/datadriven2020
- Alibaba Ads (IJCAI-18) - https://tianchi.aliyun.com/dataset/147588
- Alibaba Mobile (6GB) - https://www.yongfeng.me/dataset/
- Coveo Shopping (SIGIR-21) - https://github.com/coveooss/SIGIR-ecom-data-challenge#how-to-start
- Retail Rocket - https://www.kaggle.com/datasets/retailrocket/ecommerce-dataset?select=events.csv
- Google Merchandise - https://support.google.com/analytics/answer/7586738#
- Shopee - https://www.kaggle.com/datasets/davydev/shopee-code-league-20
- Flipkart - https://www.kaggle.com/datasets/iyumrahul/flipkartsalesdataset?select=Sales.csv
- Pakistan e-commerce - https://www.kaggle.com/datasets/zusmani/pakistans-largest-ecommerce-dataset
Browsing and search logs that may not have prices, purchases, or both.
- Amazon Sessions - https://www.aicrowd.com/challenges/amazon-kdd-cup-23-multilingual-recommendation-challenge
- JD.com Search - https://github.com/rucliujn/JDsearch
- BestBuy - https://www.kaggle.com/c/acm-sf-chapter-hackathon-big/data
- Rakuten - https://sigir-ecom.github.io/ecom2018/data-task.html
- Wayfair Search - https://github.com/wayfair/WANDS
- Criteo Display Advertising - https://ailab.criteo.com/ressources/
- Avazu - https://www.kaggle.com/competitions/avazu-ctr-prediction/overview
- Yoyi - https://apex.sjtu.edu.cn/datasets/7
- Ele Search - https://tianchi.aliyun.com/dataset/120281
- Ele Clickstream - https://tianchi.aliyun.com/dataset/131047
- Alibaba Industrial Dump (150GB) - https://tianchi.aliyun.com/dataset/81505
- Alibaba Fashion Combo - https://tianchi.aliyun.com/dataset/131519
- Alibaba Brick and Mortar (IJCAI-16) - https://tianchi.aliyun.com/dataset/53
- Alibaba Mobile 2021 - https://tianchi.aliyun.com/dataset/109858
- Alibaba Clickstream 2018 - https://tianchi.aliyun.com/dataset/56
- Alibaba Cloud Theme - https://tianchi.aliyun.com/dataset/9716
- Alibaba Ads - https://tianchi.aliyun.com/dataset/148347
- Alibaba User Behavior 2018 - https://tianchi.aliyun.com/dataset/649
- Online Shopping - https://www.kaggle.com/datasets/henrysue/online-shoppers-intention
Product characteristics include images, descriptions, and reviews, but no user activity or purchase data.
- Amazon Reviews - https://cseweb.ucsd.edu/~jmcauley/datasets.html#amazon_reviews
- Video, Audio, Text - https://xiaodongsuper.github.io/M5Product_dataset/index.html
- Tmall reviews - https://tianchi.aliyun.com/dataset/140281
- Home Depot - https://www.kaggle.com/datasets/thedevastator/the-home-depot-products-dataset
- Innerwear - https://www.kaggle.com/datasets/PromptCloudHQ/innerwear-data-from-victorias-secret-and-others
- Flipkart products - https://www.kaggle.com/datasets/PromptCloudHQ/flipkart-products
- Stanford Datasets (Amazon and Beer Reviews) - https://snap.stanford.edu/data/#amazon
- Metacritic Video Games - https://tianchi.aliyun.com/dataset/144719
- Goodreads - https://mengtingwan.github.io/data/goodreads.html#datasets
Individual or aggregate sales, typically for time-series forecasting.
- Wallmart (M5) - https://www.kaggle.com/competitions/m5-forecasting-accuracy/
- Ecuador Grocery - https://www.kaggle.com/competitions/favorita-grocery-sales-forecasting/data
- Ukraine ecommerce - https://www.kaggle.com/datasets/picklenik/fozzy-group-hack4retail/data
- Office Supplies - https://sites.google.com/view/dmdaworkshop2023/data-challenge
- Brazilian Drugs - https://www.kaggle.com/datasets/tiagoacardoso/venda-medicamentos-controlados-anvisa
- Indian Sales - https://www.kaggle.com/datasets/girishvutukuri/sales-forecasting-for-small-basket?select=train.csv
- Wallmart Sales - https://www.kaggle.com/datasets/yogesh174/wallmart-sales/data
- Montgomery Liquor - https://data.montgomerycountymd.gov/Community-Recreation/Warehouse-and-Retail-Sales/v76h-r7br
- Iowa Liquor - https://data.iowa.gov/Sales-Distribution/Iowa-Liquor-Sales/m3tr-qhgy/data
- Brazil Medical - https://www.kaggle.com/datasets/tgomesjuliana/brazil-medicine-sales?select=EDA_Industrializados_202002.csv
- Store Item - https://www.kaggle.com/c/demand-forecasting-kernels-only/
- Italian Grocers - https://data.mendeley.com/datasets/s8dgbs3rng/1
Prices and Purchases from supermarket, smaller stores,or Grocery often with limited details.
- Tesco Data - https://figshare.com/collections/Tesco_Grocery_1_0/4769354
- Rossman Store - https://www.kaggle.com/c/rossmann-store-sales
- Polish Grocery - https://www.kaggle.com/datasets/agatii/total-sale-2018-yearly-data-of-grocery-shop/data
- UK Gift Shop - http://archive.ics.uci.edu/dataset/352/online+retail
- Turkish Drugs - https://www.kaggle.com/datasets/emrahaydemr/drug-sales-data
- NYC Shopping - https://www.kaggle.com/datasets/pigment/big-sales-data
- Mexican Grocery - https://www.kaggle.com/datasets/martinezjosegpe/grocery-store/data
- Vietnam Supermarket - https://www.kaggle.com/datasets/tienanh2003/sales-and-inventory-snapshot-data
- Indian Grocery - https://www.kaggle.com/datasets/aryansingh95/flipkart-grocery-transaction-and-product-details?select=fact_sales_apr1.csv
- Instacart - https://www.kaggle.com/competitions/instacart-market-basket-analysis/data
- Israeli Grocery - https://www.kaggle.com/datasets/arielpazsawicki/kimonaim?select=shufersalist.db
- Brazilian store - https://www.kaggle.com/datasets/marcio486/sales-data-for-a-chain-of-brazilian-stores
- Dominiks Soft drinks - https://www.chicagobooth.edu/research/kilts/research-data/dominicks
- Indonesian Fashion - https://www.kaggle.com/datasets/latifahhukma/fashion-campus
- Diginetica Fashion - https://competitions.codalab.org/competitions/11161
- Dressipi Fashion - http://www.recsyschallenge.com/2022/dataset.html
- Fashion-Minst https://github.com/zalandoresearch/fashion-mnist
Market level or transaction data
- BLP US Car data - https://pyblp.readthedocs.io/en/stable/_notebooks/tutorial/blp.html
- European Car Market - https://sites.google.com/site/frankverbo/data-and-software/data-set-on-the-european-car-market?authuser=0
- Russian Car Market - https://www.kaggle.com/datasets/ekibee/car-sales-information
- German Used Cars - https://www.kaggle.com/datasets/gogotchuri/myautogecardetails
- Indian Automobiles - https://www.kaggle.com/datasets/zubairatha/revving-up-telangana-vehicle-sales-2023
Travel bookings and transactions.
- Fliggy Travel - https://tianchi.aliyun.com/dataset/113649
- Fliggy Transfers - https://tianchi.aliyun.com/dataset/140721
- Expedia - https://www.kaggle.com/datasets/vijeetnigam26/expedia-hotel
- Trivago Travel - https://recsys.trivago.cloud/challenge/dataset/
- NetEase Music - https://connect.informs.org/rmp/awards/data-competition
- Spotify - https://research.atspotify.com/datasets/
- Bandcamp Music sales - https://components.one/datasets/bandcamp-sales
- Yahoo Music Reviews - https://webscope.sandbox.yahoo.com/catalog.php?datatype=c
- Online Auctions - https://www.modelingonlineauctions.com/datasets
- Crypto Art - https://www.kaggle.com/datasets/franceschet/superrare?select=bids.csv
- Ukraine Procurement - https://www.kaggle.com/datasets/oleksastepaniuk/prozorro-public-procurement-dataset
- Romania Tenders - https://www.kaggle.com/datasets/gpreda/public-tenders-romania-20072016
- Art Auction - https://www.kaggle.com/datasets/quillen/artists-for-lahaina-2023
- Used Car Auction - https://www.kaggle.com/datasets/asimzahid/pakistans-largest-pakwheels-automobiles-listings
- Bidoo https://www.kaggle.com/datasets/federicominutoli/bidoo-closed-auctions
Bidding logs, with ad-clickstream if available.
- Ipinyou http://contest.ipinyou.com
- Alibaba https://tianchi.aliyun.com/dataset/148347
- Adform https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/TADBY7
- Tencent https://algo.qq.com/index.html
- Outbrain https://www.kaggle.com/c/outbrain-click-prediction
- Soso https://www.kaggle.com/competitions/kddcup2012-track2/overview
- Yahoo https://webscope.sandbox.yahoo.com/catalog.php?datatype=a
- Display Advertising - https://www.kaggle.com/datasets/saurav9786/real-time-advertisers-auction
- ICPSR https://www.openicpsr.org/openicpsr/search/studies?start=0&ARCHIVE=openicpsr&sort=score desc%2CDATEUPDATED desc&rows=25&q=auction
- Replication Data (Harvard) https://dataverse.harvard.edu/dataverse/harvard?q=auction
Supply-side data on delivery, logistics, freight, etc.
- MSOM18 - https://tianchi.aliyun.com/competition/entrance/231623/information
- MSOM20 - https://connect.informs.org/msom/events/datadriven2020
- MSOM21 - https://pubsonline.informs.org/page/msom/data-driven-challenge
- Amazon Last Mile - https://registry.opendata.aws/amazon-last-mile-challenges/
- DataCo - https://tianchi.aliyun.com/dataset/89959
- Drone Delivery - https://tianchi.aliyun.com/dataset/89726
- Brewery Operations - https://www.kaggle.com/datasets/ankurnapa/brewery-operations-and-market-analysis-dataset
Data from property assessment agencies typically containing house prices and transactions or rental bookings
- AirBnb - http://insideairbnb.com/explore
- Chicago - https://datacatalog.cookcountyil.gov/
- New York - https://data.cityofnewyork.us/Housing-Development/NYC-Calendar-Sales-Archive-/uzf5-f8n2/about_data
Open source data by firms and researchers.
- Yahoo - https://webscope.sandbox.yahoo.com/
- Yelp - https://www.yelp.com/dataset
- Yandex - https://research.yandex.com/datasets
- Facebook - https://fort.fb.com/researcher-datasets
- Microsoft - https://www.microsoft.com/en-us/research/tools/?
- Amazon AWS - https://registry.opendata.aws/
- Netflix - https://www.kaggle.com/datasets/netflix-inc/netflix-prize-data
- JD.com - https://datascience.jd.com/page/opendataset.html
- Rakuten - https://rit.rakuten.com/data_release/#access
- IBM - https://developer.ibm.com/technologies/artificial-intelligence/data/
- Baidu - https://ai.baidu.com/broad/download
- AirBnb - http://insideairbnb.com/get-the-data/
- Yongfeng - https://www.yongfeng.me/dataset/
- Julian McAuley - https://cseweb.ucsd.edu/~jmcauley/datasets.html
- Makridakis - https://forecasters.org/resources/time-series-data/
Competitions and Conferences related to eCommerce and Recommendation Systems.
- RecSys - https://github.com/RUCAIBox/RecSysDatasets
- NIPS - https://nips.cc/
- ICJAI - https://www.ijcai.org/
- MSOM - https://pubsonline.informs.org/journal/msom
- Data Mining Cups - https://www.data-mining-cup.com/reviews/
- KDD Cup - https://kdd.org/kdd-cup
- Marketing Science - https://pubsonline.informs.org/page/mksc/online-databases
- Driven Data - https://www.drivendata.org/
- Coda Labs - https://codalab.lisn.upsaclay.fr/
- Open ML - https://www.openml.org/
To add a dataset email me.