/Azure-Synapse-Content-Recommendations-Solution-Accelerator

This is a solution accelerator for creating personalized content recommendations based on user activity.

Primary LanguageJupyter NotebookMIT LicenseMIT

page_type languages products
sample
sql
python
azure-synapse-analytics
power-bi

Azure Synapse Content Recommendations Solution Accelerator

About this repository

This accelerator provides a simplified solution for creating personalized content recommendations based on user activity. Companies across industries are publishing content and gathering user activity data. As content stakeholders, we need to maintain and increase our customer engagements. Personalized recommendations can help our audience alleviate information overload, discover new content, and improve their overall experience. Targeting customers with personalized content and advertising can help increase monetization opportunities within our product experiences.

Prerequisites

To use this solution accelerator, you will need access to an Azure subscription. While not required, a prior understanding of data science, Azure, and Synapse will be helpful.

Getting Started

Deploy to Azure

  1. Clone or download this repository and navigate to the project's root directory
  2. Go to the Deployment guide for how to deploy this solution

Key concepts

This solution accelerator focuses on the data insights that can be quickly achieved using a simple AI model and dataset. By analyzing content metadata and user activity, we can personalize recommendations based on each user's history and the similarity of other users' consumption patterns.

The AI model accepts a single user and multiple content items as input to produce a click probability score for each content item. A visualization tool is presented in Power BI to explore the personalized content recommendations for various users.

Reference Architecture

This solution accelerator focuses the data science work needed to create personalized recommendations. We're leveraging a sample dataset that is pre-prepared and we're visualizing the insights using an admin view to explore various users. Please see the next steps section below for integrating your data sources, preparing the data, and deploying an API.

Reference Architecture

Sample Report

We will explore insights and validate results of the personalized recommendations by leveraging a Power BI report: Power BI report

Next Level

Quickly get content recommendations for your users by leveraging Azure Synapse to integrate with pipelines for your data sources and prepare the data in a similar format to the sample dataset. Update the file path variables in the Spark notebooks and rerun to see updated results for your audience.

Sample Dataset

The MIND: MIcrosoft News Dataset contains about 160k English news articles and more than 15 million impression logs generated by 1 million users. Every news article contains rich textual content including title, abstract, body, category and entities. Each impression log contains the click events, non-clicked events and historical news click behaviors of this user before this impression. To protect user privacy, each user was de-linked from the production system when securely hashed into an anonymized ID.

File Name Description
behaviors.tsv Click history and impression logs of users
news.tsv Metadata about content items

behaviors.tsv

Column Description Data Sample
Impression ID ID of an impression 91
User ID Anonymous ID of user U397059
Time Impression time 11/15/2019 10:22:32 AM
History ID list of clicked items N106403 N71977 N97080 N102132 N97212 N121652
Impressions List of items displayed in impression and user's click behaviors (1 for click and 0 for non-click) N129416-0 N26703-1 N120089-1 N53018-0 N89764-0 N91737-0 N29160-0

news.tsv

Column Data Sample
News ID N37378
Category sports
SubCategory golf
Title PGA Tour winners
Abstract A gallery of recent winners on the PGA Tour.
URL https://www.msn.com/en-us/sports/golf/pga-tour-winners/ss-AAjnQjj?ocid=chopendata
Title Entities [{"Label": "PGA Tour", "Type": "O", "WikidataId": "Q910409", "Confidence": 1.0, "OccurrenceOffsets": [0], "SurfaceForms": ["PGA Tour"]}]
Abstract Entites [{"Label": "PGA Tour", "Type": "O", "WikidataId": "Q910409", "Confidence": 1.0, "OccurrenceOffsets": [35], "SurfaceForms": ["PGA Tour"]}]

For more detailed information, see the MIND dataset description and the MIND paper (Wu et al., ACL 2020)

Further Reading

Microsoft Learn

More Documentation

For additional training and support, please see:

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.

When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.

Trademarks

This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft's Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party's policies.

Data Collection

The software may collect information about you and your use of the software and send it to Microsoft. Microsoft may use this information to provide services and improve our products and services. You may turn off the telemetry as described in the repository. There are also some features in the software that may enable you and Microsoft to collect data from users of your applications. If you use these features, you must comply with applicable law, including providing appropriate notices to users of your applications together with a copy of Microsoft's privacy statement. Our privacy statement is located at https://go.microsoft.com/fwlink/?LinkID=824704. You can learn more about data collection and use in the help documentation and our privacy statement. Your use of the software operates as your consent to these practices.