Azure-Synapse-Content-Recommendations-Solution-Accelerator: A Jupyter Notebook repository from iamalexmang

page_type

languages

products

sample

sql

python

azure-synapse-analytics

power-bi

About this repository

This accelerator provides a simplified solution for creating personalized content recommendations based on user activity. Companies across industries are publishing content and gathering user activity data. As content stakeholders, we need to maintain and increase our customer engagements. Personalized recommendations can help our audience alleviate information overload, discover new content, and improve their overall experience. Targeting customers with personalized content and advertising can help increase monetization opportunities within our product experiences.

Prerequisites

To use this solution accelerator, you will need access to an Azure subscription. While not required, a prior understanding of data science, Azure, and Synapse will be helpful.

Getting Started

Clone or download this repository and navigate to the project's root directory
Go to the Deployment guide for how to deploy this solution

Key concepts

This solution accelerator focuses on the data insights that can be quickly achieved using a simple AI model and dataset. By analyzing content metadata and user activity, we can personalize recommendations based on each user's history and the similarity of other users' consumption patterns.

The AI model accepts a single user and multiple content items as input to produce a click probability score for each content item. A visualization tool is presented in Power BI to explore the personalized content recommendations for various users.

Reference Architecture

This solution accelerator focuses the data science work needed to create personalized recommendations. We're leveraging a sample dataset that is pre-prepared and we're visualizing the insights using an admin view to explore various users. Please see the next steps section below for integrating your data sources, preparing the data, and deploying an API.

Sample Report

We will explore insights and validate results of the personalized recommendations by leveraging a Power BI report:

Next Level

Quickly get content recommendations for your users by leveraging Azure Synapse to integrate with pipelines for your data sources and prepare the data in a similar format to the sample dataset. Update the file path variables in the Spark notebooks and rerun to see updated results for your audience.

Sample Dataset

The MIND: MIcrosoft News Dataset contains about 160k English news articles and more than 15 million impression logs generated by 1 million users. Every news article contains rich textual content including title, abstract, body, category and entities. Each impression log contains the click events, non-clicked events and historical news click behaviors of this user before this impression. To protect user privacy, each user was de-linked from the production system when securely hashed into an anonymized ID.

File Name	Description
behaviors.tsv	Click history and impression logs of users
news.tsv	Metadata about content items

behaviors.tsv

Column	Description	Data Sample
Impression ID	ID of an impression	91
User ID	Anonymous ID of user	U397059
Time	Impression time	11/15/2019 10:22:32 AM
History	ID list of clicked items	N106403 N71977 N97080 N102132 N97212 N121652
Impressions	List of items displayed in impression and user's click behaviors (1 for click and 0 for non-click)	N129416-0 N26703-1 N120089-1 N53018-0 N89764-0 N91737-0 N29160-0

news.tsv

Column	Data Sample
News ID	N37378
Category	sports
SubCategory	golf
Title	PGA Tour winners
Abstract	A gallery of recent winners on the PGA Tour.
URL	https://www.msn.com/en-us/sports/golf/pga-tour-winners/ss-AAjnQjj?ocid=chopendata
Title Entities	[{"Label": "PGA Tour", "Type": "O", "WikidataId": "Q910409", "Confidence": 1.0, "OccurrenceOffsets": [0], "SurfaceForms": ["PGA Tour"]}]
Abstract Entites	[{"Label": "PGA Tour", "Type": "O", "WikidataId": "Q910409", "Confidence": 1.0, "OccurrenceOffsets": [35], "SurfaceForms": ["PGA Tour"]}]

For more detailed information, see the MIND dataset description and the MIND paper (Wu et al., ACL 2020)

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.

When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.

Trademarks

This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft's Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party's policies.

Data Collection

The software may collect information about you and your use of the software and send it to Microsoft. Microsoft may use this information to provide services and improve our products and services. You may turn off the telemetry as described in the repository. There are also some features in the software that may enable you and Microsoft to collect data from users of your applications. If you use these features, you must comply with applicable law, including providing appropriate notices to users of your applications together with a copy of Microsoft's privacy statement. Our privacy statement is located at https://go.microsoft.com/fwlink/?LinkID=824704. You can learn more about data collection and use in the help documentation and our privacy statement. Your use of the software operates as your consent to these practices.

iamalexmang/Azure-Synapse-Content-Recommendations-Solution-Accelerator