LeagueOfLegendsDataModeling

This is a DSC80 Final Project at UCSD that intends to utilize data science skills to analyze and create model predictions based on professional League of Legends match data

Introduction

The data that I am using is League of Legends (LOL) match information, from the years 2014 to 2024. There are 922080 rows in the data, each representing a player or team for some LOL game. There is always 12 rows of data per match, representing the 2 teams, and 5 players on each. From this, we also know that there is 76840 different matches in our entire dataset. There are 131 columns of data, most of them representing information about a team or player during the match. There is informational data about the match, such as the date, and names of the team, tournament, or player, including the many rows of game statistics during the match.

Obviously because this has to do with a competition, the obvious questions are all related to performance, who is the best performing player, who is the best performing teams and in what years? What are the most important statistics to look at to figure out how to best define the performance of these categories? How can we use data from historical League of Legends matches to predict their outcomes?

LOL Overview

There is a lot of interesting data in this dataset. It is really important that we first understand the data that we have. First it is important to understand that there is really three different groupings that the data is pertaining to, the match, the teams, and each individuals. There is categorical data as well as quantitative data for what happened in each match.

Here is an overview of important game values:

gold: Gold for purchasing items and powerups
xp: Experience Points unlock new/improved abilities
cs: Creep score is number of minions/monsters killed
vs: Vision Score represents number of wards (monsters on your side) killed/spawned

Here is an overview of the different games statistics in this dataset:

Time based statistics: We have access to kills, deaths, assists, gold, xp, and cs at the 10 and 15 minute mark for the data. This can be used to show which players or teams have made the most progress at this time of the game.
Other stats:
- kills, deaths, assists
- more detailed gold, xp, cs, vs stats
- damage recieved/delt stats
Monsters: Neutral monsters (drakes, dragons, inhibitors, etc. ) in the game that can be killed for gold/xp/powerups
Match/player information: The league, player/team name, type of game, position, champion

Data Preparation

Data Cleaning

There is a lot of missing data. For many columns such as the 'pick'/'ban' columns we can fill no choice with 'none' since it is a string selection of a character, and no selection will be 'none' anyways. Other 'first xyz' and statistics denoting the number of kills for a certain monster, since if they were not available or not collected during the match information that means there is 0 kills. We can also start to form some aggregations to reduce the number of columns while preserving the quality of the data. Values such as the multiple kill categories can be aggregated into a single 'multikill' column, and we can create a total neutral monsters killed columns as well.

Exploratory Analysis

Here is two boxplots showing the distributions of the xpat10 and xpat15 columns this shows us how the distribution of this value seems to move up as the game time increases.