Degrees of separation applied to IMDb film dataset.
This repository implements the idea of Six degrees of separation. The idea is implemented with IMDb's datasets
- The dataset is processed through a MapReduce algorithm to have a meaningful data.
- IMDb_adjustment/trim_dataset.sh was ran to get the data needed for MapReduce steps.
- Next, we processed the dataset in two steps of MapReduce, /HADOOP/try.jar and /HADOOP/try2.jar.
- Having obtained and output from second MapReduce algorithm, to remove duplicated (if by chance any) IMDb_adjustment/rm_duplicates.py was ran. This step gives us a graph that has every actors' friends that has took a role in the same movie.
- Lastly, resulting graph is inserted into an ANF Algorithm to find the average degrees of separation among all actors.
- Hadoop for MapReduce
- Python 3.x for duplicate removal
- Gephi - graph visuals
Reach out to us!
- @mertaytore
- @gokhansim
- @ebsenol
- @orhca