/pkg-analysis

Primary LanguageJupyter Notebook

Package Analysis through Network Analysis Methods

Python 3.7

dockeri.co

What's this?

The purpose of this repository is to apply some network analysis methods, which is precisely based on graph theory, to sample data. This repository does not have

  • Academic or technical deep explanation
  • Meaningful background (I just played with data)
  • Insightful results

Setup

Assumptions

$ pyenv --version
pyenv 1.2.18

$ python --version 
Python 3.7.6

Installation

local

$ python -m venv .venv 
$ source .venv/bin/activate
$ pip install -r requirements.txt
$ jupyter lab 

docker

$ docker build -t <image-name> .
$ docker run -it -p 8888:8888 <image-name> 

# If you want
# $ docker run -d --rm -p 8080:8080 plantuml/plantuml-server:jetty

Then, open your browser by localhost:8888. Probably, JupyterLab requires you to input access token (it's already outputted on console.)

Data

The data was extracted from collection.abc.

UML Diagram

Only on a diagram, I wrote down some functions to understand what functions are declared as an abstract method, and what functions are added on some classes. Source is here.

Notes

  • Separated @abstractmethod and usual ones by horizontal line
  • Avoided to write output type (I'm not confident)
  • cls means @classmethod

Diagram

CSV

Data is following Gremlin style because I'm aiming to insert this data into AWS Neptune. From my perspective, I have to deal with some Graph DB in real business situations. For further improvements, I just chose to store data based on Gremlin style.

$ head -n 5 data/vetices.csv
~id,name:String
v0,"Container"
v1,"Hashable"
v2,"Iterable"
v3,"Iterator"

$ head -n 5 data/edges.csv
~id,~from,~to,~label
e0,v3,v2,extends
e1,v4,v2,extends
e2,v5,v3,extends
e3,v8,v6,extends

If your interest to AWS Neptune -> here

Analysis

I compute below three centralities to analyze this network in detail. Centrality can express "how important a node is in a network". By looking at the results, we can understand "what node (=class) is important?"

Notebook: here

degree

betweeness

pagerank

Ranking

By combining with these three results, I made a ranking table. As you know, Set, Mapping and Sequence is related to set, dict, list. Since these types are quite important to understand and use python in development. This result is seemed to be great and matched to our intuition or experience.

name:String Sum of rank value
Collection 3
Set 8
Sequence 12.5
MappingView 16.5
Mapping 23.5

Thanks

These are very helpful to do this activities. Arigatou!

Tools

Extensions

Libraries

References

UML

Python

Network Analysis & Graph Theory