Focusing on the School of Informatics, University of Edinburgh, a collaboration network was created using informtion from the University's collection of research publications Edinburgh Research Explorer. More details in infnet-scrapper.
Using the publications scrapped from the research explorer, topic models were inferred, and a topic-similarity networks[1] were generated. A collaboration network was also created, visualised and analysed.
-
Data
- bin
- scrapy : scripts for scraping using scrapy
- pdfminer: contains binary from pdfminer.six
- scripts used to process PDFs using pdfminer
- data_dblp : dblp dataset, but metadata of publications are not stored due to the size of the dataset. We only store tokenised pickled files and dictionary in it.
- data_schoolofinf : Informatics dataset retrieved in Jan 2018
- notebooks : corresponds to steps taken to process and generate lookup tables for the remaining steps.
- bin
-
infnet-analysis
- notebooks : contain the jupyter notebook used to generate each informatics network.
- community detection and homophily test is carried out in analysis.ipynb
- notebooks : contain the jupyter notebook used to generate each informatics network.
-
embedding
- notebooks : creation of topic-similarity networks
-
topicModel
- notebooks : generate topic models using Gensim's implementation of LDA; also explore the performance of each model
- src : contain scripts to generate each topic model
The project is still in development. To use the datasets and run the notebooks on your system, follow the following instruction:
- The project is developed in python3.6. Using anaconda to setup the virtual environment will be the easiest. You can get a copy of miniconda by issuing the following command:
$ curl -O https://repo.continuum.io/miniconda/Miniconda3-latest-MacOSX-x86_64.sh # For MacOSX
$ curl -O https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86.sh # For linux/ubuntu
$ bash Miniconda3-latest-MacOSX-x86_64.sh # Install miniconda onto your system
$ echo "export PATH=\""\$PATH":$HOME/miniconda3/bin\"" >> ~/.benv
$ source ~/.benv
*** NOTE the use of python3 instead ***
# Create conda environment (name infnet3) for project:
# Also install essential packages across all modules:
$ conda create -n infnet3 python=3 pandas matplotlib jupyter ipython ipykernel
$ source activate infnet3 # Activates the environment
(infnet3) $ <--- this shows the successfull acitvation of the environment.
Now, we have to install required python packages. This list is updated as the project progresses:
- For data pre-processing, additional packages are installed:
(infnet3) $ conda install scrapy # for scrapping the research explorer
(infnet3) $ conda install nltk # this is used for creating tokens for topic modelling
1a. To configure NLTK, executing the following in a new terminal with infnet3 activated :
(infnet3) $ python to launch a python3 shell
> import nltk
> nltk.download('stopwords') # select `yes` when prompt.
> nltk.download('WordNet')
- For infnet-analysis:
(infnet3) $ conda install networkx numpy
(infnet3) $ pip install python-louvain # community detection package
- For topic modelling:
For topic modelling using latent diriclet allocation
$ conda install gensim # to generate LDA
$ pip install pyldavis # for visualisation of the LDA
For data exploration, visualisation of data and clustering:
$ conda install scikit-learn # for k-means, manifold, dbscan...
$ conda install -c conda-forge hdbscan