Informatics Collaboration Network & Topic Network

Focusing on the School of Informatics, University of Edinburgh, a collaboration network was created using informtion from the University's collection of research publications Edinburgh Research Explorer. More details in infnet-scrapper.

Using the publications scrapped from the research explorer, topic models were inferred, and a topic-similarity networks[1] were generated. A collaboration network was also created, visualised and analysed.

Setting up

The project is still in development. To use the datasets and run the notebooks on your system, follow the following instruction:

The project is developed in python3.6. Using anaconda to setup the virtual environment will be the easiest. You can get a copy of miniconda by issuing the following command:

$ curl -O https://repo.continuum.io/miniconda/Miniconda3-latest-MacOSX-x86_64.sh # For MacOSX
$ curl -O https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86.sh # For linux/ubuntu
$ bash Miniconda3-latest-MacOSX-x86_64.sh # Install miniconda onto your system
$ echo "export PATH=\""\$PATH":$HOME/miniconda3/bin\"" >> ~/.benv
$ source ~/.benv

*** NOTE the use of python3 instead ***

# Create conda environment (name infnet3) for project:
# Also install essential packages across all modules:
$ conda create -n infnet3 python=3 pandas matplotlib jupyter ipython ipykernel
$ source activate infnet3 # Activates the environment
(infnet3) $ <--- this shows the successfull acitvation of the environment.

Now, we have to install required python packages. This list is updated as the project progresses:

For data pre-processing, additional packages are installed:

(infnet3) $ conda install scrapy # for scrapping the research explorer
(infnet3) $ conda install nltk # this is used for creating tokens for topic modelling

1a. To configure NLTK, executing the following in a new terminal with infnet3 activated :

(infnet3) $ python to launch a python3 shell
> import nltk
> nltk.download('stopwords') # select `yes` when prompt.
> nltk.download('WordNet')

For infnet-analysis:

(infnet3) $ conda install networkx numpy
(infnet3) $ pip install python-louvain # community detection package

For topic modelling:

For topic modelling using latent diriclet allocation

$ conda install gensim # to generate LDA
$ pip install pyldavis # for visualisation of the LDA

For data exploration, visualisation of data and clustering:

$ conda install scikit-learn # for k-means, manifold, dbscan...
$ conda install -c conda-forge hdbscan

goweiting/project-infnet

Informatics Collaboration Network & Topic Network

Directory

Setting up