Closeness centrality running without interruption
Closed this issue · 9 comments
Hi,
I am using Teneto to calculate the closeness centrality. My code is running more than two days without stopping. what is the problem ?
Resources
- Processor(s): 32
- Threads par core: 2
- Core per socket : 8
- Model : Intel(R) Xeon(R) CPU E5-2650 v2 @ 2.60GHz
- Memory : 130 giga
Code:
import pandas as pd
import teneto as tn
import numpy as np
from teneto import TemporalNetwork
filename = 'data/Sociopatterns/primaryschool.csv'
df_PS = pd.read_csv(filename, names = ['t', 'i', 'j', 'Ci', 'Cj'], header=None, delimiter='\t')
df_PS_Day = df_PS[ df_PS['t'] < 63000]
df_PS_final = df_PS_Day[[ 't', 'i', 'j']]
df_PS_final
t | i | j | |
---|---|---|---|
0 | 31220 | 1558 | 1567 |
1 | 31220 | 1560 | 1570 |
2 | 31220 | 1567 | 1574 |
3 | 31220 | 1632 | 1818 |
4 | 31220 | 1632 | 1866 |
... | ... | ... | ... |
60618 | 62300 | 1719 | 1859 |
60619 | 62300 | 1720 | 1843 |
60620 | 62300 | 1739 | 1872 |
60621 | 62300 | 1767 | 1787 |
60622 | 62300 | 1800 | 1820 |
60623 rows × 3 columns
# The time resolution between two signal is 20 seconds.
# This function normalize it.
def contact_graph(data, cfg=None, resolution=0):
nodes = pd.concat([data['i'], data['j']])
nodes = nodes.sort_values()
nodes = nodes.unique()
nb_nodes = len(nodes)
## changes the label of each node to be the index of np.array
keys = nodes
values = list(range(nb_nodes))
nodeslabels = dict(zip(keys, values))
if resolution == 0 :
times = data['t']
times = times.sort_values()
times = times.unique()
n_times = len(times)
else:
times = range(min(data['t']), max(data['t'])+resolution, resolution)
n_times = len(times)
## changes the time counter
keys_times = times
items_times = list(range(n_times))
timesCounter = dict(zip(keys_times,items_times ))
df = data.copy()
frame =pd.DataFrame()
frame['i'] = df['i'].map(nodeslabels)
frame['j'] = df['j'].map(nodeslabels)
frame['t'] = df['t'].map(timesCounter)
contacts = [tuple([frame.iloc[i,0], frame.iloc[i,1], frame.iloc[i,2]]) for i in range(len(frame))]
contacts = np.array(contacts)
contacts = contacts[contacts[:, 2].argsort()]
return contacts
contacts_PS = contact_graph(df_PS_final, resolution=20)
contacts_PS[:50]
array([[ 58, 63, 0],
[ 59, 64, 0],
[ 63, 66, 0],
[ 85, 185, 0],
[ 85, 209, 0],
[101, 114, 0],
[186, 194, 0],
[186, 209, 0],
[186, 194, 1],
[183, 189, 1],
[141, 187, 1],
[101, 114, 1],
[ 85, 185, 1],
[ 63, 66, 1],
[ 58, 63, 1],
[ 85, 209, 1],
[101, 114, 2],
[179, 191, 2],
[179, 181, 2],
[159, 168, 2],
[141, 187, 2],
[ 85, 209, 2],
[ 58, 62, 2],
[ 62, 66, 2],
[ 62, 63, 2],
[ 59, 64, 2],
[ 58, 63, 2],
[ 63, 66, 2],
[ 85, 186, 3],
[179, 181, 3],
[150, 152, 3],
[101, 114, 3],
[ 85, 185, 3],
[ 85, 209, 3],
[ 62, 66, 3],
[ 62, 63, 3],
[ 58, 63, 3],
[ 58, 62, 3],
[ 36, 51, 3],
[ 63, 66, 3],
[183, 189, 4],
[150, 153, 4],
[101, 114, 4],
[ 85, 209, 4],
[ 63, 66, 4],
[ 58, 63, 4],
[ 17, 39, 4],
[ 80, 87, 4],
[162, 164, 5],
[158, 169, 5]])
tnetHS = TemporalNetwork(from_edgelist=list(contacts_PS))
tnetHS.network.head()
i | j | t | |
---|---|---|---|
0 | 58 | 63 | 0 |
1 | 59 | 64 | 0 |
2 | 63 | 66 | 0 |
3 | 85 | 185 | 0 |
4 | 85 | 209 | 0 |
closeness = tn.networkmeasures.shortest_temporal_path(tnetHS)
Hi,
Unfortunately the current code in shortest_temporal_paths is currently quite slow in its current implementation, especially for larger networks (exponentially grows with size). I've only really used in on networks about 200 nodes in size and 400 time points. It is something on the todo list to increase the speed.
So the code should finish eventually, but I can't say how long it will take.
Thank you for your response,
Can we hope soon that you will improve the performance of the shortest path code for large temporal networks?
I think other measures depend on it, like betweeness.
Can we hope soon that you will improve the performance of the shortest path code for large temporal networks?
I'll boost it up the priority queue cause it is something I've wanted to fix for sometime.
Have you solved your problem?At present, I also met the problem of using the same as you in the calculation
In your article, I saw that you used more than 200 nodes and more than 400 time points. How long did it take you to calculate the closness centrality? I think this may have time-reference significance for my current calculations. Because I calculate 60 nodes, 51 networks, and 10% sparsity. My code is running more than four days without stopping.
Development on the toolbox stalled this year because of other reasons.
The original article used an older version of this function which was quick, but could not handle larger networks. The speed issue was introduced to the shortest temporal paths function when modifying the rest of the toolbox to being able to handle larger networks.
You can use an older version of the code which should be quicker:
I think the quicker version can be found here: https://github.com/wiheto/teneto/releases/tag/0.3.5
Until then, despite being a high priority when developing this toolbox, this is currently only something which is done in my spare time due to other obligations. I cannot give a time estimate when I can push to complete the the next version where the shortest path estimation will be substantially quicker.