keyword_graph

In order to correlate keywords in a document what I do is to split the document into smaller pieces: what I call paragraphs -paragraphs are obtained using the function .splitlines(). Then we identify which keywords are contained in every paragraph and we define an arbitrary distance between paragraphs which is an integer that i call 'k'. If two keywords are found in two paragraphs which fall appart in a distance smaller than 'k', then I consider they have a link. Obviously there may be many links if the frequency of the keywords is high enough. Thus, the ammount of links by itself is not really representative of the correlation between keywords: we normalize this number (the number of links) with respect to some value that assures the correlation is meaningfull.