/language_KnowledgeMap

The name "KnowledgeMap" tries to use the metaphor of a cartographic map. If we represent all the different areas of knowledge as a bidimensional map, there will be shadowy unknown areas (fog of war) representing "ignorance" and some bright zones representing "knowledge".

Primary LanguageJavaGNU General Public License v3.0GPL-3.0

KnowledgeMap

Motivation

KnowledgeMap is a project implemented to try to answer the following questions:

  1. What do I know? Get objective awareness of the subjects in which I have deeper knowledge. This would enable to identify own's area of expertise. E.g. Do I know more about History or Science? Do I know more about Biology or Physics?

  2. What do I don't know? Get awareness of the subjects in which I have very little knowledge. This would enable to discover further subjects to explore. E.g. "Baseball in US" is a large subject with tons of data and trivia of which I'm completely unaware (and will keep it so).

  3. Do I know more about a subject than any other particular person? Objectively compare the knowledge of two different persons using a randomly generated quiz.

  4. Which books or sources can expand my knowledge? Whenever I read a book, I need it to be not too trivial (if I already know most of its content) and not too technical (if I lack the basis to understand large portions). By generating the KnowledgeMap of a certain book and overlapping with own's KnowledgeMap, it would be possible to determine whether the book fits to my own knowledge boundaries and may help to expand it, without losing interest midway.

The name "KnowledgeMap" tries to use the metaphor of a cartographic map. If we represent all the different areas of knowledge as a bidimensional map, there will be shadowy unknown areas (fog of war) representing "ignorance" and some bright zones representing "knowledge".

The first problem arises: what is "all knowledge"? For this purpose, we may use a simplified approach by saying: Wikipedia.

The second problem follows: knowledge is NOT bidimensional, but multidimensional! There are many ways to classify knowledge and the same content could be classified within several disjoint categories at the same time. Therefore we do another simplification here:

  • We take "Articles" as the top category and everything follows a hierarchy downwards from there.
  • We only take the shortest path within Wikipedia Categories from an certain article to that top category "Articles".

These are briefly the main ideas:

  1. We take a Wikipedia dump and upload to a graph database.
  2. We generate quiz questions from Wikipedia articles.
  3. The user answers those questions, with either a positive or negative result.
  4. Parent categories (following the shortest path to Wikipedia category "Articles") inherit those results.
  5. The system generates a hierarchical heat-map visualization, with white areas representing known categories and black areas representing unknown categories.

Questions are generated by removing one of the Wikipedia links in the article, showing some sentences around that link to provide context and asking the user to fill in the gap. The quiz interface looks like this: Quiz interface

A KnowledgeMap looks like this: KnowledgeMap

An interactive demo visualization is available here: interactive demo at cotrino.com

There is also another visualization of the individual pages about which questions were asked: Page visualization

The link between the user and the known (or unknown categories) is calculated with Neo4j using such a Cypher query

MATCH (u:User)-[k:Knows]->(n:Page) WHERE id(u)=193773 WITH n,k,u MATCH path=shortestPath((a:Category)<-[r:In_Category*]-(n)) WHERE a.title='Articles' RETURN path,u LIMIT 1

User to Page to Articles

To sum up: with this approach, by now we may be able to answer previous question 1 ("what do I know?"), but not the rest yet.

As the old saying goes, now at least I know that I know nothing.

Building

This is a Java project built with Maven.

Fetch libraries and compile JAR executable with mvn package.

This will generate a package including all dependencies under target/KnowledgeMap.jar.

Importing Data

A patched version of Mirko Nasato's Graphipedia is used to create a Neo4j database from a Wikipedia database dump.

See Wikipedia:Database_download for instructions on getting a Wikipedia database dump. Current implementation has been successfully tested with the Simple English Wikipedia.

  1. Extract simplewiki-latest-pages-articles.xml to the folder ./data/.

  2. Download and extract Neo4j to ./database/. Code has been tested with Neo4j 2.3.

  3. Download and install GraphAware NodeRank as plug-in in this Neo4j copy.

  4. Run KnowledgeImporter to create a Neo4j database with nodes and relationships into ./database/data/graphipedia.db directory.

java -Xmx3G -classpath ./target/KnowledgeMap.jar com.cotrino.knowledgemap.KnowledgeImporter

  1. Once this is finished, you should be able to start Neo4j server with ./database/bin/neo4j start and access Neo4j web-based interface under http://localhost:7474/

Quiz and Visualization

  1. Start the quiz web service.

java -classpath ./target/KnowledgeMap.jar com.cotrino.knowledgemap.KnowledgeWeb

  1. Start answering questions under http://localhost:8080/

Once you have answered a couple of questions, you can click on the respective buttons to access the visualizations.

References

This project uses parts or is based on: