Building the graph database based on the neo4j
using MovieLen10M
dataset.
Based on the test, neo4j-admin import
is the most efficient way to insert your data into the database. We also try to insert the data using the create
clause. The comparsion of speed is shown below.
Method | Speed |
---|---|
Neo4j-admin import | 10M/10s |
calcu_time_2 | 10K/70S |
calculate_time.py | 10k/1min |
neo4j-admin import
requires to use a new database. You need to assign a value to --database
(the default --graph.db
) may not work.
After creating a new database, you need to switch the database in the ./conf/neo4j.conf
file which is dbms.active_database=some_database.db
.
To make the import
command work, the data should look like this:
id:ID(movie-id) | name |
---|---|
1 | Toy Story |
2 | Jumanji |
3 | Grumpier Old Men |
4 | Waiting to Exhale |
5 | Father of the Bride Part II |
6 | Heat |
7 | Sabrina |
id:ID(user-id) |
---|
1 |
2 |
3 |
4 |
5 |
6 |
7 |
Because the movie_id and user_id both are sequenal identifier and many of them have some value, so id space like (user-id)
and (movie-id)
are needed.
The csv file should look like this:
:START_ID(user-id) | ratings | :END_ID(movie-id) | :TYPE |
---|---|---|---|
1 | 5 | 122 | RATINGS |
1 | 5 | 185 | RATINGS |
1 | 5 | 231 | RATINGS |
1 | 5 | 292 | RATINGS |
1 | 5 | 316 | RATINGS |
1 | 5 | 329 | RATINGS |
1 | 5 | 355 | RATINGS |
1 | 5 | 356 | RATINGS |
1 | 5 | 362 | RATINGS |
The relationship between the user and movie are called RATINGS
and scores which is the property of RATINGS
is given by ratings
field.