This repository is outdated and deprecated. Please check the latest version here: https://github.com/AngusGLChen/DelftX-Daily-Database
-
This manual is tested on Mac OSX.
- It should also work well on similar LINUX systems.
- Please report us if you meet system-related problems.
-
This manual is tested on course data of EX101x-3T2015 and FP101x-3T2015.
-
Python 2.7 should be well installed in the machine.
- It will be updated for both python 2 and 3.
-
Downloading MySQL Community Server
-
Installing and recording the initial root password
-
(MAC OSX) Checking if MySQL installed in
/usr/local/mysql/bin
-
(MAC OSX) Setting PATH
- Editting bash profile
vim ~/.bash_profile
- Setting the path in the bash profile
PATH=$PATH:/usr/local/mysql/bin
- Adding the following code in the bash profile:
alias mysql=/usr/local/mysql/bin/mysql alias mysqladmin=/usr/local/mysql/bin/mysqladmin
- Saving the setting
esc
+:wq
- Sourcing bash profile
source ~/.bash_profile
- Editting bash profile
-
Running MySQL Server
- (MAC OSX) users can start it on System Preferences
-
Setting the password of root
- Using
mysql -uroot -p
with password to login MySQL - Inputting
SET PASSWORD FOR 'root'@'localhost' = PASSWORD('newpass');
- Using
-
Installing MySQL Workbench for connecting MySQL Server
-
Installing Connector/Python
- Please check the documents MySQL Connector/Python Developer Guide
- Checking if the package installed by using
import mysql.connector
- Checking if the package version correct by using
print mysql.connector.__version__
. It should be >= 2.1.3, since we useutf8mb4
in encoding, which may not be supported by some earlier versions.
- Open the file
DelftX.sql
by MySQL Workbench and Run the SQL script.
-
Making a root folder for storing all your course data. For example, the folder can be named as
course_log
. -
Preprocessing daily log files:
- Making a folder named
Daily_Logs
under the root foldercourse_log
- Put all the gzip files of course daily logs (e.g.
delftx-edx-events-201X-MM-DD.log.gz
) into the folderDaily_Logs
- Making a folder named
-
For each course, build a folder under the root folder (e.g.
course_log
) of course data. For example, the course folder name can beFP101x-3T2015
- Uncompressing all the metadata of the courses (e.g. the file
FP101x-3T2015.zip
) - making a folder
metadata
under the course folder (e.g.FP101x-3T2015
) - put all the uncompressed course metadata into the folder
metadata
- Uncompressing all the metadata of the courses (e.g. the file
-
After above steps, the structure of course data should be as following.
-- course_log -- translated_course_list -- Daily_Logs -- delftx-edx-events-201X-MM-DD.log.gz -- ... -- FP101x-3T2015 -- metadata -- DelftX-FP101x-3T2015-auth_user-prod-analytics.sql -- ... -- EX101x-3T2015 -- metadata -- DelftX-EX101x-3T2015-auth_user-prod-analytics.sql -- ...
translated_course_list
is the file which contains all names of transfered courses. It should be empty if a new database is building. -
Editing config file
config
[mysqld] user = root password = 123456 host = 127.0.0.1 database = DelftX [data] path = /Volumes/XXXX/XXXX/ remove_filtered_logs = 0
[mysqld]
is the section of database related configuresuser
is the user id of the databasepassword
is the password of the databasehost
is the host ip of the databasedatabase
is the name of the database
[data]
is the section of data related configurespath
is the path of the root folder of all course data (e.g$PATH$/course_log/
)remove_filtered_logs
is the configure of removing the filtered daily logs of each course.remove_filtered_logs=0
means that in course transfer, fildered daily logs of each course will be kept unless you delete them manully.remove_filtered_logs=1
means that fildered daily logs of each course will be removed automatically after the course transfered into databse.- If you want to keep filtered daily logs for further work, please set it as
0
. - If you want to save space on your machine and do not need filtered daily logs, please set it as
1
.
-
Running the code by
python main.py config
-
After the above code, the structure of course data should be as following.
-- course_log -- translated_course_list -- Daily_Logs -- delftx-edx-events-201X-MM-DD.log.gz -- ... -- FP101x-3T2015 -- filter_folder -- Delftx-edx-events-201X-MM-DD.log -- ... -- metadata -- DelftX-FP101x-3T2015-auth_user-prod-analytics.sql -- ... -- EX101x-3T2015 -- filter_folder -- Delftx-edx-events-201X-MM-DD.log -- ... -- metadata -- DelftX-EX101x-3T2015-auth_user-prod-analytics.sql -- ...
filter_folder
contains filtered daily logs for the specific course. It can be removed after course data loaded into the database (by seeting in the config file).
The MOOCdb Project is an open source framework, which sets a shared data model standard for organzing data generated from MOOCs.
The initial schema of moocdb consists of four modules, which are Observing, Submitting, Collaborating and Feedback.
Our current schema is mainly based on the moocdb project. It consists of four modules, which are named as Observations, Submissions, Collaborations and UserModes.
As shown in Figure 1, each module in our schema has several tables of information. The differences between our current schema and the initial moocdb schema are discussed in the following sections.
In original Moocdb schema, Observing mode has five tables, which are observed_event, resources, resources_urls, resources_types and urls.
In our current schema, we merge them into one tables, named video_interaction. This table represent the video observing events of students and relevant interactives of events.
In original Moocdb schema, Submitting mode has four tables, which are problem_type, problems, submissions, and assessments.
In our current schema of quiz mode, we merge the two problem related table into one table named quiz_questions. After that, a table named quiz_sessions is added. quiz_sessions is leveraged to represent how users answer sessions of quiz.
In our current schema, two tables are leveraged to store learners; interactions in discussion forum.
In our schema, we have a mode named learners modes, which contains six tables named courses, course_elements, learner_index, course_learner, learner_demographic and sessions. This mode is used to store information about courses, learners, how well learners perform in each course, etc.
Table courses contains the metainfo of courses. Table global_user represent the relations between users and courses. Table course_user represent users' status and grade in courses. Table user_pii represent course users' demographic data.
The survey mode stores learners’ responses to survey questions.
For example, the only 2 point question in course Functional Programming (FP101x-3T2015) do not have a explicit weight, so we treated its weight as 1.0 at the beginning. It also happens in course_structure data of course Data Analysis(EX101x-3T2015).
Solution: We manually correct the weight of each problems in courses Fuctional Programming and Data Analysis based on the points students got and the current course settings on edx.
At the beginning, we filtered all the submissions which submitted later than the due. However, we found that those submissions are counted by manually checking the submission time and the corresponding grades students got.
- In the records of student submissions and assessment of each problem in daily logs, there are two fields named grade and max_grade. max_grade means the number of blanks need to be filled. grade means the number of correct blanks students submitted.
- In the metadata, each problem has their own weights.
- real_grade = weight * ( grade / max_grade )
In the course Functional Programming (FP101x_3T2015), 8/1143 passing students have less than 170 points in the final week. In the course Data Analysis (EX101x_3T2015), 11/1156 passing students have less than 105 points in the final week.
By manually checking daily logs of corresponding courses, all the grades of those students in daily logs are correctly loaded into the database.