/MOOC-Radar

The data and source code for the paper "MoocRadar: A Fine-grained and Multi-aspect Knowledge Repository for Improving Cognitive Student Modeling in MOOCs"

Primary LanguagePython

MoocRadar

MoocRadar is maintained by the Knowledge Engineering Group of Tsinghua University with the assistance of Insititute of Education, Tsinghua Univerisity. This repository consists of 2,513 exercises, 14,226 students and over 12 million behavioral data and 5,600 fine-grained concepts, for supporting the developments of cognitive student modeling in MOOCs. The raw data is from XuetangX (https://www.xuetangx.com/).

We summarize the features of MoocRadar as:

  • Abundant Learning Context: MoocRadar provides the relevant learning resources, structures, and contents about the students' exercise behaviors, which can enrich the selection candidates for the modeling methods.
  • Fine-grained Knowledge Concepts: All the fine-grained concepts have been manually annotated and checked by the experts, which guarantees the quality of such specifical knowledge.
  • Cognitive Level Labels: We invoke the Bloom Cognitive Taxonomy to construct "Cognitive Level" tags for the exercises, which can be further explored in subsequent research.

We are still going on the extension and annotation of this repository.

Based on MoocRadar, developers can attempt to build a more informative profile for each student, as introduced in our paper.

task

News !!

  • Exercise amount is extended to 9,384 !!

  • Our paper is submitted to SIGIR resource track !!

  • Update the annotation guidance of fine-grained concepts and cognitive labels.

Data Access

There are multi-level data to be used, including:

Dataset Description Download Link
MoocRadar_Raw The raw data from MOOCCubeX (after data filtering). Raw link
MoocRadar_Coarse Exercises and behaviors with coarse-grained concepts. Coarse link
MoocRadar_Middle Exercises and behaviors with middle-grained concepts. Middle link
MoocRadar_Fine Exercises and behaviors with fine-grained concepts. Fine link
External_Data Other additional data of MoocRadar. External link

Reproduction Model

Rsearchers can set up the presented models with EduKTM and EduCDM.

We provide several basic model's demo, including:

We also provide the performance of the improvement of DKVMN and NCDM with side information (i.e. cognitive and video).

Data for baselines reproduction:

  1. --mode (Option: Coarse/Middle/Fine) for your settings

  2. --data_dir with Corresponding granularity data from above table.

    for example, for --mode Middle setting, prepare the following files:

    • ./data/student-problem-middle.json
    • ./data/problem.json
  3. then generate train/test dataset by setting: --data_process in scripts

Data for improvement reproduction with cognitive and video side information:

Option 1: generate by setting: --data_process in scripts

Option 2: download from there

Toolkit & Guidance

There are also several tools and guidance for extending and employing the data.

For extending the data from MOOCCubeX knowledge base.

For further data annotation:

For more information:

Feature

The distribution of students' exercise behaviors, accurate rates and concept-linked exercises.

Reference

 @article{MOOCRadar,
  title={MoocRadar: A Fine-grained and Multi-aspect Knowledge Repository for Improving Cognitive Student Modeling in MOOCs},
  author={Jifan Yu, Mengying Lu, Qingyang Zhong, Zijun Yao, Shangqing Tu, Zhengshan Liao, Xiaoya Li, Manli Li, Lei Hou, Haitao Zheng, Juanzi Li, Jie Tang},
  year={ 2023 }
 }