jilljenn/ktm

some errors

Closed this issue · 18 comments

When running your codes and reading your details about the repository,I find some errors below:
1: The file doc/tuto.md render with some error,maybe syntax incompatible?
2: On Assistments 2009 dataset. I download the dataset from https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/8SWHNO,is it the same data you used?
I check the data with the data/dummy/data.csv,but their formats are different.
Thanks al lot!

Hi again, is jilljenn/qna working now?
For tuto.md, usually tuto.pdf will be enough. Do you really need the command to recompile the slides?
Assistments, maybe there is a link towards the dataset in the slides. If not I will add it tomorrow :)
Good luck. And we can talk over Hangouts/appear.in about sequence-based recommendations if you're interested!

Hi, thanks for your kind help the jilljenn/qna can work normally. I will read your codes to get more details later. I have downloaded the Assistments dataset from http://jiji.cat/weasel2018/data.csv.
Thanks so much for your reply!

What are the meanings of values in config.yml?

What values should I set in config.yml for Assistments 2009 dataset ?

I hope it's not for a submission for AAAI otherwise you're in a hurry 😄

For data/assistments09/config.yml:

nb_items: 196457
nb_skills: 124
nb_users: 4163

Thank you for your prompt reply. And If I own a new dataset how to reset nb_items, nb_skills and nb_users?

You can guess, right?

Just make sure:

  • your user IDs are within 0..N - 1
  • your item IDs are within 0..M - 1
  • your skill IDs (if any; the knowledge components of items) are within 0..S - 1

Then:

nb_users: <N>
nb_items: <M>
nb_skills: <S>

Each dataset should have their own descriptor config.yml.

Yeah, I need your confirm.Many thanks!

It seems that you only upload two kinds of model:Logistic Regression (LR) which is the baseline model and Factorization Machines (FM). From your codes "lr.py" and "fm.py" I can learn the details of them.
From your "dfm-kt-poster.pdf" and article “Deep Factorization Machines for Knowledge Tracing”(https://arxiv.org/abs/1805.00356") we can see your some othe models:
Deep Factorization Machines (DeepFM) ,Bayesian Factorization Machines (Bayesian FM) and DeepFM*.
Could you send me the codes for the models above? I try to reproduce the experimental results you have mentioned in uploaded materials
With my best regards!

Yes! It was run here: https://github.com/jilljenn/tensorflow-DeepFM
This bash script is an example about how to run them: https://github.com/jilljenn/tensorflow-DeepFM/blob/master/template.sh

Good luck! Hope it's not for AAAI!

@jilljenn Thanks a lot.Haha, I just do individual research and I would like to know more about educational data mining skills by coding instead of writing paper.Thanks again!

Thanks for dropping by then. I feel it's hard to find open source implementations of EDM models (but if my colleagues read this, they may hate me).

(⊙o⊙)…,Maybe they share them in different ways.It will make world better ^_^.

Hi @jilljenn I check the codes you provided in "https://github.com/jilljenn/tensorflow-DeepFM" before.Here are some something I do not understand:
(1) I know you use the Duolingo dataset(http://sharedtask.duolingo.com/) in the DFM model.In your "dfm.py" you just load the files: "Xi_train.npy
Xv_train.npy,y_train.npy,Xi_valid.npy,Xv_valid.npy,y_valid.npy,Xi_test.npy,Xv_test.npy" directly.
I think you own a script which can convert from raw Duolingo dataset to "*.npy" format that can be feeded to DeepFM. Could you upload it or send the script to me?My e-mail is "liujiepeng@hust.edu.cn"
(2)I will add it later.......

(2)The Bayesian Factorization Machines (Bayesian FM) model which is compared to DFM in your "dfm-kt-poster.pdf".However the codes of Bayesian FM are not provided in your repository.Could you?
(3)At the same time, I do not know how get the three kinds of datasets:"first,last,pfa".I think it is a similar problem on data conversion.Maybe a script can help solve the problems above.
Thanks a lot!

Hi! My paper was submitted, I can finally address your questions.

I feel like every time you point out something, I give you another link 😄 Sorry. Here is another iteration: https://github.com/jilljenn/slam2018/blob/master/starter_code/baseline.py
This code creates the Xi_train.npy, etc. from the raw Duolingo dataset.

Bayesian Factorization Machines actually means fm.py (wrapper for libFM with the mcmc method, which is by default).

first means only the first features (line 121)
last means, including the noisy ones (the commented line above)
pfa means, including wins and fails (this line).

It will be slightly clearer if you read the paper:
https://arxiv.org/abs/1805.00356

But anyway the best model was: just the first features. I'd be happy if you can share your findings with me!

Our paper is available as PDF and will be presented at AAAI 2019:

@inproceedings{Vie2019,
  Author = {{Vie}, Jill-J{\^e}nn and {Kashima}, Hisashi},
  Booktitle = {Proceedings of the 33th {AAAI} Conference on Artificial Intelligence},
  Title = {{Knowledge Tracing Machines: Factorization Machines for Knowledge Tracing}},
  Pages = {to appear},
  Url = {https://arxiv.org/abs/1811.03388},
  Year = 2019}

If you have other questions, please let me know!