mp2893/gram

comparison between med2vec and gram

2g-XzenG opened this issue · 2 comments

Hello Ed,

Nice work!
I didn't pay much attention to this paper at the beginning since you mentioned in the paper this method works well when the dataset is small. So I though Med2vec will give us a better performance when we have a large dataset.

However, now I look closer to the paper, it seems that GRAM will have a better performance than Med2vec and non-negative skip-gram as the t-SNE scatterplot for GRAM looks much better (dots are separated) compare to the other 2 methods.

On the other hand, since medical vector trained by GRAM is aligned well with the given knowledge DAG, which is made by human and might not be good. As you mentioned in the Med2Vec: "the degree of conformity of the code representations to the groupers does not neces-sarily indicate how well the code representations capture the hidden relationships"

I wonder how will you compare these 2 (or 3 if you count non-negative skip-gram) vector learning methods if given a large enough dataset?

Thanks!
xianlong

Med2vec and GRAM are quite different actually. Med2vec is an unsupervised representation learning method. GRAM is typically used for improving the performance of supervised classifiers.

And you can actually combine med2vec and GRAM. In GRAM, you can achieve better performance if you pre-train the basic embeddings. In the paper, I trained those basic embeddings using GloVe, but you can use med2vec (you can use any representation learning technique actually).

But you ask an interesting question. I actually asked the very same question myself.
How much can we rely on hand-engineered domain knowledge?
The correct answer is, of course, if we have infinite data, we don't need any hand-engineered features. But in reality, you cannot always collect sufficient data for some medical codes (e.g. rare disease). Then the best you can do is rely on expert knowledge. Honestly, what else can we do?

When I said "the degree of conformity of the code representations to the groupers does not neces-sarily indicate how well the code representations capture the hidden relationships", I was assuming we had enough data. If we had enough data, then is it really a good idea to use the grouper as a evaluation metric? I was simply pointing this out.

Hope this helps,
Ed

Hello Ed,

Thanks! It took me a while to understand this paper, your respond is very helpful!