Training data distribution
pluiez opened this issue · 1 comments
pluiez commented
Hi, the paper is very detailed in most aspects, but the training data is not mentioned in as much detail.
Specifically, I am interested in the following:
- The composition of the training dataset, including the types of data (e.g., text, code, images) and the sources of the data.
- How the sampling ratio for each subset is determined, e.g., which principle is followed.
luofuli commented
Hi, the paper is very detailed in most aspects, but the training data is not mentioned in as much detail.
Specifically, I am interested in the following:
- The composition of the training dataset, including the types of data (e.g., text, code, images) and the sources of the data.
- How the sampling ratio for each subset is determined, e.g., which principle is followed.
The details we can reveal so far are already in the paper.