Stack Overflow Question Quality Improvement
This dataset expands the Jie Yang's work by adding the answer count for each revision.
The data is in target.csv. The columns in target.csv are as follows:
Column Name | Meaning |
---|---|
version | The version of the question. It starts from 1. |
Text | The text of current version of question. |
change type | The revision type of the question. |
answer_count | The number of answers the question recieved after revision. |
answer_label | The label of the answer count. |
It should be noticed that numeric change type has following map relationship:
Value | Meaning |
---|---|
-1 | Spell Problem |
0 | Explain Code Usage |
1 | Add Example/URL |
2 | Code Intervention |
3 | Add Error Info Stack Trace |
4 | Add Context information (OS system) |
5 | Attempt |
6 | Solution |
Answer label is based on the count of answer count.
- If answer count is 0, the answer label is -1;
- If answer count is 1, the answer label is 0;
- Otherwise, the answer label is 1.
You can use search query in Stack Exchange to query more data samples. There are severl examples here:
- Find raw text and edit text for question, here
- Find attempt related question intervention, here
- Find answer count before revision, here
- Include more data samples.