Paper -> CoT pipeline: Algorithm for scoring a paper based on rubric data
Opened this issue · 3 comments
thehunmonkgroup commented
The pipeline for grading a paper produces a series of yes/no answers to questions meant to determine the quality of the paper for CoT extraction.
Requirements:
- An algorithm that converts the simple yes/no rubric answers into an overall suitability score
Deliverable:
- Said algorithm
- Either
- An SQL query suitable for SQLite that can apply the algorithm dynamically to the individual rubric answers
- A script that runs the algorithm and writes the suitability score to the database
Example:
Simplest algorithm, SQL-based, sum of all criteria (1 for yes, 0 for no):
SELECT
id,
paper_url,
paper_category,
(COALESCE(criteria_clear_question, 0) +
COALESCE(criteria_definitive_answer, 0) +
COALESCE(criteria_complex_reasoning, 0) +
COALESCE(criteria_coherent_structure, 0) +
COALESCE(criteria_layperson_comprehensible, 0) +
COALESCE(criteria_minimal_jargon, 0) +
COALESCE(criteria_illustrative_examples, 0) +
COALESCE(criteria_significant_insights, 0) +
COALESCE(criteria_verifiable_steps, 0) +
COALESCE(criteria_overall_suitability, 0)) AS total_criteria_score
FROM
papers;
+----+----------------------------------------------+----------------+----------------------+
| id | paper_url | paper_category | total_criteria_score |
+----+----------------------------------------------+----------------+----------------------+
| 8 | https://export.arxiv.org/pdf/0704.3252v1.pdf | astro-ph.EP | 10 |
| 9 | https://export.arxiv.org/pdf/0710.0317v1.pdf | astro-ph.EP | 10 |
+----+----------------------------------------------+----------------+----------------------+
thehunmonkgroup commented
We're at least getting some different grades when run over a larger set of papers:
SELECT
id,
paper_url,
paper_category,
(COALESCE(criteria_clear_question, 0) +
COALESCE(criteria_definitive_answer, 0) +
COALESCE(criteria_complex_reasoning, 0) +
COALESCE(criteria_coherent_structure, 0) +
COALESCE(criteria_layperson_comprehensible, 0) +
COALESCE(criteria_minimal_jargon, 0) +
COALESCE(criteria_illustrative_examples, 0) +
COALESCE(criteria_significant_insights, 0) +
COALESCE(criteria_verifiable_steps, 0) +
COALESCE(criteria_overall_suitability, 0)) AS total_criteria_score
FROM
papers
WHERE
id IN (9, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31);
+----+----------------------------------------------+----------------+----------------------+
| id | paper_url | paper_category | total_criteria_score |
+----+----------------------------------------------+----------------+----------------------+
| 9 | https://export.arxiv.org/pdf/0710.0317v1.pdf | astro-ph.EP | 10 |
| 13 | https://export.arxiv.org/pdf/0805.1116v1.pdf | astro-ph.EP | 9 |
| 14 | https://export.arxiv.org/pdf/0807.0527v1.pdf | astro-ph.EP | 7 |
| 15 | https://export.arxiv.org/pdf/0807.1873v1.pdf | astro-ph.EP | 10 |
| 16 | https://export.arxiv.org/pdf/0809.4042v1.pdf | astro-ph.EP | 3 |
| 17 | https://export.arxiv.org/pdf/0809.4562v1.pdf | astro-ph.EP | 3 |
| 18 | https://export.arxiv.org/pdf/0810.5138v1.pdf | astro-ph.EP | 10 |
| 19 | https://export.arxiv.org/pdf/0901.0304v1.pdf | astro-ph.EP | 9 |
| 20 | https://export.arxiv.org/pdf/0901.0343v1.pdf | astro-ph.EP | 4 |
| 21 | https://export.arxiv.org/pdf/0901.0482v1.pdf | astro-ph.EP | 10 |
| 22 | https://export.arxiv.org/pdf/0901.0515v1.pdf | astro-ph.EP | 10 |
| 23 | https://export.arxiv.org/pdf/0901.0532v1.pdf | astro-ph.EP | 10 |
| 24 | https://export.arxiv.org/pdf/0901.0554v1.pdf | astro-ph.EP | 10 |
| 25 | https://export.arxiv.org/pdf/0901.0625v1.pdf | astro-ph.EP | 2 |
| 26 | https://export.arxiv.org/pdf/0901.0828v1.pdf | astro-ph.EP | 10 |
| 27 | https://export.arxiv.org/pdf/0901.0846v1.pdf | astro-ph.EP | 10 |
| 28 | https://export.arxiv.org/pdf/0901.0735v1.pdf | astro-ph.EP | 9 |
| 29 | https://export.arxiv.org/pdf/0901.1214v1.pdf | astro-ph.EP | 10 |
| 30 | https://export.arxiv.org/pdf/0901.1217v1.pdf | astro-ph.EP | 10 |
| 31 | https://export.arxiv.org/pdf/0901.1547v1.pdf | astro-ph.EP | 10 |
+----+----------------------------------------------+----------------+----------------------+
thehunmonkgroup commented
Updated the scoring to be a tad smarter:
- There are now three required rubric questions (clear question, definitive answer, complex reasoning) -- if any of these are a 'no', the score is zero
- Otherwise, the score is a sum of all ten rubric questions (1 for yes, 0 for no)
Here's a scoring summary across 100 profiled papers:
sqlite> select paper_url, paper_category, suitability_score from papers where processing_status = 'scored' order by suitability_score desc;
+-----------------------------------------------+--------------------+-------------------+
| paper_url | paper_category | suitability_score |
+-----------------------------------------------+--------------------+-------------------+
| https://export.arxiv.org/pdf/0901.0735v1.pdf | astro-ph.EP | 10 |
| https://export.arxiv.org/pdf/1801.05595v1.pdf | astro-ph.EP | 10 |
| https://export.arxiv.org/pdf/1507.03327v1.pdf | astro-ph.GA | 10 |
| https://export.arxiv.org/pdf/1901.07266v1.pdf | astro-ph.GA | 10 |
| https://export.arxiv.org/pdf/1910.09121v1.pdf | astro-ph.GA | 10 |
| https://export.arxiv.org/pdf/1110.2656v2.pdf | astro-ph.HE | 10 |
| https://export.arxiv.org/pdf/1310.7588v1.pdf | astro-ph.HE | 10 |
| https://export.arxiv.org/pdf/1611.08508v1.pdf | astro-ph.HE | 10 |
| https://export.arxiv.org/pdf/1710.09893v1.pdf | astro-ph.HE | 10 |
| https://export.arxiv.org/pdf/1810.04324v3.pdf | astro-ph.HE | 10 |
| https://export.arxiv.org/pdf/1307.3576v1.pdf | cond-mat.mtrl-sci | 10 |
| https://export.arxiv.org/pdf/1310.6949v1.pdf | cond-mat.mtrl-sci | 10 |
| https://export.arxiv.org/pdf/1409.0959v1.pdf | cond-mat.mtrl-sci | 10 |
| https://export.arxiv.org/pdf/1509.04762v1.pdf | cond-mat.mtrl-sci | 10 |
| https://export.arxiv.org/pdf/1901.06620v2.pdf | cs.AI | 10 |
| https://export.arxiv.org/pdf/1604.04372v2.pdf | cs.CV | 10 |
| https://export.arxiv.org/pdf/1806.09158v1.pdf | cs.CV | 10 |
| https://export.arxiv.org/pdf/1910.13340v1.pdf | cs.CV | 10 |
| https://export.arxiv.org/pdf/1207.1387v1.pdf | cs.LG | 10 |
| https://export.arxiv.org/pdf/1307.3964v1.pdf | cs.LG | 10 |
| https://export.arxiv.org/pdf/1905.09538v2.pdf | cs.LG | 10 |
| https://export.arxiv.org/pdf/1811.08973v1.pdf | cs.SE | 10 |
| https://export.arxiv.org/pdf/1104.2747v1.pdf | hep-ex | 10 |
| https://export.arxiv.org/pdf/1407.6211v2.pdf | hep-ex | 10 |
| https://export.arxiv.org/pdf/1808.03987v3.pdf | hep-ex | 10 |
| https://export.arxiv.org/pdf/1211.3270v1.pdf | math.CA | 10 |
| https://export.arxiv.org/pdf/1709.00705v2.pdf | math.CA | 10 |
| https://export.arxiv.org/pdf/1805.00990v2.pdf | math.CO | 10 |
| https://export.arxiv.org/pdf/1706.08709v1.pdf | math.IT | 10 |
| https://export.arxiv.org/pdf/1804.02217v1.pdf | math.IT | 10 |
| https://export.arxiv.org/pdf/1206.4819v2.pdf | math.MP | 10 |
| https://export.arxiv.org/pdf/1201.0101v1.pdf | math.NA | 10 |
| https://export.arxiv.org/pdf/1012.2726v1.pdf | nlin.AO | 10 |
| https://export.arxiv.org/pdf/1706.04252v1.pdf | nlin.AO | 10 |
| https://export.arxiv.org/pdf/1310.4490v1.pdf | nlin.PS | 10 |
| https://export.arxiv.org/pdf/1405.7920v2.pdf | nlin.PS | 10 |
| https://export.arxiv.org/pdf/1806.04399v1.pdf | nlin.PS | 10 |
| https://export.arxiv.org/pdf/0805.2603v1.pdf | nucl-th | 10 |
| https://export.arxiv.org/pdf/1208.3888v2.pdf | nucl-th | 10 |
| https://export.arxiv.org/pdf/1512.02771v1.pdf | nucl-th | 10 |
| https://export.arxiv.org/pdf/1905.06163v1.pdf | physics.app-ph | 10 |
| https://export.arxiv.org/pdf/1306.4661v7.pdf | physics.atom-ph | 10 |
| https://export.arxiv.org/pdf/1706.07114v2.pdf | physics.atom-ph | 10 |
| https://export.arxiv.org/pdf/1906.00474v1.pdf | physics.atom-ph | 10 |
| https://export.arxiv.org/pdf/0803.3901v1.pdf | physics.chem-ph | 10 |
| https://export.arxiv.org/pdf/1706.07534v1.pdf | physics.class-ph | 10 |
| https://export.arxiv.org/pdf/1904.00493v1.pdf | physics.data-an | 10 |
| https://export.arxiv.org/pdf/1802.06590v1.pdf | physics.space-ph | 10 |
| https://export.arxiv.org/pdf/1103.0286v2.pdf | quant-ph | 10 |
| https://export.arxiv.org/pdf/1207.2485v3.pdf | quant-ph | 10 |
| https://export.arxiv.org/pdf/1601.07931v3.pdf | stat.AP | 10 |
| https://export.arxiv.org/pdf/1304.4203v2.pdf | astro-ph.HE | 9 |
| https://export.arxiv.org/pdf/1506.03177v2.pdf | cond-mat.mtrl-sci | 9 |
| https://export.arxiv.org/pdf/1702.08515v3.pdf | cond-mat.mtrl-sci | 9 |
| https://export.arxiv.org/pdf/1508.05025v4.pdf | cond-mat.stat-mech | 9 |
| https://export.arxiv.org/pdf/1706.09347v2.pdf | cs.AI | 9 |
| https://export.arxiv.org/pdf/1711.09952v2.pdf | cs.CV | 9 |
| https://export.arxiv.org/pdf/1205.3773v3.pdf | gr-qc | 9 |
| https://export.arxiv.org/pdf/0909.2753v2.pdf | math.MP | 9 |
| https://export.arxiv.org/pdf/1404.0651v1.pdf | math.NA | 9 |
| https://export.arxiv.org/pdf/1110.2527v1.pdf | math.OC | 9 |
| https://export.arxiv.org/pdf/1403.5318v3.pdf | nlin.PS | 9 |
| https://export.arxiv.org/pdf/1709.03402v4.pdf | physics.chem-ph | 9 |
| https://export.arxiv.org/pdf/1806.02251v1.pdf | physics.data-an | 9 |
| https://export.arxiv.org/pdf/1211.6462v3.pdf | cs.SI | 8 |
| https://export.arxiv.org/pdf/1404.6585v1.pdf | math.IT | 8 |
| https://export.arxiv.org/pdf/1604.05771v1.pdf | math.OC | 8 |
| https://export.arxiv.org/pdf/0708.0048v1.pdf | math.NT | 8 |
| https://export.arxiv.org/pdf/1506.04980v1.pdf | math.NT | 8 |
| https://export.arxiv.org/pdf/1811.08906v1.pdf | astro-ph.HE | 7 |
| https://export.arxiv.org/pdf/1310.1622v2.pdf | cs.LO | 7 |
| https://export.arxiv.org/pdf/0810.4634v1.pdf | math.CO | 7 |
| https://export.arxiv.org/pdf/1101.5924v3.pdf | math.CT | 7 |
| https://export.arxiv.org/pdf/1106.3102v4.pdf | math.OC | 7 |
| https://export.arxiv.org/pdf/1009.1736v1.pdf | nlin.SI | 7 |
| https://export.arxiv.org/pdf/1908.01260v1.pdf | stat.ME | 7 |
| https://export.arxiv.org/pdf/1609.03875v1.pdf | astro-ph.HE | 0 |
| https://export.arxiv.org/pdf/1407.4035v1.pdf | cond-mat.dis-nn | 0 |
| https://export.arxiv.org/pdf/1706.00372v1.pdf | cond-mat.mtrl-sci | 0 |
| https://export.arxiv.org/pdf/0704.1394v1.pdf | cs.AI | 0 |
| https://export.arxiv.org/pdf/1004.1230v1.pdf | cs.AI | 0 |
| https://export.arxiv.org/pdf/1902.11114v2.pdf | cs.CV | 0 |
| https://export.arxiv.org/pdf/1312.0940v1.pdf | cs.CY | 0 |
| https://export.arxiv.org/pdf/1605.01580v1.pdf | cs.CY | 0 |
| https://export.arxiv.org/pdf/1806.06230v1.pdf | cs.GT | 0 |
| https://export.arxiv.org/pdf/1901.11499v2.pdf | cs.SE | 0 |
| https://export.arxiv.org/pdf/1910.08359v1.pdf | eess.SP | 0 |
| https://export.arxiv.org/pdf/1204.1077v1.pdf | gr-qc | 0 |
| https://export.arxiv.org/pdf/0902.4798v1.pdf | hep-ex | 0 |
| https://export.arxiv.org/pdf/1912.07355v1.pdf | hep-ex | 0 |
| https://export.arxiv.org/pdf/1901.06292v1.pdf | math.CO | 0 |
| https://export.arxiv.org/pdf/1411.6503v2.pdf | math.MP | 0 |
| https://export.arxiv.org/pdf/1610.03664v1.pdf | math.MP | 0 |
| https://export.arxiv.org/pdf/0711.1635v1.pdf | nucl-th | 0 |
| https://export.arxiv.org/pdf/0906.4909v1.pdf | nucl-th | 0 |
| https://export.arxiv.org/pdf/1603.09057v1.pdf | nucl-th | 0 |
| https://export.arxiv.org/pdf/1012.0862v1.pdf | physics.gen-ph | 0 |
| https://export.arxiv.org/pdf/1504.03161v1.pdf | physics.soc-ph | 0 |
| https://export.arxiv.org/pdf/1506.06091v1.pdf | q-bio.NC | 0 |
| https://export.arxiv.org/pdf/0906.2684v2.pdf | quant-ph | 0 |
+-----------------------------------------------+--------------------+-------------------+
thehunmonkgroup commented
We've decided the current algorithm is sufficient until we need to run a larger number of papers, we'll need funding for that.