daveshap/Raspberry

Paper -> CoT pipeline: Algorithm for scoring a paper based on rubric data

Opened this issue · 3 comments

The pipeline for grading a paper produces a series of yes/no answers to questions meant to determine the quality of the paper for CoT extraction.

Requirements:

  • An algorithm that converts the simple yes/no rubric answers into an overall suitability score

Deliverable:

  • Said algorithm
  • Either
    • An SQL query suitable for SQLite that can apply the algorithm dynamically to the individual rubric answers
    • A script that runs the algorithm and writes the suitability score to the database

Example:

Simplest algorithm, SQL-based, sum of all criteria (1 for yes, 0 for no):

SELECT
    id,
    paper_url,
    paper_category,
    (COALESCE(criteria_clear_question, 0) +
     COALESCE(criteria_definitive_answer, 0) +
     COALESCE(criteria_complex_reasoning, 0) +
     COALESCE(criteria_coherent_structure, 0) +
     COALESCE(criteria_layperson_comprehensible, 0) +
     COALESCE(criteria_minimal_jargon, 0) +
     COALESCE(criteria_illustrative_examples, 0) +
     COALESCE(criteria_significant_insights, 0) +
     COALESCE(criteria_verifiable_steps, 0) +
     COALESCE(criteria_overall_suitability, 0)) AS total_criteria_score
FROM
    papers;
+----+----------------------------------------------+----------------+----------------------+
| id |                  paper_url                   | paper_category | total_criteria_score |
+----+----------------------------------------------+----------------+----------------------+
| 8  | https://export.arxiv.org/pdf/0704.3252v1.pdf | astro-ph.EP    | 10                   |
| 9  | https://export.arxiv.org/pdf/0710.0317v1.pdf | astro-ph.EP    | 10                   |
+----+----------------------------------------------+----------------+----------------------+

We're at least getting some different grades when run over a larger set of papers:

SELECT
    id,
    paper_url,
    paper_category,
    (COALESCE(criteria_clear_question, 0) +
     COALESCE(criteria_definitive_answer, 0) +
     COALESCE(criteria_complex_reasoning, 0) +
     COALESCE(criteria_coherent_structure, 0) +
     COALESCE(criteria_layperson_comprehensible, 0) +
     COALESCE(criteria_minimal_jargon, 0) +
     COALESCE(criteria_illustrative_examples, 0) +
     COALESCE(criteria_significant_insights, 0) +
     COALESCE(criteria_verifiable_steps, 0) +
     COALESCE(criteria_overall_suitability, 0)) AS total_criteria_score
FROM
    papers
WHERE
    id IN (9, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31);
+----+----------------------------------------------+----------------+----------------------+
| id |                  paper_url                   | paper_category | total_criteria_score |
+----+----------------------------------------------+----------------+----------------------+
| 9  | https://export.arxiv.org/pdf/0710.0317v1.pdf | astro-ph.EP    | 10                   |
| 13 | https://export.arxiv.org/pdf/0805.1116v1.pdf | astro-ph.EP    | 9                    |
| 14 | https://export.arxiv.org/pdf/0807.0527v1.pdf | astro-ph.EP    | 7                    |
| 15 | https://export.arxiv.org/pdf/0807.1873v1.pdf | astro-ph.EP    | 10                   |
| 16 | https://export.arxiv.org/pdf/0809.4042v1.pdf | astro-ph.EP    | 3                    |
| 17 | https://export.arxiv.org/pdf/0809.4562v1.pdf | astro-ph.EP    | 3                    |
| 18 | https://export.arxiv.org/pdf/0810.5138v1.pdf | astro-ph.EP    | 10                   |
| 19 | https://export.arxiv.org/pdf/0901.0304v1.pdf | astro-ph.EP    | 9                    |
| 20 | https://export.arxiv.org/pdf/0901.0343v1.pdf | astro-ph.EP    | 4                    |
| 21 | https://export.arxiv.org/pdf/0901.0482v1.pdf | astro-ph.EP    | 10                   |
| 22 | https://export.arxiv.org/pdf/0901.0515v1.pdf | astro-ph.EP    | 10                   |
| 23 | https://export.arxiv.org/pdf/0901.0532v1.pdf | astro-ph.EP    | 10                   |
| 24 | https://export.arxiv.org/pdf/0901.0554v1.pdf | astro-ph.EP    | 10                   |
| 25 | https://export.arxiv.org/pdf/0901.0625v1.pdf | astro-ph.EP    | 2                    |
| 26 | https://export.arxiv.org/pdf/0901.0828v1.pdf | astro-ph.EP    | 10                   |
| 27 | https://export.arxiv.org/pdf/0901.0846v1.pdf | astro-ph.EP    | 10                   |
| 28 | https://export.arxiv.org/pdf/0901.0735v1.pdf | astro-ph.EP    | 9                    |
| 29 | https://export.arxiv.org/pdf/0901.1214v1.pdf | astro-ph.EP    | 10                   |
| 30 | https://export.arxiv.org/pdf/0901.1217v1.pdf | astro-ph.EP    | 10                   |
| 31 | https://export.arxiv.org/pdf/0901.1547v1.pdf | astro-ph.EP    | 10                   |
+----+----------------------------------------------+----------------+----------------------+

Updated the scoring to be a tad smarter:

  • There are now three required rubric questions (clear question, definitive answer, complex reasoning) -- if any of these are a 'no', the score is zero
  • Otherwise, the score is a sum of all ten rubric questions (1 for yes, 0 for no)

Here's a scoring summary across 100 profiled papers:

sqlite> select paper_url, paper_category, suitability_score from papers where processing_status = 'scored' order by suitability_score desc;
+-----------------------------------------------+--------------------+-------------------+
|                   paper_url                   |   paper_category   | suitability_score |
+-----------------------------------------------+--------------------+-------------------+
| https://export.arxiv.org/pdf/0901.0735v1.pdf  | astro-ph.EP        | 10                |
| https://export.arxiv.org/pdf/1801.05595v1.pdf | astro-ph.EP        | 10                |
| https://export.arxiv.org/pdf/1507.03327v1.pdf | astro-ph.GA        | 10                |
| https://export.arxiv.org/pdf/1901.07266v1.pdf | astro-ph.GA        | 10                |
| https://export.arxiv.org/pdf/1910.09121v1.pdf | astro-ph.GA        | 10                |
| https://export.arxiv.org/pdf/1110.2656v2.pdf  | astro-ph.HE        | 10                |
| https://export.arxiv.org/pdf/1310.7588v1.pdf  | astro-ph.HE        | 10                |
| https://export.arxiv.org/pdf/1611.08508v1.pdf | astro-ph.HE        | 10                |
| https://export.arxiv.org/pdf/1710.09893v1.pdf | astro-ph.HE        | 10                |
| https://export.arxiv.org/pdf/1810.04324v3.pdf | astro-ph.HE        | 10                |
| https://export.arxiv.org/pdf/1307.3576v1.pdf  | cond-mat.mtrl-sci  | 10                |
| https://export.arxiv.org/pdf/1310.6949v1.pdf  | cond-mat.mtrl-sci  | 10                |
| https://export.arxiv.org/pdf/1409.0959v1.pdf  | cond-mat.mtrl-sci  | 10                |
| https://export.arxiv.org/pdf/1509.04762v1.pdf | cond-mat.mtrl-sci  | 10                |
| https://export.arxiv.org/pdf/1901.06620v2.pdf | cs.AI              | 10                |
| https://export.arxiv.org/pdf/1604.04372v2.pdf | cs.CV              | 10                |
| https://export.arxiv.org/pdf/1806.09158v1.pdf | cs.CV              | 10                |
| https://export.arxiv.org/pdf/1910.13340v1.pdf | cs.CV              | 10                |
| https://export.arxiv.org/pdf/1207.1387v1.pdf  | cs.LG              | 10                |
| https://export.arxiv.org/pdf/1307.3964v1.pdf  | cs.LG              | 10                |
| https://export.arxiv.org/pdf/1905.09538v2.pdf | cs.LG              | 10                |
| https://export.arxiv.org/pdf/1811.08973v1.pdf | cs.SE              | 10                |
| https://export.arxiv.org/pdf/1104.2747v1.pdf  | hep-ex             | 10                |
| https://export.arxiv.org/pdf/1407.6211v2.pdf  | hep-ex             | 10                |
| https://export.arxiv.org/pdf/1808.03987v3.pdf | hep-ex             | 10                |
| https://export.arxiv.org/pdf/1211.3270v1.pdf  | math.CA            | 10                |
| https://export.arxiv.org/pdf/1709.00705v2.pdf | math.CA            | 10                |
| https://export.arxiv.org/pdf/1805.00990v2.pdf | math.CO            | 10                |
| https://export.arxiv.org/pdf/1706.08709v1.pdf | math.IT            | 10                |
| https://export.arxiv.org/pdf/1804.02217v1.pdf | math.IT            | 10                |
| https://export.arxiv.org/pdf/1206.4819v2.pdf  | math.MP            | 10                |
| https://export.arxiv.org/pdf/1201.0101v1.pdf  | math.NA            | 10                |
| https://export.arxiv.org/pdf/1012.2726v1.pdf  | nlin.AO            | 10                |
| https://export.arxiv.org/pdf/1706.04252v1.pdf | nlin.AO            | 10                |
| https://export.arxiv.org/pdf/1310.4490v1.pdf  | nlin.PS            | 10                |
| https://export.arxiv.org/pdf/1405.7920v2.pdf  | nlin.PS            | 10                |
| https://export.arxiv.org/pdf/1806.04399v1.pdf | nlin.PS            | 10                |
| https://export.arxiv.org/pdf/0805.2603v1.pdf  | nucl-th            | 10                |
| https://export.arxiv.org/pdf/1208.3888v2.pdf  | nucl-th            | 10                |
| https://export.arxiv.org/pdf/1512.02771v1.pdf | nucl-th            | 10                |
| https://export.arxiv.org/pdf/1905.06163v1.pdf | physics.app-ph     | 10                |
| https://export.arxiv.org/pdf/1306.4661v7.pdf  | physics.atom-ph    | 10                |
| https://export.arxiv.org/pdf/1706.07114v2.pdf | physics.atom-ph    | 10                |
| https://export.arxiv.org/pdf/1906.00474v1.pdf | physics.atom-ph    | 10                |
| https://export.arxiv.org/pdf/0803.3901v1.pdf  | physics.chem-ph    | 10                |
| https://export.arxiv.org/pdf/1706.07534v1.pdf | physics.class-ph   | 10                |
| https://export.arxiv.org/pdf/1904.00493v1.pdf | physics.data-an    | 10                |
| https://export.arxiv.org/pdf/1802.06590v1.pdf | physics.space-ph   | 10                |
| https://export.arxiv.org/pdf/1103.0286v2.pdf  | quant-ph           | 10                |
| https://export.arxiv.org/pdf/1207.2485v3.pdf  | quant-ph           | 10                |
| https://export.arxiv.org/pdf/1601.07931v3.pdf | stat.AP            | 10                |
| https://export.arxiv.org/pdf/1304.4203v2.pdf  | astro-ph.HE        | 9                 |
| https://export.arxiv.org/pdf/1506.03177v2.pdf | cond-mat.mtrl-sci  | 9                 |
| https://export.arxiv.org/pdf/1702.08515v3.pdf | cond-mat.mtrl-sci  | 9                 |
| https://export.arxiv.org/pdf/1508.05025v4.pdf | cond-mat.stat-mech | 9                 |
| https://export.arxiv.org/pdf/1706.09347v2.pdf | cs.AI              | 9                 |
| https://export.arxiv.org/pdf/1711.09952v2.pdf | cs.CV              | 9                 |
| https://export.arxiv.org/pdf/1205.3773v3.pdf  | gr-qc              | 9                 |
| https://export.arxiv.org/pdf/0909.2753v2.pdf  | math.MP            | 9                 |
| https://export.arxiv.org/pdf/1404.0651v1.pdf  | math.NA            | 9                 |
| https://export.arxiv.org/pdf/1110.2527v1.pdf  | math.OC            | 9                 |
| https://export.arxiv.org/pdf/1403.5318v3.pdf  | nlin.PS            | 9                 |
| https://export.arxiv.org/pdf/1709.03402v4.pdf | physics.chem-ph    | 9                 |
| https://export.arxiv.org/pdf/1806.02251v1.pdf | physics.data-an    | 9                 |
| https://export.arxiv.org/pdf/1211.6462v3.pdf  | cs.SI              | 8                 |
| https://export.arxiv.org/pdf/1404.6585v1.pdf  | math.IT            | 8                 |
| https://export.arxiv.org/pdf/1604.05771v1.pdf | math.OC            | 8                 |
| https://export.arxiv.org/pdf/0708.0048v1.pdf  | math.NT            | 8                 |
| https://export.arxiv.org/pdf/1506.04980v1.pdf | math.NT            | 8                 |
| https://export.arxiv.org/pdf/1811.08906v1.pdf | astro-ph.HE        | 7                 |
| https://export.arxiv.org/pdf/1310.1622v2.pdf  | cs.LO              | 7                 |
| https://export.arxiv.org/pdf/0810.4634v1.pdf  | math.CO            | 7                 |
| https://export.arxiv.org/pdf/1101.5924v3.pdf  | math.CT            | 7                 |
| https://export.arxiv.org/pdf/1106.3102v4.pdf  | math.OC            | 7                 |
| https://export.arxiv.org/pdf/1009.1736v1.pdf  | nlin.SI            | 7                 |
| https://export.arxiv.org/pdf/1908.01260v1.pdf | stat.ME            | 7                 |
| https://export.arxiv.org/pdf/1609.03875v1.pdf | astro-ph.HE        | 0                 |
| https://export.arxiv.org/pdf/1407.4035v1.pdf  | cond-mat.dis-nn    | 0                 |
| https://export.arxiv.org/pdf/1706.00372v1.pdf | cond-mat.mtrl-sci  | 0                 |
| https://export.arxiv.org/pdf/0704.1394v1.pdf  | cs.AI              | 0                 |
| https://export.arxiv.org/pdf/1004.1230v1.pdf  | cs.AI              | 0                 |
| https://export.arxiv.org/pdf/1902.11114v2.pdf | cs.CV              | 0                 |
| https://export.arxiv.org/pdf/1312.0940v1.pdf  | cs.CY              | 0                 |
| https://export.arxiv.org/pdf/1605.01580v1.pdf | cs.CY              | 0                 |
| https://export.arxiv.org/pdf/1806.06230v1.pdf | cs.GT              | 0                 |
| https://export.arxiv.org/pdf/1901.11499v2.pdf | cs.SE              | 0                 |
| https://export.arxiv.org/pdf/1910.08359v1.pdf | eess.SP            | 0                 |
| https://export.arxiv.org/pdf/1204.1077v1.pdf  | gr-qc              | 0                 |
| https://export.arxiv.org/pdf/0902.4798v1.pdf  | hep-ex             | 0                 |
| https://export.arxiv.org/pdf/1912.07355v1.pdf | hep-ex             | 0                 |
| https://export.arxiv.org/pdf/1901.06292v1.pdf | math.CO            | 0                 |
| https://export.arxiv.org/pdf/1411.6503v2.pdf  | math.MP            | 0                 |
| https://export.arxiv.org/pdf/1610.03664v1.pdf | math.MP            | 0                 |
| https://export.arxiv.org/pdf/0711.1635v1.pdf  | nucl-th            | 0                 |
| https://export.arxiv.org/pdf/0906.4909v1.pdf  | nucl-th            | 0                 |
| https://export.arxiv.org/pdf/1603.09057v1.pdf | nucl-th            | 0                 |
| https://export.arxiv.org/pdf/1012.0862v1.pdf  | physics.gen-ph     | 0                 |
| https://export.arxiv.org/pdf/1504.03161v1.pdf | physics.soc-ph     | 0                 |
| https://export.arxiv.org/pdf/1506.06091v1.pdf | q-bio.NC           | 0                 |
| https://export.arxiv.org/pdf/0906.2684v2.pdf  | quant-ph           | 0                 |
+-----------------------------------------------+--------------------+-------------------+

We've decided the current algorithm is sufficient until we need to run a larger number of papers, we'll need funding for that.