allenai/entailment_bank

Ground truth predictions cannot achieve 100% on some metrics?

yangky11 opened this issue · 1 comments

Hi,

Thanks for releasing the code! I have run it on prediction files and the results look correct. Just one minor issue that I'm curious about: I tried running the evaluation script on the ground truth validation proofs using python eval/run_scorer.py --task "task_2" --split dev --prediction_file gts_val.tsv --output_dir ./ --bleurt_checkpoint bleurt-large-512, where gts_val.tsv contains ground truth validation proofs:

$proof$ = sent10 & sent12 & sent4 -> hypothesis;
$proof$ = sent1 & sent13 -> int1: a star is a source of light; int1 & sent21 & sent23 -> hypothesis;
$proof$ = sent24 & sent5 -> hypothesis;
$proof$ = sent14 & sent5 -> int1: new york state is located in the northern hemisphere; int1 & sent23 -> int2: december is during the winter for new york state; int2 & sent15 -> hypothesis;
$proof$ = sent23 & sent8 -> int1: earth is a planet that rotates on its tilted axis; int1 & sent12 & sent13 -> hypothesis;
$proof$ = sent17 & sent7 -> int1: the plants in the gardens are located outside; int1 & sent24 -> int2: the plants in the gardens will receive sunlight during the day; int2 & sent16 -> int3: the plants in the gardens will receive sunlight to grow during the day; int3 & sent25 -> hypothesis;
$proof$ = sent18 & sent2 & sent24 -> hypothesis;
$proof$ = sent18 & sent25 & sent3 -> int1: earth revolving the sun is an example of a planet revolving around its star; int1 & sent17 -> hypothesis;
$proof$ = sent16 & sent20 -> int1: the earth is a planet that rotates on its tilted axis once per day; int1 & sent24 -> hypothesis;
$proof$ = sent22 & sent7 -> int1: mercury is a planet in the solar system; int1 & sent25 -> int2: mercury is a planet orbits the sun in the solar system; int2 & sent20 -> int3: a complete orbit of mercury around the sun takes one mercury year; int3 & sent13 -> hypothesis;
$proof$ = sent7 & sent9 -> int1: the new moon is when the moon could block the earth from the sun; int1 & sent2 -> hypothesis;
$proof$ = sent13 & sent19 -> hypothesis;
$proof$ = sent1 & sent14 & sent16 -> hypothesis;
$proof$ = sent11 & sent6 -> hypothesis;
$proof$ = sent13 & sent4 -> hypothesis;
$proof$ = sent12 & sent13 -> hypothesis;
$proof$ = sent19 & sent25 -> hypothesis;
$proof$ = sent10 & sent6 -> hypothesis;
$proof$ = sent18 & sent19 & sent24 -> int1: earth is a planet that has a greater mass than the planet mars; int1 & sent2 -> int2: the force of gravity on earth will be greater than on mars; int2 & sent11 -> int3: objects will weigh less on mars than on earth because the force of gravity is less on mars; int2 & sent25 & sent6 -> int4: objects will have the same mass on earth and on mars; int3 & int4 -> int5: objects will weigh less and have the same mass on mars than on earth; int5 & sent13 -> hypothesis;
$proof$ = sent12 & sent25 -> int1: moons orbit jupiter; int1 & sent14 -> int2: a telescope can be used to observe the moons that orbit around jupiter; int2 & sent18 -> hypothesis;
$proof$ = sent13 & sent19 -> int1: a new moon will occur 28 days after the last time it occurs; int1 & sent20 -> int2: the next new moon will occur 28 days after june 2; int2 & sent22 -> hypothesis;
$proof$ = sent18 & sent4 -> hypothesis;
$proof$ = sent10 & sent18 & sent19 -> int1: if fossils of ocean animals are found in a place then that place used to be covered in ocean in the past; int1 & sent17 & sent25 -> hypothesis;
$proof$ = sent2 & sent8 -> int1: if fossils of ocean plants are found in a place then that place used to be covered by water in the past; int1 & sent11 & sent18 -> hypothesis;
$proof$ = sent16 & sent23 -> int1: marine fossils are fossils of water animals; int1 & sent22 -> int2: fossils of water animals are found in mountains; int2 & sent1 -> int3: the mountains used to be covered by water in the past; int3 & sent20 & sent6 & sent7 -> hypothesis;
$proof$ = sent10 & sent21 & sent8 -> hypothesis;
$proof$ = sent4 & sent8 -> int1: studying rock formations can mean studying the history and processes of earth; int1 & sent15 -> int2: structural geologists study the history and processes of earth; sent14 & sent5 -> int3: studying fossils in rock formations can mean studying the history and processes of earth; int3 & sent7 -> int4: paleontologists study the history and processes of earth; int2 & int4 -> hypothesis;
$proof$ = sent17 & sent4 -> hypothesis;
$proof$ = sent19 & sent7 -> int1: increasing something causes that something to become extreme; int1 & sent24 -> int2: exposure to increased pressure metamorphoses sedimentary rock into metamorphic rock; int2 & sent25 -> int3: exposure to increased pressure metamorphoses shale into metamorphic rock; int3 & sent23 -> hypothesis;
$proof$ = sent10 & sent4 -> int1: classifying is when one sorts something by class; int1 & sent6 -> int2: rocks can be classified by how the rock was formed; int2 & sent24 -> hypothesis;
$proof$ = sent12 & sent19 -> int1: classifying is when one sorts something by class; int1 & sent1 -> int2: rocks can be classified by how they are formed; int2 & sent3 -> hypothesis;
$proof$ = sent1 & sent10 -> int1: if something is formed from a material then that something contains that material; int1 & sent25 -> int2: if something is formed from organic material then that something contains organic material; int2 & sent19 -> int3: a sedimentary deposit formed from organic material contains organic material; sent18 & sent24 -> int4: organic material is made of organic compounds; int4 & sent8 -> int5: organic material is mainly made of carbon; int5 & sent14 -> int6: organic material contains high amounts of carbon; int3 & int6 -> int7: a sedimentary deposit formed from organic material contains high amounts of carbon; int7 & sent9 -> hypothesis;
$proof$ = sent19 & sent3 -> int1: sedimentary rocks are a kind of structure; sent15 & sent20 -> int2: sediment is formed by erosion; int2 & sent7 -> int3: in the formation of sedimentary rock, erosion forms sediment; int3 & sent11 -> int4: in the formation of sedimentary rock, sediment formed by erosion is deposited; int4 & sent24 -> int5: in the formation of sedimentary rock, sediment is deposited then buried; int5 & sent22 -> int6: in the formation of sedimentary rock, sediment is deposited then buried then cemented together; int1 & int6 -> int7: sedimentary rocks are a kind of structure that is formed by sediments being deposited then buried then cemented together; int7 & sent4 -> hypothesis;
$proof$ = sent10 & sent2 -> int1: to increase something in the atmosphere means to add that something into the atmosphere; int1 & sent4 -> hypothesis;
$proof$ = sent1 & sent3 -> hypothesis;
$proof$ = sent5 & sent9 -> int1: if producing renewable electric energy requires something then that something may be able to produce renewable electric energy; int1 & sent15 -> int2: renewable resources may be able to produce renewable electric energy; int2 & sent18 -> int3: renewable resources have the potential to produce renewable electric energy; int3 & sent19 -> hypothesis;
$proof$ = sent18 & sent3 -> int1: resources being available is a kind of advantage; int1 & sent11 -> hypothesis;
$proof$ = sent11 & sent17 -> int1: the supply of solar energy will not change over a long period of time; sent21 & sent9 -> int2: to a human, a time period of billions of years is considered a very long time; int1 & int2 -> hypothesis;
$proof$ = sent22 & sent24 -> int1: energy is used for heating a home; sent4 & sent6 -> int2: solar energy is a kind of renewable resource and a kind of energy; int1 & int2 -> hypothesis;
$proof$ = sent13 & sent23 -> int1: roots are a part of a tree that anchor it into soil; int1 & sent24 -> int2: planting trees increases the amount of roots in the soil; sent5 & sent6 -> int3: tree roots prevent soil erosion; int3 & sent8 -> int4: tree roots prevent moving water from moving soil from environments; int4 & sent25 -> int5: tree roots prevent soil from washing away; int2 & int5 -> hypothesis;
$proof$ = sent20 & sent5 -> int1: solar energy is a kind of light energy; int1 & sent9 -> int2: photovoltaic cells convert solar energy into electricity; sent17 & sent24 -> int3: if photovoltaic cells convert something into something else then those photovoltaic cells use that something to produce that something else; int2 & int3 -> int4: photovoltaic cells use solar energy to produce electricity; int4 & sent13 -> hypothesis;
$proof$ = sent13 & sent19 -> int1: if something enters something else then the things that something contains will enter that something else; int1 & sent7 -> int2: if runoff enters a lake then the things that runoff contains will enter that lake; int2 & sent18 -> int3: if runoff enters a lake then fertilizer will enter that lake; sent1 & sent15 -> int4: algae is found in a lake; int4 & sent2 -> int5: fertilizers have a positve impact on algae growth in a lake; int3 & int5 -> int6: runoff entering a lake has a positive impact on algae growth in that lake; int6 & sent23 -> int7: runoff entering a lake causes algae to grow in that lake; int7 & sent8 -> hypothesis;
$proof$ = sent10 & sent20 -> int1: ph of a body of water has an impact on the water quality of a body of water; int1 & sent24 -> int2: acid rain has an impact on the quality of a body of water by changing the ph of the water; int2 & sent14 -> int3: acid rain has a negative impact on water quality by changing the ph of water; sent19 & sent23 -> int4: tadpoles are animals live in water; int4 & sent17 -> int5: changing the ph of water can cause tadpoles not be able to survive; int3 & int5 -> int6: acid rain has a negative impact on a tadpole's ability to survive by chaning the ph of water; sent2 & sent23 -> int7: a tadpole is a kind of organism; int6 & int7 & sent4 -> hypothesis;
$proof$ = sent1 & sent2 -> int1: many vehicles increase the amount of pollution in air; int1 & sent23 -> hypothesis;
$proof$ = sent3 & sent9 -> int1: if fish decrease in amount then animals that eat fish will lack food; sent12 & sent23 -> int2: if the population of fish decreases then the amount of fish will decrease; int2 & sent19 -> int3: too much fishing causes the amount of fish in an area to decrease; int1 & int3 -> int4: too much fishing can cause some animals that eat fish to lack food; int4 & sent25 -> int5: too much fishing can cause starvation for some animals that eat fish; int5 & sent18 -> int6: starvation can decrease the population of animals that eat fish; int6 & sent1 & sent20 -> hypothesis;
$proof$ = sent20 & sent7 -> int1: notebook paper is recyclable; int1 & sent16 -> hypothesis;
$proof$ = sent11 & sent18 & sent7 -> int1: glass bottles are made of recyclable material; int1 & sent2 -> hypothesis;
$proof$ = sent22 & sent25 -> int1: an earthquake usually occurs in a short amount of time; sent14 & sent9 -> int2: an earthquake can change earth's surface by shaking the ground; int1 & int2 -> int3: an earthquake can change earth's surface in a short amount of time; int3 & sent20 -> hypothesis;
$proof$ = sent17 & sent19 & sent5 -> hypothesis;
$proof$ = sent13 & sent15 & sent6 -> int1: lichens can cause rocks to break down by chemical weathering; sent14 & sent2 -> int2: soil can be formed by chemical weathering; int1 & int2 -> hypothesis;
$proof$ = sent18 & sent24 -> int1: sierra nevada contains mountains; int1 & sent23 -> int2: sierra nevada mountains can limit the water vapor reaching a location; int2 & sent3 -> hypothesis;
$proof$ = sent21 & sent4 -> hypothesis;
$proof$ = sent12 & sent21 & sent23 -> hypothesis;
$proof$ = sent11 & sent25 -> int1: pouring water onto soil causes soil erosion; int1 & sent14 -> hypothesis;
$proof$ = sent13 & sent25 -> int1: if water freezes in the crack of the rock, then water will expand in the crack; int1 & sent21 -> hypothesis;
$proof$ = sent2 & sent4 -> int1: water freezing means water changes from a liquid into a solid; int1 & sent3 -> int2: water freezing means water changes from a liquid to ice; int2 & sent16 -> hypothesis;
$proof$ = sent10 & sent14 -> hypothesis;
$proof$ = sent2 & sent20 -> hypothesis;
$proof$ = sent11 & sent17 -> hypothesis;
$proof$ = sent2 & sent24 & sent8 -> int1: water absorbing solar energy will increase in temperature; int1 & sent15 -> int2: water absorbing solar energy will increase in heat energy; sent17 & sent2 -> int3: evaporation of water is when water changes from a liquid into a gas by increasing heat energy; int2 & int3 -> int4: water absorbing solar energy will cause the evaporation of water; int4 & sent6 -> hypothesis;
$proof$ = sent10 & sent16 -> int1: when a person moves, chemical energy is converted to mechanical energy; sent5 & sent9 -> int2: shivering is a kind of moving; int2 & sent12 -> int3: the person moves when he is shivering; int1 & int3 -> hypothesis;
$proof$ = sent1 & sent2 -> int1: a hand dryer converts electrical energy into other forms of energy; int1 & sent16 & sent9 -> hypothesis;
$proof$ = sent22 & sent6 -> hypothesis;
$proof$ = sent14 & sent16 -> int1: electrons flow in one direction in the wires when an electrical circuit is working properly; int1 & sent18 -> int2: electrons flow in one direction through the atoms in wires when an electrical circuit is working properly; sent22 & sent4 -> int3: if something flows through something else then that something will collide with that something else; int2 & int3 -> int4: electrons will collide with the atoms in wires when flowing in one direction; int4 & sent12 -> int5: electrons flowing in wires increases the kinetic energy of the wires; int5 & sent23 -> int6: electrical current running through a wire causes the temperature of the wire to increase; sent1 & sent17 -> int7: warm / becoming warm means heat increases; int7 & sent2 -> int8: warm / becoming warm means temperature increases; int6 & int8 -> hypothesis;
$proof$ = sent20 & sent23 -> hypothesis;
$proof$ = sent15 & sent3 & sent9 -> hypothesis;
$proof$ = sent22 & sent3 -> int1: a block of ice is touching a hot sidewalk; sent11 & sent19 -> int2: ice is a kind of cold object; int1 & int2 & sent2 -> int3: a block of ice on a hot sidewalk is an example of cold object touching warmer object; sent17 & sent9 -> int4: when a cooler object touches a warmer object, heat will flow from the warmer object to the cooler object; int3 & int4 -> int5: heat will flow from the hot sidewalk to the block of ice; int5 & sent13 -> int6: the hot sidewalk increase the heat energy in the block of ice; int6 & sent24 -> hypothesis;
$proof$ = sent2 & sent24 -> int1: a blue block is an object with blue color; int1 & sent19 -> hypothesis;
$proof$ = sent14 & sent23 -> int1: a leaf is a green object; int1 & sent16 -> hypothesis;
$proof$ = sent11 & sent17 -> int1: the light will be reflected from the reflector; int1 & sent22 -> hypothesis;
$proof$ = sent14 & sent9 -> int1: an image in a mirror is formed by reflecting light; int1 & sent21 -> hypothesis;
$proof$ = sent2 & sent24 -> int1: the person in the dark room will not reflect enough light to be seen; int1 & sent3 -> hypothesis;
$proof$ = sent25 & sent4 -> hypothesis;
$proof$ = sent22 & sent5 -> hypothesis;
$proof$ = sent21 & sent5 -> hypothesis;
$proof$ = sent11 & sent18 & sent6 -> int1: a hot boiled egg in cold water is an example of hot object in a colder substance; int1 & sent3 -> int2: thermal conduction will occur between the hot egg and the cold water; int2 & sent12 -> int3: the heat will transfer from the hot egg to the cold water; int3 & sent20 -> int4: the hot egg will decrease in temperature; int3 & sent25 -> int5: the cold water will increase in temperature; int4 & int5 -> hypothesis;
$proof$ = sent15 & sent16 -> int1: pot is a kind of thermal conductor; int1 & sent22 -> int2: if a pot is exposed to a source of heat, then the pot will become hot; sent8 & sent9 -> int3: a stove is a source of heat for cooking; int2 & int3 -> int4: if a pot is exposed to a stove, then the pot will become hot; int4 & sent23 -> hypothesis;
$proof$ = sent17 & sent7 -> int1: a burner of a stove generates heat for cooking usually; int1 & sent13 -> int2: a burner of a stove will be hot in temperature; sent11 & sent9 -> int3: a pan and a burner are kinds of objects; int2 & int3 & sent21 & sent25 -> int4: a frying pan on the burner of the stove is an example of cooler object touching hot object; int4 & sent4 -> int5: thermal conduction will occur between the frying pan and the burner; sent11 & sent9 -> int6: a pan and a burner are kinds of objects; int5 & int6 & sent23 -> hypothesis;
$proof$ = sent19 & sent20 -> int1: longitudinal waves are also called compression waves; int1 & sent17 -> hypothesis;
$proof$ = sent24 & sent3 -> int1: a soccer ball and ground are kinds of objects; int1 & sent25 & sent5 -> int2: friction will occur between the rolling soccer ball and the ground; int2 & sent24 & sent7 -> hypothesis;
$proof$ = sent12 & sent24 -> int1: grease is used to decrease the friction among objects; sent11 & sent13 & sent19 & sent21 -> int2: when wheels and gears move against other surfaces, frictions will occur; int1 & int2 & sent11 & sent21 -> hypothesis;
$proof$ = sent25 & sent4 & sent6 -> hypothesis;
$proof$ = sent13 & sent21 -> hypothesis;
$proof$ = sent2 & sent22 -> hypothesis;
$proof$ = sent1 & sent14 -> int1: a plant cell contains a nucleus; int1 & sent15 -> int2: plant cells contain a nucleus and chloroplasts; int2 & sent17 -> int3: plant cells contain a nucleus and chloroplasts and a cell wall; int3 & sent22 -> int4: plants are made up of cells that contain a nucleus and chloroplasts and a cell wall; sent10 & sent23 -> int5: plants are kinds of organisms that are part of the biological kingdom plantae; int4 & int5 -> int6: organisms in the biological kingdom plantae are made up of cells that contain a nucleus and chloroplasts and a cell wall; int6 & sent21 -> hypothesis;
$proof$ = sent22 & sent25 -> int1: an insect has changed from an immature form to an adult form without being a pupa; sent11 & sent9 -> int2: if something undergoes metamorphosis without going through a certain stage of metamorphosis, then that metamorphosis was incomplete; int2 & sent19 -> int3: if something undergoes metamorphosis without going through the pupa stage, then that metamorphosis was incomplete; int3 & sent13 -> int4: incomplete metamorphosis is when an insect reaches the adult stage without being a pupa; int1 & int4 -> hypothesis;
$proof$ = sent10 & sent20 -> int1: the cones of a jack pine tree are sealed with a resin that prevents seed dispersal; int1 & sent21 -> int2: the resin on the cones of a jack pine tree that prevent seed dispersal can be melted with heat; sent1 & sent19 -> int3: burning a forest can produce great amounts of heat; int3 & sent5 -> int4: a wildfire produces great amounts of heat; int2 & int4 -> int5: a wildfire can melt the resing that prevents seed dispersal off of the cones of a jack pine tree; int5 & sent3 -> int6: a wildfire can remove the resin that prevents seed dispersal from the cones of a jack pine tree; int6 & sent17 -> int7: a wildfire causes jack pine tree cones to be able to disperse seeds; sent14 & sent9 -> int8: a tree will disperse seeds after something else causes that tree to be able to disperse seeds; int7 & int8 -> hypothesis;
$proof$ = sent14 & sent7 -> int1: a forest environment is dark in color; sent16 & sent20 -> int2: a bear's fur is dark in color; int1 & int2 -> int3: a bear's fur and a forest environment are both dark in color; sent1 & sent23 -> int4: a bear is a kind of organism; int4 & sent19 -> int5: an example of camouflage is a bear having the same color as its environment; int3 & int5 -> int6: an example of camouflage is a bear's fur being the same dark color as a forest; int6 & sent11 -> int7: a bear having dark fur is a kind of adaptation for hiding in a forest; sent10 & sent5 -> int8: protecting an animal has a positive impact on that animal's survival; int8 & sent12 -> int9: hiding can have a positive impact on an animal's survival; int9 & sent1 -> int10: hiding can have a positive impact on a bear's survival; int10 & int7 -> hypothesis;
$proof$ = sent23 & sent25 -> int1: claws are used by some predators to catch food; sent1 & sent17 -> int2: some birds are predators; int1 & int2 -> hypothesis;
$proof$ = sent18 & sent22 -> int1: water has a positive impact on a living thing's survival; int1 & sent16 -> int2: increasing the availability of water for a living thing has a positive impact on that living thing's survival; int2 & sent17 -> int3: storing water has a positive impact on a living thing's survival by increasing the availability of water; sent1 & sent4 -> int4: a dry environment is low in availability of water; int3 & int4 -> int5: storing water increases the water available to organisms in a dry environment; int5 & sent19 -> hypothesis;
$proof$ = sent4 & sent5 -> int1: increasing the size of a leaf has a positive impact on that leaf's ability to absorb sunlight; int1 & sent3 -> int2: increasing the size of a plant' leaves has a positive impact on a plant's ability to perform photosynthesis; int2 & sent23 -> int3: increasing the size of a plant's leaves have a positive impact on that plant's survival; sent25 & sent6 -> int4: fast growth increases the speed of growth; int4 & sent19 -> int5: fast growth causes the leaves of a plant to increase in size more quickly; int3 & int5 -> int6: fast leaf growth has a positive impact on a plant's survival; sent16 & sent2 -> int7: if being fast has a positive impact on something, then being slow will have a negative impact on that something; int6 & int7 -> hypothesis;
$proof$ = sent19 & sent6 -> int1: as water increases in an environment , the population of aquatic animals will increase; int1 & sent20 -> int2: as water increases in an environment, the population of salamanders may increase; sent11 & sent17 -> int3: a flood is a result of a large increase of water in a body of water; int3 & sent24 -> int4: a flood is a result of a large increase of water in an environment; int2 & int4 -> hypothesis;
$proof$ = sent1 & sent19 -> int1: the decrease of something required by an organism in water has a negative impact on that organism's survival; sent17 & sent5 -> int2: if a substance with a certain temperature contains more / less of something than a substance with a different temperature, then that temperature positively / negatively affects how much of that something that substance contains; int2 & sent23 -> int3: if water with a certain temperature contains more / less of something than water with a different temperature, then that temperature positively / negatively affects how much of that something that water contains; int3 & sent22 -> int4: temperature negatively affects how much dissolved oxygen water contains; int4 & sent2 -> int5: as the temperature of water increases, the amount of dissolved oxygen in the water decreases; int1 & int5 -> int6: the increase of water tempreature has a negative impact on the survival of organisms in the water that require oxygen; sent13 & sent24 -> int7: a fish requires oxygen to survive; int7 & sent8 -> int8: a fish lives in water and requires oxygen to survive; int8 & sent10 -> int9: a fish is a kind of organism that lives in water and requires oxygen to survive; int6 & int9 -> int10: the increase of water tempreature has a negative impact on the survival of fish; sent21 & sent4 -> int11: hot weather means high temperatures; int11 & sent11 -> int12: hot weather causes the temperature of water in an environment to increase; int10 & int12 -> hypothesis;
$proof$ = sent10 & sent23 -> int1: a plant requires nutrients in soil for growth; int1 & sent18 -> int2: nutrients in soil positively impacts the plant growth process; sent20 & sent25 -> int3: different types of soil contain different amounts of nutrients; int2 & int3 -> int4: different amounts of nutrients in different types of soil impact plant growth; sent2 & sent7 -> int5: growth is when a plant grows; int5 & sent9 -> int6: a plant requires water for growth; int6 & sent14 -> int7: a plant requires water absorbed from soil for growth; sent11 & sent18 -> int8: if something is required for growth then that something positively impacts growth; int7 & int8 -> int9: water absorbed from soil positively impacts plant growth; sent17 & sent5 -> int10: sandy soil has less available water than heavy soil; sent1 & sent12 -> int11: sandy soil and heavy soil are different types of soil; int10 & int11 -> int12: different types of soil have different availability of water; int12 & int9 -> int13: different availabilty of water in different types of soil impact plant growth; int13 & int4 -> int14: type of soil impacts plant growth through amount of nutrients and availability of water; int14 & sent4 -> hypothesis;
$proof$ = sent23 & sent3 -> int1: lungs perform the function of breathing in birds; sent23 & sent25 -> int2: skin performs the function of breathing in frogs; int1 & int2 -> int3: lungs in birds and skin on frogs both perform the function of breathing; int3 & sent8 -> hypothesis;
$proof$ = sent16 & sent21 -> int1: the lungs bring in oxygen from the air; int1 & sent2 -> int2: the lungs perform the function of bringing in oxygen from the air; sent1 & sent8 -> int3: if the lungs perform a function then the respiratory system performs that function through the lungs; int2 & int3 -> int4: the respiratory system takes in oxygen from the air through the lungs; sent4 & sent5 -> int5: the respiratory system brings oxygen to the circulatory system; int4 & int5 -> int6: the respiratory system brings oxygen from the air to the circulatory system; sent10 & sent19 -> int7: the circulatory system brings oxygen from the lungs to the rest of the body; int6 & int7 -> int8: the respiratory system and the circulatory system both work to bring oxygen from the air to the rest of the body; sent12 & sent2 -> int9: if two things both bring oxygen to the body then those two things are similar; int8 & int9 -> hypothesis;
$proof$ = sent19 & sent9 -> int1: digestion of food occurs in the small intestine; int1 & sent18 -> int2: digestion of proteins occurs in the small intestine; int2 & sent6 -> int3: digestion of nutrients occurs in the small intestine; sent5 & sent8 -> int4: digestion is when an organism absorbs nutrients from food into itself; int3 & int4 -> hypothesis;
$proof$ = sent3 & sent9 -> hypothesis;
$proof$ = sent4 & sent9 -> int1: waste must be eliminated from the parts of the body before it can be eliminated from the entirety of the body; int1 & sent24 -> int2: waste must be eliminated from the blood before it can be eliminated from the body; int2 & sent11 -> int3: removing waste from blood is a step in removing waste from the body; int3 & sent14 -> int4: the kidneys perform a step in removing waste from the body; int4 & sent19 -> hypothesis;
$proof$ = sent22 & sent9 -> int1: the purpose of the vomiting reflex is to remove toxic material from the stomach before it is absorbed; int1 & sent12 -> hypothesis;
$proof$ = sent16 & sent8 -> int1: rheumatoid arthritis is a kind of autoimmune disease; int1 & sent10 & sent18 -> int2: rheumatoid arthritis is caused by a disordered immune system; int2 -> hypothesis;
$proof$ = sent17 & sent20 -> int1: wings and feathers are used to fly by birds; int1 & sent12 -> int2: wing and feathers have a positive impact on a bird's ability to fly; sent14 & sent23 -> int3: hollow bones have a positive impact on an animal's ability to fly; int2 & int3 -> int4: wings and feathers and hollow bones have a positive impact on a bird's ability to fly; sent18 & sent8 -> int5: some birds use flight for survival by avoiding predators; sent11 & sent3 -> int6: some birds use flight for survival by finding food; int5 & int6 -> int7: many birds use flight for survival; int4 & int7 -> int8: birds use wings and feathers and hollow bones for survival by positivly impacting their ability to fly; int8 & sent4 -> hypothesis;
$proof$ = sent17 & sent9 -> int1: muscular movement moves a part of animals to help them move; int1 & sent21 -> int2: muscular movement moves bones to help animals move; int2 & sent12 -> int3: muscles pull bones to move the bones; int3 & sent7 -> hypothesis;
$proof$ = sent12 & sent4 -> int1: nerve cells conduct messages in the body in the form of electrical signals; int1 & sent16 -> int2: nerve cells carry messages in the body in the form of electrical signals; int2 & sent6 -> int3: nerves carry messages in the body; sent14 & sent20 -> int4: sensory organs and the brain are parts of a body; int3 & int4 -> int5: nerves carry messages from sensory organs to the brain; sent10 & sent2 -> int6: some sensory organs are used for seeing; int6 & sent3 -> int7: eyes are a kind of sensory organ; int5 & int7 -> hypothesis;
$proof$ = sent22 & sent8 -> int1: bones can be used to provide support; int1 & sent9 -> int2: the skeletal system is used to provide support for animals; sent13 & sent3 -> int3: bones can be used for protection; int3 & sent9 -> int4: the skeletal system can be used for protection; int2 & int4 -> int5: the skeletal system can be used for protection and support; int5 & sent23 -> hypothesis;
$proof$ = sent1 & sent8 -> int1: fish is a kind of scaled animal with scales covering around the body; int1 & sent12 -> hypothesis;
$proof$ = sent12 & sent2 -> hypothesis;
$proof$ = sent13 & sent17 -> int1: cell division causes the baby elephant to grow into adult elephant; int1 & sent8 -> hypothesis;
$proof$ = sent10 & sent12 -> hypothesis;
$proof$ = sent10 & sent4 & sent5 -> hypothesis;
$proof$ = sent10 & sent15 & sent18 & sent21 & sent9 -> int1: the heart, blood vessels, kidneys, and bladder working together is an example of different parts of body working together; int1 & sent23 -> hypothesis;
$proof$ = sent5 & sent6 -> int1: chloroplast can be found in a plant cell; int1 & sent2 -> hypothesis;
$proof$ = sent1 & sent22 -> int1: bark is used to protect the tree; sent1 & sent14 -> int2: a cell wall is used to protect a plant cell; int1 & int2 -> hypothesis;
$proof$ = sent16 & sent3 -> int1: the scientist is comparing two somatic cells of a multicellular organism; int1 & sent13 -> hypothesis;
$proof$ = sent16 & sent24 -> hypothesis;
$proof$ = sent18 & sent20 -> int1: a potato is made of plant cells; int1 & sent22 -> int2: a potato is made of potato cells; sent19 & sent5 -> int3: vacuoles are the organelles that are used for storing water and food for cells; int2 & int3 -> int4: vacuoles are the organelles that store water and food for potato cells; int4 & sent14 -> hypothesis;
$proof$ = sent1 & sent5 -> hypothesis;
$proof$ = sent2 & sent9 -> int1: photosynthesis makes food for the plant by converting sunlight into carbohydrates; int1 & sent11 -> hypothesis;
$proof$ = sent24 & sent6 -> int1: photosynthesis means plants convert carbon dioxide and water and sunlight into carbohydrates and food and oxygen; sent25 & sent3 -> int2: if a plant uses a process to convert something into something else then that something is used for that process; int2 & sent23 -> int3: if a plant uses photosynthesis to convert something into something else then that something is used for photosynthesis; int1 & int3 -> int4: a plant uses sunlight for photosynthesis; int4 & sent1 -> int5: a plant uses the raw material sunlight for photosynthesis; int5 & sent15 -> int6: a plant absorbs sunlight to perform photosynthesis; sent20 & sent5 -> int7: a leaf is a part of a plant that absorbs sunlight to perform photosynthesis; int7 & sent10 -> int8: chlorophyll is found in the cells of parts of plants that absorb sunlight to perform photosynthesis; int6 & int8 -> int9: chlorophyll is used for absorbing sunlight by plants; sent14 & sent4 -> int10: sunlight is a kind of light energy; int10 & int9 -> hypothesis;
$proof$ = sent1 & sent2 -> int1: a seed / young plant sprouting causes that seed / young plant to grow; sent18 & sent7 -> int2: a plant may grow if it soil is provided; int2 & sent14 -> int3: a plant may grow if buried in soil; int1 & int3 -> int4: seeds may sprout when buried in soil; int4 & sent12 -> hypothesis;
$proof$ = sent21 & sent5 -> int1: food is a source of energy for animals / plants; int1 & sent4 -> int2: energy comes from food; int2 & sent6 -> hypothesis;
$proof$ = sent14 & sent18 -> int1: nutrients have a positive impact on an living thing's health; int1 & sent5 -> int2: getting nutrients from eating has a positive impact on a living thing's health; int2 & sent22 -> int3: getting minerals from eating has a positive impact on a living thing's health; sent1 & sent24 -> int4: eating leafy vegetables has a positive impact on human health; int3 & int4 -> hypothesis;
$proof$ = sent12 & sent2 -> int1: the swelling of bodily tissues can result from infection; int1 & sent25 -> int2: the swelling of bodily tissues is a kind of condition that can result from infection; int2 & sent17 -> hypothesis;
$proof$ = sent16 & sent5 -> int1: exercise has a positive impact on a human 's health; int1 & sent7 -> hypothesis;
$proof$ = sent1 & sent14 -> int1: a tundra is a kind of area; int1 & sent21 -> hypothesis;
$proof$ = sent10 & sent4 -> int1: humans eat parts of plants; int1 & sent15 -> int2: humans eat organisms; int2 & sent16 & sent20 -> hypothesis;
$proof$ = sent11 & sent7 -> int1: eagles eat animals; int1 & sent17 -> int2: eagles only eat animals; int2 & sent4 -> hypothesis;
$proof$ = sent10 & sent2 -> int1: as the amount of organic matter in soil increases, the soil will become more fertile; sent20 & sent9 -> int2: bacteria break down dead organisms; sent15 & sent19 -> int3: a decomposer returns parts of dead organisms to the soil; int2 & int3 -> int4: bacteria break down dead organisms and returns parts of dead organisms to the soil; int4 & sent24 -> int5: bacteria breaking down dead organisms turns those dead organisms into the things they are made of and returns it to the soil; int5 & sent4 -> int6: bacteria breaking down dead organisms turns those dead organisms into organic matter and returns it to the soil; sent16 & sent6 -> int7: when something is returned to a place the amount of that something in that place increases; int6 & int7 -> int8: bacteria increases the amount of organic matter in the soil by breaking down dead organisms; int1 & int8 -> int9: bacteria causes soil to more become fertile by breaking down dead organisms; sent21 & sent5 -> int10: plants and animals are kinds of organisms; int10 & int9 -> hypothesis;
$proof$ = sent4 & sent6 -> int1: fertile soil has a high amount of nutrients / nitrogen; int1 & sent14 -> int2: fertile soil is made of high amounts of nutrients / nitrogen; int2 & sent12 -> int3: high amounts of nutrients / nitrogen are required for the formation of fertile soil; int3 & sent8 -> int4: high amounts of nutrients / nitrogen are important for the formation of fertile soil; sent11 & sent16 -> int5: if something is returned to a place then the amount of that something in that place increases; int5 & sent3 -> int6: decomposition of dead organisms increases the amount of nutrients / nitrogen in soil; int4 & int6 -> int7: decomposition of dead organisms is important for the formation of fertile soil; int7 & sent24 -> hypothesis;
$proof$ = sent25 & sent3 -> int1: humans changing a meadow causes animal populations to decrease in that meadow; sent6 & sent7 -> int2: a rabbit is a kind of animal that may live in a meadow; int1 & int2 -> int3: humans changing a meadow causes rabbit populations to decrease in that meadow; sent19 & sent21 -> int4: humans / farmers plant fruit trees; int4 & sent17 -> int5: humans / farmers plant fruit tree crops; int5 & sent12 -> int6: an example of farming is humans / farmers planting fruit trees; int6 & sent1 -> int7: humans / farmers planting fruit trees decreases animal habitats; int7 & sent2 -> int8: humans / farmers planting fruit trees changes animal habitats; int8 & sent23 -> int9: humans / farmers planting fruit trees changes an animals' environemnt; int3 & int9 -> hypothesis;
$proof$ = sent23 & sent25 -> int1: a robin and a cricket are kinds of animals; int1 & sent2 & sent3 -> int2: a robin is a predator to the cricket; sent23 & sent25 -> int3: a robin and a cricket are kinds of animals; int3 & sent3 & sent9 -> int4: the cricket is a prey to the robin; int2 & int4 -> hypothesis;
$proof$ = sent14 & sent15 -> int1: earthworms creating tunnels in soil can loosen the soil; int1 & sent10 -> hypothesis;
$proof$ = sent13 & sent6 -> int1: sunlight is a nonliving thing; int1 & sent12 & sent18 -> hypothesis;
$proof$ = sent12 & sent19 & sent9 -> hypothesis;
$proof$ = sent17 & sent19 -> hypothesis;
$proof$ = sent11 & sent8 -> hypothesis;
$proof$ = sent18 & sent9 -> int1: a dog laying down on command is an example of dog following order; int1 & sent8 -> hypothesis;
$proof$ = sent1 & sent12 -> int1: hunting is a kind of learned behaviors; sent10 & sent23 -> int2: instinctive behaviors are opposite to learned characteristics; int1 & int2 -> hypothesis;
$proof$ = sent1 & sent21 -> int1: food preference is a kind of learned characteristics; sent10 & sent20 -> int2: learned characteristics are not inherited from parents; int1 & int2 -> hypothesis;
$proof$ = sent22 & sent5 -> hypothesis;
$proof$ = sent2 & sent24 -> int1: offspring will have similar dna to their parents; sent19 & sent22 -> int2: if two organisms share similar dna, then those two organisms will resemble each other; int1 & int2 & sent4 -> hypothesis;
$proof$ = sent15 & sent3 & sent7 -> int1: if thymine is found in a strand of nucleic acid, then the nucleic acid is dna; int1 & sent5 -> hypothesis;
$proof$ = sent12 & sent19 & sent25 & sent3 -> int1: zebras are on lions' diet; int1 & sent14 & sent3 -> hypothesis;
$proof$ = sent14 & sent20 -> hypothesis;
$proof$ = sent16 & sent7 -> int1: combining baking soda and vinegar is combining two substance together; int1 & sent2 & sent24 -> hypothesis;
$proof$ = sent2 & sent21 -> hypothesis;
$proof$ = sent24 & sent4 -> int1: an acid is corrosive and can change the color of litmus paper from blue to red; sent15 & sent3 -> int2: lemon juice is corrosive and can change the color of litmus paper from blue to red; int1 & int2 -> int3: lemon juice is a kind of acid; int3 & sent25 -> hypothesis;
$proof$ = sent17 & sent18 -> int1: the core of an atom is made of protons and neutrons; sent18 & sent8 -> int2: electrons surround the core of an atom; int1 & int2 -> hypothesis;
$proof$ = sent22 & sent8 -> hypothesis;
$proof$ = sent11 & sent14 & sent21 -> int1: only protons and neutrons add mass to the atom; int1 & sent16 -> hypothesis;
$proof$ = sent16 & sent22 -> int1: protons and neutrons together will be positive in charge; int1 & sent9 -> int2: the nucleus of an atom is positive in charge; int2 & sent15 -> int3: an atom has a positve charged core; sent13 & sent15 -> int4: the core of an atom is surrounded by almost empty space; int3 & int4 -> hypothesis;
$proof$ = sent10 & sent14 -> int1: an electron is a kind of negtively charged particle; int1 & sent12 -> int2: electron is the smallest, negtively-charged particle in an atom; int2 & sent2 -> hypothesis;
$proof$ = sent3 & sent4 -> hypothesis;
$proof$ = sent17 & sent2 -> hypothesis;
$proof$ = sent13 & sent15 -> int1: a piece of wood contains chemical energy; int1 & sent16 -> hypothesis;
$proof$ = sent22 & sent25 -> hypothesis;
$proof$ = sent12 & sent2 -> int1: aluminum is made of aluminum atoms; int1 & sent11 & sent2 -> hypothesis;
$proof$ = sent13 & sent16 -> hypothesis;
$proof$ = sent1 & sent22 -> int1: the temperature of 120 c is above 100 c; int1 & sent3 -> hypothesis;
$proof$ = sent15 & sent7 -> int1: the lit candle left out in the open is exposed to unlimited oxygen; int1 & sent5 & sent9 -> int2: the candle that is left out in the open will keep burning; sent19 & sent8 -> int3: the lit candle covered in a large jar is exposed to a limited amount of oxygen; int3 & sent5 & sent9 -> int4: the candle that is covered in a large jar will burn for a limited amount of time; sent10 & sent19 -> int5: the lit candle covered in a small jar is exposed to a limited amount of oxygen; int5 & sent5 & sent9 -> int6: the candle that is covered in a small jar will burn for a limited amount of time; sent11 & sent24 -> int7: a vaccum has no oxygen in it; int7 & sent21 -> int8: a fire cannot burn in a vaccum; int8 & sent16 & sent9 -> int9: the fourth candle will stop burning when it's placed in the vaccum; int2 & int4 & int6 & int9 -> hypothesis;
$proof$ = sent16 & sent20 -> int1: a glass thermometer is made of alcohol liquid in a tube; int1 & sent14 -> int2: if the alcohol liquid in the tube expands, then the height of the alcohol liquid will increase; int2 & sent5 -> hypothesis;
$proof$ = sent22 & sent4 -> int1: a balance can be used for measure the mass of sand; int1 & sent15 -> hypothesis;
$proof$ = sent10 & sent17 -> int1: a graduated cylinder can be used to measure the volume of liquid water; sent1 & sent13 & sent24 -> int2: freezing means changing water from liquid water to solid water ice by decreasing heat energy; int2 & sent9 -> int3: a graduated cylinder can be used to measure the volume of solid water ice; int1 & int3 -> hypothesis;
$proof$ = sent13 & sent24 -> hypothesis;
$proof$ = sent17 & sent19 -> int1: magnifying glass can be used to see an insect by making it appear bigger; int1 & sent11 -> hypothesis;
$proof$ = sent2 & sent6 -> int1: magnifying glass can be used to see a leaf by making it appear bigger; int1 & sent11 -> hypothesis;
$proof$ = sent13 & sent6 -> int1: the height of a door is a measure of length from the top of the door to the bottom of the door; int1 & sent7 -> hypothesis;
$proof$ = sent6 & sent8 -> int1: centimeter is a metric unit used to measure the length; int1 & sent25 -> hypothesis;
$proof$ = sent16 & sent2 -> int1: yard and meter are both unit used for measuring length; int1 & sent1 -> hypothesis;
$proof$ = sent16 & sent8 -> int1: rice cereal and marshmallows are two substances; int1 & sent20 -> int2: a student combined two substances together; int2 & sent18 -> hypothesis;
$proof$ = sent11 & sent17 -> int1: salt and water are two substances; int1 & sent15 -> int2: salt dissolving in water is an example of one substance being dissolved in another substance; int2 & sent14 -> int3: salt dissolving in water will form a salt water solution; int3 & sent4 -> int4: salt dissolving in water is an example of two substances being combined physically; int4 & sent22 -> hypothesis;
$proof$ = sent12 & sent7 -> int1: copper and zinc are two metals; int1 & sent2 -> int2: brass is made of two metals; int2 & sent4 -> hypothesis;
$proof$ = sent13 & sent17 -> int1: a solution is a mixture; sent11 & sent19 & sent7 -> int2: sugar dissolving in water is an example of one substance dissolving in another substance; int2 & sent17 -> int3: sugar water is a kind of solution; int1 & int3 -> hypothesis;
$proof$ = sent24 & sent3 -> int1: different solids will have the same physical properties; int1 & sent21 -> hypothesis;
$proof$ = sent4 & sent5 -> hypothesis;
$proof$ = sent3 & sent9 -> int1: the intensive properties of two samples are the same; int1 & sent21 -> hypothesis;
$proof$ = sent24 & sent4 -> int1: copper is an electrical conductor; sent18 & sent24 -> int2: copper is a kind of material; int1 & int2 -> hypothesis;
$proof$ = sent18 & sent25 -> int1: sugar dissolves in water when they are combined; int1 & sent12 -> hypothesis;
$proof$ = sent20 & sent24 -> int1: shape is a property of a leaf; int1 & sent11 -> int2: students are grouping leaves by their properties; int2 & sent13 -> hypothesis;
$proof$ = sent1 & sent15 -> hypothesis;
$proof$ = sent3 & sent7 -> int1: the ice undergoes a phase change; int1 & sent17 -> hypothesis;
$proof$ = sent17 & sent19 -> hypothesis;
$proof$ = sent14 & sent5 -> int1: aluminum is usually solid at room temperature; int1 & sent20 & sent3 -> hypothesis;
$proof$ = sent3 & sent5 -> int1: plant is a source of paper; int1 & sent12 & sent25 -> hypothesis;
$proof$ = sent5 & sent6 -> int1: a relationship exists between simple machines and energy; int1 & sent23 -> int2: a physicist studies energy and its relationship to simple machines; int2 & sent12 -> hypothesis;
$proof$ = sent1 & sent18 -> int1: an airplane moves people fast; int1 & sent20 -> hypothesis;
$proof$ = sent14 & sent8 -> hypothesis;

However, the result is not 100% for some metrics:

=================
Percentage recall per gold proof depth
Gold_proof_depth	#Gold answers	#Correct predictions	%accuracy (recall)	%Gold answers	%Correct Predictions
=========================
num_dev_answers:187
num_dev_answers_seen_in_train_context:0
num_dev_answers_seen_in_train_answers:0
INFO:__main__:    Aggregated metrics:
INFO:__main__:       QAHC->P: {'counter': 187, 'proof-leaves': {'acc': 1.0, 'F1': 1.0, 'P': 1.0, 'R': 1.0}, 'proof-steps': {'acc': 0.983957219251337, 'F1': 0.9945505474917239, 'P': 0.9945505474917239, 'R': 0.9945505474917239}, 'proof-intermediates': {'ROUGE_L_F': 0.998698752228164, 'ROUGE_L_F_perfect_align': 0.998698752228164, 'BLEURT': 1.0598104268476716, 'BLEURT_P': 0.9963840081487141, 'BLEURT_R': 0.9963840081487141, 'BLEURT_F1': 0.9963840081487141, 'BLEURT_perfect_align': 1.0598104268476716, 'BLEURT_acc': 0.9946524064171123, 'BLEURT_acc_perfect_align': 0.9946524064171123, 'fraction_perfect_align': 1.0}, 'proof-overall': {'acc': 0.983957219251337, 'acc_perfect_align': 0.983957219251337}}

======================
collated:{'QAHC->P': {'counter': 187, 'proof-leaves': {'acc': 1.0, 'F1': 1.0, 'P': 1.0, 'R': 1.0}, 'proof-steps': {'acc': 0.983957219251337, 'F1': 0.9945505474917239, 'P': 0.9945505474917239, 'R': 0.9945505474917239}, 'proof-intermediates': {'ROUGE_L_F': 0.998698752228164, 'ROUGE_L_F_perfect_align': 0.998698752228164, 'BLEURT': 1.0598104268476716, 'BLEURT_P': 0.9963840081487141, 'BLEURT_R': 0.9963840081487141, 'BLEURT_F1': 0.9963840081487141, 'BLEURT_perfect_align': 1.0598104268476716, 'BLEURT_acc': 0.9946524064171123, 'BLEURT_acc_perfect_align': 0.9946524064171123, 'fraction_perfect_align': 1.0}, 'proof-overall': {'acc': 0.983957219251337, 'acc_perfect_align': 0.983957219251337}}}
leave-F1	leaves-Acc	steps-F1	steps-Acc	int-BLEURT-F1	int-BLEURT-Acc	overall-Acc
100.0	100.0	99.46	98.4	99.64	99.47	98.4

I haven't looked into the code in details yet, but I guess the issue may be related to the alignment process or the process for calculating BLEURT similarity—either some trees don't perfectly align with themselves, or some sentences have <0.28 BLEURT similarity with itself. If you happen to have any thoughts, I would be the most interested to hear. Thank you!

Hi,

I find a potential bug that may have led to the issue. Some proofs are not strictly trees. For example, the proof below contains both sent11 & sent9 -> int3 and sent11 & sent9 -> int6.

sent17 & sent7 -> int1: a burner of a stove generates heat for cooking usually; int1 & sent13 -> int2: a burner of a stove will be hot in temperature; sent11 & sent9 -> int3: a pan and a burner are kinds of objects; int2 & int3 & sent21 & sent25 -> int4: a frying pan on the burner of the stove is an example of cooler object touching hot object; int4 & sent4 -> int5: thermal conduction will occur between the frying pan and the burner; sent11 & sent9 -> int6: a pan and a burner are kinds of objects; int5 & int6 & sent23 -> hypothesis

The align_conclusions_across_proofs function does not work well for such proofs. Given the proof above as both the prediction and the ground truth, it produces a wrong mapping between intermediate sentences: {'int1': 'int1', 'int2': 'int2', 'int3': 'int3', 'int4': 'int4', 'int5': 'int5', 'int6': 'int3', 'hypothesis': 'hypothesis'} (see int3 and int6).

Hmm, I just realized this is not necessarily a bug in implementation, but a difficulty in how the alignment works conceptually. If we only aggregate the ancestors and calculate the Jaccard similarity (Appendix B), we cannot disambiguate int3 and int6 in the above case.