jank-lang/clojure-test-suite

Test all the things! :tada:

Opened this issue ยท 42 comments

jeaye commented

General process

Anyone is welcome to join in and write tests. The process goes like this:

  1. Pick a function from one of the milestones:
    a. clojure.core: https://github.com/jank-lang/clojure-test-suite/milestone/1
  2. Leave a comment on the function's ticket to claim that function
  3. Add a new test file using bb (documented here)
  4. Work through the testing questions, implementing tests for each
  5. Add any additional tests for all the edge cases you can think of. Try to write tests that will challenge the runtime as much as possible. Put some thought into it.
  6. Keep your tests dialect-independent (i.e. wrap any Java interop in a reader conditional and provide CLJS equivalents, avoid using :default for reader conditionals)
  7. Make a PR to add your new tests!

Testing questions

Common cases

  • What happens when the input is nil? (apply to all inputs)
  • What happens if it's given all valid inputs? (this will require some manual work to identify edge cases)
  • Are there any special cases for inputs?
  • What happens when the transducer arity is called?
  • Is metadata preserved through the function?

Edge cases

  • What happens when the input is an incorrect shape (i.e. a number instead of a sequence)? (apply to all inputs)
  • If the input accepts a sequence, what happens when it's an infinite sequence?
  • If the input accepts a map, what happens with both array maps and hash maps?
  • If the input accepts a set, what happens with both sorted sets and hash sets?
  • If the function accepts unboxed inputs/outputs, what happens with different combinations?

Things we don't need to test

  • Invalid arity (too many/too few args, the runtime does this for us, not the fn itself)

Keep your tests small

Try to only use the var you're testing and the testing framework. If you can avoid using other vars in your test, try to do so. This will keep each test focused and make it easier for new Clojure dialects to test what they have without implementing all of clojure.core.

Generative and property-based testing

We're not looking to add these types of tests right now. We want to get good coverage of all core functions using intentional, manually written tests. This will make the test suite easier to run for new Clojure dialects which may not have a ton of functionality yet.

quoll commented

Claiming * and *'

quoll commented

Claimed add-tap, remove-tap and tap>

Note: Included some short calls to Thread/sleep since taps respond asynchronously and code can race.

Claiming or: #8

Claiming some? and not: #11

rob-3 commented

Claiming even?, odd?, and nil?

rob-3 commented

Claiming inc and dec

rob-3 commented

Claiming = (this might take a bit to finish)

dgr commented

Claiming:

  • identical?
  • zero?
  • pos?
  • neg?
  • number?
  • ratio?
  • rational?
  • integer?
  • int?
  • pos-int?
  • nat-int?
  • decimal?
  • float?
  • double?
  • true?
  • false?
  • boolean

These are all fairly easy to do as a group since the tests are similar.

dgr commented

Claiming:

  • keyword
  • symbol
  • name
  • intern
  • namespace
  • keyword?
  • symbol?
  • ident?
  • simple-keyword?
  • simple-symbol?
  • simple-ident?
  • qualified-keyword?
  • qualified-symbol?
  • qualified-ident?
dgr commented

Claiming:

  • char
  • char?
  • format
  • pr-str
  • print-str
  • println-str
  • prn-str
  • str
  • string?
  • subs
  • with-out-str
dgr commented

Claiming:

  • byte
  • short
  • int
  • long
  • float
  • double
  • bigint
  • bigdec
  • num
  • rationalize
dgr commented

Claiming:

  • -
  • /
  • quot
  • rem
  • mod
  • inc
  • dec
  • max
  • min
  • with-precision
  • numerator
  • denominator
  • rand
  • rand-int

Claiming

  • compare

Claiming

Claiming fnil: #29

Claiming partial: #30

Claiming binding: #33

Claiming: bound-fn: #42

dgr commented

Claiming:

  • drop
  • drop-last
  • drop-while
  • take
  • take-last
  • take-while
dgr commented

Claiming:

  • first
  • second
  • rest
  • next
  • nth
  • nthrest
  • nthnext

I'll take zipmap.

dgr commented

Claiming:

  • count
  • get
  • butlast
  • sequential?
  • associative?
  • sorted?
  • counted?
  • reversible?
  • seqable?
  • coll?
  • seq?
  • vector?
  • list?
  • map?
  • set?

Claiming sort #67

Claiming:

  • interleave
  • interpose

Claiming shuffle

dgr commented

Claiming:

  • ==
  • <
  • >
  • <=
  • >=
dgr commented

@quoll , any chance you could look into the tap code and fix the intermittent failure there? I think you originally wrote that, right? I got the intermittent failure again last night when running tests locally.

cddr commented

Claiming reduce

I've disabled the taps tests for now, due to the intermittent failures.

The list in this ticket has been updated. We're currently sitting at 20% coverage of all Clojure vars. Let's get that to 80%! ๐Ÿš€

claiming boolean?

Claiming empty #84

Claiming #86

  • ffirst
  • fnext
  • last
  • nfirst
  • nnext

Claiming #87

  • hash-map
  • hash-set
  • set

Claiming #89

  • empty?
  • get-in
  • find
  • contains?

Claiming #90

  • parse-boolean
  • parse-long
  • parse-double
  • parse-uuid

I claim update #92
NB: coll? is already added

Claiming realized?

claiming atom, constantly, fn?, ifn?

We need a bot for what's claimed lol

Github docs say markdown task lists are "retired" and that we should use sub tasks instead. So I tried creating sub tasks for all of these, but I had to do it one by one. That was annoying, but I was stubborn enough to do it. Until I hit cycle, which was the 100th sub task. Now any more sub tasks fail, since Github has a limit of 100 sub tasks per issue. ๐Ÿ˜‘

So I'll use normal issues instead of sub tasks, but I don't think there's a way to convert the 100 sub tasks I have into normal issues. ๐Ÿค”

Ex-post conditionally (on AI acceptance) claiming all missing clojure.core functions starting with 'v' or 'w'.

jeaye commented

Overview

Ex-post conditionally (on AI acceptance) claiming all missing clojure.core functions starting with 'v' or 'w'.

Thanks for the interest and for the PR: #770 I appreciate that you'd like to help fill out these tests and I'm glad that you made clear that AI was used so that we can discuss it.

In short, I am not open to accepting AI writing these tests, at this point. The primary practical reason is over testing, along with the controversial philosophical reasoning. Yes, it takes effort to come up with good, concise coverage of a function. Most things worth well doing take effort. This is worth doing well.

Good unit tests have minimal to no redundancy. Each test is specifically chosen to handle one case, which is clear from reading it. With AI generated tests, this is rarely the case. To demonstrate my point, let's analyze the tests in your PR.

Numbers

Hypothetically, if we were to finish the remaining 455 tests using AI, let's take a look at the amount of code we'd be dealing with.

  • Based on your PR, containing 23 test files, the median file has 102 lines of code.
  • Based on the existing 180 test files currently in the repo, the median file has 30 lines of code.

I don't think that your PR contains three times the test coverage per file, but I would grant that an AI may come up with some cases which we might miss. The rest is likely to be over testing. So, using those numbers, that means, for the remaining 455 untested functions, we'd end up with:

  • The AI way: 46,410 lines of new code (grand total of 54,808)
  • The manual way: 13,650 lines of new code (grand total of 22,048)

The manual way will take longer, and will involve more effort, but we'll end up with less than a third of the lines of new code. When we're talking about a difference of 33k lines of new code, this is serious business. I need to maintain this code, not you, and not the AI you used.

Dialects

Furthermore, comparing your submitted code to what we have in main, I suspect we'll need a great more reader conditionals in order to properly handle ClojureScript, Clojure CLR, and babashka. Your submitted tests are taking CLJS into account, but are likely missing nuances between each dialect which are more likely found through experimentation. This will only increase the median line count per AI test file, which will only make the whole thing less appealing.