instrumentkit/InstrumentKit

Testing and coverage - Instruments

trappitsch opened this issue ยท 15 comments

With the recent preparations by @scasagrande in PR #243 to replace quantities with pint, tests and what they cover has moved more to the forefront of my thinking. I have been running some coverage analysis of the whole instruments package and looked through some classes. Some conclusions:

  • The core of IK is generally well covered with tests with a few exceptions. For example, the visa_communicator.py file does not seem to have tests at all. Most parts are however fairly well covered. Some improvements could surely be made with more test cases.
  • Testing with instruments vary widely. There are four rough categories that I noticed when looking at instruments:
    • Error raising is frequently missing in tests, also in instruments that are generally well covered. This could in some cases result in issues when changing parent classes behaviors. One good example for this category would, for example, be: glassmanfr.py
    • Some instruments are well tested, but have some sends and routines missing. If something changes upstream, I would expect other tests to fail as well. One example would be agilent34410a.py.
    • Some instrument are poorly tested, e.g., newportesp301.py. While a test file exists, it only seems to contain one individual test, most of the functionality is however not covered.
    • Some instruments, e.g., any lakeshore instrument, some keithley instruments, are completely untested.

How to improve on the testing cases is tricky, but can be done. The core base is probably the most important one that should be checked with a coverage analysis.

Improving tests for instruments: Such tests can written without the physical instrument being available if the instrument works as expected with the current implementation of IK. The latter however is important to ensure prior to writing a test, in order to no end up with a test suite that passes for failing / non-working sections. Are there any instruments that are currently known to be "not working" for some reason? The source code often has comments with todo and fixme remarks indicating additional checking and maybe problems. Tests should be written with this in mind.
Surely, having the physical device available and testing the code along with writing the test would be the ideal case scenario. However, that is not likely to be feasible for most.

I thought I'll put this up here for discussion. Maybe we can come up with a way to improve feature implementations and ongoing enhancements and changes (#244, #243, ...), as well as future ones.

You're comments line up pretty much with reality. The last time I made a huge test coverage push I was working forward I was working my way through the coveralls.io reports.

And here's the history: A lot of the instrument drivers were written before we had test coverage requirements. Then I did what you already suggested; the tests were written assuming the drivers work ๐Ÿ˜† and were not done to exhaustive test all the code paths. They should, but don't. Probably for my sanity at the time when I was trying to add as much coverage as I could when I was already spending dozens of hours just writing tests!

I've learned a lot about testing over the years, and of course if I could do things differently I would. For the core, I've been leaning on hypothesis more and more to avoid having to hardcode for edge cases constantly. I use it for a lot of other projects as well.

One thing I'd ideally like is to have a tool that can run every (relevant) property of a driver against actual hardware, record the IO, and save all the relevant info in a blob of some form. Then, unit tests could pick up on those files, and use it to populate a test suite to test all the generic positive cases, like we do right now. Then, we'd only have to be writing tests by hand for edge and negative cases.

Thanks for the history! Was it in fact helpful for future development to go through individual instruments, assume they work, and write tests without the hardware? If so, it might be worth to put together a simple todo list and keep on checking instruments off. For my part I'd be happy to contribute tests if it's useful for the project. I'm by far less experienced with this, so it would also be a learning opportunity. The question remains (assuming the answer to the usefulness is yes): What would the order of most helpful to less helpful be?

For the core: Using hypothesis sounds pretty interesting. For me, the IK core is still something I discover new things in all the time, so that's more difficult to help optimize with respect to testing.

Finally, your idea with the tool to run cases against hardware and automatically build a test suite for all the generic cases sounds really interesting. Do you have an idea on how the initial case would be generated? One way might be, maybe a bit to naively, to use the docstring examples directly. For example, xdoctest runs tests for your docstrings against your package to ensure that all the examples actually work. A similar approach that also gets the message to the instrument and from the instrument could create the test suite, and at the same time make sure that every contributor writes examples ๐Ÿ˜

Was it in fact helpful for future development to go through individual instruments, assume they work, and write tests without the hardware?

Oh yes, its caught lots of problems. The biggest ones being the current effort to move to pint (!) and also when I added prompt and ACK support. Because you know, some instruments need to pretend they're an actual terminal instead of just supporting a programmatic interface.

I'm not 100% sure if a doc test would work for this just because you would need to store in the docstring the exact data that the instrument is sending to the computer, even if its not in the final human readable form. I think doctests are great and might be helpful for IK to have, just not a solution to this particular one.

My initial thoughts on how to do this might involve a new metaclass attached to Instrument to allow for the classes to be registered at import time. Then with a special decorator attached to every property & function that we want to "auto test", the tool can scan through all the classes that have registered, discover these calls, and call generate stubs for each one to be saved in a file.

Then, another tool(s) can be used to consume those files and either populate them with content (by running against actual hardware), or run the tests using the captured data. We already support loading drivers based off entries in a yaml file, so that's at least one part checked off.

Could pyvisa-sim be useful here for simulated drivers?
Also, there's https://github.com/kiwicom/pytest-recording for recording and playing back network traffic - maybe there is something similar available for generic I/O, or the former can be adapted to your needs?

RE: pytest-recording, that could be useful for more complicated instruments. We actually already have a spot where you can (optionally, with some missing documentation) send to the logger all the IO at the boundary of IK (https://github.com/Galvant/InstrumentKit/blob/master/instruments/abstract_instruments/comm/abstract_comm.py#L217). I sure this could be adapted to capture the IO. Then with that, we can parse it to match what we already do for out unit tests with expected_protocol

Oh yes, its caught lots of problems.

That's good to hear and should probably justify for another push to increase coverage and get more testing in. I went with coverage through all instruments and put together the following task list. It looks pretty intense, however, a lot of instruments have very minor issues that would not influence something like the pint integration. Even the minor issue instruments should mostly be okay. There remain seven instruments without tests all together and eight instruments with major issues.

Let me know what you think of the list, I'd be happy to spend some time on integrating more instrument tests under the assumptions that it all works. These would either way be required, even when a recording system or similar would be in place, since that would likely require the hardware to be present, correct?

Here's the list:

Instruments without tests

  • Keithley195 - PR #262
  • Keithley580 - PR #263
  • Lakeshore340 - PR #249
  • Lakeshore370 - PR #249
  • Lakeshore475 - PR #243, #249
  • SRS CTC100 - PR #257
  • Tektronix AWG 2000 - PR #252
  • Tektronix DPO 4104 - PR #253
  • Tektronix DPO 70000 - PR #256
  • Tektronix DS 5xx - PR #261
  • Thorlabs PM100 USB - PR #260
  • RigolDS1000 - PR #250
  • Yokogawa7651 - PR #251

Utilities files without full testing

  • Newport errors.py - PR #270
  • Thorlabs _abstract.py - PR #274
  • Thorlabs _packets.py - PR #274

Instruments with major test issues
Several major routines untested

  • NewportESP301 - PR #280
  • Thorlabs APT - PR #265

Instruments with minor test issues
Includes all issues in subcategories below, plus some communication untested.

  • Agilent34410a - PR #264
  • Fluke3000 - PR #269, #292
  • Newport Agilis - PR #268
  • Qubitek MC1 - PR #243
  • Tektronix DS 224 - PR #266
  • Yokogawa 6370 - PR #251

Instruments with very minor test issues
Instruments with untested error raising, user input checks, or other, very minor issues.

  • Agilent33220a - PR #281
  • GlassmanFR - PR #282
  • HP3456a - PR #284
  • HP6624a - PR #284
  • HP6632b - PR #284
  • HP6652a - PR #284
  • HPe3631a - PR #284
  • Keithley485 - PR #285
  • Keithley2182 - PR #286
  • Keithley6220 - PR #286
  • Minghe MHS5200a - PR #283
  • Phasematrix FSW0020 - PR #291
  • Qubitek CC1 - PR #290
  • SRS830 - PR #293
  • SRS DG645 - PR #293

Love it!! Thanks @trappitsch

When I get a moment I'll fix #248 which will help with this

Sounds good, thanks! I'll start working on new classes then and update the list accordingly, i.e., add my name to what I'm currently working on and a checkmark once it's completed and merged. I assume there won't be too many volunteers grabbing cases to write tests for that we have to move it out of this thread for managing on who's doing what. Does that sound reasonable?

Priorities: I guess top to bottom makes the most sense in this case.

I don't think any particular order is important. Its not like any one instrument is more important than another for this. I originally just went in alphanumeric order and skipped over ones that were going to be a pita at the time. Getting in the groove and crunching it out without demotivation is more important.

I'll be adding some coverage for the MC1 in my migration to pint branch, since coveralls seems to think that the coverage has decreased from my changes

Just looked through the coverage reports again and indeed, all physical instruments seem to be fully tested now. This should hopefully help a bit with developments of further milestones and changes.

That being said: there is still more to be done ๐Ÿ˜ƒ Looks like the next part would be the abstract instruments and the generic_scpi instruments. This is closer to the core now though... what do you think @scasagrande? What would be the logical next step and what are you looking for - what is giving you the biggest headache in your current developments?

That's a good question. Other than the remaining test coverage that you mentioned, another todo is to take a look at the docs. The badge says passing, but the link also says it hasn't been built in a few years. Whoops. Something clearly broke in the autobuild pipeline a while ago. Maybe we should build something into CI to test that the docs build?

Oh I never noticed, but yes, that's a good point, the docs might deserve an update :) They still build, I built them recently offline just to make sure I got the re-Text syntax right for including the Blu class. I suspect it's a setting on readthedocs, could you have a look there if there are some failed builds, etc?
CI testing of the docs is for sure a good idea though, I'll have to look it up, but could probably either include it with tox (with adjustment of the dev-requirements to include sphinx) or run it separately in a workflow, such that it only is triggered on Travis. Might also be good to read through the docs and correct changes, maybe put some updates into the dev guide? Might be worth it's separate issue now that I think about it. Let's keep this issue open for now and I add a new ToDo list for abstract instruments to see what needs some more work. Let me know what you think :)

Abstract instruments that are missing tests (according to https://coveralls.io/github/Galvant/InstrumentKit?branch=master):

Abstract Instruments - Signal Generator

  • channel.py - PR #302
  • signal_generator.py - PR #302
  • single_channel_sg.py - PR #302

Abstract instruments

  • electrometer.py - PR #301
  • function_generator.py - PR #301
  • instrument.py - PR #309
  • multimeter.py - PR #301
  • oscilloscope.py - PR #301
  • optical_spectrum_analyzer.py - PR #301
  • power_supply.py - PR #301

Generic SCPI Instruments

  • scpi_instrument.py - PR #297
  • scpi_function_generator.py - PR #297
  • scpi_multimeter.py - PR #297

Looks like with #309 we seem to have covered all instruments and abstract instruments with tests. ๐ŸŽ‰
Communicators remain, and some other files, see coveralls. Need to think a bit more about this and start a new thread / issue specifically about this.