Function-level fixture parametrization (or, some pytest magic)
15 Nov 2017In my work my more recent work with de Bruijn graphs, I’ve been making heavy use of py.test fixtures and parametrization to generate sequence graphs of a particular structure. Fixtures are a wonderful bit of magic for performing test setup and dependency injection, and their cascading nature (fixtures using fixtures!) means a few can be recombined in myriad ways. This post will assume you’ve already bowed to the wonder of fixtures and have some close familiarity with them; if not, it will appear to you as cosmic horror – which maybe it is, but cosmic horror never felt so good.
The Problem
I’ve got a bunch of fixtures, heavily parametrized, which are all composed. For example, I have one for generating varying flavors of our de Bruijn Graph (dBG) objects:
For those not familiar with the dBG, for our purposes it is a graph where the nodes are sequences of length \(K\) for some alphabet \(\Sigma\) (in our case, \(\Sigma = \{A, C, G, T\}\)). We draw an edge \(e_i = v_j \rightarrow v_k\) if the length \(K-1\) suffix of \(v_j\) matches the length \(K-1\) prefix of \(v_k\). This turns out to be highly useful when we want take a pile of highly redundant short random samples of an underlying sequence and try to extract something close to the underlying sequence. A more in-depth discussion of dBGs is, uh, left as an exercise to the reader – what we really care about here is \(K\). It seems to be showing up often: as an argument to our dBG objects, as a way to prevent loops and overlaps in our randomly generated sequences, and, as it turns out, all over our tests for various indexing operations.
And so, there’s also fixture for generating random nucleotide sequences that don’t overlap in a dBG of order \(K\):
…and naturally, one to compose them:
You might notice a few things about these fixtures:
- OMG you’re testing with random data. Yes I am! But, the space for that data is highly constrained, it seems to be doing the job well, and I can always introspect unexpected failures.
- The
random_sequence
fixture returns a function! This is a trick to share some state at function scope: we keep track of the global set of seen k-mers, and the resulting function can generate many sequences. - There’s an undefined parameter or fixture:
ksize
.
The last bit is the interesting part.
So, of course, it seems we should just write a fixture for \(K\)! The simplest approach might be:
This sets one default \(K\) for each fixture and test using it. This kinda sucks though: we should be testing different \(K\) sizes! So…
Slightly better! We test three values for \(K\) instead of one. Unfortunately, it still doesn’t quite cut it: for some tests
we want more a more trivial dBG ( say with \(K=4\)), or we might not want or need to have three instances of every single test.
We need individual tests to be able to set their own \(K\), and importantly, it still needs to trickle down to all the
fixtures the test depends on. I’d also like this to be somewhat clear and concise: turns out that what I’m about to show you
can more or less be achieved with indirect parametrization, but I find that interface clunky (and not very well documented)
, and besides, this taught me a bit more about pytest.
The Solution
My first thought was that it’d be nice to just set a variable within a test function and reach through the request
object to pull it out with getattr
, which would produce tests something like:
Turns out this doesn’t work properly with test collection and Python’s scoping rules, and just feels icky to boot. We need a way to pass some information to the fixture, while also making it clear that it’s a property of the test itself and not some detail of the test’s implementation. Then I realized: decorators!
So, I came up with this:
Pretty straightforward: all it does is add an attribute called _ksize
to the test function. However, we need to tell
pytest and our fixtures about it. Turns out that the pytest API already has a hook for more granular control over
parametrization, called pytest_generate_tests
. This lets us grab the fixtures being used
by whatever function pytest is currently setting up and poke at their generation in various ways. For example, in my
case…
So what is this nonsense? We look at the metafunc
, which contains the requesting context, and into its list of
fixture names. If we find one called ksize
, we check the calling function in metafunc.function
for
the _ksize
attribute; if we don’t find it, we set a default value, and if we do, we just use it.
Now, we can write a couple different sorts of tests:
I rather like this approach: it’s quite clear and retains all the pytest fixture goodness, while also
giving more granular control. This is a simple parametrization case which admittedly could be
accomplished with indirect
parametrization, but one could imagine scenarios where the indirect
method would be insufficient. Curiously, you don’t even have to write an actual fixture function with
this approach, as its implied by the function argument lists.
In my case, I eventually get tests like…
In the end, only mild cosmic horror.