some notes on py.test, travis-ci, and SciPy 2016
15 Jul 2016I’ve been in Austin since Tuesday for SciPy 2016, and after a couple weeks in Brazil and some time off the grid in the Sierras, I can now say that I’ve been officially bludgeoned back into my science and my Python. Aside from attending talks and meeting new people, I’ve been working on getting a little package of mine up to scratch with tests and continuous integration, with the eventual goal of submitting it to the Journal of Open Source Software. I had never used travis-ci before, nor had I used py.test in an actual project, and as expected, there were some hiccups – learn from mine to avoid your own :)
Note: this blog post is not beginner friendly. For a simple intro to continuous integration, check out our pycon tutorial, travis ci’s intro docs, or do further googling. Otherwise, to quote Worf: ramming speed!
travis
Having used drone.io in the past, I had a good idea of where to start here. travis is much more feature rich than drone though, and as such, requires a bit more configuration. My package, shmlast, is not large, but it has some external dependencies which need to be installed and relies on the numpy-scipy-pandas stack. drone’s limited configuration options and short maximum run time quickly make it intractable for projects with non-trivial dependencies, and this was where travis stepped in.
getting your scientific python packages
The first stumbling block here was deciding on a python distribution. Using virtualenv and PyPI is burdensome with numpy, scipy, and pandas – they almost always want to compile, which takes much too long. Being an impatient page-refreshing fiend, I simply could not abide the wait. The alternative is to use anaconda, which does us the favor of compiling them ahead of time (while also being a little smarter about managing dependencies). The default distribution is quite large though, so instead, I suggest using the stripped-down miniconda and installing the packages you need explicitly. Detailed instructions are available here, and I’ll run through my setup.
The miniconda setup goes under the install
directive in your .travis.yml
:
install:
- sudo apt-get update
- if [[ "$TRAVIS_PYTHON_VERSION" == "2.7" ]]; then
wget https://repo.continuum.io/miniconda/Miniconda2-latest-Linux-x86_64.sh -O miniconda.sh;
else
wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh -O miniconda.sh;
fi
- bash miniconda.sh -b -p $HOME/miniconda
- export PATH="$HOME/miniconda/bin:$PATH"
- hash -r
- conda config --set always_yes yes --set changeps1 no
- conda update -q conda
- conda info -a
- conda create -q -n test python=$TRAVIS_PYTHON_VERSION numpy scipy pandas=0.17.0 matplotlib pytest pytest-cov coverage sphinx nose
- source activate test
- pip install -U codecov
- python setup.py install
Woah! Let’s break it down. Firstly, there’s a check of travis’s python environment variable to grab the correct miniconda distribution. Then we install it, add it to PATH
,
and configure it to work without interaction. The conda info -a
is just a convenience for debugging. Finally, we go ahead and create the environment. I do specify a version
for Pandas; if I were more organized, I might write out a conda environment.yml
and use that instead. After creating the environment and installing a non-conda dependency
with pip, I install the package. This gets use ready for testing.
After a lot of fiddling around, I believe this is the fastest way to get your Python environment up and running with numpy, scipy, and pandas. You can probably safely use virtualenv and pip if you don’t need to compile massive libraries. The downside is that this essentially locks your users into the conda ecosystem, unless they’re willing to risk going it alone re: platform testing.
non-python stuff
Bioinformatics software (or more accurately, users…) often have to grind their way through the Nine Circles (or perhaps orders of magnitude) of Dependency Hell to
get software installed, and if you want CI for your project, you’ll have to automate this devilish journey. Luckily, travis has extensive support for this. For example,
I was easily able to install LAST aligner from source by adding some commands under before_script
:
before_script:
- curl -LO http://last.cbrc.jp/last-658.zip
- unzip last-658.zip
- pushd last-658 && make && sudo make install && popd
The source is first downloaded and unpacked. We need to avoid mucking up our current location when compiling, so we use pushd
to save our directory and
move to the folder, then make and install before using popd
to jump back out.
Software from Ubuntu repos is even simpler. We can these commands to before_install
:
before_install:
- sudo apt-get -qq update
- sudo apt-get install -y emboss parallel
This grabbed emboss (which includes transeq
, for 6-frame DNA translation) and gnu-parallel. These commands could probably just as easily go in the install
section,
but the travis docs recommended they go here and I didn’t feel like arguing.
py.test
and the import file mismatch
I’ve used nose in my past projects, but I’m told the cool kids (and the less-cool kids who just don’t like deprecated software) are using py.test these days. Getting
some basic tests up and running was easy enough, as the patterns are similar to nose, but getting everything integrated was more difficult. Pretty soon, after
running a python setup.py test
or even a simple py.test
, I was running into a nice collection of these errors:
import file mismatch:
imported module 'shmlast.tests.test_script' has this __file__ attribute:
/work/shmlast/shmlast/tests/test_script.py
which is not the same as the test file we want to collect:
/work/shmlast/build/lib/shmlast/tests/test_script.py
HINT: remove __pycache__ / .pyc files and/or use a unique basename for your test file modules
All the google results for this were to threads with devs and other benevolent folks patiently explaining that you need to have unique basenames for your
test modules (I mean it’s right there in the error duh), or that I needed to delete __pycache__
. My basenames were unique and my caches clean, so something
else was afoot. An astute reader might have noticed that one of these paths given is under the build/
directory, while the other is in the root of the repo.
Sure enough, deleting the build/
directory fixes the problem. This seemed terribly inelegant though, and quite silly for such a common use-case.
Well, it turns out that this problem is indirectly addressed in the docs. Unfortunately, it’s 1) under the
obligatory “good practices” section, and who goes there? and 2) doesn’t warn that this error can result (instead there’s a somewhat confusing warning
telling you not to use an __init__.py
in your tests subdirectory, but also that you need to use one if you want to inline your tests and distribute them
with your package). The problem is that py.test happily slurps up the tests
in the build directory as well as the repo, which triggers the expected unique basename error. The solution is to be a bit more explicit about where to find tests.
Instead of running a plain old py.test
, you run py.test --pyargs <pkg>
, which in clear and totally obvious language in the help is said to
make py.test “try to interpret all arguments as python packages.” Clarity aside, it fixes it! To be extra double clear, you can also add a pytest.ini
to your
root directory with a line telling where the tests are:
[pytest]
testpaths = path/to/tests
organizing test data
Other than documentation gripes, py.test is a solid library. Particularly nifty are fixtures, which make it easy to abstract away more boilerplate. For example, in the past I’ve use the structure of our lab’s khmer project for grabbing test data and copying it into temp directories, but it involves a fair amount of code and bookkeeping. With a fixture, I can easily access test data in any test, while cleaning up the garbage:
Deep in my heart of hearts I must be a functional programmer, because I’m really pleased with this. Here, we get the path to the tests directory,
and then the data directory which it contains. The test data is then all copied to a temp directory, and by the awesome raw power of closures,
we return a function which will join the temp dir with a requested filename. A better version would handle a nonexistant file, but I said raw power,
not refined and domesticated power. Best of all, this fixture uses another fixture, the builtin tmpdir
, which makes sure then files get blown away
when you’re done with them.
Use it as a fixture in a test in the canonical way:
Next up: thoughts on SciPy 2016!