SubtasksΒΆ
Learning Objectives
- Introduce subtasks.
- Add a document templating task.
We’ve defined tasks to download our data, gunzip it, and plot it. However, our original goal was to create a publication pipeline, so we should get on with it!
The file actually defining our paper will be written in markdown. For the unititiated, markdown is an extremely simply markup language (like HTML) designed with the goal of being human readable. Eventually we’ll be using pandoc to compile the markdown document into our format of choice. However, before we do any of that, we need to add our image file and other information to the markdown document. We could just include these things directly in the file, but we’re cooler than that – we want to do it dynamically. We’ll use the jinja2 templating library to build our markdown document dynamically.
Given that this is a course on pydoit and not jinja2, it would be best if we downloaded the jinja2 template rather than writing it ourselves. Convenienty, we already have a task for downloading things – but currently, it only downloads a single file. “Surely,” you say, “this task must be capable of downloading other files!” And you would be correct! We can use subtasks to generate multiple tasks from the same task function.
from doit.task include clean_targets
import os
DATA_URLS = ['https://s3.amazonaws.com/pydoit-intermediate/Melee_data.csv.document.md.tpl',
'https://s3.amazonaws.com/pydoit-intermediate/Melee_data.csv.gz']
def task_download_data():
def print_url(URL):
print 'File was retrieved from: {0}'.format(URL)
for URL in DATA_URLS:
target = os.path.basename(URL)
yield {'name': 'download:{0}'.format(target),
'actions': ['curl -OL {0}'.format(URL)],
'targets': [target],
'uptodate': [run_once],
'clean': [clean_targets, (print_url, [URL])]}
We’ve made a number of changes here, but most important is that we’ve
switched to using a generator object instead of a normal function. For
those of you not familiar with generators, the generator is signified by
the yield
keyword, which takes the place of a return
keyword.
Because this function has a yield
, it becomes a generator and can be
iterated over, for example, with a for
loop. For pydoit
specifically, this means it can yield multiple tasks, one for each in
the DATA_URLS
list. We’ve also included a name
attribute; this
is necessary because pydoit needs the ability to uniquely identify tasks
in order to resolve dependencies.
Now that we have the task to download the template file, we’ll add one to compile the template into a markdown file. This is another python task, which will include much of what we’ve gone over already.
import jinja2
# ... the other tasks ...
def task_build_markdown_file():
def do_build(targets):
with open(targets[0] + '.tpl') as fp:
template = jinja2.Template(fp.read())
with open(targets[0], 'wb') as fp:
fp.write(template.render(author='Your Name',
affiliation='Your Institution',
date='Jan 20, 2016',
heatmap_filename='Melee_data.csv.heatmap.pdf'))
return {'actions': [do_build],
'file_dep': ['Melee_data.csv.heatmap.pdf',
'Melee_data.csv.document.md.tpl'],
'targets': ['Melee_data.csv.document.md'],
'clean': [clean_targets]}
This task creates a jinja2 Template
object from the template file we
downloaded, then renders it into its final form by passing in a number
of keyword arguments.
Fun with Templates
Although templating isn’t specific to pydoit, you may find jinja2 quite useful. Try playing around with the template
.tpl
file and adding your own content to it. Can you see how to add additional fields?