CleaningΒΆ
Learning Objectives
- Learn how to use the
clean
attribute.- Define a custom clean function.
In the last section, we changed the heatmap plotting function and had to manually remove the target to have it regenerated. Wouldn’t it be nice to not have to do that? Yes, yes it would!
The best way to accomplish this is via the clean
attribute. As one
might guess, the clean
attribute should be a function which cleans
up after the task. Conveniently, the project creator has included an
extremely useful function for this, called clean_targets
, which just
goes ahead and rm
‘s the target for that task. Here’s what it looks
like included in our humble download task.
from doit.task import clean_targets
def task_download_data():
return {'actions': ['curl -OL https://s3.amazonaws.com/pydoit-intermediate/Melee_data.csv.gz'],
'targets': ['Melee_data.csv.gz'],
'uptodate': [run_once],
'clean': [clean_targets]}
Now, if we run doit clean
, the data file will be removed; if we run
doit
, all the tasks will be rerun, because they all depend directly
or indirectly on the downloaded data file.
Pydoit’s creator clearly saw that removing the targets would be an extremely common task. As such, one does not actually need to import theclean_targets
function; just passing[True]
toclean
will suffice. However, if you would like both remove the targets and do something else to revert your task, the list should includeclean_targets
and whatever other callable or string you’d like run.
clean
can also be a shell command or function that the user defines.
For example, we might want to inform the user where the pipeline got the
file they just blew away.
DATA_URL = 'https://s3.amazonaws.com/pydoit-intermediate/Melee_data.csv.gz'
def task_download_data():
def print_url():
print 'File was retrieved from: {0}'.format(DATA_URL)
return {'actions': ['curl -OL {0}'.format(DATA_URL)],
'targets': ['Melee_data.csv.gz'],
'uptodate': [run_once],
'clean': [clean_targets, print_url]}
The clean
attribute takes the same form as actions
or even
uptodate
, and so can also be a string to be executed in the shell.
Clean all the things
Add
clean
attributes to all the currently defined tasks. If you’d like, get creative with your attributes. Include two clean actions in at least one of the tasks (the second doesn’t have to do anything groundbreaking).