Backfillable Jobs

What is backfilling?

Backfilling is basically a crontabber app that receives a date to its run() function. For example:

import datetime
from crontabber.base import BaseCronApp
from crontabber.mixins import as_backfill_cron_app

@as_backfill_cron_app
class MyBackfillApp(BaseCronApp):
    app_name = 'my-backfill-app'

    def run(self, date):
        with open(self.app_name + '.log', 'a') as f:
            f.write('Date supplied: %s\n' % date)

The date parameter is a Python <datetime.datetime> instance variable with timezone information.

What crontabber guarantees is that that method will never be called with the same date value twice.

The point of all this is if the app was to fail, it will be retried automatically by crontabber and when it does that needs to know exactly what dates have been tried before.

An example explains it

Suppose that you have a stored procedure in a PostgreSQL database. It needs to be called exactly once every day. Internally the stored procedure is programmed to raise an exception if the same day is supplied twice. For

example it might do something like this:
CREATE OR REPLACE FUNCTION cleanup(report_date DATE)
    RETURNS boolean
    LANGUAGE plpgsql
AS $$
BEGIN

SELECT 1 FROM reports_clean
WHERE report_date = report_date;
IF FOUND THEN
    RAISE ERROR 'Already run for %.',report_date;
    RETURN FALSE;
END IF;

INSERT INTO reports_clean (
    name, sex, dob, report_date
)
SELECT
    name, sex, dob, report_date
FROM ( SELECT
           TRIM(both ' ' from full_name)
           gender,
           date_of_birth::DATE
       FROM data_collection
       WHERE
           collection_date = report_date
           AND
           gender = 'male' OR gender = 'female'
);

RETURN TRUE;
END;
$$;

The example is not a real-world example but it demonstrates the importance of really making sure the same date isn’t passed into the function twice. If it was, you’d have duplicates for a particular date and that would be bad.

When does the magic kick in?

When things go wrong. If for example, you have some network outtage or a bug in your code or something then the triggering will cause an error. That’s OK because crontabber will catch that and take note of exactly what date it tried to pass in.

Then, the next time crontabber runs it will re-attempt to execute the job app with the same date, even if the wall clock says it’s the next day. It will also know which other days it has not been able to execute and re-attempt those too.

Suppose you have a daily app that is configured to be backfillable. The app depends on presence of some external third party service which unfortunately goes offline for three days. It’s not a problem, crontabber will try and try till it works and will accordinly pass in the correct dates.

A caveat about backfillable jobs

Because the integrity of which apps have been passed with which dates is important, it means you can’t use crontabber to run an individual job as a “one off”. That means that if you try:

crontabber --admin.conf=crontabber.ini --job=my-backfill-app

It will deliberately ignore that since there’s a risk it then “disrupts” its predictable rythem. Otherwise it could potentially be calling the same app with the same date twice.