Pipeline Design

This section presents an overview of the fundamental algorithms used by, and data flow through, the TraP. It is designed such that everyday users have a full understanding of how their data is being processed. Note that the top-level logic is defined in tkp.main, further implementation details for specific sub-sections may be found in the Developer’s Reference Guide.

As images flow through the TraP, they are processed by a series of distinct pipeline components, or “stages”. Each stage consists of Python logic, often interfacing with the pipeline database.

A complete description of the logical design of the TraP is beyond the scope of this document. Instead, the reader is referred to an upcoming publication by Swinbank et al. Here, we sketch only an outline of the various pipeline stages.

Pipeline topology and code re-use

An early design goal of the TraP was that the various stages should be easily re-usable in different pipeline topologies. That is, rather than simply relying on “the” TraP, users should be able to mix-and-match pipeline components to pursue their own individual science goals. This mode of operation is not well supported by the current TraP, but some effort is made to ensure that stages can operate as independent entities

Image ordering and reproducibility

The material below describes each of the stages an image goes through as it is processed through the pipeline. It is important to realise, though, that the order in which images are processed is important due to the way in which lightcurves are generated within the database: see the material on Source association stage for details. Reproducibility of pipeline results is of paramount importance: the TraP guarantees that results will be reproducible provided that images are always processed in order of time. That is, an image from time \(t_n\) must always be processed before an image from time \(t_{n+1}\). In order to satisfy this condition, the TraP will internally re-order images provided to it in the images_to_process.py file so that they are in time order. If multiple TraP runs are to be combined in a single dataset, the user must ensure that the runs are in an appropriate sequence.

Configuration and startup

The pipeline configuration and job management system is described under Pipeline Configuration.