pipeline.cfg - Project Configuration File

The project configuration file provides a common configuration to all pipeline runs which are part of a particular project. Through this file, it is possible to configure the database used for pipeline runs, the location in which jobs are stored, and the amount and storage location for logging.

The default pipeline.cfg file is as follows:

[DEFAULT]
runtime_directory = %(cwd)s
job_directory = %(runtime_directory)s/%(job_name)s

[logging]
#log_dir contains output log, plus a copy of the config files used.
log_dir = %(job_directory)s/logs/%(start_time)s
debug = False

[database]
engine = ;(monetdb or postgresql)
database = "" ; e.g. '{% user_name %}'
user =  "" ; e.g. '{% user_name %}', or 'postgres'
password = "" ; e.g. '{% user_name %}'
host = "localhost"
port =
passphrase =
dump_backup_copy = False

[image_cache]
copy_images = True
mongo_host = "localhost"
mongo_port = 27017
mongo_db = "tkp"


[parallelise]
method = "multiproc"  ; or celery, or serial
cores = 0  ; the number of cores to use. Set to 0 for autodetect

The file follows the standard ConfigParser syntax. Three special variables which may be used in expansions are provided provided by the TraP: cwd, the current working directory, start_time, the time at which the current pipeline job is started and job_name, the name of the job currently being executed.

DEFAULT section

The DEFAULT section provides a location for defining parameters which may be referred to be other sections. The following parameters may be defined:

runtime_directory
This is the root directory for the project. The default value, %(cwd)s, means that the pipeline.cfg refers to the project in the directory in which it is stored: this is almost always correct.
job_directory
This is the directory under which new jobs will be created. The default is to create a directory named after the job as a subdirectory of the project directory. This is almost always correct.

logging section

log_dir
The full path to a directory into which the pipeline will write logging information as it progresses, and also make a record of the parameters used for a job. The log file provides a record of pipeline activity, and, in particular, any errors or problems encountered, while the parameter files record the configuration that produced these results. This folder is therefore important for reproducibility and debugging purposes.
debug
A boolean (True or False) value. If True, extra information will be written to the log file, which might be helpful in diagnosing hard-to-find problems.

database Section

Note

The database config settings can be over-ridden using environment variables, e.g. for configuring a unit-testing environment. See tkp.config.get_database_config() for details.

engine
The database engine to use. Two engines are supported: postgresql and monetdb. See the introductory material on databases for details.
host, port
The host and port on the network at which the database server is listening.
database, user, password
The name of the database to use, and the username and password required to connect to it.
passphrase
A passphrase which provides administrative access to the database server. Only applicable to the monetdb engine. This is not required for normal operation, but enables the user to (for example) create and destroy databases.
dump_backup_copy
A boolean value. If True, a copy of the configured database will be dumped to disk at the beginning of each pipeline run. This is not recommended in regular use, but can be useful if encountering intermittent database errors, both for recovering a working database, and diagnosing how errors occur. The dump is made to the job directory in a file named according to the pattern <database host>_<database name>_<current time>.dump.

image_cache Section

This section configures the image caching or ‘pixel store’ functionality.

See also: the ‘optional dependencies’ section of your relevant installation guide.

copy_images
Boolean. If True, image pixel data will be stored to a MongoDB database.
mongo_host, mongo_port
String, integer. Network hostname and port to use to connect to MongoDB. Only used if copy_images is True.
mongo_db
String. Name of MongoDB database in which to store image pixel data. Only used if copy_images is True.

parallelise Section

method
Determines whether the TraP is run in single-process, multi-process, or distributed mode. "multiproc" should be suitable for most users.
cores
Determines the number of cores to use in multi-process mode. 0 will attempt to autodetect (and use all available cores).