tkp.db.associations – source association

A collection of back end subroutines (mostly SQL queries), In this module we deal with source association.

tkp.db.associations._check_meridian_wrap(image_id)[source]

Checks whether an image is close to the meridian ra = 0 or ra = 360

When so, the association query needs to be rewritten to take into account sources across the 0/360 meridian.

The query returns:

q_across: true, if the extraction region of the image crosses
the ra=0/360 border
ra_min: the min value of the ra-between for the normal case,
when the image is outside the ra=0/360 meridian, otherwise NULL
ra_max: the max value of the ra-between for the normal case,
when the image is outside the ra=0/360 meridian, otherwise NULL

ra_min1/max1 and ra_min2/max2 are the values which may be used for the case of a cross-meridian image. F.ex. using a search radius of 5 degrees, and when a source is at 359.99 the ra-betweens 1 and 2 are : … AND (ra BETWEEN ra_min1 AND ra_max1 OR ra BETWEEN ra_min2 AND ra_max2) … … AND (ra BETWEEN 354.99 AND 360 OR ra BETWEEN 0 AND 4.99) …

ra_min1: the min value of the high-end ra-between, if the
extraction region of the image crosses the ra=0/360 border, otherwise NULL
ra_max1: the min value of the high-end ra-between, if the
extraction region of the image crosses the ra=0/360 border, otherwise NULL

ra_min2, ra_max2: As ra_min1/max1, but for the low-end ra values.

These values are not being used in the cross-meridian association query, but are merely reported to notice the search area. The cross-meridian association query uses the cartesian dot product, to get the search area.

tkp.db.associations._delete_1_to_many_inactive_assocskyrgn()[source]

Delete the assocskyrgn links of the old runcat

Since we replaced this runcat.id with multiple new ones, we now delete the old links.

tkp.db.associations._delete_1_to_many_inactive_assocxtrsource()[source]

Delete the association pairs of the old runcat from assocxtrsource

NOTE: It might sound confusing, but those are not qualified as inactive in tempruncat (read below). Since we replaced this runcat.id with multiple new one, we first flag it as inactive, after which we delete it from the runningcatalog

The subselect selects those valid “old” runcat ids (i.e., the ones that were not set to inactive for the many-to-many associations).

NOTE: We do not have to flag these rows as inactive,
no furthr processing depends on these in the assoc run
tkp.db.associations._delete_1_to_many_inactive_newsource()[source]

Delete the newsource sources of the old runcat

Since we replaced this runcat.id with multiple new ones, we now delete the old one.

tkp.db.associations._delete_1_to_many_inactive_runcat_flux()[source]

Flag the old runcat ids in the runningcatalog to inactive

Since we replaced this runcat.id with multiple new one, we first flag it as inactive, after which we delete it from the runningcatalog

tkp.db.associations._delete_bad_blind_extractions(image_id)[source]

Remove blind extractions centred outside designated extract region.

These occur sometimes due to highly elliptical fits on noisy data, creating a best fit centred outside the original pixel region. The source-extraction code has been modified to (probably) prevent this, but we check for them anyway.

NB. We currently only delete blind extractions. We expect that occasionally forced fits to sources just inside the extraction radius might converge just outside, but these should be restricted to a very small additional margin. By not deleting these edge cases, the data allows us to construct proper lightcurves, and (I think) does not contribute to their weighted mean positions (so sources cannot ‘migrate’ across the border). TODO(TS): Check this.

Only extractions from the specified image are checked for deletion.

Returns:Number of extractedsource rows deleted.
tkp.db.associations._delete_inactive_runcat()[source]

Delete the one-to-many associations from temprunningcatalog, and delete the inactive rows from runningcatalog.

After the one-to-many associations have been processed, they can be deleted from the temporary table and the runningcatalog.

tkp.db.associations._determine_newsource_previous_limits(image_id, new_source_sigma_margin)[source]

Determines which new-runcat sources are also probably transient.

Looks up previous images relevant to this source-position, using the following criteria - images must:

  • overlap the new-source position, according to the skyregion information;
  • be in the same dataset;
  • be in the same frequency band;
  • have an earlier timestamp than the current image;
  • have not been rejected.

For those images we calculate the per-previous-image detection-thresholds, which are defined as follows.

A new source is ‘possibly transient’ (type 0) if it passes the following tests:

  • Was not detected in a skyregion being surveyed for the first time.

  • Has a flux-value such that:

    flux > MIN_OVER_I [ (rms_min_I*(det_I + new_source_sigma_margin) ]

(where I indexes the images) i.e. if it was a steady-source, it should have been already detected if it was in the low-RMS area of the previous image with best detection threshold, even allowing for noise fluctuations.

Furthermore, a new source is ‘likely transient’ (type 1) if it is additionally bright enough that, if it were a steady source, it should have been detected even if it was in the high-RMS area of the aforementioned ‘low rms_min’ image, i.e.

flux > (rms_max_I*(det_I + new_source_sigma_margin))

Note that, once we have located the image with best ‘low rms threshold’, we then use that image to also generate the ‘high rms threshold’. Strictly speaking, this is non-optimal - we should run a fresh search against all images to find the best ‘high rms threshold’. However, I’m working on the assumption that most of the time the image with best low-threshold will also have best high-threshold, and even when that is not the case we won’t lose too much accuracy. The benefits of this assumption are simplicity, and possibly faster performance, but this might need to be re-examined in future, especially if we start ingesting images of wildly differing sizes and noise non-uniformity characteristics (e.g. single pointings vs mosaics) etc.

We use peak flux (f_peak) as the flux value here, since that is likely to be the deciding factor in whether a source gets blindly extracted or not. (NB This is a hunch, rigorous investigation welcome.)

tkp.db.associations._empty_temprunningcatalog()[source]

Initialize the temporary storage table

Initialize the temporary table temprunningcatalog which contains the current observed sources.

tkp.db.associations._flag_1_to_many_inactive_runcat()[source]

Flag the old runcat ids in the runningcatalog to inactive

We do not delete them yet, because we still need to clear up all the superseded entries in assocskyrgn, etc.

tkp.db.associations._flag_1_to_many_inactive_tempruncat()[source]

Flag the one-to-many associations from temprunningcatalog.

(Since we are done processing them, now.)

We do not delete them yet- if we did, we would not be able to cross-match extractedsources to determine which sources did not have a match in temprunningcatalog (‘new’ sources).

tkp.db.associations._flag_many_to_many_tempruncat()[source]

Select the many-to-many association pairs in temprunningcatalog.

By flagging the many-to-many associations, we reduce the processing to one-to-many and many-to-one (identical to one-to-one) relationships

tkp.db.associations._insert_1_to_1_assoc()[source]

Insert remaining associations from temprunningcatalog into assocxtrsource.

We also calculate the variability indices at the timestamp of the the current image.

tkp.db.associations._insert_1_to_1_runcat_flux()[source]

Insert the fluxes in runningcatalog_flux of a new band for an existing runcat source.

If the runcat, band, stokes entry does not exist (yet) in runcat_flux, we need to insert the new values from tempruncat. This might be the case if a source has been observed at other frequencies, but not in the current band, so there does not exist an entry for this band.

tkp.db.associations._insert_1_to_many_assocskyrgn()[source]

Copy skyregion associations from old runcat entries for new one-to-many runningcatalog entries.

tkp.db.associations._insert_1_to_many_basepoint_assocxtrsource()[source]

Insert ‘base points’ for one-to-many associations

Before continuing, we have to insert the ‘base points’ of the associations, i.e. the links between the new runningcatalog entries and their associated (new) extractedsources.

We also calculate the variability indices at the timestamp of the the current image.

tkp.db.associations._insert_1_to_many_newsource()[source]

Update the runcat id for the one-to-many associations, and delete the newsource entries of the old runcat id (the new ones have been added earlier).

In this case, new entries in the runningcatalog and runningcatalog_flux were already added (for every extractedsource one), which will replace the existing ones in the runningcatalog. Therefore, we have to update the references to these new ids as well.

tkp.db.associations._insert_1_to_many_replacement_assocxtrsource()[source]

Insert links into the association table between the new runcat entries and the old extractedsources. (New to New (‘basepoint’) links have been added earlier).

In this case, new entries in the runningcatalog and runningcatalog_flux were already added (for every extractedsource one), which will replace the existing ones in the runningcatalog. Therefore, we have to update the references to these new ids as well. So, we will append to assocxtrsource and delete the entries from runningcatalog_flux.

NOTE: 1. We do not update the distance_arcsec and r values of the pairs.

TODO: 1. Why not?

tkp.db.associations._insert_1_to_many_runcat()[source]

Insert the extracted sources that belong to one-to-many associations in the runningcatalog.

Since for the one-to-many associations (i.e. one runcat source associated with multiple extracted sources) we cannot a priori decide which counterpart pair is the correct one, or whether all are correct (in the case of a higher-resolution image), all extracted sources are added as a new source to the runningcatalog, and they will replace the (old; lower resolution) runcat source of the association.

As a consequence of this, the resolution of the runningcatalog is increasing over time.

tkp.db.associations._insert_1_to_many_runcat_flux()[source]

Insert the fluxes of the extracted sources that belong to a one-to-many association in the runningcatalog.

Analogous to the runningcatalog, extracted source properties are added to the runningcatalog_flux table.

tkp.db.associations._insert_new_assocxtrsource(image_id)[source]

Insert new associations for previously unknown sources.

tkp.db.associations._insert_new_runcat(image_id)[source]

Insert previously unknown sources into the runningcatalog table.

Extractedsources for which no counterpart was found in the runningcatalog (i.e. no pair exists in tempruncat), will be added as a new source to the assocxtrsource, runningcatalog and runningcatalog_flux tables.

tkp.db.associations._insert_new_runcat_flux(image_id)[source]

Insert previously unknown sources into the runningcatalog_flux table.

(i.e. those without any previous runcat-counterpart)

tkp.db.associations._insert_new_runcat_skyrgn_assocs(image_id)[source]

Process newly created entries from the runningcatalog, determine which skyregions they lie within.

Upon creation of a new runningcatalog entry, we need to determine which previous fields of view (skyrgns) we expect to see it in. This knowledge helps us to make accurate guesses as whether a new source is really transient or simply being surveyed for the first time.

tkp.db.associations._insert_temprunningcatalog(image_id, deRuiter_r, beamwidths_limit, meridian_wrap)[source]

Select matched sources

Here we select the extractedsource that have a positional match with the sources in the running catalogue table (runningcatalog). Those sources which do have a potential match, will be inserted into the temporary running catalogue table (temprunningcatalog).

See also: http://docs.transientskp.org/tkp/database/schema.html#temprunningcatalog

Explanation of some column name prefixes/suffixes used in the SQL query:

  • avg_X := average of X
  • avg_X_sq := average of X^2
  • avg_weight_X := average of weight of X, i.e. mean( 1/error^2 )
  • avg_weighted_X := average of weighted X,
    i.e. mean(X/error^2)
  • avg_weighted_X_sq := average of weighted X^2,
    i.e. mean(X^2/error^2)

This result set might contain multiple associations (1-n,n-1) for a single known source in runningcatalog.

The n-1 assocs will be treated similar as n 1-1 assocs.

NOTE: Beware of the extra condition on x0.image in the WHERE clause, preventing the query to grow exponentially in response time

tkp.db.associations._update_1_to_1_runcat()[source]

Update the running catalog with the values in temprunningcatalog

tkp.db.associations._update_1_to_1_runcat_flux()[source]

Updates the fluxes in runningcatalog_flux of an existing band for an existing runcat source.

If the runcat, band, stokes entry does exist in runcat_flux, it will be updated with the values from tempruncat.

tkp.db.associations._update_ff_runcat_extractedsource()[source]

We are about to delete the runcats that are inactivated, and therefore have to set the ff_runcat reference in extractedsource to NULL.

tkp.db.associations.associate_extracted_sources(image_id, deRuiter_r, beamwidths_limit=1, new_source_sigma_margin=3)[source]

Associate extracted sources with sources detected in the running catalog.

See the “developer’s reference” section of the docs for a step-by-step breakdown of the logic encapsulated here.

The dimensionless distance between two sources is given by the “De Ruiter radius”, see Chapters 2 & 3 of Scheers’ thesis.