Storing hits with HitDatabase
Hyperseti implements a simple HDF5-backed database for storing hits from multiple observations. These files can be accessed via the hyperseti.io.hit_db.HitDatabase class.
The HitDatabase class provides:
hit_db.list_obs()- List all observations within the database file.hit_db.add_obs()- Add a new observation to database.hit_db.get_obs()- Retrieve hit data table from the database.hit_db.get_obs_schema()- Retrieve a description of the columns based on hit_db schemahit_db.get_obs_config()- Retrieve the observation pipeline config YAML (if present)get_obs_metadata()- Retrieve metadata about an observation.hit_db.browse_obs()- Return aHitBrowserobject for given obs_id.
The last of these, browse_obs() is very useful and probably the main method of note for most data analysis.
Data model notes:
Within the HDF5 file, a new group is created in the root for each observation. Table data are a column store, one dataset per column, i.e.:
GROUP "/" {
GROUP "proxima_cen" {
DATASET "beam_idx"
DATASET "boxcar_size"
DATASET "snr"
DATASET ...}
GROUP "alpha_cen" {
DATASET "beam_idx"
DATASET "boxcar_size"
DATASET "snr"
DATASET ...}
GROUP ... {
DATASET ...
}
}
The file format can be identified by the following attributes in the root group:
CLASS = 'HYPERSETI_DB'
VERSION = 'X.Y.Z'
The database may optionally provide paths to data files for each observation, and a copy of the YAML pipeline config used when running the pipeline. Any additional metadata are stored as attributes.
Database schema
A description of the hit database columns is provided in hyperseti.io.hit_db_schema.yml, reproduced below:
# Version 1.0.0
# Hit details - primary
snr:
description: Signal-to-noise ratio of detected hit
dtype: float32
mandatory: true
f_start:
description: Frequency of detected hit at first timestep (MHz)
dtype: float64
mandatory: true
units: MHz
drift_rate:
description: Drift rate trial for detected hit (Hz/s)
dtype: float64
units: Hz/s
mandatory: true
# Primary data indexes
channel_idx:
description: Index of hit on channel axis
dtype: int32
mandatory: true
gulp_channel_idx:
description: Index of hit within gulp, on channel axis
dtype: int32
mandatory: true
beam_idx:
description: Index of hit within gulp, on beam/pol axis
dtype: int32
mandatory: true
driftrate_idx:
description: Index of hit within gulp, on drift rate axis
dtype: int32
mandatory: true
# Hit details - secondary
extent_lower:
description: Extent of detected signal in doppler space, lower extent (# channels)
dtype: int32
mandatory: true
extent_upper:
description: Extent of detected signal in doppler space, upper extent (# channels)
dtype: int32
mandatory: true
ddsk:
description: Spectral Kurtosis computed along dedoppler path (DDSK)
dtype: float32
mandatory: false
# Gulp information
gulp_idx:
description: Index of gulp from within the data array
dtype: int64
mandatory: true
gulp_size:
description: Size of each gulp
dtype: int64
mandatory: true
hits_in_gulp:
description: Number of hits detected in gulp
dtype: int32
mandatory: false
# Pipeline information
n_integration:
description: Number of timestep within the file
dtype: int32
mandatory: true
n_blank:
description: Number of signal blanking iterations in search pipeline
dtype: int32
mandatory: false
blank_count:
description: Count (index) for signal blanking iteration
dtype: int32
mandatory: false
boxcar_size:
description: Size of boxcar mean filter used before hit search
dtype: int32
mandatory: true
# Preprocessing information
# Placehold vars: X runs from [0, N_beam], Y runs from [0, N_poly_coeffs]
n_poly:
description: Number of coefficients in polynomial bandpass fit
dtype: int32
mandatory: false
bX_gulp_mean:
description: Mean computed during preprocessing normalization for beam X (bX)
dtype: float32
mandatory: false
bX_gulp_std:
description: Stdev computed during preprocessing normalization for beam X (bX)
dtype: float32
mandatory: false
bX_gulp_flag_frac:
description: Fraction of data flagged with SK flagger during normalization for beam X (bX)
dtype: float32
mandatory: false
bX_gulp_poly_cY:
description: Polynomial fit coefficient Y (cY), beam X (bX)
dtype: float32
mandatory: false