setigen API reference

setigen.frame module

class setigen.frame.Frame(waterfall=None, fchans=None, tchans=None, df=None, dt=None, fch1=<Quantity 8. GHz>, ascending=False, data=None)[source]

Bases: object

Facilitate the creation of entirely synthetic radio data (narrowband signals + background noise) as well as signal injection into existing observations.

__init__(waterfall=None, fchans=None, tchans=None, df=None, dt=None, fch1=<Quantity 8. GHz>, ascending=False, data=None)[source]

Initialize a Frame object either from an existing .fil/.h5 file or from frame resolution / size.

If you are initializing based on a .fil or .h5, pass in either the filename or the Waterfall object into the waterfall keyword.

Otherwise, you can initialize a frame by specifying the parameters fchans, tchans, df, dt, and potentially fch1, if it’s important to specify frequencies (8*u.GHz is an arbitrary but reasonable choice otherwise). The data keyword is only necessary if you are also preloading data that matches your specified frame dimensions and resolutions.

Parameters:
  • waterfall (str or Waterfall, optional) – Name of filterbank file or Waterfall object for preloading data
  • fchans (int, optional) – Number of frequency samples
  • tchans (int, optional) – Number of time samples
  • df (astropy.Quantity, optional) – Frequency resolution (e.g. in u.Hz)
  • dt (astropy.Quantity, optional) – Time resolution (e.g. in u.s)
  • fch1 (astropy.Quantity, optional) – Frequency of channel 1, as in filterbank file headers (e.g. in u.Hz). If ascending=True, fch1 is the minimum frequency; if ascending=False (default), fch1 is the maximum frequency.
  • ascending (bool, optional) – Specify whether frequencies should be in ascending order, so that fch1 is the minimum frequency. Default is False, for which fch1 is the maximum frequency. This is overwritten if a waterfall object is provided, where ascending will be automatically determined by observational parameters.
  • data (ndarray, optional) – 2D array of intensities to preload into frame
classmethod from_data(df, dt, fch1, ascending, data)[source]
classmethod from_waterfall(waterfall)[source]
zero_data()[source]

Resets data to a numpy array of zeros.

mean()[source]
std()[source]
get_total_stats()[source]
get_noise_stats()[source]
add_noise(x_mean, x_std=None, x_min=None, noise_type='chi2')[source]

By default, synthesizes radiometer noise based on a chi-squared distribution. Alternately, can generate pure Gaussian noise.

Specifying noise_type=’chi2’ will only use x_mean, and ignore other parameters. Specifying noise_type=’normal’ or ‘gaussian’ will use all arguments (if provided).

When adding Gaussian noise to the frame, the minimum is simply a lower bound for intensities in the data (e.g. it may make sense to cap intensities at 0), but this is optional.

add_noise_from_obs(x_mean_array=None, x_std_array=None, x_min_array=None, share_index=True, noise_type='chi2')[source]

By default, synthesizes radiometer noise based on a chi-squared distribution. Alternately, can generate pure Gaussian noise.

If no arrays are specified from which to sample, noise samples will be drawn from saved GBT C-Band observations at (dt, df) = (1.4 s, 1.4 Hz) resolution, from frames of shape (tchans, fchans) = (32, 1024). These sample noise parameters consist of 126500 samples for mean, std, and min of each observation.

Specifying noise_type=’chi2’ will only use x_mean_array (if provided), and ignore other parameters. Specifying noise_type=’normal’ will use all arrays (if provided).

Note: this method will attempt to scale the noise parameters to match self.dt and self.df. This assumes that the observation data products are not normalized by the FFT length used to contstruct them.

Parameters:
  • x_mean_array (ndarray) – Array of potential means
  • x_std_array (ndarray) – Array of potential standard deviations
  • x_min_array (ndarray, optional) – Array of potential minimum values
  • share_index (bool) – Whether to select noise parameters from the same index across each provided array. If share_index is True, then each array must be the same length.
  • noise_type (string) – Distribution to use for synthetic noise; ‘chi2’, ‘normal’, ‘gaussian’
add_signal(path, t_profile, f_profile, bp_profile, bounding_f_range=None, integrate_path=False, integrate_t_profile=False, integrate_f_profile=False, t_subsamples=10, f_subsamples=10)[source]

Generates synthetic signal.

Adds a synethic signal using given path in time-frequency domain and brightness profiles in time and frequency directions.

Parameters:
  • path (function, np.ndarray, list, float) – Function in time that returns frequencies, or provided array or single value of frequencies for the center of the signal at each time sample
  • t_profile (function, np.ndarray, list, float) – Time profile: function in time that returns an intensity (scalar), or provided array or single value of intensities at each time sample
  • f_profile (function) – Frequency profile: function in frequency that returns an intensity (scalar), relative to the signal frequency within a time sample. Note that unlike the other parameters, this must be a function
  • bp_profile (function, np.ndarray, list, float) – Bandpass profile: function in frequency that returns a relative intensity (scalar, between 0 and 1), or provided array or single value of relative intensities at each frequency sample
  • bounding_f_range (tuple) – Tuple (bounding_min, bounding_max) that constrains the computation of the signal to only a range in frequencies
  • integrate_path (bool, optional) – Option to average path along time to get a more accurate frequency position in t-f space. Note that this option only makes sense if the provided path can be evaluated at the sub frequency sample level (e.g. as opposed to returning a pre-computed array of frequencies of length tchans). Makes t_subsamples calculations per time sample.
  • integrate_t_profile (bool, optional) – Option to integrate t_profile in the time direction. Note that this option only makes sense if the provided t_profile can be evaluated at the sub time sample level (e.g. as opposed to returning an array of intensities of length tchans). Makes t_subsamples calculations per time sample.
  • integrate_f_profile (bool, optional) – Option to integrate f_profile in the frequency direction. Makes f_subsamples calculations per time sample.
  • t_subsamples (int, optional) – Number of bins for integration in the time direction, using Riemann sums
  • f_subsamples (int, optional) – Number of bins for integration in the frequency direction, using Riemann sums
Returns:

signal – Two-dimensional NumPy array containing synthetic signal data

Return type:

ndarray

Examples

Here’s an example that creates a linear Doppler-drifted signal with chi-squared noise with sampled parameters:

>>> from astropy import units as u
>>> import setigen as stg
>>> fchans = 1024
>>> tchans = 32
>>> df = 2.7939677238464355*u.Hz
>>> dt = tsamp = 18.253611008*u.s
>>> fch1 = 6095.214842353016*u.MHz
>>> frame = stg.Frame(fchans=fchans,
                      tchans=tchans,
                      df=df,
                      dt=dt,
                      fch1=fch1)
>>> noise = frame.add_noise(x_mean=10)
>>> signal = frame.add_signal(stg.constant_path(f_start=frame.get_frequency(200),
                                                drift_rate=2*u.Hz/u.s),
                              stg.constant_t_profile(level=frame.get_intensity(snr=30)),
                              stg.gaussian_f_profile(width=40*u.Hz),
                              stg.constant_bp_profile(level=1))

Saving the noise and signals individually may be useful depending on the application, but the combined data can be accessed via frame.get_data(). The synthetic signal can then be visualized and saved within a Jupyter notebook using:

>>> %matplotlib inline
>>> import matplotlib.pyplot as plt
>>> fig = plt.figure(figsize=(10, 6))
>>> frame.render()
>>> plt.savefig('image.png', bbox_inches='tight')
>>> plt.show()

To run within a script, simply exclude the first line: %matplotlib inline.

add_constant_signal(f_start, drift_rate, level, width, f_profile_type='gaussian')[source]

A wrapper around add_signal() that injects a constant intensity, constant drift_rate signal into the frame.

Parameters:
  • f_start (astropy.Quantity) – Starting signal frequency
  • drift_rate (astropy.Quantity) – Signal drift rate, in units of frequency per time
  • level (float) – Signal intensity
  • width (astropy.Quantity) – Signal width in frequency units
  • f_profile_type (str) – Can be ‘box’, ‘sinc2’, ‘gaussian’, ‘lorentzian’, or ‘voigt’, based on the desired spectral profile
Returns:

signal – Two-dimensional NumPy array containing synthetic signal data

Return type:

ndarray

get_index(frequency)[source]

Convert frequency to closest index in frame.

get_frequency(index)[source]

Convert index to frequency.

get_intensity(snr)[source]

Calculates intensity from SNR, based on estimates of the noise in the frame.

Note that there must be noise present in the frame for this to make sense.

get_snr(intensity)[source]

Calculate SNR from intensity.

Note that there must be noise present in the frame for this to make sense.

get_drift_rate(start_index, end_index)[source]
get_info()[source]
get_data(use_db=False)[source]
get_metadata()[source]
set_metadata(new_metadata)[source]

Set custom metadata using a dictionary new_metadata.

update_metadata(new_metadata)[source]

Append custom metadata using a dictionary new_metadata.

add_metadata(new_metadata)[source]
render(use_db=False)[source]
bl_render(use_db=True)[source]
get_waterfall()[source]

Return current frame as a Waterfall object. Note: some filterbank metadata may not be accurate anymore, depending on prior frame manipulations.

save_fil(filename, max_load=1)[source]

Save frame data as a filterbank file (.fil).

save_hdf5(filename, max_load=1)[source]

Save frame data as an HDF5 file.

save_h5(filename, max_load=1)[source]

Save frame data as an HDF5 file.

save_npy(filename)[source]

Save frame data as an .npy file.

load_npy(filename)[source]

Load frame data from a .npy file.

save_pickle(filename)[source]

Save entire frame as a pickled file (.pickle).

classmethod load_pickle(filename)[source]

Load Frame object from a pickled file (.pickle), created with Frame.save_pickle.

setigen.split_utils module

setigen.split_utils.split_waterfall_generator(waterfall_fn, fchans, tchans=None, f_shift=None)[source]

Creates a generator that returns smaller Waterfall objects by ‘splitting’ an input filterbank file according to the number of frequency samples.

Since this function only loads in data in chunks according to fchans, it handles very large observations well. Specifically, it will not attempt to load all the data into memory before splitting, which won’t work when the data is very large anyway.

Parameters:
  • waterfall_fn (str) – Filterbank filename with .fil extension
  • fchans (int) – Number of frequency samples per new filterbank file
  • tchans (int, optional) – Number of time samples to select - will default from start of observation. If None, just uses the entire integration time
  • f_shift (int, optional) – Number of samples to shift when splitting filterbank. If None, defaults to f_shift=fchans so that there is no overlap between new filterbank files
Returns:

split – A blimpy Waterfall object containing a smaller section of the data

Return type:

Waterfall

setigen.split_utils.split_fil(waterfall_fn, output_dir, fchans, tchans=None, f_shift=None)[source]

Creates a set of new filterbank files by ‘splitting’ an input filterbank file according to the number of frequency samples.

Parameters:
  • waterfall_fn (str) – Filterbank filename with .fil extension
  • output_dir (str) – Directory for new filterbank files
  • fchans (int) – Number of frequency samples per new filterbank file
  • tchans (int, optional) – Number of time samples to select - will default from start of observation. If None, just uses the entire integration time
  • f_shift (int, optional) – Number of samples to shift when splitting filterbank. If None, defaults to f_shift=fchans so that there is no overlap between new filterbank files
Returns:

split_fns – List of new filenames

Return type:

list of str

setigen.split_utils.split_array(data, f_sample_num=None, t_sample_num=None, f_shift=None, t_shift=None, f_trim=False, t_trim=False)[source]

Splits NumPy arrays into a list of smaller arrays according to limits in frequency and time. This doesn’t reduce/combine data, it simply cuts the data into smaller chunks.

Parameters:data (ndarray) – Time-frequency data
Returns:split_data – List of new time-frequency data frames
Return type:list of ndarray

setigen.distributions module

setigen.distributions.gaussian(x_mean, x_std, shape)[source]
setigen.distributions.truncated_gaussian(x_mean, x_std, x_min, shape)[source]

Samples from a normal distribution, but enforces a minimum value.

setigen.distributions.chi2(x_mean, chi2_df, shape)[source]

Chi-squared distribution centered at a specific mean.

Parameters:
  • x_mean (float) –
  • chi2_df (int) – Degrees of freedom for chi-squared
  • shape (list) – Shape of output noise array
Returns:

dist – Array of chi-squared noise

Return type:

ndarray

setigen.waterfall_utils module

setigen.waterfall_utils.max_freq(waterfall)[source]

Returns central frequency of the highest-frequency bin in a .fil file.

Parameters:waterfall (str or Waterfall) – Name of filterbank file or Waterfall object
Returns:fmax – Maximum frequency in data
Return type:float
setigen.waterfall_utils.min_freq(waterfall)[source]

Returns central frequency of the lowest-frequency bin in a .fil file.

Parameters:waterfall (str or Waterfall) – Name of filterbank file or Waterfall object
Returns:fmin – Minimum frequency in data
Return type:float
setigen.waterfall_utils.get_data(waterfall, use_db=False)[source]

Gets time-frequency data from filterbank file as a 2d NumPy array.

Note: when multiple Stokes parameters are supported, this will have to be expanded.

Parameters:waterfall (str or Waterfall) – Name of filterbank file or Waterfall object
Returns:data – Time-frequency data
Return type:ndarray
setigen.waterfall_utils.get_fs(waterfall)[source]

Gets frequency values from filterbank file.

Parameters:waterfall (str or Waterfall) – Name of filterbank file or Waterfall object
Returns:fs – Frequency values
Return type:ndarray
setigen.waterfall_utils.get_ts(waterfall)[source]

Gets time values from filterbank file.

Parameters:waterfall (str or Waterfall) – Name of filterbank file or Waterfall object
Returns:ts – Time values
Return type:ndarray

setigen.sample_from_obs module

setigen.sample_from_obs.sample_gaussian_params(x_mean_array, x_std_array, x_min_array=None)[source]

Sample Gaussian parameters (mean, std, min) from provided arrays.

Typical usage would be for select Gaussian noise properties for injection into data frames.

Parameters:
  • x_mean_array (ndarray) – Array of potential means
  • x_std_array (ndarray) – Array of potential standard deviations
  • x_min_array (ndarray, optional) – Array of potential minimum values
Returns:

  • x_mean – Selected mean of distribution
  • x_std – Selected standard deviation of distribution
  • x_min – If x_min_array provided, selected minimum of distribution

setigen.sample_from_obs.get_parameter_distributions(waterfall_fn, fchans, tchans=None, f_shift=None)[source]

Calculate parameter distributions for the mean, standard deviation, and minimum of split filterbank data from real observations.

Parameters:
  • waterfall_fn (str) – Filterbank filename with .fil extension
  • fchans (int) – Number of frequency samples per new filterbank file
  • tchans (int, optional) – Number of time samples to select - will default from start of observation. If None, just uses the entire integration time
  • f_shift (int, optional) – Number of samples to shift when splitting filterbank. If None, defaults to f_shift=f_window so that there is no overlap between new filterbank files
Returns:

  • x_mean_array – Distribution of means calculated from observations
  • x_std_array – Distribution of standard deviations calculated from observations
  • x_min_array – Distribution of minimums calculated from observations

setigen.sample_from_obs.get_mean_distribution(waterfall_fn, fchans, tchans=None, f_shift=None)[source]

Calculate parameter distributions for the mean of split filterbank frames from real observations.

Parameters:
  • waterfall_fn (str) – Filterbank filename with .fil extension
  • fchans (int) – Number of frequency samples per new filterbank file
  • tchans (int, optional) – Number of time samples to select - will default from start of observation. If None, just uses the entire integration time
  • f_shift (int, optional) – Number of samples to shift when splitting filterbank. If None, defaults to f_shift=f_window so that there is no overlap between new filterbank files
Returns:

Distribution of means calculated from observations

Return type:

x_mean_array

setigen.stats module

setigen.stats.exclude_and_flatten(data, exclude=0)[source]
setigen.stats.get_mean(data, exclude=0)[source]
setigen.stats.get_std(data, exclude=0)[source]
setigen.stats.get_min(data, exclude=0)[source]
setigen.stats.compute_frame_stats(data, exclude=0)[source]

setigen.time_freq_utils module

setigen.time_freq_utils.db(x)[source]

Converts to dB.

setigen.time_freq_utils.integrate_frame(frame, normalize=False)[source]

Integrate over time using mean (not sum).

setigen.time_freq_utils.integrate_frame_subdata(data, frame=None, normalize=False)[source]

Integrate a chunk of data assuming frame statistics using mean (not sum).

setigen.time_freq_utils.normalize(data, cols=0, exclude=0.0, to_db=False, use_median=False)[source]

Normalize data per frequency channel so that the noise level in data is controlled; using mean or median filter.

Uses a sliding window to calculate mean and standard deviation to preserve non-drifted signals. Excludes a fraction of brightest pixels to better isolate noise.

Parameters:
  • data (ndarray) – Time-frequency data
  • cols (int) – Number of columns on either side of the current frequency bin. The width of the sliding window is thus 2 * cols + 1
  • exclude (float, optional) – Fraction of brightest samples in each frequency bin to exclude in calculating mean and standard deviation
  • to_db (bool, optional) – Convert values to decibel equivalents before normalization
  • use_median (bool, optional) – Use median and median absolute deviation instead of mean and standard deviation
Returns:

normalized_data – Normalized data

Return type:

ndarray

setigen.time_freq_utils.normalize_by_max(data)[source]

Simple normalization by dividing out by the brightest pixel.

setigen.unit_utils module

This module contains a couple unit conversion utilities used in frame.Frame.

In general, we rely on astropy units for conversions, and note that float values are assumed to be in SI units (e.g. Hz, s).

setigen.unit_utils.cast_value(value, unit)[source]

If value is already an astropy Quantity, then cast it into the desired unit. Otherwise, value is assumed to be a float and converted directly to the desired unit.

setigen.unit_utils.get_value(value, unit=None)[source]

This function converts a value, which may be a float or astropy Quantity, into a float (in terms of a desired unit).

If we know that value is an astropy Quantity, then grabbing the value is simple (and we can cast this to a desired unit, if we need to change this.

If value is already a float, it simply returns value.