setigen API reference

setigen.frame module

class setigen.frame.Frame(waterfall=None, fchans=None, tchans=None, df=None, dt=None, fch1=<Quantity 8. GHz>, data=None)[source]

Bases: object

Facilitates the creation of entirely synthetic radio data (narrowband signals + Gaussian noise) as well as signal injection into existing observations.

__init__(waterfall=None, fchans=None, tchans=None, df=None, dt=None, fch1=<Quantity 8. GHz>, data=None)[source]

Initializes a Frame object either from an existing .fil/.h5 file or from frame resolution / size.

If you are initializing based on a .fil or .h5, pass in either the filename or the Waterfall object into the waterfall keyword.

Otherwise, you can initialize a frame by specifying the parameters fchans, tchans, df, dt, and potentially fch1, if it’s important to specify frequencies (8*u.GHz is an arbitrary but reasonable choice otherwise). The data keyword is only necessary if you are also preloading data that matches your specified frame dimensions and resolutions.

Parameters:
  • waterfall (str or Waterfall, optional) – Name of filterbank file or Waterfall object for preloading data.
  • fchans (int, optional) – Number of frequency samples
  • tchans (int, optional) – Number of time samples
  • df (astropy.Quantity, optional) – Frequency resolution (e.g. in u.Hz)
  • dt (astropy.Quantity, optional) – Time resolution (e.g. in u.s)
  • fch1 (astropy.Quantity, optional (fmax)) – Maximum frequency, as in filterbank file headers (e.g. in u.Hz)
  • data (ndarray, optional) – 2D array of intensities to preload into frame
add_constant_signal(f_start, drift_rate, level, width, f_profile_type='gaussian')[source]

A wrapper around add_signal() that injects a constant intensity, constant drift_rate signal into the frame.

Parameters:
  • f_start (astropy.Quantity) – Starting signal frequency
  • drift_rate (astropy.Quantity) – Signal drift rate, in units of frequency per time
  • level (float) – Signal intensity
  • width (astropy.Quantity) – Signal width in frequency units
  • f_profile_type (str) – Can be ‘box’, ‘gaussian’, ‘lorentzian’, or ‘voigt’, based on the desired spectral profile
Returns:

signal – Two-dimensional NumPy array containing synthetic signal data

Return type:

ndarray

add_metadata(new_metadata)[source]

Append custom metadata using a dictionary new_metadata.

add_noise(x_mean, x_std, x_min=None)[source]

Adds Gaussian noise to the frame, from the specified mean and standard deviation (and minimum if desired). The minimum is simply a lower bound for intensities in the data (e.g. it may make sense to cap intensities at 0), but this is optional.

add_noise_from_obs(x_mean_array=None, x_std_array=None, x_min_array=None, share_index=True)[source]

If no arrays are specified to sample Gaussian parameters from, noise samples will be drawn from saved GBT C-Band observations at (dt, df) = (1.4 s, 1.4 Hz) resolution, from frames of shape (tchans, fchans) = (32, 1024). These sample noise parameters consists of 126500 samples for mean, std, and min of each observation.

Note: this method will attempt to scale the noise parameters to match self.dt and self.df. This assumes that the observation data products are not normalized by the FFT length used to contstruct them.

Parameters:
  • x_mean_array (ndarray) – Array of potential means
  • x_std_array (ndarray) – Array of potential standard deviations
  • x_min_array (ndarray, optional) – Array of potential minimum values
  • share_index (bool) – Whether to select noise parameters from the same index across each provided array. If share_index is True, then each array must be the same length.
add_signal(path, t_profile, f_profile, bp_profile, bounding_f_range=None, integrate_path=False, integrate_t_profile=False, integrate_f_profile=False, t_subsamples=10, f_subsamples=10)[source]

Generates synthetic signal.

Adds a synethic signal using given path in time-frequency domain and brightness profiles in time and frequency directions.

Parameters:
  • path (function, np.ndarray, list, float) – Function in time that returns frequencies, or provided array or single value of frequencies for the center of the signal at each time sample
  • t_profile (function, np.ndarray, list, float) – Time profile: function in time that returns an intensity (scalar), or provided array or single value of intensities at each time sample
  • f_profile (function) – Frequency profile: function in frequency that returns an intensity (scalar), relative to the signal frequency within a time sample. Note that unlike the other parameters, this must be a function
  • bp_profile (function, np.ndarray, list, float) – Bandpass profile: function in frequency that returns a relative intensity (scalar, between 0 and 1), or provided array or single value of relative intensities at each frequency sample
  • bounding_f_range (tuple) – Tuple (bounding_min, bounding_max) that constrains the computation of the signal to only a range in frequencies
  • integrate_path (bool, optional) – Option to average path along time to get a more accurate frequency position in t-f space. Note that this option only makes sense if the provided path can be evaluated at the sub frequency sample level (e.g. as opposed to returning a pre-computed array of frequencies of length tchans). Makes t_subsamples calculations per time sample.
  • integrate_t_profile (bool, optional) – Option to integrate t_profile in the time direction. Note that this option only makes sense if the provided t_profile can be evaluated at the sub time sample level (e.g. as opposed to returning an array of intensities of length tchans). Makes t_subsamples calculations per time sample.
  • integrate_f_profile (bool, optional) – Option to integrate f_profile in the frequency direction. Makes f_subsamples calculations per time sample.
  • t_subsamples (int, optional) – Number of bins for integration in the time direction, using Riemann sums
  • f_subsamples (int, optional) – Number of bins for integration in the frequency direction, using Riemann sums
Returns:

signal – Two-dimensional NumPy array containing synthetic signal data

Return type:

ndarray

Examples

Here’s an example that creates a linear Doppler-drifted signal with Gaussian noise with sampled parameters:

>>> from astropy import units as u
>>> import setigen as stg
>>> fchans = 1024
>>> tchans = 32
>>> df = 2.7939677238464355*u.Hz
>>> dt = tsamp = 18.25361108*u.s
>>> fch1 = 6095.214842353016*u.MHz
>>> frame = stg.Frame(fchans=fchans,
                      tchans=tchans,
                      df=df,
                      dt=dt,
                      fch1=fch1)
>>> noise = frame.add_noise(x_mean=5, x_std=2, x_min=0)
>>> signal = frame.add_signal(stg.constant_path(f_start=frame.get_frequency(200),
                                                drift_rate=2*u.Hz/u.s),
                              stg.constant_t_profile(level=frame.get_intensity(snr=30)),
                              stg.gaussian_f_profile(width=40*u.Hz),
                              stg.constant_bp_profile(level=1))

Saving the noise and signals individually may be useful depending on the application, but the combined data can be accessed via frame.get_data(). The synthetic signal can then be visualized and saved within a Jupyter notebook using:

>>> %matplotlib inline
>>> import matplotlib.pyplot as plt
>>> fig = plt.figure(figsize=(10, 6))
>>> frame.render()
>>> plt.savefig('image.png', bbox_inches='tight')
>>> plt.show()

To run within a script, simply exclude the first line: %matplotlib inline.

bl_render(use_db=True)[source]
get_data(use_db=False)[source]
get_drift_rate(start_index, end_index)[source]
get_frequency(index)[source]

Convert index to frequency

get_index(frequency)[source]

Convert frequency to closest index in frame.

get_info()[source]
get_intensity(snr)[source]

Calculates intensity from SNR, based on estimates of the noise in the frame.

Note that there must be noise present in the frame for this to make sense.

get_metadata()[source]
get_noise_stats()[source]
get_snr(intensity)[source]

Calculates SNR from intensity.

Note that there must be noise present in the frame for this to make sense.

get_total_stats()[source]
get_waterfall()[source]

Return current frame as a Waterfall object. Note: some filterbank metadata may not be accurate anymore, depending on prior frame manipulations.

load_npy(filename)[source]

Load frame data from a .npy file.

classmethod load_pickle(filename)[source]

Load Frame object from a pickled file (.pickle), created with Frame.save_pickle.

mean()[source]
render(use_db=False)[source]
save_fil(filename)[source]

Save frame data as a filterbank file (.fil).

save_h5(filename)[source]

Save frame data as an HDF5 file.

save_hdf5(filename)[source]

Save frame data as an HDF5 file.

save_npy(filename)[source]

Save frame data as an .npy file.

save_pickle(filename)[source]

Save entire frame as a pickled file (.pickle).

set_data(data)[source]
set_df(df)[source]
set_dt(dt)[source]
set_metadata(new_metadata)[source]

Set custom metadata using a dictionary new_metadata.

std()[source]
zero_data()[source]

Resets data to a numpy array of zeros.

setigen.split_utils module

setigen.split_utils.split_array(data, f_sample_num=None, t_sample_num=None, f_shift=None, t_shift=None, f_trim=False, t_trim=False)[source]

Splits NumPy arrays into a list of smaller arrays according to limits in frequency and time. This doesn’t reduce/combine data, it simply cuts the data into smaller chunks.

Parameters:data (ndarray) – Time-frequency data
Returns:split_data – List of new time-frequency data frames
Return type:list of ndarray
setigen.split_utils.split_fil(waterfall_fn, output_dir, fchans, tchans=None, f_shift=None)[source]

Creates a set of new filterbank files by ‘splitting’ an input filterbank file according to the number of frequency samples.

Parameters:
  • waterfall_fn (str) – Filterbank filename with .fil extension
  • output_dir (str) – Directory for new filterbank files
  • fchans (int) – Number of frequency samples per new filterbank file
  • tchans (int, optional) – Number of time samples to select - will default from start of observation. If None, just uses the entire integration time
  • f_shift (int, optional) – Number of samples to shift when splitting filterbank. If None, defaults to f_shift=fchans so that there is no overlap between new filterbank files
Returns:

split_fns – List of new filenames

Return type:

list of str

setigen.split_utils.split_waterfall_generator(waterfall_fn, fchans, tchans=None, f_shift=None)[source]

Creates a generator that returns smaller Waterfall objects by ‘splitting’ an input filterbank file according to the number of frequency samples.

Since this function only loads in data in chunks according to fchans, it handles very large observations well. Specifically, it will not attempt to load all the data into memory before splitting, which won’t work when the data is very large anyway.

Parameters:
  • waterfall_fn (str) – Filterbank filename with .fil extension
  • fchans (int) – Number of frequency samples per new filterbank file
  • tchans (int, optional) – Number of time samples to select - will default from start of observation. If None, just uses the entire integration time
  • f_shift (int, optional) – Number of samples to shift when splitting filterbank. If None, defaults to f_shift=fchans so that there is no overlap between new filterbank files
Returns:

split – A blimpy Waterfall object containing a smaller section of the data

Return type:

Waterfall

setigen.distributions module

setigen.distributions.gaussian(x_mean, x_std, shape)[source]
setigen.distributions.truncated_gaussian(x_mean, x_std, x_min, shape)[source]

Samples from a normal distribution, but enforces a minimum value.

setigen.waterfall_utils module

setigen.waterfall_utils.get_data(waterfall, use_db=False)[source]

Gets time-frequency data from filterbank file as a 2d NumPy array.

Note: when multiple Stokes parameters are supported, this will have to be expanded.

Parameters:waterfall (str or Waterfall) – Name of filterbank file or Waterfall object
Returns:data – Time-frequency data
Return type:ndarray
setigen.waterfall_utils.get_fs(waterfall)[source]

Gets frequency values from filterbank file.

Parameters:waterfall (str or Waterfall) – Name of filterbank file or Waterfall object
Returns:fs – Frequency values
Return type:ndarray
setigen.waterfall_utils.get_ts(waterfall)[source]

Gets time values from filterbank file.

Parameters:waterfall (str or Waterfall) – Name of filterbank file or Waterfall object
Returns:ts – Time values
Return type:ndarray
setigen.waterfall_utils.max_freq(waterfall)[source]

Returns central frequency of the highest-frequency bin in a .fil file.

Parameters:waterfall (str or Waterfall) – Name of filterbank file or Waterfall object
Returns:fmax – Maximum frequency in data
Return type:float
setigen.waterfall_utils.min_freq(waterfall)[source]

Returns central frequency of the lowest-frequency bin in a .fil file.

Parameters:waterfall (str or Waterfall) – Name of filterbank file or Waterfall object
Returns:fmin – Minimum frequency in data
Return type:float

setigen.sample_from_obs module

setigen.sample_from_obs.get_parameter_distributions(waterfall_fn, fchans, tchans=None, f_shift=None)[source]

Calculate parameter distributions for the mean, standard deviation, and minimum of split filterbank data from real observations.

Parameters:
  • waterfall_fn (str) – Filterbank filename with .fil extension
  • fchans (int) – Number of frequency samples per new filterbank file
  • tchans (int, optional) – Number of time samples to select - will default from start of observation. If None, just uses the entire integration time
  • f_shift (int, optional) – Number of samples to shift when splitting filterbank. If None, defaults to f_shift=f_window so that there is no overlap between new filterbank files
Returns:

  • x_mean_array – Distribution of means calculated from observations
  • x_std_array – Distribution of standard deviations calculated from observations
  • x_min_array – Distribution of minimums calculated from observations

setigen.sample_from_obs.sample_gaussian_params(x_mean_array, x_std_array, x_min_array=None)[source]

Sample Gaussian parameters (mean, std, min) from provided arrays.

Typical usage would be for select Gaussian noise properties for injection into data frames.

Parameters:
  • x_mean_array (ndarray) – Array of potential means
  • x_std_array (ndarray) – Array of potential standard deviations
  • x_min_array (ndarray, optional) – Array of potential minimum values
Returns:

  • x_mean – Selected mean of distribution
  • x_std – Selected standard deviation of distribution
  • x_min – If x_min_array provided, selected minimum of distribution

setigen.stats module

setigen.stats.compute_frame_stats(data, exclude=0)[source]
setigen.stats.exclude_and_flatten(data, exclude=0)[source]
setigen.stats.get_mean(data, exclude=0)[source]
setigen.stats.get_min(data, exclude=0)[source]
setigen.stats.get_std(data, exclude=0)[source]

setigen.time_freq_utils module

setigen.time_freq_utils.db(x)[source]

Converts to dB.

setigen.time_freq_utils.normalize(data, cols=0, exclude=0.0, to_db=False, use_median=False)[source]

Normalize data per frequency channel so that the noise level in data is controlled; using mean or median filter.

Uses a sliding window to calculate mean and standard deviation to preserve non-drifted signals. Excludes a fraction of brightest pixels to better isolate noise.

Parameters:
  • data (ndarray) – Time-frequency data
  • cols (int) – Number of columns on either side of the current frequency bin. The width of the sliding window is thus 2 * cols + 1
  • exclude (float, optional) – Fraction of brightest samples in each frequency bin to exclude in calculating mean and standard deviation
  • to_db (bool, optional) – Convert values to decibel equivalents before normalization
  • use_median (bool, optional) – Use median and median absolute deviation instead of mean and standard deviation
Returns:

normalized_data – Normalized data

Return type:

ndarray

setigen.time_freq_utils.normalize_by_max(data)[source]

Simple normalization by dividing out by the brightest pixel.

setigen.unit_utils module

This module contains a couple unit conversion utilities used in frame.Frame.

In general, we rely on astropy units for conversions, and note that float values are assumed to be in SI units (e.g. Hz, s).

setigen.unit_utils.cast_value(value, unit)[source]

If value is already an astropy Quantity, then cast it into the desired unit. Otherwise, value is assumed to be a float and converted directly to the desired unit.

setigen.unit_utils.get_value(value, unit=None)[source]

This function converts a value, which may be a float or astropy Quantity, into a float (in terms of a desired unit).

If we know that value is an astropy Quantity, then grabbing the value is simple (and we can cast this to a desired unit, if we need to change this.

If value is already a float, it simply returns value.