setigen package

Submodules

setigen.distributions module

setigen.distributions.gaussian(x_mean, x_std, shape)[source]
setigen.distributions.truncated_gaussian(x_mean, x_std, x_min, shape)[source]

Samples from a normal distribution, but enforces a minimum value.

setigen.fil_utils module

setigen.fil_utils.get_data(fil, db=False)[source]

Gets time-frequency data from filterbank file as a 2d NumPy array.

Note: when multiple Stokes parameters are supported, this will have to be expanded.

Parameters:fil (str or Waterfall) – Name of filterbank file or Waterfall object
Returns:data – Time-frequency data
Return type:ndarray
setigen.fil_utils.get_fs(fil)[source]

Gets frequency values from filterbank file.

Parameters:fil (str or Waterfall) – Name of filterbank file or Waterfall object
Returns:fs – Frequency values
Return type:ndarray
setigen.fil_utils.get_ts(fil)[source]

Gets time values from filterbank file.

Parameters:fil (str or Waterfall) – Name of filterbank file or Waterfall object
Returns:ts – Time values
Return type:ndarray
setigen.fil_utils.maxfreq(fil)[source]

Returns central frequency of the highest-frequency bin in a .fil file.

Parameters:fil (str or Waterfall) – Name of filterbank file or Waterfall object
Returns:fmax – Maximum frequency in data
Return type:float
setigen.fil_utils.minfreq(fil)[source]

Returns central frequency of the lowest-frequency bin in a .fil file.

Parameters:fil (str or Waterfall) – Name of filterbank file or Waterfall object
Returns:fmin – Minimum frequency in data
Return type:float

setigen.frame module

class setigen.frame.Frame(fchans=None, tchans=None, df=None, dt=None, fch1=<Quantity 8. GHz>, data=None, fil=None)[source]

Bases: object

Facilitates the creation of entirely synthetic radio data (narrowband signals + Gaussian noise) as well as signal injection into existing observations.

add_constant_signal(f_start, drift_rate, level, width, f_profile_type='gaussian')[source]

A wrapper around add_signal() that injects a constant intensity, constant drift_rate signal into the frame.

Parameters:
  • f_start (astropy.Quantity) – Starting signal frequency
  • drift_rate (astropy.Quantity) – Signal drift rate, in units of frequency per time
  • level (float) – Signal intensity
  • width (astropy.Quantity) – Signal width in frequency units
  • f_profile_type (str) – Either ‘box’ or ‘gaussian’, based on the desired spectral profile
Returns:

signal – Two-dimensional NumPy array containing synthetic signal data

Return type:

ndarray

add_noise(x_mean, x_std, x_min=None)[source]

Adds Gaussian noise to the frame, from the specified mean and standard deviation (and minimum if desired). The minimum is simply a lower bound for intensities in the data (e.g. it may make sense to cap intensities at 0), but this is optional.

add_noise_from_obs(x_mean_array=None, x_std_array=None, x_min_array=None, share_index=True)[source]

If no arrays are specified to sample Gaussian parameters from, noise samples will be drawn from saved GBT C-Band observations at (dt, df) = (1.4 s, 1.4 Hz) resolution, from frames of shape (tchans, fchans) = (32, 1024). These sample noise parameters consists of 126500 samples for mean, std, and min of each observation.

Note: this method will attempt to scale the noise parameters to match self.dt and self.df. This assumes that the observation data products are not normalized by the FFT length used to contstruct them.

Parameters:
  • x_mean_array (ndarray) – Array of potential means
  • x_std_array (ndarray) – Array of potential standard deviations
  • x_min_array (ndarray, optional) – Array of potential minimum values
  • share_index (bool) – Whether to select noise parameters from the same index across each provided array. If share_index is True, then each array must be the same length.
add_signal(path, t_profile, f_profile, bp_profile, bounding_f_range=None, integrate_time=False, mean_f_position=False, time_subsamples=10)[source]

Generates synthetic signal.

Adds a synethic signal using given path in time-frequency domain and brightness profiles in time and frequency directions.

Parameters:
  • path (function, np.ndarray, list, float) – Function in time that returns frequencies, or provided array or single value of frequencies for the center of the signal at each time sample
  • t_profile (function, np.ndarray, list, float) – Time profile: function in time that returns an intensity (scalar), or provided array or single value of intensities at each time sample
  • f_profile (function) – Frequency profile: function in frequency that returns an intensity (scalar), relative to the signal frequency within a time sample. Note that unlike the other parameters, this must be a function
  • bp_profile (function, np.ndarray, list, float) – Bandpass profile: function in frequency that returns a relative intensity (scalar, between 0 and 1), or provided array or single value of relative intensities at each frequency sample
  • bounding_f_range (tuple) – Tuple (bounding_min, bounding_max) that constrains the computation of the signal to only a range in frequencies
  • integrate_time (bool, optional) – Option to integrate t_profile in the time direction. Note that this option only makes sense if the provided t_profile can be evaluated at the sub time sample level (e.g. as opposed to returning an array of intensities of length tchans).
  • mean_f_position (bool, optional) – Option to average path along frequency to get a more accurate position in t-f space. Note that this option only makes sense if the provided path can be evaluated at the sub frequency sample level (e.g. as opposed to returning a pre-computed array of frequencies of length tchans).
  • time_subsamples (int, optional) – Number of bins for integration in the time direction, using Riemann sums
Returns:

signal – Two-dimensional NumPy array containing synthetic signal data

Return type:

ndarray

Examples

Here’s an example that creates a linear Doppler-drifted signal with Gaussian noise with sampled parameters:

>>> from astropy import units as u
>>> import setigen as stg
>>> fchans = 1024
>>> tchans = 32
>>> df = 2.7939677238464355*u.Hz
>>> dt = tsamp = 18.25361108*u.s
>>> fch1 = 6095.214842353016*u.MHz
>>> frame = stg.Frame(fchans, tchans, df, dt, fch1)
>>> noise = frame.add_noise(x_mean=5, x_std=2, x_min=0)
>>> signal = frame.add_signal(stg.constant_path(f_start=frame.fs[200],
                                                drift_rate=2*u.Hz/u.s),
                              stg.constant_t_profile(level=frame.compute_intensity(snr=30)),
                              stg.gaussian_f_profile(width=40*u.Hz),
                              stg.constant_bp_profile(level=1))

Saving the noise and signals individually may be useful depending on the application, but the combined data can be accessed via frame.get_data(). The synthetic signal can then be visualized and saved within a Jupyter notebook using:

>>> %matplotlib inline
>>> import matplotlib.pyplot as plt
>>> fig = plt.figure(figsize=(10, 6))
>>> plt.imshow(frame.get_data(), aspect='auto')
>>> plt.xlabel('Frequency')
>>> plt.ylabel('Time')
>>> plt.colorbar()
>>> plt.savefig('image.png', bbox_inches='tight')
>>> plt.show()

To run within a script, simply exclude the first line: %matplotlib inline.

compute_SNR(intensity)[source]

Calculates SNR from intensity.

Note that there must be noise present in the frame for this to make sense.

compute_intensity(snr)[source]

Calculates intensity from SNR, based on estimates of the noise in the frame.

Note that there must be noise present in the frame for this to make sense.

freq_to_index(freq)[source]

Convert frequency to closest index in frame.

get_data(db=False)[source]
get_info()[source]
get_noise_stats()[source]
get_total_stats()[source]
load_data(file)[source]

file can be a filename or a file handle of a npy file

mean()[source]
save_data(file)[source]

file can be a filename or a file handle of a npy file

save_fil(filename)[source]
save_h5(filename)[source]
save_hdf5(filename)[source]
set_data(data)[source]
set_df(df)[source]
set_dt(dt)[source]
std()[source]
zero_data()[source]

Resets data to a numpy array of zeros.

setigen.sample_from_obs module

setigen.sample_from_obs.get_parameter_distributions(fil_fn, f_window, f_shift=None, exclude=0)[source]

Calculate parameter distributions for the mean, standard deviation, and minimum of split filterbank data from real observations.

Parameters:
  • fil_fn (str) – Filterbank filename with .fil extension
  • f_window (int) – Number of frequency samples per new filterbank file
  • f_shift (int, optional) – Number of samples to shift when splitting filterbank. If None, defaults to f_shift=f_window so that there is no overlap between new filterbank files
  • Returns
  • --------
  • x_mean_array – Distribution of means calculated from observations.
  • x_std_array – Distribution of standard deviations calculated from observations.
  • x_min_array – Distribution of minimums calculated from observations.
setigen.sample_from_obs.sample_gaussian_params(x_mean_array, x_std_array, x_min_array=None)[source]

Sample Gaussian parameters (mean, std, min) from provided arrays.

Typical usage would be for select Gaussian noise properties for injection into data frames.

Parameters:
  • x_mean_array (ndarray) – Array of potential means
  • x_std_array (ndarray) – Array of potential standard deviations
  • x_min_array (ndarray, optional) – Array of potential minimum values
Returns:

  • x_mean – Selected mean of distribution
  • x_std – Selected standard deviation of distribution
  • x_min – If x_min_array provided, selected minimum of distribution

setigen.split_utils module

setigen.split_utils.split_array(data, f_sample_num=None, t_sample_num=None, f_shift=None, t_shift=None, f_trim=False, t_trim=False)[source]

Splits NumPy arrays into a list of smaller arrays according to limits in frequency and time. This doesn’t reduce/combine data, it simply cuts the data into smaller chunks.

Parameters:data (ndarray) – Time-frequency data
Returns:split_data – List of new time-frequency data frames
Return type:list of ndarray
setigen.split_utils.split_fil(fil_fn, output_dir, f_window, f_shift=None)[source]

Creates a set of new filterbank files by ‘splitting’ an input filterbank file according to the number of frequency samples.

Parameters:
  • fil_fn (str) – Filterbank filename with .fil extension
  • output_dir (str) – Directory for new filterbank files
  • f_window (int) – Number of frequency samples per new filterbank file
  • f_shift (int, optional) – Number of samples to shift when splitting filterbank. If None, defaults to f_shift=f_window so that there is no overlap between new filterbank files
Returns:

split_fns – List of new filenames

Return type:

list of str

setigen.split_utils.split_fil_generator(fil_fn, f_window, f_shift=None)[source]

Creates a generator that returns smaller Waterfall objects by ‘splitting’ an input filterbank file according to the number of frequency samples.

Since this function only loads in data in chunks according to f_window, it handles very large observations well. Specifically, it will not attempt to load all the data into memory before splitting, which won’t work when the data is very large anyway.

Parameters:
  • fil_fn (str) – Filterbank filename with .fil extension
  • f_window (int) – Number of frequency samples per new filterbank file
  • f_shift (int, optional) – Number of samples to shift when splitting filterbank. If None, defaults to f_shift=f_window so that there is no overlap between new filterbank files
Returns:

split – A blimpy Waterfall object containing a smaller section of the data

Return type:

Waterfall

setigen.stats module

setigen.stats.compute_frame_stats(data, exclude=0)[source]
setigen.stats.exclude_and_flatten(data, exclude=0)[source]
setigen.stats.get_mean(data, exclude=0)[source]
setigen.stats.get_min(data, exclude=0)[source]
setigen.stats.get_std(data, exclude=0)[source]

setigen.time_freq_utils module

setigen.time_freq_utils.db(x)[source]

Converts to dB.

setigen.time_freq_utils.normalize(data, cols=0, exclude=0.0, to_db=False, use_median=False)[source]

Normalize data per frequency channel so that the noise level in data is controlled; using mean or median filter.

Uses a sliding window to calculate mean and standard deviation to preserve non-drifted signals. Excludes a fraction of brightest pixels to better isolate noise.

Parameters:
  • data (ndarray) – Time-frequency data
  • cols (int) – Number of columns on either side of the current frequency bin. The width of the sliding window is thus 2 * cols + 1
  • exclude (float, optional) – Fraction of brightest samples in each frequency bin to exclude in calculating mean and standard deviation
  • to_db (bool, optional) – Convert values to decibel equivalents before normalization
  • use_median (bool, optional) – Use median and median absolute deviation instead of mean and standard deviation
Returns:

normalized_data – Normalized data

Return type:

ndarray

setigen.time_freq_utils.normalize_by_max(data)[source]

Simple normalization by dividing out by the brightest pixel.

setigen.unit_utils module

This module contains a couple unit conversion utilities used in frame.Frame.

In general, we rely on astropy units for conversions, and note that float values are assumed to be in SI units (e.g. Hz, s).

setigen.unit_utils.cast_value(value, unit)[source]

If value is already an astropy Quantity, then cast it into the desired unit. Otherwise, value is assumed to be a float and converted directly to the desired unit.

setigen.unit_utils.get_value(value, unit=None)[source]

This function converts a value, which may be a float or astropy Quantity, into a float (in terms of a desired unit).

If we know that value is an astropy Quantity, then grabbing the value is simple (and we can cast this to a desired unit, if we need to change this.

If value is already a float, it simply returns value.

Module contents