.. |setigen| replace:: :mod:`setigen` .. _setigen.funcs: https://setigen.readthedocs.io/en/main/setigen.funcs.html Advanced frame topics ===================== Advanced signal creation ------------------------ Behind the scenes, :func:`~setigen.frame.Frame.Frame.add_signal` uses signal parameter functions to compute intensity for each time, frequency pair in the data. This is kept quite general to allow for the creation of complex signals. In this section, we explore some of the flexibility behind :func:`~setigen.frame.Frame.Frame.add_signal`. Writing custom signal functions ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ You can go beyond |setigen|'s pre-written signal functions by writing your own. For each :func:`~setigen.frame.Frame.Frame.add_signal` input parameter (``path``, ``t_profile``, ``f_profile``, and ``bp_profile``), you can pass in your own custom functions. Note that these inputs are themselves functions. It's important that the functions you pass into each parameter have the correct inputs and outputs. Specifically: ``path`` Function that takes in time [array] ``t`` and outputs a frequency [array] ``t_profile`` Function that takes in time [array] ``t`` and outputs an intensity [array] ``f_profile`` Function that takes in frequency [array] ``f`` and a reference central frequency [array] ``f_center``, and outputs an intensity [array] ``bp_profile`` Function that takes in frequency [array] ``f`` and outputs an intensity [array] For example, here's the code behind the sine path shape: .. code-block:: Python def my_sine_path(f_start, drift_rate, period, amplitude): def path(t): return f_start + amplitude * np.sin(2 * np.pi * t / period) + drift_rate * t return path Alternately, you can use the lambda operator: .. code-block:: Python def my_sine_path(f_start, drift_rate, period, amplitude): return lambda t: return f_start + amplitude * np.sin(2 * np.pi * t / period) + drift_rate * t These can then be incorporated as: .. code-block:: Python signal = frame.add_signal(my_sine_path(f_start=frame.get_frequency(200), drift_rate=2*u.Hz/u.s, period=100*u.s, amplitude=100*u.Hz), stg.constant_t_profile(level=1), stg.box_f_profile(width=20*u.Hz), stg.constant_bp_profile(level=1)) To see more examples on how you might write your own parameter functions, check out the source code behind the pre-written functions (setigen.funcs_). Using arrays as signal parameters ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Sometimes it can be difficult to wrap up a desired signal property into a separate function, or perhaps there is external existing code that calculates desired properties. In these cases, we can also use arrays to describe these signals, instead of functions. .. code-block:: python from astropy import units as u import numpy as np import setigen as stg import matplotlib.pyplot as plt frame = stg.Frame(fchans=1024*u.pixel, tchans=32*u.pixel, df=2.7939677238464355*u.Hz, dt=18.253611008*u.s, fch1=6095.214842353016*u.MHz) frame.add_noise(x_mean=10) rng = np.random.default_rng() path_array = rng.uniform(frame.get_frequency(200), frame.get_frequency(400), 32) t_profile_array = rng.uniform(frame.get_intensity(snr=20), frame.get_intensity(snr=40), 32) frame.add_signal(path_array, t_profile_array, stg.gaussian_f_profile(width=40*u.Hz), stg.constant_bp_profile(level=1)) fig = plt.figure(figsize=(10, 6)) frame.plot(ftype="px", db=False) plt.savefig('frame.png', bbox_inches='tight') plt.show() .. image:: images/advanced_array_synth.png Optimization and accuracy ^^^^^^^^^^^^^^^^^^^^^^^^^ By default, :func:`~setigen.frame.Frame.add_signal` calculates an intensity value for every time, frequency pairing. Depending on the situation, this might not be the best behavior. For example, if you are injecting synthetic narrowband signals into a very large frame of data, it can be inefficient and unnecessary to calculate intensity values for every pixel in the frame. In other cases, perhaps calculating intensity values at only every (dt, df) offset would be too inaccurate. Optimization ~~~~~~~~~~~~ To limit the range of signal computation, you can use the ``bounding_f_range`` parameter of ``add_signal``. This takes in a tuple of frequencies ``(bounding_min, bounding_max)``, between which the signal will be computed. .. code-block:: Python signal = frame.add_signal(stg.constant_path(f_start=frame.get_frequency(200), drift_rate=2*u.Hz/u.s), stg.constant_t_profile(level=1), stg.box_f_profile(width=20*u.Hz), stg.constant_bp_profile(level=1), bounding_f_range=(frame.get_frequency(100), frame.get_frequency(700))) As an example of how this can reduce needless computation, we can time different frame manipulations for a large frame: .. code-block:: Python import time times = [] times.append(time.time()) frame = stg.Frame(fchans=2**20, tchans=32, df=2.7939677238464355*u.Hz, dt=18.253611008*u.s, fch1=6095.214842353016*u.MHz) times.append(time.time()) frame.add_noise(x_mean=10) times.append(time.time()) # Normal add_signal frame.add_signal(stg.constant_path(f_start=frame.get_frequency(200), drift_rate=2*u.Hz/u.s), stg.constant_t_profile(level=frame.get_intensity(snr=30)), stg.gaussian_f_profile(width=40*u.Hz), stg.constant_bp_profile(level=1)) times.append(time.time()) # Limiting computation with bounding_f_range frame.add_signal(stg.constant_path(f_start=frame.get_frequency(200), drift_rate=2*u.Hz/u.s), stg.constant_t_profile(level=frame.get_intensity(snr=30)), stg.gaussian_f_profile(width=40*u.Hz), stg.constant_bp_profile(level=1), bounding_f_range=(frame.get_frequency(100), frame.get_frequency(700))) times.append(time.time()) x = np.array(times) print(x[1:] - x[:-1]) >>> [1.14681625 1.4038794 1.6308465 0.02862048] Depending on the type of signal, you should be cautious when defining a bounding frequency range. For signals with constant drift rate and small spectral width, it isn't too hard to define a range. For example, :func:`~setigen.frame.Frame.add_constant_signal` uses bounding ranges automatically to optimize signal creation. However, for signals with large or stochastic frequency variation, or with long spectral tails, it can be difficult to define a bounding range without cutting off parts of these signals. To illustrate this, using the above example that takes arrays as signal parameters, setting too small of a bounding frequency range can look like: .. code-block:: Python frame = stg.Frame(fchans=1024*u.pixel, tchans=32*u.pixel, df=2.7939677238464355*u.Hz, dt=18.253611008*u.s, fch1=6095.214842353016*u.MHz) frame.add_noise(x_mean=10) rng = np.random.default_rng() path_array = rng.uniform(frame.get_frequency(200), frame.get_frequency(400), 32) t_profile_array = rng.uniform(frame.get_intensity(snr=20), frame.get_intensity(snr=40), 32) frame.add_signal(path_array, t_profile_array, stg.gaussian_f_profile(width=40*u.Hz), stg.constant_bp_profile(level=1), bounding_f_range=(frame.get_frequency(200), frame.get_frequency(300))) .. image:: images/advanced_array_synth_trunc.png Accuracy ~~~~~~~~ To improve accuracy a bit, we can integrate signal computations over subsamples in time and frequency. The function :func:`~setigen.frame.Frame.add_signal` has three boolean parameters: ``integrate_path``, ``integrate_t_profile``, and ``integrate_f_profile``, which control whether various integrations are turned on (by default, they are False). The former two depend on the ``t_subsamples`` parameter, which is the number of bins per time sample (e.g. per dt) over which to integrate; likewise, ``integrate_f_profile`` depends on the ``f_subsamples`` parameter. ``integrate_path`` controls integration of the signal's center frequency with respect to time, ``path``. If your ``path`` varies on timescales shorter than the time resolution dt, then it could make sense to integrate to get more appropriate frequency positions. ``integrate_t_profile`` controls integration of the intensity variation with respect to time, ``t_profile``. If your ``t_profile`` varies on timescales shorter than the time resolution dt, then it could make sense to integrate to get more appropriate intensities. ``integrate_f_profile`` controls integration of the intensity variation with respect to frequency, ``f_profile``. If your ``f_profile`` varies on spectral scales shorter than the frequency resolution df, then it could make sense to integrate to get more appropriate intensities. Note that since integration requires make multiple calculations per pixel, it can increase signal computation time significantly. Be sure to evaluate whether it's actually necessary to integrate, or whether the default add_signal computation is sufficient for your use cases. Here is an example of integration in action: .. code-block:: Python frame = stg.Frame(fchans=1024*u.pixel, tchans=32*u.pixel, df=2.7939677238464355*u.Hz, dt=18.253611008*u.s, fch1=6095.214842353016*u.MHz) frame.add_noise(x_mean=10) frame.add_signal(stg.constant_path(f_start=frame.get_frequency(200), drift_rate=2*u.Hz/u.s), stg.constant_t_profile(level=frame.get_intensity(snr=30)), stg.gaussian_f_profile(width=40*u.Hz), stg.constant_bp_profile(level=1), integrate_path=True, integrate_t_profile=True, integrate_f_profile=True, t_subsamples=10, f_subsamples=10) .. image:: images/advanced_accuracy.png Creating custom observational noise distributions ------------------------------------------------- If you are interested in simulating observations of different resolutions and frequency bands, the underlying noise statistics may certainly differ from the included C-band distributions used by :func:`~setigen.frame.Frame.add_noise_from_obs`. In these cases, it may be best to generate your own parameter distribution arrays from your own observations, and feed those into :func:`~setigen.frame.Frame.add_noise_from_obs` yourself. It is worth mentioning that while you can just inject signals into observational frames directly, real observations may contain real signals as well. By estimating noise parameter distributions from observations, you can generate synthetic chi-squared or Gaussian noise with similar noise statistics as real observations, thereby resembling real data while excluding real signals. To do this, we can use :func:`~setigen.sample_from_obs.get_parameter_distributions`: .. code-block:: Python import setigen as stg waterfall_fn = 'path/to/data.fil' # Number of frequency channels per frame fchans = 1024 # Number of time samples per frame; optional tchans = 32 x_mean_array, x_std_array, x_min_array = stg.get_parameter_distributions(waterfall_fn, fchans, tchans=tchans, f_shift=None) This will iterate over an entire filterbank file, estimating the noise statistics and returning them as numpy arrays. Creating an injected synthetic signal dataset using observations ---------------------------------------------------------------- We can create a dataset based on observations using the :mod:`~setigen.split_utils` module. We can use :func:`~setigen.split_utils.split_waterfall_generator` to create a Python generator that returns ``blimpy`` Waterfall objects, from which we can create |setigen| Frames. The function :func:`~setigen.sample_from_obs.get_parameter_distributions` actually uses this behind the scenes to iterate through observational data. .. code-block:: Python import setigen as stg waterfall_fn = 'path/to/data.fil' fchans = 1024 tchans = 32 waterfall_itr = stg.split_waterfall_generator(waterfall_fn, fchans, tchans=tchans, f_shift=None) waterfall = next(waterfall_itr) frame = stg.Frame(waterfall) Here, ``f_shift`` is the number of indices in the frequency direction to shift before making another slice or split of size ``fchans``. If ``f_shift=None``, it defaults to shifting over by ``fchans``, so that there is no overlap. To construct a full dataset, we can then use the generator to iterate over slices of data and save out frames. As a simple example: .. code-block:: Python for i, waterfall in enumerate(waterfall_itr): frame = stg.Frame(waterfall=waterfall) rng = np.random.default_rng() start_index = rng.integers(0, fchans) stop_index = rng.integers(0, fchans) drift_rate = frame.get_drift_rate(start_index, stop_index) signal = frame.add_constant_signal(f_start=frame.get_frequency(start_index), drift_rate=drift_rate, level=frame.get_intensity(snr=10), width=40, f_profile_type='gaussian') signal_props = { 'start_index': start_index, 'stop_index': stop_index, 'snr': 10, } frame.add_metadata(signal_props) frame.save_pickle('save/path/frame{:06d}.pickle'.format(i)) Depending on the application, it can also pay to save metadata out to a CSV file that tracks filenames / indices with corresponding properties.