## DeepSig’s 2018 Data Set: 2018.01.OSC.0001_1024x2M.h5.tar.gz

DeepSig’s data sets are popular in the machine-learning modulation-recognition community, and in that community there are many claims that the deep neural networks are vastly outperforming any expertly hand-crafted tired old conventional method you care to name (none are usually named though). So I’ve been looking under the hood at these data sets to see what the machine learners think of as high-quality inputs that lead to disruptive upending of the sclerotic mod-rec establishment. In previous posts, I’ve looked at two of the most popular DeepSig data sets from 2016 (here and here). In this post, we’ll look at one more and I will then try to get back to the CSP posts.

Let’s take a look at one more DeepSig data set: 2018.01.OSC.0001_1024x2M.h5.tar.gz.

Continue reading “DeepSig’s 2018 Data Set: 2018.01.OSC.0001_1024x2M.h5.tar.gz”

## All BPSK Signals

Update June 2020

I’ll be adding new papers to this post as I find them. At the end of the original post there is a sequence of date-labeled updates that briefly describe the relevant aspects of the newly found papers.

## Symmetries of Higher-Order Temporal Probabilistic Parameters in CSP

In this post, we continue our study of the symmetries of CSP parameters. The second-order parameters–spectral correlation and cyclic correlation–are covered in detail in the companion post, including the symmetries for ‘auto’ and ‘cross’ versions of those parameters.

Here we tackle the generalizations of cyclic correlation: cyclic temporal moments and cumulants. We’ll deal with the generalization of the spectral correlation function, the  cyclic polyspectra, in a subsequent post. It is reasonable to me to focus first on the higher-order temporal parameters, because I consider the temporal parameters to be much more useful in practice than the spectral parameters.

This topic is somewhat harder and more abstract than the second-order topic, but perhaps there are bigger payoffs in algorithm development for exploiting symmetries in higher-order parameters than in second-order parameters because the parameters are multidimensional. So it could be worthwhile to sally forth.

## SPTK: The Fourier Series

This installment of the Signal Processing Toolkit shows how the Fourier series arises from a consideration of representing arbitrary signals as vectors in a signal space. We also provide several examples of Fourier series calculations, interpret the Fourier series, and discuss its relevance to cyclostationary signal processing.

## A Gallery of Cyclic Correlations

There are some situations in which the spectral correlation function is not the preferred measure of (second-order) cyclostationarity. In these situations, the cyclic autocorrelation (non-conjugate and conjugate versions) may be much simpler to estimate and work with in terms of detector, classifier, and estimator structures. So in this post, I’m going to provide plots of the cyclic autocorrelation for each of the signals in the spectral correlation gallery post. The exceptions are those signals I called feature-rich in the spectral correlation gallery post, such as LTE and radar. Recall that such signals possess a large number of cycle frequencies, and plotting their three-dimensional spectral correlation surface is not helpful as it is difficult to interpret with the human eye. So for the cycle-frequency patterns of feature-rich signals, we’ll rely on the stem-style (cyclic-domain profile) plots in the gallery post.

## Simple Synchronization Using CSP

In this post I discuss the use of cyclostationary signal processing applied to communication-signal synchronization problems. First, just what are synchronization problems? Synchronize and synchronization have multiple meanings, but the meaning of synchronize that is relevant here is something like:

syn·chro·nize: To cause to occur or operate with exact coincidence in time or rate

If we have an analog amplitude-modulated (AM) signal (such as voice AM used in the AM broadcast bands) at a receiver we want to remove the effects of the carrier sine wave, resulting in an output that is only the original voice or music message. If we have a digital signal such as binary phase-shift keying (BPSK), we want to remove the effects of the carrier but also sample the message signal at the correct instants to optimally recover the transmitted bit sequence.

## Can a Machine Learn a Power Spectrum Estimator?

I continue with my foray into machine learning (ML) by considering whether we can use widely available ML tools to create a machine that can output accurate power spectrum estimates. Previously we considered the perhaps simpler problem of learning the Fourier transform. See here and here.

Along the way I’ll expose my ignorance of the intricacies of machine learning and my apparent inability to find the correct hyperparameter settings for any problem I look at. But, that’s where you come in, dear reader. Let me know what to do!

## Data Set for the Machine-Learning Challenge

Update September 2020. I made a mistake when I created the signal-parameter “truth” files signal_record.txt and signal_record_first_20000.txt. Like the DeepSig RML data sets that I analyzed on the CSP Blog here and here, the SNR parameter in the truth files did not match the actual SNR of the signals in the data files. I’ve updated the truth files and the links below. You can still use the original files for all other signal parameters, but the SNR parameter was in error.

Update July 2020. I originally posted $20,000$ signals in the posted data set. I’ve now added another $92,000$ for a total of $112,000$ signals. The original signals are contained in Batches 1-5, the additional signals in Batches 6-28. I’ve placed these additional Batches at the end of the post to preserve the original post’s content.

I’ve posted $20000$ PSK/QAM signals to the CSP Blog. These are the signals I refer to in the post I wrote challenging the machine-learners. In this brief post, I provide links to the data and describe how to interpret the text file containing the signal-type labels and signal parameters.

### Overview of Data Set

The $20,000$ signals are stored in five zip files, each containing $4000$ individual signal files:

Batch 1

Batch 2

Batch 3

Batch 4

Batch 5

The zip files are each about 1 GB in size.

The modulation-type labels for the signals, such as “BPSK” or “MSK,” are contained in the text file:

signal_record_first_20000.txt

Each signal file is stored in a binary format involving interleaved real and imaginary parts, which I call ‘.tim’ files. You can read a .tim file into MATLAB using read_binary.m. Or use the code inside read_binary.m to write your own data-reader; the format is quite simple.

### The Label and Parameter File

Let’s look at the format of the truth/label file. The first line of signal_record_first_20000.txt is

1 bpsk  11  -7.4433467080e-04  9.8977795076e-01  10  9  5.4532617590e+00  0.0

which comprises $9$ fields. All temporal and spectral parameters (times and frequencies) are normalized with respect to the sampling rate. In other words, the sampling rate can be taken to be unity in this data set. These fields are described in the following list:

1. Signal index. In the case above this is `1′ and that means the file containing the signal is called signal_1.tim. In general, the $n$th signal is contained in the file signal_n.tim. The Batch 1 zip file contains signal_1.tim through signal_4000.tim.
2. Signal type. A string indicating the modulation format of the signal in the file. For this data set, I’ve only got eight modulation types: BPSK, QPSK, 8PSK, $\pi/4$-DQPSK, 16QAM, 64QAM, 256QAM, and MSK. These are denoted by the strings bpsk, qpsk, 8psk, dqpsk, 16qam, 64qam, 256qam, and msk, respectively.
3. Base symbol period. In the example above (line one of the truth file), the base symbol period is $T_0 = 11$.
4. Carrier offset. In this case, it is $-7.4433467080\times 10^{-4}$.
5. Excess bandwidth. The excess bandwidth parameter, or square-root raised-cosine roll-off parameter, applies to all of the signal types except MSK. Here it is $9.8977795076\times 10^{-1}$. It can be any real number between $0.1$ and $1.0$.
6. Upsample factor. The sixth field is an upsampling parameter U.
7. Downsample factor. The seventh field is a downsampling parameter D. The actual symbol rate of the signal in the file is computed from the base symbol period, upsample factor, and downsample factor: $\displaystyle f_{sym} = (1/T_0)*(D/U)$. So the BPSK signal in signal_1.tim has rate $0.08181818$. If the downsample factor is zero in the truth-parameters file, no resampling was done to the signal.
8. Inband SNR (dB). The ratio of the signal power to the noise power within the signal’s bandwidth, taking into account the signal type and the excess bandwidth parameter.
9. Noise spectral density (dB). It is always $0$ dB. So the various SNRs are generated by varying the signal power.

To ensure that you have correctly downloaded and interpreted my data files, I’m going to provide some PSD plots and a couple of the actual sample values for a couple of the files.

### signal_1.tim

The line from the truth file is:

1 bpsk  11  -7.4433467080e-04  9.8977795076e-01  10  9  5.4532617590e+00  0.0

The first ten samples of the file are:

-5.703014e-02   -6.163056e-01
-1.285231e-01   -6.318392e-01
6.664069e-01    -7.007506e-02
7.731103e-01    -1.164615e+00
3.502680e-01    -1.097872e+00
7.825349e-01    -3.721564e-01
1.094809e+00    -3.123962e-01
4.146149e-01    -5.890701e-01
1.444665e+00    7.358724e-01
-2.217039e-01   -1.305001e+00

An FSM-based PSD estimate for signal_1.tim is:

And the blindly estimated cycle frequencies (using the SSCA) are:

The previous plot corresponds to the numerical values:

Non-conjugate $(\alpha, C, S)$:

8.181762695e-02  7.480e-01  5.406e+00

Conjugate $(\alpha, C, S)$:

8.032470942e-02  7.800e-01  4.978e+00
-1.493096002e-03  8.576e-01  1.098e+01
-8.331298083e-02  7.090e-01  5.039e+00

### signal_4000.tim

The line from the truth file is

4000 256qam  9  8.3914849139e-04  7.2367959637e-01  9  8  1.0566301192e+01  0.0

which means the symbol rate is given by $(1/9)*(8/9) = 0.09876543209$. The carrier offset is $0.000839$ and the excess bandwidth is $0.723$. Because the signal type is 256QAM, it has a single (non-zero) non-conjugate cycle frequency of $0.098765$ and no conjugate cycle frequencies. But the square of the signal has cycle frequencies related to the quadrupled carrier:

### Final Thoughts

Is $20000$ waveforms a large enough data set? Maybe not. I have generated tens of thousands more, but will not post until there is a good reason to do so. And that, my friends, is up to you!

That’s about it. I think that gives you enough information to ensure that you’ve interpreted the data and the labels correctly. What remains is experimentation, machine-learning or otherwise I suppose. Please get back to me and the readers of the CSP Blog with any interesting results using the Comments section of this post or the Challenge post.

For my analysis of a commonly used machine-learning modulation-recognition data set (RML), see the All BPSK Signals post.

Batch 6

Batch 7

Batch 8

Batch 9

Batch 10

Batch 11

Batch 12

Batch 13

Batch 14

Batch 15

Batch 16

Batch 17

Batch 18

Batch 19

Batch 20

Batch 21

Batch 22

Batch 23

Batch 24

Batch 25

Batch 26

Batch 28

Signal parameters text file

## Comments on “Detection of Almost-Cyclostationarity: An Approach Based on a Multiple Hypothesis Test” by S. Horstmann et al

I recently came across the conference paper in the post title (The Literature [R101]). Let’s take a look.

The paper is concerned with “detect[ing] the presence of ACS signals with unknown cycle period.” In other words, blind cyclostationary-signal detection and cycle-frequency estimation. Of particular importance to the authors is the case in which the “period of cyclostationarity” is not equal to an integer number of samples. They seem to think this is a new and difficult problem. By my lights, it isn’t. But maybe I’m missing something. Let me know in the Comments.

## CSP Estimators: The FFT Accumulation Method

Let’s look at another spectral correlation function estimator: the FFT Accumulation Method (FAM). This estimator is in the time-smoothing category, is exhaustive in that it is designed to compute estimates of the spectral correlation function over its entire principal domain, and is efficient, so that it is a competitor to the Strip Spectral Correlation Analyzer (SSCA) method. I implemented my version of the FAM by using the paper by Roberts et al (The Literature [R4]). If you follow the equations closely, you can successfully implement the estimator from that paper. The tricky part, as with the SSCA, is correctly associating the outputs of the coded equations to their proper $\displaystyle (f, \alpha)$ values.

## Resolution in Time, Frequency, and Cycle Frequency for CSP Estimators

In this post, we look at the ability of various CSP estimators to distinguish cycle frequencies, temporal changes in cyclostationarity, and spectral features. These abilities are quantified by the resolution properties of CSP estimators.

### Resolution Parameters in CSP: Preview

Consider performing some CSP estimation task, such as using the frequency-smoothing method, time-smoothing method, or strip spectral correlation analyzer method of estimating the spectral correlation function. The estimate employs $T$ seconds of data.

Then the temporal resolution $\Delta t$ of the estimate is approximately $T$, the cycle-frequency resolution $\Delta \alpha$ is about $1/T$, and the spectral resolution $\Delta f$ depends strongly on the particular estimator and its parameters. The resolution product $\Delta f \Delta t$ was discussed in this post. The fundamental result for the resolution product is that it must be very much larger than unity in order to obtain an SCF estimate with low variance.

## CSP Estimators: Cyclic Temporal Moments and Cumulants

In this post we discuss ways of estimating $n$-th order cyclic temporal moment and cumulant functions. Recall that for $n=2$, cyclic moments and cyclic cumulants are usually identical. They differ when the signal contains one or more finite-strength additive sine-wave components. In the common case when such components are absent (as in our recurring numerical example involving rectangular-pulse BPSK), they are equal and they are also equal to the conventional cyclic autocorrelation function provided the delay vector is chosen appropriately.

The more interesting case is when the order $n$ is greater than $2$. Most communication signal models possess odd-order moments and cumulants that are identically zero, so the first non-trivial order $n$ greater than $2$ is $4$. Our estimation task is to estimate $n$-th order temporal moment and cumulant functions for $n \ge 4$ using a sampled-data record of length $T$.

## More on Pure and Impure Sine Waves

Remember when we derived the cumulant as the solution to the pure $n$th-order sine-wave problem? It sounded good at the time, I hope. But here I describe a curious special case where the interpretation of the cumulant as the pure component of a nonlinearly generated sine wave seems to break down.

## Cyclostationarity of Direct-Sequence Spread-Spectrum Signals

In this post we look at direct-sequence spread-spectrum (DSSS) signals, which can be usefully modeled as a kind of PSK signal. DSSS signals are used in a variety of real-world situations, including the familiar CDMA and WCDMA signals, covert signaling, and GPS. My colleague Antonio Napolitano has done some work on a large class of DSSS signals (The Literature [R11, R17, R95]), resulting in formulas for their spectral correlation functions, and I’ve made some remarks about their cyclostationary properties myself here and there (My Papers [16]).

A good thing, from the point of view of modulation recognition, about DSSS signals is that they are easily distinguished from other PSK and QAM signals by their spectral correlation functions. Whereas most PSK/QAM signals have only a single non-conjugate cycle frequency, and no conjugate cycle frequencies, DSSS signals have many non-conjugate cycle frequencies and in some cases also have many conjugate cycle frequencies.

## Cumulant (4, 2) is a Good Discriminator? Comments on “Energy-Efficient Processor for Blind Signal Classification in Cognitive Radio Networks,” by E. Rebeiz et al.

Let’s talk about another published paper on signal detection involving cyclostationarity and/or cumulants. This one is called “Energy-Efficient Processor for Blind Signal Classification in Cognitive Radio Networks,” (The Literature [R69]), and is authored by UCLA researchers E. Rebeiz and four colleagues.

My focus on this paper it its idea that broad signal-type classes, such as direct-sequence spread-spectrum (DSSS), QAM, and OFDM can be reliably distinguished by the use of a single number: the fourth-order cumulant with two conjugated terms. This kind of cumulant is referred to as the $(4, 2)$ cumulant here at the CSP Blog, and in the paper, because the order is $n=4$ and the number of conjugated terms is $m=2$.

## Comments on “Blind Cyclostationary Spectrum Sensing in Cognitive Radios” by W. M. Jang

I recently came across the 2014 paper in the title of this post. I mentioned it briefly in the post on the periodogram. But I’m going to talk about it a bit more here because this is the kind of thing that makes things a bit harder for people trying to learn about cyclostationarity, which eventually leads to the need for something like the CSP Blog.

The idea behind the paper is that it would be nice to avoid the need for prior knowledge of cycle frequencies when using cycle detectors or the like. If you could just compute the entire spectral correlation function, then collapse it by integrating (summing) over frequency $f$, then you’d have a one-dimensional function of cycle frequency $\alpha$ and you could then process that function inexpensively to perform detection and classification tasks.