More on DeepSig’s RML Data Sets

I presented an analysis of one of DeepSig’s earlier modulation-recognition data sets (RML2016.10a.tar.bz2) in the post on All BPSK Signals. There we saw several flaws in the data set as well as curiosities. Most notably, the signals in the data set labeled as analog amplitude-modulated single sideband (AM-SSB) were absent: these signals were only noise. DeepSig has several other data sets on offer at the time of this writing:

In this post, I’ll present a few thoughts and results for the “Larger Version” of RML2016.10a.tar.bz2, which is called RML2016.10b.tar.bz2. This is a good post to offer because it is coherent with the first RML post, but also because more papers are being published that use the RML 10b data set, and of course more such papers are in review. Maybe the offered analysis here will help reviewers to better understand and critique the machine-learning papers. The latter do not ever contain any side analysis or validation of the RML data sets (let me know if you find one that does in the Comments below), so we can’t rely on the machine learners to assess their inputs.

Continue reading “More on DeepSig’s RML Data Sets”

All BPSK Signals

Update June 2020

I’ll be adding new papers to this post as I find them. At the end of the original post there is a sequence of date-labeled updates that briefly describe the relevant aspects of the newly found papers.

Continue reading “All BPSK Signals”

A Gallery of Cyclic Correlations

There are some situations in which the spectral correlation function is not the preferred measure of (second-order) cyclostationarity. In these situations, the cyclic autocorrelation (non-conjugate and conjugate versions) may be much simpler to estimate and work with in terms of detector, classifier, and estimator structures. So in this post, I’m going to provide plots of the cyclic autocorrelation for each of the signals in the spectral correlation gallery post. The exceptions are those signals I called feature-rich in the spectral correlation gallery post, such as LTE and radar. Recall that such signals possess a large number of cycle frequencies, and plotting their three-dimensional spectral correlation surface is not helpful as it is difficult to interpret with the human eye. So for the cycle-frequency patterns of feature-rich signals, we’ll rely on the stem-style (cyclic-domain profile) plots in the gallery post.

Continue reading “A Gallery of Cyclic Correlations”

Data Set for the Machine-Learning Challenge

Update September 2020. I made a mistake when I created the signal-parameter “truth” files signal_record.txt and signal_record_first_20000.txt. Like the DeepSig RML data sets that I analyzed on the CSP Blog here and here, the SNR parameter in the truth files did not match the actual SNR of the signals in the data files. I’ve updated the truth files and the links below. You can still use the original files for all other signal parameters, but the SNR parameter was in error.

Update July 2020. I originally posted 20,000 signals in the posted data set. I’ve now added another 92,000 for a total of 112,000 signals. The original signals are contained in Batches 1-5, the additional signals in Batches 6-28. I’ve placed these additional Batches at the end of the post to preserve the original post’s content.

I’ve posted 20000 PSK/QAM signals to the CSP Blog. These are the signals I refer to in the post I wrote challenging the machine-learners. In this brief post, I provide links to the data and describe how to interpret the text file containing the signal-type labels and signal parameters.

Overview of Data Set

The 20,000 signals are stored in five zip files, each containing 4000 individual signal files:

Batch 1

Batch 2

Batch 3

Batch 4

Batch 5

The zip files are each about 1 GB in size.

The modulation-type labels for the signals, such as “BPSK” or “MSK,” are contained in the text file:


Each signal file is stored in a binary format involving interleaved real and imaginary parts, which I call ‘.tim’ files. You can read a .tim file into MATLAB using read_binary.m. Or use the code inside read_binary.m to write your own data-reader; the format is quite simple.

The Label and Parameter File

Let’s look at the format of the truth/label file. The first line of signal_record_first_20000.txt is

1 bpsk  11  -7.4433467080e-04  9.8977795076e-01  10  9  5.4532617590e+00  0.0

which comprises 9 fields. All temporal and spectral parameters (times and frequencies) are normalized with respect to the sampling rate. In other words, the sampling rate can be taken to be unity in this data set. These fields are described in the following list:

  1. Signal index. In the case above this is `1′ and that means the file containing the signal is called signal_1.tim. In general, the nth signal is contained in the file signal_n.tim. The Batch 1 zip file contains signal_1.tim through signal_4000.tim.
  2. Signal type. A string indicating the modulation format of the signal in the file. For this data set, I’ve only got eight modulation types: BPSK, QPSK, 8PSK, \pi/4-DQPSK, 16QAM, 64QAM, 256QAM, and MSK. These are denoted by the strings bpsk, qpsk, 8psk, dqpsk, 16qam, 64qam, 256qam, and msk, respectively.
  3. Base symbol period. In the example above (line one of the truth file), the base symbol period is T_0 = 11.
  4. Carrier offset. In this case, it is -7.4433467080\times 10^{-4}.
  5. Excess bandwidth. The excess bandwidth parameter, or square-root raised-cosine roll-off parameter, applies to all of the signal types except MSK. Here it is 9.8977795076\times 10^{-1}. It can be any real number between 0.1 and 1.0.
  6. Upsample factor. The sixth field is an upsampling parameter U.
  7. Downsample factor. The seventh field is a downsampling parameter D. The actual symbol rate of the signal in the file is computed from the base symbol period, upsample factor, and downsample factor: \displaystyle f_{sym} = (1/T_0)*(D/U). So the BPSK signal in signal_1.tim has rate 0.08181818. If the downsample factor is zero in the truth-parameters file, no resampling was done to the signal.
  8. Inband SNR (dB). The ratio of the signal power to the noise power within the signal’s bandwidth, taking into account the signal type and the excess bandwidth parameter.
  9. Noise spectral density (dB). It is always 0 dB. So the various SNRs are generated by varying the signal power.

To ensure that you have correctly downloaded and interpreted my data files, I’m going to provide some PSD plots and a couple of the actual sample values for a couple of the files.


The line from the truth file is:

1 bpsk  11  -7.4433467080e-04  9.8977795076e-01  10  9  5.4532617590e+00  0.0

The first ten samples of the file are:

-5.703014e-02   -6.163056e-01
-1.285231e-01   -6.318392e-01
6.664069e-01    -7.007506e-02
7.731103e-01    -1.164615e+00
3.502680e-01    -1.097872e+00
7.825349e-01    -3.721564e-01
1.094809e+00    -3.123962e-01
4.146149e-01    -5.890701e-01
1.444665e+00    7.358724e-01
-2.217039e-01   -1.305001e+00

An FSM-based PSD estimate for signal_1.tim is:


And the blindly estimated cycle frequencies (using the SSCA) are:


The previous plot corresponds to the numerical values:

Non-conjugate (\alpha, C, S):

8.181762695e-02  7.480e-01  5.406e+00

Conjugate (\alpha, C, S):

8.032470942e-02  7.800e-01  4.978e+00
-1.493096002e-03  8.576e-01  1.098e+01
-8.331298083e-02  7.090e-01  5.039e+00


The line from the truth file is

4000 256qam  9  8.3914849139e-04  7.2367959637e-01  9  8  1.0566301192e+01  0.0

which means the symbol rate is given by (1/9)*(8/9) = 0.09876543209. The carrier offset is 0.000839 and the excess bandwidth is 0.723. Because the signal type is 256QAM, it has a single (non-zero) non-conjugate cycle frequency of 0.098765 and no conjugate cycle frequencies. But the square of the signal has cycle frequencies related to the quadrupled carrier:


Final Thoughts

Is 20000 waveforms a large enough data set? Maybe not. I have generated tens of thousands more, but will not post until there is a good reason to do so. And that, my friends, is up to you!

That’s about it. I think that gives you enough information to ensure that you’ve interpreted the data and the labels correctly. What remains is experimentation, machine-learning or otherwise I suppose. Please get back to me and the readers of the CSP Blog with any interesting results using the Comments section of this post or the Challenge post.

For my analysis of a commonly used machine-learning modulation-recognition data set (RML), see the All BPSK Signals post.

Additional Batches of Signals:

Batch 6

Batch 7

Batch 8

Batch 9

Batch 10

Batch 11

Batch 12

Batch 13

Batch 14

Batch 15

Batch 16

Batch 17

Batch 18

Batch 19

Batch 20

Batch 21

Batch 22

Batch 23

Batch 24

Batch 25

Batch 26

Batch 27

Batch 28

Signal parameters text file

MATLAB’s SSCA: commP25ssca.m

In this short post, I describe some errors that are produced by MATLAB’s strip spectral correlation analyzer function commP25ssca.m. I don’t recommend that you use it; far better to create your own function.

Continue reading “MATLAB’s SSCA: commP25ssca.m”

A Challenge for the Machine Learners


I’ve decided to post the data set I discuss here to the CSP Blog for all interested parties to use. See the new post on the Data Set. If you do use it, please let me and the CSP Blog readers know how you fared with your experiments in the Comments section of either post. Thanks!

Continue reading “A Challenge for the Machine Learners”

CSP Estimators: The FFT Accumulation Method

Let’s look at another spectral correlation function estimator: the FFT Accumulation Method (FAM). This estimator is in the time-smoothing category, is exhaustive in that it is designed to compute estimates of the spectral correlation function over its entire principal domain, and is efficient, so that it is a competitor to the Strip Spectral Correlation Analyzer (SSCA) method. I implemented my version of the FAM by using the paper by Roberts et al (The Literature [R4]). If you follow the equations closely, you can successfully implement the estimator from that paper. The tricky part, as with the SSCA, is correctly associating the outputs of the coded equations to their proper \displaystyle (f, \alpha) values.

Continue reading “CSP Estimators: The FFT Accumulation Method”

‘Can a Machine Learn the Fourier Transform?’ Redux, Plus Relevant Comments on a Machine-Learning Paper by M. Kulin et al.

I first considered whether a machine (neural network) could learn the (64-point, complex-valued)  Fourier transform in this post. I used MATLAB’s Neural Network Toolbox and I failed to get good learning results because I did not properly set the machine’s hyperparameters. A kind reader named Vito Dantona provided a comment to that original post that contained good hyperparameter selections, and I’m going to report the new results here in this post.

Since the Fourier transform is linear, the machine should be set up to do linear processing. It can’t just figure that out for itself. Once I used Vito’s suggested hyperparameters to force the machine to be linear, the results became much better:

Continue reading “‘Can a Machine Learn the Fourier Transform?’ Redux, Plus Relevant Comments on a Machine-Learning Paper by M. Kulin et al.”

CSP Patent: Tunneling

My colleague Dr. Apurva Mody (of BAE Systems, IEEE 802.22, and the WhiteSpace Alliance) and I have received a patent on a CSP-related invention we call tunneling. The US Patent is 9,755,869 and you can read it here or download it here. We’ve got a journal paper in review and a 2013 MILCOM conference paper (My Papers [38]) that discuss and illustrate the involved ideas. I’m also working on a CSP Blog post on the topic.

Update December 28, 2017: Our Tunneling journal paper has been accepted for publication in the journal IEEE Transactions on Cognitive Communications and Networking. You can download the pre-publication version here.

Continue reading “CSP Patent: Tunneling”

CSP Estimators: Cyclic Temporal Moments and Cumulants

In this post we discuss ways of estimating n-th order cyclic temporal moment and cumulant functions. Recall that for n=2, cyclic moments and cyclic cumulants are usually identical. They differ when the signal contains one or more finite-strength additive sine-wave components. In the common case when such components are absent (as in our recurring numerical example involving rectangular-pulse BPSK), they are equal and they are also equal to the conventional cyclic autocorrelation function provided the delay vector is chosen appropriately.

The more interesting case is when the order n is greater than 2. Most communication signal models possess odd-order moments and cumulants that are identically zero, so the first non-trivial order n greater than 2 is 4. Our estimation task is to estimate n-th order temporal moment and cumulant functions for n \ge 4 using a sampled-data record of length T.

Continue reading “CSP Estimators: Cyclic Temporal Moments and Cumulants”

Automatic Spectral Segmentation

In this post, I discuss a signal-processing algorithm that has almost nothing to do with cyclostationary signal processing. Almost. The topic is automated spectral segmentation, which I also call band-of-interest (BOI) detection. When attempting to perform automatic radio-frequency scene analysis (RFSA), we may be confronted with a data block that contains multiple signals in a large number of distinct frequency subbands. Moreover, these signals may be turning on an off within the data block. To apply our cyclostationary signal processing tools effectively, we would like to isolate these signals in time and frequency to the greatest extent possible using linear time-invariant filtering (for separating in the frequency dimension) and time-gating (for separating in the time dimension). Then the isolated signal components can be processed serially.

It is very important to remember that even perfect spectral and temporal segmentation will not solve the cochannel-signal problem. It is perfectly possible that an isolated subband will contain more that one cochannel signal.

The basics of my BOI-detection approach are published in a 2007 conference paper (My Papers [32]). I’ll describe this basic approach, illustrate it with examples relevant to RFSA, and also provide a few extensions of interest, including one that relates to cyclostationary signal processing.

Continue reading “Automatic Spectral Segmentation”

Cyclostationarity of Direct-Sequence Spread-Spectrum Signals

In this post we look at direct-sequence spread-spectrum (DSSS) signals, which can be usefully modeled as a kind of PSK signal. DSSS signals are used in a variety of real-world situations, including the familiar CDMA and WCDMA signals, covert signaling, and GPS. My colleague Antonio Napolitano has done some work on a large class of DSSS signals (The Literature [R11, R17, R95]), resulting in formulas for their spectral correlation functions, and I’ve made some remarks about their cyclostationary properties myself here and there (My Papers [16]).

A good thing, from the point of view of modulation recognition, about DSSS signals is that they are easily distinguished from other PSK and QAM signals by their spectral correlation functions. Whereas most PSK/QAM signals have only a single non-conjugate cycle frequency, and no conjugate cycle frequencies, DSSS signals have many non-conjugate cycle frequencies and in some cases also have many conjugate cycle frequencies.

Continue reading “Cyclostationarity of Direct-Sequence Spread-Spectrum Signals”

Cumulant (4, 2) is a Good Discriminator? Comments on “Energy-Efficient Processor for Blind Signal Classification in Cognitive Radio Networks,” by E. Rebeiz et al.

Let’s talk about another published paper on signal detection involving cyclostationarity and/or cumulants. This one is called “Energy-Efficient Processor for Blind Signal Classification in Cognitive Radio Networks,” (The Literature [R69]), and is authored by UCLA researchers E. Rebeiz and four colleagues.

My focus on this paper it its idea that broad signal-type classes, such as direct-sequence spread-spectrum (DSSS), QAM, and OFDM can be reliably distinguished by the use of a single number: the fourth-order cumulant with two conjugated terms. This kind of cumulant is referred to as the (4, 2) cumulant here at the CSP Blog, and in the paper, because the order is n=4 and the number of conjugated terms is m=2.

Continue reading “Cumulant (4, 2) is a Good Discriminator? Comments on “Energy-Efficient Processor for Blind Signal Classification in Cognitive Radio Networks,” by E. Rebeiz et al.”

Machine Learning and Modulation Recognition: Comments on “Convolutional Radio Modulation Recognition Networks” by T. O’Shea, J. Corgan, and T. Clancy

In this post I provide some comments on another paper I’ve seen on (I have also received copies of it through email) that relates to modulation classification and cyclostationary signal processing. The paper is by O’Shea et al and is called “Convolutional Radio Modulation Recognition Networks.” (The Literature [R138]) You can find it at this link.

Continue reading “Machine Learning and Modulation Recognition: Comments on “Convolutional Radio Modulation Recognition Networks” by T. O’Shea, J. Corgan, and T. Clancy”

Modulation Recognition Using Cyclic Cumulants, Part I: Problem Description and Variants

In this post, we start a discussion of what I consider the ultimate application of the theory of cyclostationary signals: Automatic Modulation Recognition. My relevant papers are My Papers [16,17,25,26,28,30,32,33,38,43,44].

Continue reading “Modulation Recognition Using Cyclic Cumulants, Part I: Problem Description and Variants”

Comments on “Blind Cyclostationary Spectrum Sensing in Cognitive Radios” by W. M. Jang

I recently came across the 2014 paper in the title of this post. I mentioned it briefly in the post on the periodogram. But I’m going to talk about it a bit more here because this is the kind of thing that makes things a bit harder for people trying to learn about cyclostationarity, which eventually leads to the need for something like the CSP Blog.

The idea behind the paper is that it would be nice to avoid the need for prior knowledge of cycle frequencies when using cycle detectors or the like. If you could just compute the entire spectral correlation function, then collapse it by integrating (summing) over frequency f, then you’d have a one-dimensional function of cycle frequency \alpha and you could then process that function inexpensively to perform detection and classification tasks.

Continue reading “Comments on “Blind Cyclostationary Spectrum Sensing in Cognitive Radios” by W. M. Jang”

Comments on “Cyclostationary Correntropy: Definition and Application” by Fontes et al

I recently came across a published paper with the title Cyclostationary Correntropy: Definition and Application, by Aluisio Fontes et al. It is published in a journal called Expert Systems with Applications (Elsevier). Actually, it wasn’t the first time I’d seen this work by these authors. I had reviewed a similar paper in 2015 for a different journal.

I was surprised to see the paper published because I had a lot of criticisms of the original paper, and the other reviewers agreed since the paper was rejected. So I did my job, as did the other reviewers, and we tried to keep a flawed paper from entering the literature, where it would stay forever causing problems for readers.

The editor(s) of the journal Expert Systems with Applications did not ask me to review the paper, so I couldn’t give them the benefit of the work I already put into the manuscript, and apparently the editor(s) did not themselves see sufficient flaws in the paper to merit rejection.

It stings, of course, when you submit a paper that you think is good, and it is rejected. But it also stings when a paper you’ve carefully reviewed, and rejected, is published anyway.

Fortunately I have the CSP Blog, so I’m going on another rant. After all, I already did this the conventional rant-free way.

Continue reading “Comments on “Cyclostationary Correntropy: Definition and Application” by Fontes et al”

100-MHz Amplitude Modulation? Comments on “Sub-Nyquist Cyclostationary Detection for Cognitive Radio” by Cohen and Eldar

I came across a paper by Cohen and Eldar, researchers at the Technion in Israel. You can get the paper on the Arxiv site here. The title is “Sub-Nyquist Cyclostationary Detection for Cognitive Radio,” and the setting is spectrum sensing for cognitive radio. I have a question about the paper that I’ll ask below.

Continue reading “100-MHz Amplitude Modulation? Comments on “Sub-Nyquist Cyclostationary Detection for Cognitive Radio” by Cohen and Eldar”

The Cycle Detectors

Let’s take a look at a class of signal-presence detectors that exploit cyclostationarity and in doing so illustrate the good things that can happen with CSP whenever cochannel interference is present, or noise models deviate from simple additive white Gaussian noise (AWGN). I’m referring to the cycle detectors, the first CSP algorithms I ever studied.

Continue reading “The Cycle Detectors”

Radio-Frequency Scene Analysis

So why do I obsess over cyclostationary signals and cyclostationary signal processing? What’s the big deal, in the end? In this post I discuss my view of the ultimate use of cyclostationary signal processing (CSP): Radio-Frequency Scene Analysis (RFSA). Eventually, I hope to create a kind of Star Trek Tricorder for RFSA.

Continue reading “Radio-Frequency Scene Analysis”