DeepSig’s data sets are popular in the machine-learning modulation-recognition community, and in that community there are many claims that the deep neural networks are vastly outperforming any expertly hand-crafted tired old conventional method you care to name (none are usually named though). So I’ve been looking under the hood at these data sets to see what the machine learners think of as high-quality inputs that lead to disruptive upending of the sclerotic mod-rec establishment. In previous posts, I’ve looked at two of the most popular DeepSig data sets from 2016 (here and here). In this post, we’ll look at one more and I will then try to get back to the CSP posts.
Let’s take a look at one more DeepSig data set: 2018.01.OSC.0001_1024x2M.h5.tar.gz.
I presented an analysis of one of DeepSig’s earlier modulation-recognition data sets (RML2016.10a.tar.bz2) in the post on All BPSK Signals. There we saw several flaws in the data set as well as curiosities. Most notably, the signals in the data set labeled as analog amplitude-modulated single sideband (AM-SSB) were absent: these signals were only noise. DeepSig has several other data sets on offer at the time of this writing:
In this post, I’ll present a few thoughts and results for the “Larger Version” of RML2016.10a.tar.bz2, which is called RML2016.10b.tar.bz2. This is a good post to offer because it is coherent with the first RML post, but also because more papers are being published that use the RML 10b data set, and of course more such papers are in review. Maybe the offered analysis here will help reviewers to better understand and critique the machine-learning papers. The latter do not ever contain any side analysis or validation of the RML data sets (let me know if you find one that does in the Comments below), so we can’t rely on the machine learners to assess their inputs.
I’ve seen several published and pre-published (arXiv.org) technical papers over the past couple of years on the topic of cyclic correntropy (The Literature [R123-R127]). I first criticized such a paper ([R123]) here, but the substance of that review was about my problems with the presented mathematics, not impulsive noise and its effects on CSP. Since the papers keep coming, apparently, I’m going to put down some thoughts on impulsive noise and some evidence regarding simple means of mitigation in the context of CSP. Preview: I don’t think we need to go to the trouble of investigating cyclic correntropy as a means of salvaging CSP from the clutches of impulsive noise.
There are some situations in which the spectral correlation function is not the preferred measure of (second-order) cyclostationarity. In these situations, the cyclic autocorrelation (non-conjugate and conjugate versions) may be much simpler to estimate and work with in terms of detector, classifier, and estimator structures. So in this post, I’m going to provide plots of the cyclic autocorrelation for each of the signals in the spectral correlation gallery post. The exceptions are those signals I called feature-rich in the spectral correlation gallery post, such as LTE and radar. Recall that such signals possess a large number of cycle frequencies, and plotting their three-dimensional spectral correlation surface is not helpful as it is difficult to interpret with the human eye. So for the cycle-frequency patterns of feature-rich signals, we’ll rely on the stem-style (cyclic-domain profile) plots in the gallery post.
Update September 2020. I made a mistake when I created the signal-parameter “truth” files signal_record.txt and signal_record_first_20000.txt. Like the DeepSig RML data sets that I analyzed on the CSP Blog here and here, the SNR parameter in the truth files did not match the actual SNR of the signals in the data files. I’ve updated the truth files and the links below. You can still use the original files for all other signal parameters, but the SNR parameter was in error.
Update July 2020. I originally posted signals in the posted data set. I’ve now added another for a total of signals. The original signals are contained in Batches 1-5, the additional signals in Batches 6-28. I’ve placed these additional Batches at the end of the post to preserve the original post’s content.
I’ve posted PSK/QAM signals to the CSP Blog. These are the signals I refer to in the post I wrote challenging the machine-learners. In this brief post, I provide links to the data and describe how to interpret the text file containing the signal-type labels and signal parameters.
Overview of Data Set
The signals are stored in five zip files, each containing individual signal files:
Each signal file is stored in a binary format involving interleaved real and imaginary parts, which I call ‘.tim’ files. You can read a .tim file into MATLAB using read_binary.m. Or use the code inside read_binary.m to write your own data-reader; the format is quite simple.
The Label and Parameter File
Let’s look at the format of the truth/label file. The first line of signal_record_first_20000.txt is
which comprises fields. All temporal and spectral parameters (times and frequencies) are normalized with respect to the sampling rate. In other words, the sampling rate can be taken to be unity in this data set. These fields are described in the following list:
Signal index. In the case above this is `1′ and that means the file containing the signal is called signal_1.tim. In general, the th signal is contained in the file signal_n.tim. The Batch 1 zip file contains signal_1.tim through signal_4000.tim.
Signal type. A string indicating the modulation format of the signal in the file. For this data set, I’ve only got eight modulation types: BPSK, QPSK, 8PSK, -DQPSK, 16QAM, 64QAM, 256QAM, and MSK. These are denoted by the strings bpsk, qpsk, 8psk, dqpsk, 16qam, 64qam, 256qam, and msk, respectively.
Base symbol period. In the example above (line one of the truth file), the base symbol period is .
Carrier offset. In this case, it is .
Excess bandwidth. The excess bandwidth parameter, or square-root raised-cosine roll-off parameter, applies to all of the signal types except MSK. Here it is . It can be any real number between and .
Upsample factor. The sixth field is an upsampling parameter U.
Downsample factor. The seventh field is a downsampling parameter D. The actual symbol rate of the signal in the file is computed from the base symbol period, upsample factor, and downsample factor: . So the BPSK signal in signal_1.tim has rate . If the downsample factor is zero in the truth-parameters file, no resampling was done to the signal.
Inband SNR (dB). The ratio of the signal power to the noise power within the signal’s bandwidth, taking into account the signal type and the excess bandwidth parameter.
Noise spectral density (dB). It is always dB. So the various SNRs are generated by varying the signal power.
To ensure that you have correctly downloaded and interpreted my data files, I’m going to provide some PSD plots and a couple of the actual sample values for a couple of the files.
which means the symbol rate is given by . The carrier offset is and the excess bandwidth is . Because the signal type is 256QAM, it has a single (non-zero) non-conjugate cycle frequency of and no conjugate cycle frequencies. But the square of the signal has cycle frequencies related to the quadrupled carrier:
Is waveforms a large enough data set? Maybe not. I have generated tens of thousands more, but will not post until there is a good reason to do so. And that, my friends, is up to you!
That’s about it. I think that gives you enough information to ensure that you’ve interpreted the data and the labels correctly. What remains is experimentation, machine-learning or otherwise I suppose. Please get back to me and the readers of the CSP Blog with any interesting results using the Comments section of this post or the Challenge post.
For my analysis of a commonly used machine-learning modulation-recognition data set (RML), see the All BPSK Signals post.
So why do I obsess over cyclostationary signals and cyclostationary signal processing? What’s the big deal, in the end? In this post I discuss my view of the ultimate use of cyclostationary signal processing (CSP): Radio-Frequency Scene Analysis (RFSA). Eventually, I hope to create a kind of Star Trek Tricorder for RFSA.