Update May 20, 2022: Here is the arxiv.org link.
Back in 2018 I posted a dataset consisting of 112,000 I/Q data files, 32,768 samples in length each, as a part of a challenge to machine learners who had been making strong claims of superiority over signal processing in the area of automatic modulation recognition. One part of the challenge was modulation recognition involving eight digital modulation types, and the other was estimating the carrier frequency offset. That dataset is described here, and I’d like to refer to it as CSPB.ML.2018.
Then in 2022 I posted a companion dataset to CSPB.ML.2018 called CSPB.ML.2022. This new dataset uses the same eight modulation types, similar ranges of SNR, pulse type, and symbol rate, but the random variable that governs the carrier frequency offset is different with respect to the random variable in CSPB.ML.2018. The purpose of the CSPB.ML.2022 dataset is to facilitate studies of the dataset-shift, or generalization, problem in machine learning.
Throughout the past couple of years I’ve been working with some graduate students and a professor at Old Dominion University on merging machine learning and signal processing for problems involving RF signal analysis, such as modulation recognition. We are starting to publish a sequence of papers that describe our efforts. I briefly describe the results of one such paper, My Papers [51], in this post.
We’ve been working with the two CSPB datasets but also with DeepSig’s datasets (RML 2016a, 2016b, 2016c, 2018), so stay tuned for further papers that include some non-CSPB data.
The two key results in [51] are (1) Capsule networks can provide modulation recognition performance that exceeds the presented CSP-based performance I show for the Challenge (CSPB.ML.2018) and Shifted Challenge (CSPB.ML.2022) datasets, and (2) no considered network generalizes from one of the datasets to the other when using I/Q data as the network input. This means that the many claims of superiority of deep neural networks for modulation recognition are ill-founded: the superiority, if it exists at all, is constrained to lie entirely within the particular (and always narrow) training/testing dataset. DNNs with I/Q data at the input have not been shown to solve any general modulation-recognition problem.
In future papers (near term) we demonstrate that all considered networks generalize very well when using cyclic temporal cumulants as the network input.
We’ve not yet tried using neural networks to address the second part of the Challenge involving estimation of the carrier frequency offset.
Here are some highlights.


At this point in the post, I had intended to say “Hey, go get the paper at this arxiv.org link and see all the details for yourself!” But I cannot post the paper because it appears that arxiv.org is heavily biased against non-academic users. I do not have a current .edu email address or affiliation, so arxiv.org requires an “endorsement” so that I can post a paper in the electrical engineering category, signal processing subcategory. However, all the people I checked on the site, including other academic colleagues of mine, some famous and prolific, are ineligible to endorse, according to arxiv.org itself.
So my co-authors will have to post the paper. I’ll update this post with a link when I get it.
***
This bias against non-university researchers is more problematic the more I ponder it. I’ve just had an academic colleague tell me she has posted multiple papers without ever needing an endorsement; the bias is verified. This means that academics can effectively shut out non-academics from the site (in principle, I hasten to add), since only non-academics require endorsements. Which in turn means that non-academic researchers that are critical of academic work could find themselves unable to use arxiv.org.
***
Update May 20, 2022: Here is the arxiv.org link.