My Old Dominion colleagues and I have published an extended version of the 2022 MILCOM paper My Papers  in the journal MDPI Sensors. The first author is John Snoap, who is one of those rare people that is an expert in signal processing and in machine learning. Bright future there! Dimitrie Popescu, James Latshaw, and I provided analysis, programming, writing, and research-direction support.
The new paper is titled “Deep-Learning-Based Classification of Digitally Modulated Signals Using Capsule Networks and Cyclic Cumulants,” and is My Papers . If you go to the My Papers page, you can download a pdf of the new paper using a link in the citation for .
In the extended paper , we provide additional details of cyclic-cumulant estimation and direct comparisons to a CSP-based blind modulation-recognition algorithm (My Papers [25,26,28]). The discussions concerning motivations, processing approaches, and future directions are also extended relative to .
Like , the focus of  is on the generalization problem associated with trained neural networks. In our application area, modulation recognition, and in many other areas, a major drawback of using trained neural networks (convolutional neural networks, residual networks, capsule networks, etc.) is that their performance is highly sensitive to slight changes in the probability density functions that describe the random variables influencing the input data. This brittleness has several names, including generalization, dataset-shift, data drift, data shift, and concept shift.
We find, perhaps unsurprisingly, that there is no dataset-shift (generalization) problem for simple modulation-recognition problems if the input is a principled extracted data feature rather than I/Q samples. The principled feature here is a matrix of cyclic-cumulant magnitudes of various orders (such as the features depicted in the CSP Blog banner). By principled I simply mean that the feature is directly related to the fundamental mathematical characterization of the data, which is the set of all joint probability density functions for the samples. Such features contrast with data-mining features obtained by rooting around in some giant dataset looking for correlations (and you’ll always find some, principles be damned).
The obtained excellent generalization of our networks when using cyclic-cumulant inputs can be explained by realizing that the (properly estimated and normalized) cyclic cumulants for a BPSK signal with rate , carrier offset of $f_1$, and square-root raised-cosine pulse rolloff of are exactly the same as those for a BPSK signal with rate , offset and rolloff . All BPSK signals (with a fixed rolloff) are characterized by the same feature matrix. So the distribution of the bit rates and/or the carrier offsets is immaterial. This is not the case for I/Q input data.
The drawback of the cyclic-cumulant-input approach to training neural networks is that, well, you have to estimate, blindly, the cyclic-cumulant matrix. If only we could stick with I/Q inputs and get both the high performance and the excellent generalization that comes with using cyclic cumulants as inputs… Well, we can. We’ve done some work to show that and have a couple MILCOM papers in submission. I’m looking forward to seeing you all again at MILCOM 2023 if we can get those papers accepted.
The crucial point, which I’ve made before and so am in danger of belaboring it, is that to obtain simultaneous good performance and good generalization in machine-learning modulation recognition, one needs a machine that is designed with the modulation-recognition problem in mind. Therefore, we have explicitly rejected the wholesale copying of successful image-recognition neural networks to the RF domain in favor of designing network layers that have the chance to extract the very features that we know work best. The modulation-recognition problem is not the same, in terms of the probabilistic description of the input data, as the image-recognition problem and convolutions won’t cut it. The original motivation for including all the different two-dimensional convolutions in the network was to mimic known good performance of biological image-recognition systems (human eye-brain system). That system is terrible at modulation recognition by staring at plots of I/Q data, but great at finding the cat in the photo.
There is no universal classifier that provides good performance AND good generalization across multiple disparate domains.
Here is an extracted figure from the paper to motivate you to go read the whole thing. We used the CSP Blog datasets CSPB.ML.2018 and CSPB.ML.2022 to assess performance and generalization differences between networks with different kinds of inputs.