My Old Dominion colleagues and I have published an extended version of the 2022 MILCOM paper My Papers [52] in the journal MDPI Sensors. The first author is John Snoap, who is one of those rare people that is an expert in signal processing *and* in machine learning. Bright future there! Dimitrie Popescu, James Latshaw, and I provided analysis, programming, writing, and research-direction support.

The new paper is titled “Deep-Learning-Based Classification of Digitally Modulated Signals Using Capsule Networks and Cyclic Cumulants,” and is My Papers [54]. If you go to the My Papers page, you can download a pdf of the new paper using a link in the citation for [54].

In the extended paper [54], we provide additional details of cyclic-cumulant estimation and direct comparisons to a CSP-based blind modulation-recognition algorithm (My Papers [25,26,28]). The discussions concerning motivations, processing approaches, and future directions are also extended relative to [52].

Like [52], the focus of [54] is on the generalization problem associated with trained neural networks. In our application area, modulation recognition, and in many other areas, a major drawback of using trained neural networks (convolutional neural networks, residual networks, capsule networks, etc.) is that their performance is highly sensitive to slight changes in the probability density functions that describe the random variables influencing the input data. This brittleness has several names, including generalization, dataset-shift, data drift, data shift, and concept shift.

We find, perhaps unsurprisingly, that there is no dataset-shift (generalization) problem for simple modulation-recognition problems if the input is a principled extracted data feature rather than I/Q samples. The principled feature here is a matrix of cyclic-cumulant magnitudes of various orders (such as the features depicted in the CSP Blog banner). By *principled* I simply mean that the feature is directly related to the fundamental mathematical characterization of the data, which is the set of all joint probability density functions for the samples. Such features contrast with data-mining features obtained by rooting around in some giant dataset looking for correlations (and you’ll always find some, principles be damned).

The obtained excellent generalization of our networks when using cyclic-cumulant inputs can be explained by realizing that the (properly estimated and normalized) cyclic cumulants for a BPSK signal with rate , carrier offset of $f_1$, and square-root raised-cosine pulse rolloff of are exactly the same as those for a BPSK signal with rate , offset and rolloff . All BPSK signals (with a fixed rolloff) are characterized by the same feature matrix. So the distribution of the bit rates and/or the carrier offsets is immaterial. This is not the case for I/Q input data.

The drawback of the cyclic-cumulant-input approach to training neural networks is that, well, you have to estimate, blindly, the cyclic-cumulant matrix. If only we could stick with I/Q inputs and get both the high performance and the excellent generalization that comes with using cyclic cumulants as inputs… Well, we can. We’ve done some work to show that and have a couple MILCOM papers in submission. I’m looking forward to seeing you all again at MILCOM 2023 if we can get those papers accepted.

The crucial point, which I’ve made before and so am in danger of belaboring it, is that to obtain simultaneous good performance and good generalization in machine-learning modulation recognition, one needs a machine that is designed with the modulation-recognition problem in mind. Therefore, we have explicitly rejected the wholesale copying of successful image-recognition neural networks to the RF domain in favor of designing network layers that have the chance to extract the very features that we *know* work best. The modulation-recognition problem is not the same, in terms of the probabilistic description of the input data, as the image-recognition problem and convolutions won’t cut it. The original motivation for including all the different two-dimensional convolutions in the network was to mimic known good performance of biological image-recognition systems (human eye-brain system). That system is terrible at modulation recognition by staring at plots of I/Q data, but great at finding the cat in the photo.

There is no universal classifier that provides good performance AND good generalization across multiple disparate domains.

Here is an extracted figure from the paper to motivate you to go read the whole thing. We used the CSP Blog datasets CSPB.ML.2018 and CSPB.ML.2022 to assess performance and generalization differences between networks with different kinds of inputs.

Hi Chad, been loving your work. For a gem of a blog post (though a bit advertising-ish of course) from Renesas that I think is saying the same message, check out https://www.renesas.com/tw/en/blogs/ffts-and-stupid-deep-learning-tricks

Hey Marty! Thanks much for the link and note. Comments like this are great because they strengthen the ties between useful websites. Sometimes it is hard to find information that you can relate to and trust–you’re making things even better!

I do think that Stuart Feffer’s remarks at that link are consistent with my views on the topics of feature engineering, machine learning, and signal processing. Overall, he does support the idea that maybe just throwing the data at the network isn’t the best idea every time.

His example of the Fourier transform in the context of fault diagnosis for rotating machinery is interesting and relevant to the CSP Blog in a couple ways.

First, I myself wondered about whether a machine could learn the Fourier transform. I conclude “not really, no, but kind of, approximately, yes.” More generally, I believe the research program that I’m working on with ODU (John Snoap) is consistent with the idea that you probably should feed the network principled features based on the physics or mathematical structure of the problem at hand–there is no practical universal classifier. You still need some expertise.

But also, secondly, there is a large body of research devoted to using spectral correlation to do early fault diagnosis for rotating machinery, rather than the Fourier transform directly. That is what animates J. Antoni et al. Unfortunately, the data I have from some of those researchers cannot be shared on the CSP Blog yet.

* * *

Feffer says this:

But I’m not so sure. That is, I’m not sure “enough” is achievable in the real world. Nobody has been able to show me they can obtain a DNN for modulation recognition that has both high performance and high generalization

when IQ data is at the input. Maybe we don’t have enough time left to do it before the sun goes nova. I suppose I am stuck, then, on what Feffer means, exactly, by “reasonable” here.Here is where we

really reallyagree: