Update October 2020:
Since I wrote the paper review in this post, I’ve analyzed three of O’Shea’s data sets (O’Shea is with the company DeepSig, so I’ve been referring to the data sets as DeepSig’s in other posts): All BPSK Signals, More on DeepSig’s Data Sets, and DeepSig’s 2018 Data Set. The data set relating to this paper is analyzed in All BPSK Signals. Preview: It is heavily flawed.
In this post I provide some comments on another paper I’ve seen on arxiv.org (I have also received copies of it through email) that relates to modulation classification and cyclostationary signal processing. The paper is by O’Shea et al and is called “Convolutional Radio Modulation Recognition Networks.” (The Literature [R138]) You can find it at this link.
My main interest in commenting on this paper is that it makes reference to cyclic moments as good features for modulation recognition. Although I think cyclic cumulants are a much better way to go, we do know that for order , cyclic moments and cyclic cumulants are equal (provided there are no finite-strength additive sine-wave components in the data). So, the modulation recognition algorithms that use the spectral correlation function or cyclic autocorrelation function can be said to be using cyclic moments. That is, for order two, we can say we are using either cyclic moments or cyclic cumulants as we prefer.
Let’s start with Section 2.1, titled “Expert Cyclic-Moment Features.” We have the quote:
Integrated cyclic-moment based features  are currently widely popular in performing modulation recognition and for forming analytically derived decision trees to sort modulations into different classes.
In general, they take the form given in equation 3
By computing the mth order statistic on the nth power of the instantaneous or time delayed received signal , we may obtain a set of statistics which uniquely separate it from other modulations given a decision process on the features. For our expert feature set, we compute 32 features. These consist of cyclic time lags of 0 and 8 samples. And the first 2 moments of the first 2 powers of the complex received signal, the amplitude, the phase, and the absolute value of the phase for each of these lags.
And that’s all that they say about “expert cyclic-moment features.”
I’m not at all sure what they mean, but I think I am an expert in cyclic moments. So I’m going to take take the quote from Section 2.1 seriously for a moment to see if there is an interpretation that is charitable.
In the quote, reference  is my paper with Gardner My Papers , which is titled “Signal Interception: Performance Advantages of Cyclic Feature Detectors,” which is all about the cycle detectors, not modulation classification. And it says nothing about moments with orders higher than two except we intend to study their effectiveness in the future. So the first part of the quote doesn’t make much sense.
Now let’s look at Equation (3). I’ve looked through the paper several times, but I cannot find a definition of . Perhaps it is an infinite-time average, but if so, the subscript doesn’t fit. Perhaps it is the homogeneous th-order transformation
But then it isn’t a moment, it is just a nonlinearly transformed signal, and so isn’t a feature at all.
On the right side of (3), what is ? Perhaps it is the “cyclic time lag” mentioned below the equation. But does it appear in any of the terms represented by the ellipsis in (3)? Why specify the feature in terms of absolute numbers like ?
Perhaps is both the th-order nonlinearity and the infinite time averaging operation, rolled into one functional? But where is the cycle frequency? (Recall the section starts out by talking about “cyclic moment features.”)
I couldn’t find any other mentions of nor in the remainder of the paper.
So let’s now ignore (3) and focus on the words that follow it: “And the first 2 moments of the first 2 powers of the complex received signal …” Let’s let the received data be denoted by and look at what this phrase might mean. The first two powers of are
The first moments of the first two powers are then
The moment is typically zero. Exceptions are signals like OOK and AM with a transmitted carrier.
The second moments of the first two powers are
And I suppose all of these with one or more of the factors delayed by .
Out of the four quantities, two are redundant, one is typically zero for the signals of interest to the authors, and the last one is the expected value of the fourth power of the signal. This is what we call here on the CSP Blog the “(4,0) moment”, because the order is four and there are no applied conjugations, . This is zero for MPSK, and CPM/CPFSK with a few exceptions like MSK. But it is a good feature for digital QAM in general, especially if you break out the Fourier-series components (the cyclic moments).
Where are the conjugations?
Later in the paper, the authors describe their simulated signals, and they say
“Data is modulated at a rate of roughly 8 samples per symbol with a normalized average transmit power of 0 dB.”
Now if they use in true cyclic moments, they will be using cyclic moments that are small or zero (depending on how “rough” the samples per symbol is).
So, in the end, I can’t see how the section on Expert Cyclic-Moment Features lives up to its name.
Much of the rest of the paper is devoted to applying machine learning tools to a large simulated data set, and there are confusing issues there too, but I digress. There are a few more instances of strange comments relating to signals and their properties:
“We treat the complex valued input as an input dimension of 2 real valued inputs and use as a set of vectors into a narrow 2D Convolutional Network where the orthogonal synchronously sampled In-Phase and Quadrature (I & Q) samples make up this 2-wide dimension.”
In Figure 2 (which is quite tiny), the authors show high-SNR PSDs for some of their generated signals. BPSK appears to contain a tone (is it really OOK?), AM-SSB appears to span the entire sampling bandwidth, and WBFM looks like a sine wave.
In explaining how 8PSK might be confused for QPSK,
“An 8PSK symbol containing the specific bits is indiscernible from QPSK since the QPSK constellation points are spanned by 8PSK points.”
But QPSK is not confused for BPSK here, and yet the BPSK constellation points are spanned by the QPSK points.
I think we get to the real motivation for the paper in the last sentence of the section on Future Work:
“This application domain is ripe for a wide array of further investigation and applications which will significantly impact the state of the art in wireless signal processing and cognitive radio domains, shifting them more towards machine learning and data driven approaches.”
I would welcome this conclusion, and the new research avenues, if it was based on a study that accurately represented what has come before.