Machine Learning and Modulation Recognition: Comments on “Convolutional Radio Modulation Recognition Networks” by T. O’Shea, J. Corgan, and T. Clancy

In this post I provide some comments on another paper I’ve seen on (I have also received copies of it through email) that relates to modulation classification and cyclostationary signal processing. The paper is by O’Shea et al and is called “Convolutional Radio Modulation Recognition Networks.” You can find it at this link.

My main interest in commenting on this paper is that it makes reference to cyclic moments as good features for modulation recognition. Although I think cyclic cumulants are a much better way to go, we do know that for order n=2, cyclic moments and cyclic cumulants are equal (provided there are no finite-strength additive sine-wave components in the data). So, the modulation recognition algorithms that use the spectral correlation function or cyclic autocorrelation function can be said to be using cyclic moments. That is, for order two, we can say we are using either cyclic moments or cyclic cumulants as we prefer.

Let’s start with Section 2.1, titled “Expert Cyclic-Moment Features.” We have the quote:

Integrated cyclic-moment based features [1] are currently widely popular in performing modulation recognition and for forming analytically derived decision trees to sort modulations into different classes.

In general, they take the form given in equation 3

\displaystyle s_{nm} = f_m(x^n(t)\ldots x^n(t+T)) \hfill (3)

By computing the mth order statistic on the nth power of the instantaneous or time delayed received signal r(t), we may obtain a set of statistics which uniquely separate it from other modulations given a decision process on the features. For our expert feature set, we compute 32 features. These consist of cyclic time lags of 0 and 8 samples. And the first 2 moments of the first 2 powers of the complex received signal, the amplitude, the phase, and the absolute value of the phase for each of these lags.

And that’s all that they say about “expert cyclic-moment features.”

I’m not at all sure what they mean, but I think I am an expert in cyclic moments. So I’m going to take take the quote from Section 2.1 seriously for a moment to see if there is an interpretation that is charitable.

In the quote, reference [1] is my paper with Gardner My Papers [1], which is titled “Signal Interception: Performance Advantages of Cyclic Feature Detectors,” which is all about the cycle detectors, not modulation classification. And it says nothing about moments with orders higher than two except we intend to study their effectiveness in the future. So the first part of the quote doesn’t make much sense.

Now let’s look at Equation (3). I’ve looked through the paper several times, but I cannot find a definition of f_m(\cdot). Perhaps it is an infinite-time average, but if so, the subscript m doesn’t fit. Perhaps it is the homogeneous mth-order transformation

\displaystyle f_m(x(t)) = x^m(t) \hfill (A)

But then s_{nm} it isn’t a moment, it is just a nonlinearly transformed signal, and so isn’t a feature at all.

On the right side of (3), what is T? Perhaps it is the “cyclic time lag” mentioned below the equation. But does it appear in any of the terms represented by the ellipsis \ldots in (3)? Why specify the feature in terms of absolute numbers like 8?

Perhaps f_m(\cdot) is both the mth-order nonlinearity and the infinite time averaging operation, rolled into one functional? But where is the cycle frequency? (Recall the section starts out by talking about “cyclic moment features.”)

I couldn’t find any other mentions of s_{nm} nor f_m(\cdot) in the remainder of the paper.

So let’s now ignore (3) and focus on the words that follow it: “And the first 2 moments of the first 2 powers of the complex received signal …” Let’s let the received data be denoted by r(t) and look at what this phrase might mean. The first two powers of r(t) are

\displaystyle y_1(t) = r(t) \hfill (B)


\displaystyle y_2(t) = r^2(t) \hfill (C)

The first moments of the first two powers are then

\displaystyle z_1(t) = E[y_1(t)] = E[r(t)] \hfill (D)


\displaystyle z_2(t) = E[y_2^2(t)] = E[r^2(t)] \hfill (E)

The moment z_1(t) is typically zero. Exceptions are signals like OOK and AM with a transmitted carrier.

The second moments of the first two powers are

\displaystyle z_3(t) = E[y_1^2(t)] = z_2(t) \hfill (F)


\displaystyle z_4(t) = E[y_2^2(t)] = E[r^4(t)] \hfill (G)

And I suppose all of these with one or more of the factors delayed by T =  8.

Out of the four quantities, two are redundant, one is typically zero for the signals of interest to the authors, and the last one is the expected value of the fourth power of the signal. This is what we call here on the CSP Blog the “(4,0) moment”, because the order n is four and there are no applied conjugations, m=0. This is zero for MPSK, M > 4 and CPM/CPFSK with a few exceptions like MSK. But it is a good feature for digital QAM in general, especially if you break out the Fourier-series components (the cyclic moments).

Where are the conjugations?

Later in the paper, the authors describe their simulated signals, and they say

“Data is modulated at a rate of roughly 8 samples per symbol with a normalized average transmit power of 0 dB.”

Now if they use T=8 in true cyclic moments, they will be using cyclic moments that are small or zero (depending on how “rough” the 8 samples per symbol is).

So, in the end, I can’t see how the section on Expert Cyclic-Moment Features lives up to its name.

Much of the rest of the paper is devoted to applying machine learning tools to a large simulated data set, and there are confusing issues there too, but I digress. There are a few more instances of strange comments relating to signals and their properties:

“We treat the complex valued input as an input dimension of 2 real valued inputs and use r(t) as a set of 2xN vectors into a narrow 2D Convolutional Network where the orthogonal synchronously sampled In-Phase and Quadrature (I & Q) samples make up this 2-wide dimension.”

In Figure 2 (which is quite tiny), the authors show high-SNR PSDs for some of their generated signals. BPSK appears to contain a tone (is it really OOK?), AM-SSB appears to span the entire sampling bandwidth, and WBFM looks like a sine wave.


In explaining how 8PSK might be confused for QPSK,

“An 8PSK symbol containing the specific bits is indiscernible from QPSK since the QPSK constellation points are spanned by 8PSK points.”

But QPSK is not confused for BPSK here, and yet the BPSK constellation points are spanned by the QPSK points.

I think we get to the real motivation for the paper in the last sentence of the section on Future Work:

“This application domain is ripe for a wide array of further investigation and applications which will significantly impact the state of the art in wireless signal processing and cognitive radio domains, shifting them more towards machine learning and data driven approaches.”

I would welcome this conclusion, and the new research avenues, if it was based on a study that accurately represented what has come before.

11 thoughts on “Machine Learning and Modulation Recognition: Comments on “Convolutional Radio Modulation Recognition Networks” by T. O’Shea, J. Corgan, and T. Clancy

  1. Tim says:

    We’ve implemented a much more rigorous version of your excellent prior work as a baseline in this more recent paper on the topic Including all possible combinations of conjugations for the orders considers and a very strong boosted tree classifier operating on them. I appreciate your harping on semantics and minutia within a preliminary conference paper (the term “Expert … Features” is quite, the feature extractors are derived by an expert to do one specific thing) — we make an effort to leverage your work but do not wish to make it the main focus of discussion in this paper (as you’ve shown it takes a lot of time and detail to explain the careful manual feature engineering required which would take up the whole paper) as we are simply exploring a new approach here — if there were better open source tools and implementations of your work and the datasets, perhaps we could advance the community as a whole, and more easily baseline against your best practice, recall we are not all extreme experts in high order moment engineering. Rather than critiquing the dataset, feel free to send pull requests to the github repository and help improve it so we can all have a good baseline to compare with, if you feel I’ve not done justice to your work, please release quantitative results and a reference implementation we can compare against. I’ve personally been reading and appreciate your blog here, but I have not had great success in executing the limited number of matlab scripts you’ve provided —
    Best Regards

    • I’ll take a look at your new paper.

      a preliminary conference paper

      I wouldn’t have done the post if I knew that. The paper I commented on is found on and has a revision history of posts on Feb 12, Apr 24, and Jun 10 of 2016. It is 15 pages long and I can’t find any indication that it is a conference paper draft, much less a preliminary one. My experience with is that people post submitted journal papers in order to get them out while the lengthy review process proceeds. So I figured after three drafts were uploaded, this was something you were standing by as solid. Is there a way I can tell on that the paper is a preliminary draft of a conference paper? I admit I am not an expert on that site.

      I appreciate your harping on semantics and minutia

      Is that called for?

      (the term “Expert … Features” is quite, the feature extractors are derived by an expert to do one specific thing)

      I don’t get the grammar of that parenthetical remark, but I would very much like to know what you intended.

      I admit my interest is narrow at the CSP Blog. I didn’t even try to critique the ML setup or all the parameters for the algorithm you put forth. I just saw that the cyclic moment feature part was so confusing that I didn’t think any reader would be able to understand what your Machine was doing, and that my name was explicitly connected to that confusion. Was I wrong?

  2. Nice writing style. Looking forward to reading more from you. I finally decided to write a comment on your blog. I just wanted to say good job. I really enjoy reading your post, Thanks for your great information,

  3. mohamad reza says:

    Hi Dr. Spooner,
    Thanks for your useful blog. I learned a lot about cyclostationary and, implemented second order and higher order cyclostationary reading your blog posts.
    I am working on automatic modulation recognition (non-linear modulations e.g. CPFSK , MSK, …) for my master of science dissertation output and I was wondering if you could help me about which cyclic features can help me for a better results on non-linear v.s non-linear or non-linear v.s linear cases.

    • Mohammad:

      Thanks for your comment and for using the CSP Blog! It sounds like you have made a lot of progress.

      First, I do have a paper with Antonio Napolitano (My Papers [8]) on the cyclostationarity of CPM signals, which includes CPFSK, MSK, and GMSK as special cases. You might want to check that out for some basics on the features you can expect.

      A second point is that MSK is actually exactly equal to a SQPSK signal, and so is a linear modulation from that point of view. It is also exactly a CPM signal with full response rectangular pulses, a binary alphabet, and a modulation index of h=1/2, so it falls in the class of CPM signals too.

      More generally, many CPM signals of interest are approximately linear. I think you can use Laurent’s approximation to find an approximation to a CPM signal that is the sum of several PAM signals, and usually one of those constituent PAM signals has most of the signal’s energy.

      Finally, in my own work I don’t make much of a distinction between linear and nonlinear modulation types. I just try to find out what the spectral correlation function (and, maybe, the cyclic cumulants) looks like. What are the features for the signal, and what is the pattern of exhibited cycle frequencies? Sometimes this is most easily done empirically if the modulation is too crazy, like LTE.

      Hope this helps…

      • mohamad reza says:

        Thanks Dr. Spooner,
        I just read your paper [8] and learned about Laurent’s approximation. I think it would be very helpful extracting higher order features of linear signals e.g. some PAM modulated signals instead of extracting those directly from CPMs. My next challenge is to simulate this approximation.
        Is there any source I could use to simulate this approximation and get some PAM signals from a CPM modulated signal?
        I really appreciate your help

        • I’m not sure I understand your approach. If you approximate a CPM signal by its dominant PAM signal using the approximation in the referenced paper, and then compute the cyclic moments and cumulants of that dominant PAM signal, those moments and cumulants will not match the moments and cumulants extracted from the CPM signal itself. The approximation is a useful way to get at the basic cycle-frequency pattern exhibited by the CPM signals. It turns out that the cycle-frequency pattern depends on the modulation index. For most modulation indices, the cycle-frequency pattern is that exhibited by high-alphabet PSK, such as 16PSK, provided you don’t consider moment and cumulant orders greater than eight. For some special cases of the modulation index (multiples of 0.5), the cycle-frequency pattern is much different.

          So I would try to learn the cycle-frequency patterns for the particular kinds of CPM signals of interest to you, then simulate those signals, and extract the cyclic cumulants to form classification features.

          Does that make sense? Perhaps I’ve missed something or misinterpreted your remarks…

Leave a Reply