Park the car at the side of the road“That Joke Isn’t Funny Anymore” by The Smiths
You should know
Time’s tide will smother you…
And I will too
I applaud the intent behind the paper in this post’s title, which is The Literature [R183], apparently accepted in 2022 for publication in IEEE Access, a peer-reviewed journal. That intent is to list all the found ways in which researchers preprocess radio-frequency data (complex sampled data) prior to applying some sort of modulation classification (recognition) algorithm or system.
The problem is that this attempt at gathering up all of the ‘representations’ gets a lot of the math wrong, and so has a high potential to confuse rather than illuminate.
There’s only one thing to do: correct the record.
First let’s just dwell on the word representation. It is commonly used in mathematics and signal processing to mean an alternative description of some mathematical object, typically a function or time-series. That is, it is a way of expressing the mathematical object that has all the information of the original object and no additional information. In this way of using representation, the new object description is invertible–you can get back to the original object expression from the representation.
We looked at several representations in the Signal Processing ToolKit post on signal representations. There we saw how to represent–write down–signals (functions) in terms of weighted sums of simple functions drawn from familiar function sets, such as harmonically related sine waves (Fourier series), piecewise constant binary-valued functions (Walsh series), and impulses.
In The Literature [R183], though, representation is used to mean any kind of mathematical operation applied to sampled data. We’ve seen representation used in various machine-learning papers, such as The Literature [R135], where the performance of a convolutional neural network for modulation recognition is assessed for three versions of the input data: rectangular , polar , and Fourier . So those latter representations are consistent with the way I use representation here at the CSP Blog, but not with the way the authors of [R183] use representation. For example, the spectral correlation function estimate for a given received data block is one of their representations. We can simply interpret representation in [R183] to mean feature.
The paper attempts to catalog all the features researchers use to develop modulation recognition algorithms. These features could be used as inputs to a neural network, or used in more traditional ways to develop modulation-recognition signal-processing algorithms. The authors then use many of their defined features (representations) in a simple linear classifier that is applied to the DeepSig 2018 dataset. Which means, yes, they try to apply cyclic cumulants and the spectral correlation function to very short data records with uncertain cycle frequencies.
Representations (Features) Considered in [R183]
Helpfully, the authors provide a table of the representations they aim to cover in their Figure 1 and Table 1, which are reproduced here, together, in my Figure 1.
The signal class for which the features are to be extracted is really the class of digital signals in the RML 2018 dataset, but the authors also decide to try to define the signal class mathematically. So we see Equation (1) is the bare-bones linear pulse-amplitude-modulated signal,
and a better, more realistic, version that includes a carrier frequency offset, symbol-clock offset, noise, and carrier phase in Equation (2). However, this model rules out several of the signals that are considered at various points in the text, including staggered modulations like MSK and GMSK, and frequency modulations like FSK. But, OK, not a terrible start.
In summary, the authors consider many features, and their focus is on simple digital signals, not stuff like OFDM, DSSS, or FH.
Needed Corrections to the Authors’ Mathematics
Now let’s look at some of the equations that the authors use to define all the representations that they think are important for modulation classification. For each one I present, I correct the record.
Section II.B: Preprocessing and Cyclostationary Features
Let’s start with Equation (3),
Here the delay product is defined by examples in Table 2,
The first thing to notice is that the average in (3) is biased: the sum is over items but the sum is divided by . So it should be
In the table, the correntropy kernel is listed as , but in the text there is only , and in the text, the correntropy kernel takes only real-valued arguments. The quantity is generally complex-valued. Correntropy researchers can overcome this limitation, as we’ve discussed before, but that requires a non-conjugate correntropy and a conjugate correntropy. Similarly, the hyperbolic tangent can take complex arguments, but I think that is not desirable here. So there is a lot of confusion right off the bat regarding the authors’ desire to unify all the representations through this function. For the record, I also try to unify higher-order cyclostationarity with higher-order stationarity by considering general complex-valued delay (lag) products using a similar notation:
where there are optional conjugations among the factors in the product. This notation is not entirely satisfactory, because it does not capture the relationship between the delays and the conjugations : which particular factors are conjugated? I do address that shortcoming in the post on symmetries of higher-order temporal parameters.
Turning to the last row in the column, the higher-order cumulants, we have to assume that the underbar notation means a vector for some . But then that vector of delays appears inside the argument to the signal itself, but the signals aren’t functions of multiple delays. And if , then in any case the dimensions of the two shouldn’t be the same. So this is major confusion at the notation level.
And then that last row is different in character from the others. However you define a cumulant (of a random variable, of a set of random variables, of a random process), it is a limit parameter, not a simple function of the data like the other three rows. In fact, if it is a cumulant of a random variable or a stationary process, it will be independent of , and the average in (3) is superfluous. Moreover, the cumulant is a function of a set of variables , not their product. You have to have access to a lot of different products of subsets of that set to properly compute a cumulant.
Also, just below (3) is defined as delays , yet in the last line of Table 2 we’re talking about th order cumulants, so I guess and are the same?
Finally, notice that in (3) the optional conjugation is applied to the entire delay product, whereas in the examples in Table 2, different factors possess a conjugation or not. The latter is appropriate, the former is superfluous: you don’t get any extra or different information from using the single optional conjugation applied to the delay product. You just negate the cycle frequencies for which the limiting sum is nonzero.
Section III.A: Fourier Transform
The discrete Fourier transform is introduced, correctly, in Equation (4), which is followed by an attempt at defining the short-time Fourier transform (STFT) in Equation (5),
But notice the sloppiness creeping in here–the window slides along with the time variable , but the data doesn’t! The same set of samples is processed independently of the choice of . That STFT definition should be something like my 
We then see the sentence “Since the modulation techniques MPSK, MQAM, and MFSK have different numbers of frequency peaks, the peak count of the magnitude of the DFT can be used as a feature.” Well, I don’t think the number of frequency peaks is different between MPSK and MQAM–they have a single peak because they are unimodal (assuming typical filtering like square-root raised-cosine). I suppose Manchester-encoded PSK signals are a special case because they have bimodal spectra. Some forms of MFSK do have multiple peaks–their spectra are truly multi-modal. But MFSK is not included in the authors’ signal model (1).
Section III.B: Wavelet Transform
I have some direct experience with wavelets (My Papers [27,36]). They are useful in some situations, but are not particularly well-suited for ‘Using and understanding the statistics of communication signals,’ our raison d’etre here at the CSP Blog. In defining the continuous wavelet transform in Equation 9,
The ‘defined as’ symbol is used strangely–the two quantities are not necessarily equal (how is derived from , for instance), and the symbol is either defined as the top quantity (continuous time) or the bottom one (discrete time).
The scaled and shifted wavelet generating function in the upper member of Equation (10) is missing a conjugation.
The wavelets are ostensibly computed for ASK, FSK, PSK, and QAM, and the result is the following:
which is strange because none of the wavelets here are functions of the delay , but they are a function of the symbol index , I guess, through something called . I think this is what people mean by the phrase ‘not even wrong.’
So far the mathematical errors and issues are relatively minor. Things get rougher from here on out.
Section IV.A: Estimation of Moments and Cumulants
Just before Section IV.A begins, we have the shout-out:
Their reference  (My Papers ) considers, like Marchand, features that combine cyclic cumulants of various orders. It builds on the more important and fundamental algorithmic work in My Papers [25,26].
In Section IV.A, we have a brief discussion of how moments arise from the characteristic function, then the paper focuses on moments of a cyclostationary process with equation (20),
Since the process is supposedly cyclostationary, it will have (almost) periodic moments and cumulants, which means should be a function of ! Note also here we have yet another parameterization of the conjugated and unconjugated factors: we started with total factors, switched to , and now we have . Whew!
Equation (20) is followed by a rather garbled description of the moment-cumulant relationship and the mathematical idea of a partition of a finite set. Then we arrive at a long list of cumulants that are expressed in terms of moments in Equation (23):
But these formulas do not apply when the random variable is associated with a cyclostationary process. But it explicitly is, so we have to deal with the rather complex and difficult subject of stationary versus cyclostationary models, and what happens when you try to apply formulas like (23) to variables that arise from a cyclostationary signal.
The sentence under (23) is false. The cumulants in Table 4 are cumulants of the symbol random variable associated with digital modulations, not the cumulants of the signal samples themselves, except in the very special and impractical case where the digital signal uses rectangular pulses. And let’s talk about Table 4:
This is a slightly modified version of Table 4 relative to [R183] in that I added small red ovals to indicate each stated cumulant that is in error. For 8PSK, the correct values for those in the red ovals are, from left to right, , , , . The corrected values for 16PSK are and . The corrected value for 128QAM is .
Remember that the cumulants in Table 4 are for the symbol random variable in the general model for all signals in the table given by
where is the symbol random variable, is the symbol rate, is the carrier frequency offset, is the symbol-clock phase, is the carrier phase, and is the pulse-shaping function. So it is relatively easy to see why some of the circled cumulants can’t be right.
Let’s take 8PSK as an example. For this signal, the symbol random variable is a discrete random variable that can take on one of eight values , for . So those are just eight points on the unit circle in the complex plane, and they are evenly spaced in the sense that nearest neighbors are separated by an angle equal to . When we consider the second-order moment for the set , we look at
where is a discrete integer-valued random variable on . No matter which of those variables you choose, when you compute the square of the symbol, you get a point in the set because the angle of the complex exponential is just a multiple of . And the average over that (remember things are equally likely here), you get zero. This is why the non-BPSK PSK and QAM signals have zero-valued conjugate spectral correlation–their symbol random variables never lead to a non-zero expected value for the case of no (or two) conjugations.
The same simple analysis holds for 8PSK and the set , which would be . It is also zero. When we get to , we see that raising any of the eight symbols to the eighth power collapses the value to . So here, finally, the moment is non-zero. Since the cumulant is made up of that moment modified by a sum of products of various subsets of , we can immediately see that . Not .
Think of it this way. For , we want to compute
because is always an integer. And the lower (than eight) order moments are equal to zero for similar reasons. Now consider another line in the table–16PSK. Raising the 16PSK symbol random variable to the eighth power will result in zero, not because
because for odd, the value of the exponential is and for even, it is , and there are just as many even as odd.
My identification of both the correct and erroneous entries in Table 4 is confirmed by a random-variable cumulant computer I constructed long ago that operates on an arbitrary input probability mass function. Those values are, in turn, validated by connection to theoretical cumulant formulas. Those formulas are also validated by application of simulated PSK/QAM signals to a cyclic-cumulant estimator algorithm. It all hangs together.
So the simplest case of a cumulant of a discrete random variable is not handled correctly by these authors, which will cause confusion in any reader that tries to replicate the results. More importantly, though, it augurs poorly for the upcoming section on cyclic cumulants, which are complicated functions of time , order , number of conjugations , and the -dimensional delay vector .
Section IV.B: HOS as Preprocessing Features
So we’re still in Section IV, which is called Signal Statistics, and not yet in Section V, which is called Cyclostationary Analysis. But the authors here, in Section IV, are still throwing around expectations applied to signals, so like it or not, cyclostationarity must be attended to.
We arrive at Equation (24),
There are three lines to (24). The first line specifies a limit parameter (stochastic expectation or fraction-of-time expectation, for that is what a cumulant relates to). Note that in the final items in the comma-separated list the signal value is conjugated, but the carrier-offset component is not. So that’s a typo. But we know that that cumulant limit parameter is a function of if the signal is cyclostationary (and all the signals considered in the paper except noise are definitely cyclostationary). Yet the second and third lines are finite averages over , meaning the cumulant limit parameter in the first line is a function of some finite parameter , presumably the number of samples you want to consider in an estimator.
That’s not the bad part of (24) though.
The second line is an estimate of the moment for .
The third line is just confused and confusing. What are the limits on the sum over ? What are the limits on the sum over ? Let’s suppose the former is (with good reason) and the latter is , which may be infinite, who knows. For rectangular-pulse signals, then, we have for all and , and for , the third line then becomes . Which can’t be right except, possibly, for , and diverges if in fact the sum over is infinite. Again, I’m straining here because this sequence of relations is not even wrong:
- The cumulant is a limit parameter and is a function of time .
- The cumulant is not equal to the moment except in some special cases.
- The cumulant is a limit parameter and is not equal to a finite-time estimate of a moment.
- The cumulant is a limit parameter dependent on time and is not equal to the vague (no limits on sums) final line.
Section V.A: Second-Order Cyclostationary: CAF and SCD
We start off with incorrect definitions of the autocorrelation and cyclic autocorrelation functions in Equations (26) and (27),
Since neither factor involving is conjugated, these either relate to the conjugate cyclic autocorrelation, there is a typo, or we’ve reverted to real-valued signals in spite of the use of complex-valued signals throughout the paper.
In (26), there has to be a limiting operation, or else the quantity on the left is an estimate, not a limit parameter. And (26) is missing a factor relating to the number of samples considered in the average, probably because the authors have infinite limits on the sum. So (26) is a mess. In (27), the factor out in front of the summation must be since there are items in the sum.
Below (27) we find this tangle of verbiage:
- The delay product is not the autocorrelation function.
- A first-order cyclostationary function is a function that contains one or more finite-strength additive sine-wave components. Generally we avoid processing such signals with cyclic autocorrelation or spectral correlation estimators because the first-order sine waves interact to produce second-order sine waves, but there is no additional information in these particular second-order sine waves. Plus, the signals we’ve been focusing on in the paper don’t even typically have finite-strength additive sine-wave components (except OOK and maybe the MFSK signals, depending on their nature). So that last sentence is gibberish.
The exposition on second-order cyclostationarity ends with Figure 4, which purports to show salient aspects of the spectral correlation function for OOK, BPSK, QPSK, and M-QAM:
Because we can clearly see a symmetry that is always evident in the PSD of a real-valued PSK/QAM signal, it must be the case that the cycle frequency and spectral frequency axes are mislabeled. Moreover, they are unitless. Taking QPSK as an example, the cycle frequency ranges from to (after swapping the axes labels!), and the spectral frequency ranges from to . But the entirety of the PSD is captured. That means, for consistency with , the cycle frequency should range from to . Or to enforce consistency with , the spectral frequency should range from to . The shapes of the functions in the surface are tiny and cramped and there are weird peaks near the edges of the slices. Three of the signals have a non-zero carrier (required for proper representation of the signal with real numbers), but the OOK does not. I don’t think anyone can learn anything true from looking at Figure 4.
Section V.C: High Order Cyclostationary: Cyclic Moments and Cumulants
The delay product (3) we looked at early in the post is revisited and revised in Equation (30),
Here the optional conjugations are applied to each factor, as they should be, and in contradiction to (3). Good start to the section! But then it is rapidly downhill.
Equation (31) is
and the factor of in front of the sum should be .
This definition of the CTMF is NOT the discrete Fourier transform of the delay product. That’s exactly the point of including the and the limiting operation–this is an infinite time average. The CTCF CANNOT be obtained from the CTMF using (23). The CTCF is a function of the CTMF and all lower-order CTMFs that possess cycle frequencies that add to the targeted cycle frequency .
Section V.D: RD-CTCF
The authors then move on to the cyclic cumulant itself. We see this claim:
That is not the definition of the reduced-dimension cyclic temporal cumulant function, and I should know because I coined that term. In general, the th-order CTCF (or just cyclic cumulant) is a function of all delay variables in (31). That function is typically not absolutely integrable, which in this case just means that along some manifolds in that -dimensional delay space, the CTCF does not decay (become arbitrarily small). This is intimately related to the concept of impure sine waves. The RD-CTCF is just the regular CTCF with one of the delay variable set to zero (typically ). This function is much better behaved and its transform is the cyclic polyspectrum.
Then we come to the mathematical definition of the cyclic cumulant proffered by the authors in (32)-(33):
Please be aware that these are NOT the correct formulas for cyclic cumulants, reduced-dimension or otherwise. This is what happens when you take the relatively simple formulas that relate the moments of a single random variable to the cumulants for that single random variable and apply them directly to a random process ( is either a random process or a random signal, comprised of an infinite number, indexed by , of random variables).
Moreover, these are actually finite-time estimates of … something, because there are finite sums of length here, so the left sides are limit parameters (not estimates) and the right sides are finite-time estimates. Is the left side equal to the right side for all ? How about ?
It is probably worth reproducing the correct formula for a cyclic cumulant in terms of products of lower-order cyclic moments here so we can really appreciate the wrongness of (32)-(33). From the post on cyclic cumulants (Equation (32) there),
The sum is over all the distinct partitions of the index set , where each partition is a set of non-intersecting subsets whose joint union is exactly the index set.
The important thing to notice about  is that the cyclic cumulant for cycle frequency depends on cyclic moments with cycle frequencies that can be different than . Now compare  with (32)-(33). The definitions there completely ignore all the lower-order cyclic moments that don’t have the same cycle frequency as the cyclic cumulant. Those functions in (32)-(33) will almost never equal actual, true, cyclic cumulants.
Take for example SRRC BPSK, which has conjugate cycle frequencies (cycle frequencies for ) of , where is the carrier frequency (offset) and is the bit rate. To implement , for and , you’ll need to form a product of two cyclic moments, each with cycle frequency because . But also you have to worry about cyclic moments with cycle frequencies and because . And so on.
These incorrect formulas are then used, to produce the cyclic-cumulant magnitude plots in Figure 5, although there is no mention of ,
The lack of symmetry in some of the plots should tip you off right away that something is amiss. For you, dear reader, I’ve created the true cyclic cumulant plots that should have appeared in Figure 5. For these cyclic-cumulant estimates, I used a bit rate of , a carrier of , a high SNR, rolloff of , and samples. The result is shown here for and ,
and here for and to replace Figure 6,
How do I know that the values in my plots are correct and the values in the plots of Figure 5 are in error? Because I checked my work. I have an independent piece of software that numerically evaluates the theoretical formula for the cyclic cumulant of a PSK or QAM signal, and I can check those numerical values against the values I obtain in the plots. And they check out. Moreover, I can cross-check those results with known theoretical (math) formulas for these textbook signals. And that checks out.
Section V.F: Kernel-Based Cyclostationary
There isn’t much to say about the authors’ take on kernel-based cyclostationarity, which is more commonly called cyclic correntropy, other than they are quite credulous concerning the material, which I believe is dubious. You can see my evidence here, here, and here.
The DeepSig RML 2018 Dataset
The authors are clearly enamored with the DeepSig RML datasets, focusing on the 2018 dataset I analyzed in a previous post (see also here, here, and here for analyses of other DeepSig datasets on the CSP Blog).
Have they, really, though? I think every paper I’ve analyzed that used one or more of the DeepSig datasets has failed to either mention or to adequately address the generalization (dataset shift) problem. The papers merely show that you can do an automated optimization process (also known as training a neural network) and that that optimization process does well on the highly constrained training dataset. That’s it. Moreover, we’ve recently developed strong evidence that these automated optimization processes are quite delicate (brittle) and suffer greatly when the input is IQ data and the probability density functions of the involved random variables change even slightly.
Then there’s this:
I don’t have an issue with ‘objective.’ I can’t detect any biases here, but the work is clearly not ‘careful.’
But the variation in parameters of the 2018 dataset is minimal. This is just more highly constrained dataset processing, showing that optimizers can optimize.
This is where I usually cry out in anguish about the state of peer review. But not so much this time.
Why do the authors feel this paper is even ready for review, much less ready for publication? Why do the reviewers accept it? Because they don’t cross-check their work. And that is the most valuable mental habit for research engineers: how can I verify this result? How can I see if this result is consistent with other results I already trust?
The dilemma is that checking, cross-checking, re-checking, verifying, and validating are time-consuming and can be tedious. The time consumption aspect is a primary driver in the modern jettisoning of the checking mentality. ‘Just get that paper complete and submitted’–we need to publish right now and over and over again quickly.
The incentive structure of academia seems to tend toward paper weight rather than paper quality. It pays more to have more pages to count than it does to have far fewer pages but those pages contain valid, novel, high-quality results and information.
But I say to the younger readers here: Check your work. It will pay off for you and for everyone that accesses your work in the long run. Don’t quickly believe anything you read in a technical paper. Find ways to cross-check the claims and results. Adopt a skeptical stance.
And now the logical conclusion of my exhortation: check my work here in this post and on the entire CSP Blog.