The Signal-Processing Equivalent of Resume-Padding? Comments on “A Robust Modulation Classification Method Using Convolutional Neural Networks” by S. Zhou et al.

Does the use of ‘total SNR’ mislead when the fractional bandwidth is very small? What constitutes ‘weak-signal processing?’

Or maybe “Comments on” here should be “Questions on.”

In a recent paper in EURASIP Journal on Advances in Signal Processing (The Literature [R165]), the authors tackle the problem of machine-learning-based modulation recognition for highly oversampled rectangular-pulse digital signals. They don’t use the DeepSig data sets, but their data-set description and use of ‘signal-to-noise ratio’ leaves a lot to be desired. Let’s take a brief look. See if you agree with me that the touting of their results as evidence that they can reliably classify signals with ‘SNRs of -10 dB’ is unwarranted and misleading.

First, let me give credit where credit is due. These authors put the results of O’Shea et al (The Literature [R138]) in proper perspective:

O’Shea et al [18] trained the CNN with the received base banded [sic] signals directly, and the classification accuracy was higher than those trained by HOC features.

All other summations of O’Shea et al say that they compared the I/Q-samples trained CNNs with conventional modulation-recognition techniques, and the latter came out poorly indeed. However, O’Shea et al (and no machine learners that I know about) never even attempted to compare the performance of a trained neural network with an actual signal-processing algorithm for modulation recognition. I’ll leave you to your own thoughts about why that may be.

But on to the main issue at hand: SNR. In Section 2 of [R165], the authors lay out their signal models, including this one for MASK, MFSK, and MPSK:

(Which doesn’t actually work for MFSK because the cos(\cdot) is not a function of n). But notice the \mbox{\rm rect} (\cdot) function, renamed here as g(\cdot). Yes, all the PSK, QAM (authors’ Eq. (4)), ASK, and FSK signals use the rectangular pulse-shaping function. That will be useful in contemplating the SNRs used by the authors.

The OFDM signal is incorrectly defined by some indefinite sum of sine waves (there is no machinery to produce OFDM symbols),

Later we are told there are 20 subcarriers.

So, not exactly realistic signal types here. But I think we readers get the idea, and I am hopeful that the simulation of the various signals has a higher quality that the provided mathematical descriptions. Although I have no way to check.

Unlike many signal-processing papers, the authors do explicitly define their SNR measure,

which is what we at the CSP Blog call ‘total SNR’, in contrast to our favored definition, which is ‘inband SNR.’ The latter is the ratio of the signal power to the power of the noise that falls within the signal band, whereas the former is the ratio of the signal power to the power of the noise that falls within the sampling band (also called the analysis band).

For sampled signals with large fractional bandwidth, which means the signal bandwidth is a large fraction of the sampling rate, the two measures are not too far off from each other. But when the fractional bandwidth is small, they can be very different. What’s the case here? Well, here are the stated signal parameter values for the data set the authors generate for input to the neural network:

Unfortunately, the symbol used to define the symbol rate in the signal-model section, T_s, does not appear here. The ‘Code rate’ symbol f_d in the table does not appear in the signal-model section. I’m going to assume that ‘Code rate’ here means symbol rate. Agree? For the rectangular-pulse signals, then the main spectral lobe will have width 4 MHz. That is, most of the signal power falls within a bandwidth of 4 MHz.

The ratio of the signal bandwidth to the sample rate is then \eta = 4/400 = 1/100. This means that the inband SNR is 20 dB greater than the total SNR. The authors are using total SNR, so everywhere we see an SNR value in the paper, if we want to convert to inband SNR, we need to add 20 dB.

Yes, you say, but the processor acts on the entire waveform, and so ‘sees’ the total SNR. Sure. But what matters, I argue (and here is the crux of the question), is the inband SNR. For you can easily detect the presence of a 10-dB signal (inband) with simple low-cost energy detection methods, and then can use simple, low-cost filtering to remove the vast majority of noise. Since the neural networks apply many many many convolutions to the input I/Q samples, this should be easy for the machines. Figuring out where the signal is in frequency when the SNR is moderate-to-high is not hard using the output of a bunch of filters.

The authors provide results like this:

Which seem to indicate good performance for low SNR–after all, the left side of the plot is labeled with a negative SNR. But that SNR actually corresponds to an inband SNR of 10 dB, which is the SNR of a signal easily detected by spectral analysis or filterbank processing.

If we increased the sampling rate to 4 GHz, and kept the signal power and noise spectral density values unchanged, we could then say we are processing a signal with SNR of -20 dB, instead of -10, at the left edge of the graph above. Yet such a signal would be just as easily processed by conventional means as in the case of a sampling rate of 400 MHz: same inband SNR either way.

I read the graph above in light of the signal and SNR definitions and think: OK, cool, good performance can be had once the inband SNR gets to about 20 dB. Not particularly impressive. Not successful low-SNR processing.

In other words, the joint use of very high oversampling and total SNR allows the appearance of processing signals at low SNR, without the actual processing of weak signals.

Resumé padding? Or is this fair?

Author: Chad Spooner

I'm a signal processing researcher specializing in cyclostationary signal processing (CSP) for communication signals. I hope to use this blog to help others with their cyclo-projects and to learn more about how CSP is being used and extended worldwide.

2 thoughts on “The Signal-Processing Equivalent of Resume-Padding? Comments on “A Robust Modulation Classification Method Using Convolutional Neural Networks” by S. Zhou et al.”

  1. Talking about the “total SNR” is quite common. E.g. in amateur radio it is said that the JT weak signal modes work at -25 dB SNR and below. This is of course only true because the signal bandwidth (few 10 Hz) is a small fraction of the typically used 3 kHz bandwidth.
    But isn’t this perfectly fine for a practical system? Rearding the discussed paper: A 400 MHz ADC has typically a lot higher noise figure than a 4 MHz ADC. So the question is always how and where in the system you define the SNR: Before the ADC, after the ADC, after reducing the sample rate to the desired signal? What if the sample rate cannot perfectly be reduced to the actual signal?
    So I would think that total vs inband SNR is perhabs a more theory-related issue.

    1. Talking about the “total SNR” is quite common.

      So is resume-padding.

      But isn’t this perfectly fine for a practical system?

      Sure, although here we are talking about a digital signal processing subsystem.

      So the question is always how and where in the system you define the SNR: Before the ADC, after the ADC, after reducing the sample rate to the desired signal?

      Well, since we’re talking about processing a discrete-time signal, after the antenna, receiver, ADC, I’d say it is pretty clear that it should be at the input to the signal-processing algorithm. DSP Engineer 1: “Hey, I can successfully process these signals at -10 dB SNR!” DSP Engineer 2: “Uh, do you mean the SNR at the input to the ADC?” Not a likely conversation.

      What if the sample rate cannot perfectly be reduced to the actual signal?

      I’m not clear on what this means. Do you mean that a system operator cannot select a sampling rate that is exactly equal to the occupied bandwidth of the signal of interest? No problem! That’s why inband SNRs are so helpful. All that matters is specifying the amount of noise power that falls within the signal bandwidth, irrespective of the sampling rate.

      So I would think that total vs inband SNR is perhabs a more theory-related issue.

      I think it is a communication issue–an issue of writing and speaking with the intent to clearly convey technical information. The issue applies equally well to theoretical analyses and their write-ups and practical experimentation and its write-ups.

      Suppose we consider the parameters of the paper, but change the symbol rate from 2 MHz to 20 kHz, reducing it by a factor of 100. The total SNR is unchanged, but the inband SNR is now 20 dB higher. Is it a good practice to label these two situations with the same name: SNR = X dB?

Leave a Comment, Ask a Question, or Point out an Error