Or maybe “Comments on” here should be “Questions on.”
In a recent paper in EURASIP Journal on Advances in Signal Processing (The Literature [R165]), the authors tackle the problem of machine-learning-based modulation recognition for highly oversampled rectangular-pulse digital signals. They don’t use the DeepSig data sets, but their data-set description and use of ‘signal-to-noise ratio’ leaves a lot to be desired. Let’s take a brief look. See if you agree with me that the touting of their results as evidence that they can reliably classify signals with ‘SNRs of dB’ is unwarranted and misleading.
First, let me give credit where credit is due. These authors put the results of O’Shea et al (The Literature [R138]) in proper perspective:
O’Shea et al  trained the CNN with the received base banded [sic] signals directly, and the classification accuracy was higher than those trained by HOC features.
All other summations of O’Shea et al say that they compared the I/Q-samples trained CNNs with conventional modulation-recognition techniques, and the latter came out poorly indeed. However, O’Shea et al (and no machine learners that I know about) never even attempted to compare the performance of a trained neural network with an actual signal-processing algorithm for modulation recognition. I’ll leave you to your own thoughts about why that may be.
But on to the main issue at hand: SNR. In Section 2 of [R165], the authors lay out their signal models, including this one for MASK, MFSK, and MPSK:
(Which doesn’t actually work for MFSK because the is not a function of ). But notice the function, renamed here as . Yes, all the PSK, QAM (authors’ Eq. (4)), ASK, and FSK signals use the rectangular pulse-shaping function. That will be useful in contemplating the SNRs used by the authors.
The OFDM signal is incorrectly defined by some indefinite sum of sine waves (there is no machinery to produce OFDM symbols),
Later we are told there are subcarriers.
So, not exactly realistic signal types here. But I think we readers get the idea, and I am hopeful that the simulation of the various signals has a higher quality that the provided mathematical descriptions. Although I have no way to check.
Unlike many signal-processing papers, the authors do explicitly define their SNR measure,
which is what we at the CSP Blog call ‘total SNR’, in contrast to our favored definition, which is ‘inband SNR.’ The latter is the ratio of the signal power to the power of the noise that falls within the signal band, whereas the former is the ratio of the signal power to the power of the noise that falls within the sampling band (also called the analysis band).
For sampled signals with large fractional bandwidth, which means the signal bandwidth is a large fraction of the sampling rate, the two measures are not too far off from each other. But when the fractional bandwidth is small, they can be very different. What’s the case here? Well, here are the stated signal parameter values for the data set the authors generate for input to the neural network:
Unfortunately, the symbol used to define the symbol rate in the signal-model section, , does not appear here. The ‘Code rate’ symbol in the table does not appear in the signal-model section. I’m going to assume that ‘Code rate’ here means symbol rate. Agree? For the rectangular-pulse signals, then the main spectral lobe will have width MHz. That is, most of the signal power falls within a bandwidth of MHz.
The ratio of the signal bandwidth to the sample rate is then . This means that the inband SNR is dB greater than the total SNR. The authors are using total SNR, so everywhere we see an SNR value in the paper, if we want to convert to inband SNR, we need to add dB.
Yes, you say, but the processor acts on the entire waveform, and so ‘sees’ the total SNR. Sure. But what matters, I argue (and here is the crux of the question), is the inband SNR. For you can easily detect the presence of a -dB signal (inband) with simple low-cost energy detection methods, and then can use simple, low-cost filtering to remove the vast majority of noise. Since the neural networks apply many many many convolutions to the input I/Q samples, this should be easy for the machines. Figuring out where the signal is in frequency when the SNR is moderate-to-high is not hard using the output of a bunch of filters.
The authors provide results like this:
Which seem to indicate good performance for low SNR–after all, the left side of the plot is labeled with a negative SNR. But that SNR actually corresponds to an inband SNR of dB, which is the SNR of a signal easily detected by spectral analysis or filterbank processing.
If we increased the sampling rate to GHz, and kept the signal power and noise spectral density values unchanged, we could then say we are processing a signal with SNR of dB, instead of , at the left edge of the graph above. Yet such a signal would be just as easily processed by conventional means as in the case of a sampling rate of MHz: same inband SNR either way.
I read the graph above in light of the signal and SNR definitions and think: OK, cool, good performance can be had once the inband SNR gets to about dB. Not particularly impressive. Not successful low-SNR processing.
In other words, the joint use of very high oversampling and total SNR allows the appearance of processing signals at low SNR, without the actual processing of weak signals.
Resumé padding? Or is this fair?