Shifted Dataset for the Machine-Learning Challenge: How Well Does a Modulation-Recognition DNN Generalize? [Dataset CSPB.ML.2022]

Another RF-signal dataset to help push along our R&D on modulation recognition.

Update February 2023: A third dataset has been posted to the CSP Blog: CSPB.ML.2023. It features cochannel signals.

Update January 2023: I’m going to put Challenger results in the Comments. I’ve received a Challenger’s decisions and scored them in January 2023. See below.

In this post I provide a second dataset for the Machine-Learning Challenge I issued in 2018 (CSPB.ML.2018). This dataset is similar to the original dataset, but possesses a key difference in that the probability distribution of the carrier-frequency offset parameter, viewed as a random variable, is not the same, but is still realistic.

Blog Note: By WordPress’ count, this is the 100th post on the CSP Blog. Together with a handful of pages (like My Papers and The Literature), these hundred posts have resulted in about 250,000 page views. That’s an average of 2,500 page views per post. However, the variance of the per-post pageviews is quite large. The most popular is The Spectral Correlation Function (> 16,000) while the post More on Pure and Impure Sinewaves, from the same era, has only 316 views. A big Thanks to all my readers!!

When I process the generalized challenge (GC) dataset, I obtain results that are nearly the same as when I process the challenge (C) dataset. This is because there is no training in my cyclostationary signal processing approach to modulation recognition and parameter estimation, and there is no prior information that is supplied to the algorithm, so it doesn’t matter that things have changed with the signals’ non-constellation probability distributions. The BPSK signals in the GC dataset are just like the BPSK signals in the C dataset as far as spectral correlation, spectral coherence, cyclic temporal cumulants, etc., are concerned. The signal-processing approach can recognize all BPSK signals. A BPSK signal just needs to conform to the (textbook) BPSK signal model to be recognized; particular numerical values of the modulation parameters are not important, and they will be accurately estimated along the way.

Carrier-Frequency Offset Results

The primary focus of the original Challenge was carrier-frequency-offset (CFO) estimation. So let’s start with the CFO results for the GC dataset in Figure 1.

Figure 1. CFO estimation error results for the Generalized Challenge (GC) dataset. Once the processing block length exceeds 16,384 samples, the average error in the CFO dips below the Fourier limit of approximately the reciprocal of the block length. Except for 8PSK, for which the CSP-based method I use does not produce a cycle frequency related to the carrier, and so the CFO estimate is essentially an estimate of the center of the signal’s noisy PSD.

Compare this result with Figure 1 from the Challenge post, reproduced here for your convenience,

Figure 2. This is Figure 1 in the original Challenge post, and so corresponds to the processing I did a couple years back using the Challenge (C) dataset.

So the CSP-based approach to modulation recognition (My Papers [25,26,28]) possesses a high degree of generalizability with regard to estimation of the carrier-frequency offset parameter. Let’s now turn to the modulation-classification results.

Modulation Classification Results

First the full confusion matrices for the eight-class problem embodied by the Challenge and Generalized Challenge datasets. Figures 3-5 show the results for three processing block lengths: 32,768, 16,384, and 8,192 samples, respectively.

Figure 3. Confusion matrix over the entire Generalized Challenge dataset (112,000 signal files) using the maximum available number of samples per signal file of 32,768. Overall probability of correct classification is 0.82.
Figure 4. Confusion matrix over the entire Generalized Challenge dataset (112,000 signal files) using half the available number of samples per signal file, or 16,384. Overall probability of correct classification is 0.78.
Figure 5. Confusion matrix over the entire Generalized Challenge dataset (112,000 signal files) using a quarter of the available number of samples per signal file, or 8,192. Overall probability of correct classification is 0.68.

The conclusion is that the performance of the cyclic-cumulant-based modulation-classification algorithm I outline in My Papers [25,26,28] is insensitive to the change in carrier-frequency offset. The question is: Can we construct a neural network that is trained on the Challenge set, but that performs equally well on the Generalized Challenge set (and the performance is good)? So far, when training a convolutional neural network using the Challenge dataset I/Q samples, my colleagues and I have not been able to get good generalization performance–the network fails on the Generalized Challenge dataset.

I hasten to add that this isn’t some big revelation coming out of the CSP Blog. The dataset-shift problem, also known as the generalization problem and other names, is well-known in machine learning.

One thing that has occurred to me over and over while pondering all this generalization stuff with neural networks is: Why do we continue to stick with convolutional layers? I get that convolutional layers are appropriate for image processing (at least I think I get that because linear processing of images can get you a long way), but here, in RF signal processing, the characteristics of the class are embedded throughout the waveform samples through the structure of many involved random variables. We’re not attempting to identify and locate a picture of a cat no matter where it is in an image. The ‘cat’ (modulation type) is distributed throughout the entire ‘image’ (data sample vector). We don’t need to localize anything. We don’t need to grasp edges or boundaries or shapes. (See the Comments below for more on this line of thinking.)

The fundamental probability structure of an RF signal can be brought to light by subjecting the data samples to nonlinearities–that’s all I’m doing when I apply the arithmetic of CSP, whether it is the traditional second-order cyclic autocorrelation or the higher-order cyclic temporal cumulants. Why not have explicit nonlinear layers like squarers in our networks?

The Dataset

The format of the signal files is the same as in the Challenge dataset–go look over there for details. The difference here is that I’m not providing the true parameters and signal-type labels. The idea is that a neural network is trained on some other dataset (perhaps the Challenge dataset, perhaps some other dataset, whatever), and then applied to the Generalized Challenge dataset. So if you take me up on this part of the challenge, you’ll have to submit your machine’s answers and trust me to give you an accurate score.

Since there is no possibility of training here, I don’t think I need to provide as many examples of each class as I did in the Challenge dataset. I created the same number (112,000), but for now I’m just posting 20,000 in five batches of 4,000 files each.

Batch_1

Batch_2

Batch_3

Batch_4

Batch_5

Author: Chad Spooner

I'm a signal processing researcher specializing in cyclostationary signal processing (CSP) for communication signals. I hope to use this blog to help others with their cyclo-projects and to learn more about how CSP is being used and extended worldwide.

10 thoughts on “Shifted Dataset for the Machine-Learning Challenge: How Well Does a Modulation-Recognition DNN Generalize? [Dataset CSPB.ML.2022]”

  1. Congratulations Dr.Chad for reaching the 100th post here! Thank for all you efforts and great blogs out here!

    I just read your question “Why do we continue to stick with convolutional layers?” and your insight after it and I really liked it because I always have the same question.
    The issue now is that we are importing techniques from the vision and language domains and just trying to apply them to RF data. Many of them seem to be working well “in most cases”, but I still think that we should have something different since our data and domain is different. Could you please elaborate more about the special characteristics of RF data and some potential directions/resources for RFML researchers?

    Many thanks!

    1. Thanks for stopping by the CSP Blog, Abdurrahman, and leaving a thoughtful comment. I appreciate that.

      Since I believe that the key mathematical difference between different modulation types lies in the distinct sets of nth-order probability density functions (PDFs), I tend to think that a high-performing machine should be able to learn the PDFs or their easiest-to-estimate ‘components.’ Since the collection of all possible nth-order moment functions is equivalent to the collection of all possible nth-order joint PDFs (in the sense that you can compute one from the other), I would think a machine would do well to learn moments. But moments are highly nonlinear functions of the input to the machine–you need several explicit homogenous nonlinear operations (e.g., a squarer) followed by averaging. In the case of a cyclostationary signal, the PDFs are periodic–they have Fourier series representations. So, a lot going on. But it does seem to suggest that straightforward NN layers that apply homogeneous nonlinearities, followed by convolutions of some sort (“averaging“), may very well allow an NN to synthesize moments or cumulants. I know from my own research that training a CNN on IQ data does not result in significant generalizability, but training one on extracted cyclic cumulants does.

      When I think about image processing (more specifically, image recognition) or natural-language processing (more specifically, speech recognition), my thoughts swirl around the idea of additive representation. That is, the picture of the cat in the image is to be recognized, and the image is equal to the cat part of the image plus everything else. You need to find the cat. So it makes sense that you want an NN to find the cat in “cat plus stuff” no matter how the cat is oriented or scaled or colored, but it is still “cat plus stuff.” If you take away the cat from the image, you leave a cat-shaped hole. So things like edge detection, matched filtering, etc. seem appropriate, and the NN can learn these things using all the little convolutional kernels–simple edge detectors ARE convolutions, matched-filters ARE convolutions…

      When I want to classify a modulation type, though, there isn’t a simple representation of the “BPSK” part of the complex-valued sequence of numbers I have to work with. I can’t say the sequence is the BPSK part plus the other stuff. The BPSK part of the sequence is distributed throughout the sequence. The representation is different. I don’t see how linear operations like convolution can hope to be successful against all BPSK signals we can encounter. If I take away the BPSK signal from the data sequence, I don’t leave behind a BPSK-shaped hole.

      Perhaps it is a bit like the difference between recognizing an object and recognizing a scene. Consider a scene consisting of a room with typical furniture, empty wine bottles tossed around, various articles of clothing on the floor, bowls of half-eaten popcorn, lampshades missing off lamps, dirty plates with unfinished food and condiments, like catsup. And a still body on a couch with closed eyes. What is the scene? Well, there is no one item that tells you the scene–it is a gestalt, a holistic idea. We might say: The Aftermath of a Party. Change the catsup to blood, twist a limb awkwardly, dim the lighting, and maybe we have The Scene of a Murder. Throw in a clapperboard in the corner, and now it is a Movie Set. The point is all the pieces of the puzzle have to somehow align. Maybe the set of all joint nth-order PDFs for my complex-valued IQ sequence is like that. And we’re trying to use a technique that is optimized for recognizing the wine bottles, which won’t get us to the scene label reliably.

      Pressing the analog too far, perhaps, let the BPSK symbol rate be like the wine bottles. We train a machine to find the wine bottles (consider a single rate), and it pops out the ‘BPSK’ label. Great! Then we change to beer cans (the rate is now different). No wine bottles, no BPSK label, but the scene is still Aftermath of a Party. We need to recognize the Aftermath of a Party in spite of the changes to the particular aspects of the Party, such as the furniture, the bottles, the popcorn, whatever. We need to recognize the BPSK signal in spite of the changes to the carrier offset, the pulse shape, the symbol rate, whatever.

      1. This is one of the most thoughtful comments regarding the RFML domain that I have ever seen!

        I face the same issue in RF fingerprinting! and I assume that the fingerprint (if it actually exists) is spread all over the signal as well.

        When the testing data is drawn from a little bit shifted distribution (Even just a few hours in between or using a different receiver), I see a huge drop in performance. This is still an unsolved problem!

        Do you think the fingerprint of a device also related to the nth-order PDFs?

        Thank you so much!

        1. Thanks much, Abdurrahman, and good to see you comment again on the CSP Blog.

          Do you think the fingerprint of a device also related to the nth-order PDFs?

          I can’t see how it could not. That is, yes, I think it is. If you use time-domain (inphase/quadrature or I/Q) data as the input to your neural network, and it tries to find features that give you reliable output labels over a dataset, the mathematical structure of that dataset that gives rise to that reliable feature must matter. And whether the features are isolated time-gated chunks of the data (every once in a while it emits a subtle pulse) or are nonlinear functions of the data (CSP), or are averages, or whatever, the character of those features must be traceable to the basic probability structure of the data, whether we know that or not. And there is nothing more basic, I claim, for a probabilistic description than the cumulative distribution function and its derivative, the probability density function.

          1. Thank you so much, Dr.Chad, for your reply! I really appreciate it!

            Following your stream of thoughts -which made sense to me-, I found this paper, “Non-linear Convolution Filters for CNN-Based Learning”, in which they proposed a second-order convolution operation in CNN-Based learning through Volterra kernel for image processing. I though that this type of convolution might be a better solution for RF data than the linear convolution.

            I trained my volterra-based CNN on RF-data (I/Q values) to do device classification (RF fingerprinting), it gives a very comparable results as the normal CNNs in the normal scenario, but it also fails to generalize well over different days (The Training data has been captured in one day while the testing data has been captured in another day).

            Any thoughts or pointers are very appreciated!

  2. Thanks dear for the very helpful blog.

    So you mean that in the I/Q modulation signal, there is no common pattern to make CNN distinguish between different type of modulation also noise and man made signals. For this reason, we need to use pre-processing techniques like extract cyclostationnary which gives common patterns ?.

    Thanks

    1. you mean that in the I/Q modulation signal, there is no common pattern to make CNN distinguish between different type of modulation

      No, I don’t think that is true. I think you can use I/Q data as the input to a trained neural network for modulation recognition. The network will find patterns in the data–inscrutable to us–that allow some degree of correct classification. The trouble is that the features that the network generates are not useful when the input modulated signals deviate even slightly from the original signals used to train/test the data.

      The point of the ‘Shifted Dataset’ (otherwise known as CSPB.ML.2022) is to use the same modulation types as in the original dataset (otherwise known as CSPB.ML.2018) except the carrier frequency offset is governed by a slightly different random variable. See also My Papers [51].

      This inability of the trained network to generalize, that is to successfully process input signals that are slightly different in terms of the distributions of the underlying random variables, is a recurring weakness in neural-network-based machine learning for classification problems. I’ve supplied a new dataset here in order to facilitate the study of this problem.

      When we switch the input from I/Q samples to estimated features like cyclic cumulants, the failure to generalize disappears. (Again, see My Papers [51] and the results presented in the current post.) The price we pay is the up-front blind estimation cost of cyclic cumulants (in this case anyway) which requires CPU/GPU cycles and domain expertise.

      1. Thank you very much. got it.
        but when try to train a network with dataset contains different modulation signals with different SNR also contains one Noise signal.
        and when try to do binary Classification ( signal/noise) the accuracy be very low compared to when train network dataset without Noise signal.

        could you interpret please?

        1. I find it hard to believe that the two-class problem (‘Signal plus Noise’ vs. ‘Noise Only’) would produce a poor trained neural network.

          What are the characteristics of the involved dataset(s)?

          Is there any dataset shift issue here?

          Difficult for me to interpret without knowing many more details …

  3. I’ve received a Challenger’s decision set for the 20,000 posted signal files in the Shifted Dataset Challenge.

    Comparing the decisions with the truth yields a probability of correct classification of about 0.125. The confusion matrix looks like this:

    with numerical values:

    The researcher used a technique called time-distributed convolutional neural networks. The researcher acknowledges that the scoring I applied is likely correct–it is clear that the first 4000 decisions contain no ‘BPSK’ entries, for example.

Leave a Comment, Ask a Question, or Point out an Error

%d bloggers like this: