Shifted Dataset for the Machine-Learning Challenge: How Well Does a Modulation-Recognition DNN Generalize? [Dataset CSPB.ML.2022]

Another RF-signal dataset to help push along our R&D on modulation recognition.

Update October 2023: A flaw in the way a random-number generator was used to create CSPB.ML.2022 (and CSPB.ML.2018) has led me to recreate the dataset and post it here. It is called CSPB.ML.2022R2.

Update February 2023: A third dataset has been posted to the CSP Blog: CSPB.ML.2023. It features cochannel signals.

Update January 2023: I’m going to put Challenger results in the Comments. I’ve received a Challenger’s decisions and scored them in January 2023. See below.

In this post I provide a second dataset for the Machine-Learning Challenge I issued in 2018 (CSPB.ML.2018). This dataset is similar to the original dataset, but possesses a key difference in that the probability distribution of the carrier-frequency offset parameter, viewed as a random variable, is not the same, but is still realistic.

Blog Note: By WordPress’ count, this is the 100th post on the CSP Blog. Together with a handful of pages (like My Papers and The Literature), these hundred posts have resulted in about 250,000 page views. That’s an average of 2,500 page views per post. However, the variance of the per-post pageviews is quite large. The most popular is The Spectral Correlation Function (> 16,000) while the post More on Pure and Impure Sinewaves, from the same era, has only 316 views. A big Thanks to all my readers!!

When I process the generalized challenge (GC) dataset, I obtain results that are nearly the same as when I process the challenge (C) dataset. This is because there is no training in my cyclostationary signal processing approach to modulation recognition and parameter estimation, and there is no prior information that is supplied to the algorithm, so it doesn’t matter that things have changed with the signals’ non-constellation probability distributions. The BPSK signals in the GC dataset are just like the BPSK signals in the C dataset as far as spectral correlation, spectral coherence, cyclic temporal cumulants, etc., are concerned. The signal-processing approach can recognize all BPSK signals. A BPSK signal just needs to conform to the (textbook) BPSK signal model to be recognized; particular numerical values of the modulation parameters are not important, and they will be accurately estimated along the way.

Carrier-Frequency Offset Results

The primary focus of the original Challenge was carrier-frequency-offset (CFO) estimation. So let’s start with the CFO results for the GC dataset in Figure 1.

Figure 1. CFO estimation error results for the Generalized Challenge (GC) dataset. Once the processing block length exceeds 16,384 samples, the average error in the CFO dips below the Fourier limit of approximately the reciprocal of the block length. Except for 8PSK, for which the CSP-based method I use does not produce a cycle frequency related to the carrier, and so the CFO estimate is essentially an estimate of the center of the signal’s noisy PSD.

Compare this result with Figure 1 from the Challenge post, reproduced here for your convenience,

Figure 2. This is Figure 1 in the original Challenge post, and so corresponds to the processing I did a couple years back using the Challenge (C) dataset.

So the CSP-based approach to modulation recognition (My Papers [25,26,28]) possesses a high degree of generalizability with regard to estimation of the carrier-frequency offset parameter. Let’s now turn to the modulation-classification results.

Modulation Classification Results

First the full confusion matrices for the eight-class problem embodied by the Challenge and Generalized Challenge datasets. Figures 3-5 show the results for three processing block lengths: 32,768, 16,384, and 8,192 samples, respectively.

Figure 3. Confusion matrix over the entire Generalized Challenge dataset (112,000 signal files) using the maximum available number of samples per signal file of 32,768. Overall probability of correct classification is 0.82.

Figure 4. Confusion matrix over the entire Generalized Challenge dataset (112,000 signal files) using half the available number of samples per signal file, or 16,384. Overall probability of correct classification is 0.78.

Figure 5. Confusion matrix over the entire Generalized Challenge dataset (112,000 signal files) using a quarter of the available number of samples per signal file, or 8,192. Overall probability of correct classification is 0.68.

The conclusion is that the performance of the cyclic-cumulant-based modulation-classification algorithm I outline in My Papers [25,26,28] is insensitive to the change in carrier-frequency offset. The question is: Can we construct a neural network that is trained on the Challenge set, but that performs equally well on the Generalized Challenge set (and the performance is good)? So far, when training a convolutional neural network using the Challenge dataset I/Q samples, my colleagues and I have not been able to get good generalization performance–the network fails on the Generalized Challenge dataset.

I hasten to add that this isn’t some big revelation coming out of the CSP Blog. The dataset-shift problem, also known as the generalization problem and other names, is well-known in machine learning.

One thing that has occurred to me over and over while pondering all this generalization stuff with neural networks is: Why do we continue to stick with convolutional layers? I get that convolutional layers are appropriate for image processing (at least I think I get that because linear processing of images can get you a long way), but here, in RF signal processing, the characteristics of the class are embedded throughout the waveform samples through the structure of many involved random variables. We’re not attempting to identify and locate a picture of a cat no matter where it is in an image. The ‘cat’ (modulation type) is distributed throughout the entire ‘image’ (data sample vector). We don’t need to localize anything. We don’t need to grasp edges or boundaries or shapes. (See the Comments below for more on this line of thinking.)

The fundamental probability structure of an RF signal can be brought to light by subjecting the data samples to nonlinearities–that’s all I’m doing when I apply the arithmetic of CSP, whether it is the traditional second-order cyclic autocorrelation or the higher-order cyclic temporal cumulants. Why not have explicit nonlinear layers like squarers in our networks?

The Dataset

The format of the signal files is the same as in the Challenge dataset–go look over there for details. The difference here is that I’m not providing the true parameters and signal-type labels. The idea is that a neural network is trained on some other dataset (perhaps the Challenge dataset, perhaps some other dataset, whatever), and then applied to the Generalized Challenge dataset. So if you take me up on this part of the challenge, you’ll have to submit your machine’s answers and trust me to give you an accurate score.

Since there is no possibility of training here, I don’t think I need to provide as many examples of each class as I did in the Challenge dataset. I created the same number (112,000), but for now I’m just posting 20,000 in five batches of 4,000 files each.

Author: Chad Spooner

I'm a signal processing researcher specializing in cyclostationary signal processing (CSP) for communication signals. I hope to use this blog to help others with their cyclo-projects and to learn more about how CSP is being used and extended worldwide. View all posts by Chad Spooner

13 thoughts on “Shifted Dataset for the Machine-Learning Challenge: How Well Does a Modulation-Recognition DNN Generalize? [Dataset CSPB.ML.2022]”

Abdurrahman Elmaghbub says:

February 2, 2022 at 11:20 pm

Congratulations Dr.Chad for reaching the 100th post here! Thank for all you efforts and great blogs out here!

I just read your question “Why do we continue to stick with convolutional layers?” and your insight after it and I really liked it because I always have the same question.
The issue now is that we are importing techniques from the vision and language domains and just trying to apply them to RF data. Many of them seem to be working well “in most cases”, but I still think that we should have something different since our data and domain is different. Could you please elaborate more about the special characteristics of RF data and some potential directions/resources for RFML researchers?

Many thanks!

Loading...

Reply
1. Chad Spooner says:
  
  February 15, 2022 at 5:18 pm
  
  Thanks for stopping by the CSP Blog, Abdurrahman, and leaving a thoughtful comment. I appreciate that.
  
  Since I believe that the key mathematical difference between different modulation types lies in the distinct sets of $n$ th-order probability density functions (PDFs), I tend to think that a high-performing machine should be able to learn the PDFs or their easiest-to-estimate ‘components.’ Since the collection of all possible $n$ th-order moment functions is equivalent to the collection of all possible $n$ th-order joint PDFs (in the sense that you can compute one from the other), I would think a machine would do well to learn moments. But moments are highly nonlinear functions of the input to the machine–you need several explicit homogenous nonlinear operations (e.g., a squarer) followed by averaging. In the case of a cyclostationary signal, the PDFs are periodic–they have Fourier series representations. So, a lot going on. But it does seem to suggest that straightforward NN layers that apply homogeneous nonlinearities, followed by convolutions of some sort (“averaging“), may very well allow an NN to synthesize moments or cumulants. I know from my own research that training a CNN on IQ data does not result in significant generalizability, but training one on extracted cyclic cumulants does.
  
  When I think about image processing (more specifically, image recognition) or natural-language processing (more specifically, speech recognition), my thoughts swirl around the idea of additive representation. That is, the picture of the cat in the image is to be recognized, and the image is equal to the cat part of the image plus everything else. You need to find the cat. So it makes sense that you want an NN to find the cat in “cat plus stuff” no matter how the cat is oriented or scaled or colored, but it is still “cat plus stuff.” If you take away the cat from the image, you leave a cat-shaped hole. So things like edge detection, matched filtering, etc. seem appropriate, and the NN can learn these things using all the little convolutional kernels–simple edge detectors ARE convolutions, matched-filters ARE convolutions…
  
  When I want to classify a modulation type, though, there isn’t a simple representation of the “BPSK” part of the complex-valued sequence of numbers I have to work with. I can’t say the sequence is the BPSK part plus the other stuff. The BPSK part of the sequence is distributed throughout the sequence. The representation is different. I don’t see how linear operations like convolution can hope to be successful against all BPSK signals we can encounter. If I take away the BPSK signal from the data sequence, I don’t leave behind a BPSK-shaped hole.
  
  Perhaps it is a bit like the difference between recognizing an object and recognizing a scene. Consider a scene consisting of a room with typical furniture, empty wine bottles tossed around, various articles of clothing on the floor, bowls of half-eaten popcorn, lampshades missing off lamps, dirty plates with unfinished food and condiments, like catsup. And a still body on a couch with closed eyes. What is the scene? Well, there is no one item that tells you the scene–it is a gestalt, a holistic idea. We might say: The Aftermath of a Party. Change the catsup to blood, twist a limb awkwardly, dim the lighting, and maybe we have The Scene of a Murder. Throw in a clapperboard in the corner, and now it is a Movie Set. The point is all the pieces of the puzzle have to somehow align. Maybe the set of all joint $n$ th-order PDFs for my complex-valued IQ sequence is like that. And we’re trying to use a technique that is optimized for recognizing the wine bottles, which won’t get us to the scene label reliably.
  
  Pressing the analog too far, perhaps, let the BPSK symbol rate be like the wine bottles. We train a machine to find the wine bottles (consider a single rate), and it pops out the ‘BPSK’ label. Great! Then we change to beer cans (the rate is now different). No wine bottles, no BPSK label, but the scene is still Aftermath of a Party. We need to recognize the Aftermath of a Party in spite of the changes to the particular aspects of the Party, such as the furniture, the bottles, the popcorn, whatever. We need to recognize the BPSK signal in spite of the changes to the carrier offset, the pulse shape, the symbol rate, whatever.
  
  Loading...
  
  Reply
  1. Abdurrahman Elmaghbub says:
    
    July 28, 2022 at 4:53 pm
    
    This is one of the most thoughtful comments regarding the RFML domain that I have ever seen!
    
    I face the same issue in RF fingerprinting! and I assume that the fingerprint (if it actually exists) is spread all over the signal as well.
    
    When the testing data is drawn from a little bit shifted distribution (Even just a few hours in between or using a different receiver), I see a huge drop in performance. This is still an unsolved problem!
    
    Do you think the fingerprint of a device also related to the nth-order PDFs?
    
    Thank you so much!
    
    Loading...
    
    Reply
    1. Chad Spooner says:
      
      July 29, 2022 at 3:43 am
      
      Thanks much, Abdurrahman, and good to see you comment again on the CSP Blog.
      
      Do you think the fingerprint of a device also related to the nth-order PDFs?
      
      I can’t see how it could not. That is, yes, I think it is. If you use time-domain (inphase/quadrature or I/Q) data as the input to your neural network, and it tries to find features that give you reliable output labels over a dataset, the mathematical structure of that dataset that gives rise to that reliable feature must matter. And whether the features are isolated time-gated chunks of the data (every once in a while it emits a subtle pulse) or are nonlinear functions of the data (CSP), or are averages, or whatever, the character of those features must be traceable to the basic probability structure of the data, whether we know that or not. And there is nothing more basic, I claim, for a probabilistic description than the cumulative distribution function and its derivative, the probability density function.
      
      Loading...
      
      Reply
      1. Abdurrahman Elmaghbub says:
        
        August 1, 2022 at 10:56 am
        
        Thank you so much, Dr.Chad, for your reply! I really appreciate it!
        
        Following your stream of thoughts -which made sense to me-, I found this paper, “Non-linear Convolution Filters for CNN-Based Learning”, in which they proposed a second-order convolution operation in CNN-Based learning through Volterra kernel for image processing. I though that this type of convolution might be a better solution for RF data than the linear convolution.
        
        I trained my volterra-based CNN on RF-data (I/Q values) to do device classification (RF fingerprinting), it gives a very comparable results as the normal CNNs in the normal scenario, but it also fails to generalize well over different days (The Training data has been captured in one day while the testing data has been captured in another day).
        
        Any thoughts or pointers are very appreciated!
        
        Loading...
Ben Mohamed says:

May 14, 2022 at 7:32 am

Thanks dear for the very helpful blog.

So you mean that in the I/Q modulation signal, there is no common pattern to make CNN distinguish between different type of modulation also noise and man made signals. For this reason, we need to use pre-processing techniques like extract cyclostationnary which gives common patterns ?.

Thanks

Loading...

Reply
1. Chad Spooner says:
  
  May 14, 2022 at 8:32 am
  
  you mean that in the I/Q modulation signal, there is no common pattern to make CNN distinguish between different type of modulation
  
  No, I don’t think that is true. I think you can use I/Q data as the input to a trained neural network for modulation recognition. The network will find patterns in the data–inscrutable to us–that allow some degree of correct classification. The trouble is that the features that the network generates are not useful when the input modulated signals deviate even slightly from the original signals used to train/test the data.
  
  The point of the ‘Shifted Dataset’ (otherwise known as CSPB.ML.2022) is to use the same modulation types as in the original dataset (otherwise known as CSPB.ML.2018) except the carrier frequency offset is governed by a slightly different random variable. See also My Papers [51].
  
  This inability of the trained network to generalize, that is to successfully process input signals that are slightly different in terms of the distributions of the underlying random variables, is a recurring weakness in neural-network-based machine learning for classification problems. I’ve supplied a new dataset here in order to facilitate the study of this problem.
  
  When we switch the input from I/Q samples to estimated features like cyclic cumulants, the failure to generalize disappears. (Again, see My Papers [51] and the results presented in the current post.) The price we pay is the up-front blind estimation cost of cyclic cumulants (in this case anyway) which requires CPU/GPU cycles and domain expertise.
  
  Loading...
  
  Reply
  1. Ben Mohamed says:
    
    May 14, 2022 at 8:47 am
    
    Thank you very much. got it.
    but when try to train a network with dataset contains different modulation signals with different SNR also contains one Noise signal.
    and when try to do binary Classification ( signal/noise) the accuracy be very low compared to when train network dataset without Noise signal.
    
    could you interpret please?
    
    Loading...
    
    Reply
    1. Chad Spooner says:
      
      May 16, 2022 at 8:54 am
      
      I find it hard to believe that the two-class problem (‘Signal plus Noise’ vs. ‘Noise Only’) would produce a poor trained neural network.
      
      What are the characteristics of the involved dataset(s)?
      
      Is there any dataset shift issue here?
      
      Difficult for me to interpret without knowing many more details …
      
      Loading...
      
      Reply
Chad Spooner says:

January 19, 2023 at 4:21 pm

I’ve received a Challenger’s decision set for the 20,000 posted signal files in the Shifted Dataset Challenge.

Comparing the decisions with the truth yields a probability of correct classification of about 0.125. The confusion matrix looks like this:

with numerical values:

The researcher used a technique called time-distributed convolutional neural networks. The researcher acknowledges that the scoring I applied is likely correct–it is clear that the first 4000 decisions contain no ‘BPSK’ entries, for example.

Loading...

Reply
Chad Spooner says:

April 4, 2024 at 10:42 am

A second Challenger has submitted modulation-recognition decisions for the Generalized Challenge dataset. Here is the confusion matrix for the first 4000 of the submitted signal strings:

The corresponding probability of correct classification is 0.37, mostly due to the good performance for MSK.

The student researcher here is from Europe, and he used a large convolutional neural network to attempt generalization.

Loading...

Reply
Miguel A Dominguez says:

January 21, 2025 at 12:19 pm

Good afternoon Dr. Spooner! I’m a new reader to your blog and have been enjoying your gentle breakdown of cyclostationary signal processing concepts. I am interested in reproducing the results of your 2022 MIL paper “Robust Classification of Digitally Modulated Signals Using Capsule Networks and Cyclic Cumulant Features”, especially the parts where you use cyclostationary features. I also have some of my own ideas on how to attack the “generalization challenge.”

However, I have one question about this 2022 dataset: Are the ranges of all the parameters the same as in the 2018 dataset (CFO, symbol period, up/downsample rates, RRCF rolloff parameter, SNR)? I want to know if there is anything in the 2022 dataset that would be “out of domain” no matter what useful information I extracted from the 2018 data. I don’t need to know these parameters per-sample, I just want to know if the ranges are different.

Loading...

Reply
1. Chad Spooner says:
  
  January 21, 2025 at 12:55 pm
  
  Welcome to the CSP Blog Miguel! Thank you for the comment.
  
  I have one question about this 2022 dataset: Are the ranges of all the parameters the same as in the 2018 dataset (CFO, symbol period, up/downsample rates, RRCF rolloff parameter, SNR)? I want to know if there is anything in the 2022 dataset that would be “out of domain”
  
  Yes, indeed, the two datasets differ by CFO. Also they differ a little on symbol rate and inband SNR, but the big difference is in the probability density function for the CFO random variable.
  
  In the 2022 MILCOM paper you refer to (My Papers [52]), we provide the following table of parameter ranges for the two datasets (CSPB.ML.2018 and CSPB.ML.2022):
  
  where we withheld the CFO distribution for CSPB.ML.2022. But then in the MILCOM 2023 paper (My Papers [55]), we went ahead and published it:
  
  The point of that line of work was to show that even for a simple modulation-recognition problem, a small change in the PDF of an underlying random variable (and there are many many random variables associated with the reception of a transmitted communication signal) causes large performance degradation for a neural network that has not been trained on the changed RV PDF.
  
  Also, be sure to use the revised datasets CSPB.ML.2018R2 and CSPB.ML.2022R2 instead of the original ones.
  
  Let me know how it all goes!
  
  Loading...
  
  Reply