CSPB.ML.2022R2: Correcting an RNG Flaw in CSPB.ML.2022

For completeness, I also correct the CSPB.ML.2022 dataset, which is aimed at facilitating neural-network generalization studies.

The same random-number-generator (RNG) error that plagued CSPB.ML.2018 corrupts CSPB.ML.2022, so that some of the files in the dataset correspond to identical signal parameters. This makes the CSPB.ML.2018 dataset potentially problematic for training a neural network using supervised learning.

In a recent post, I remedied the error and provided an updated CSPB.ML.2018 dataset and called it CSPB.ML.2018R2. Both are still available on the CSP Blog.

In this post, I provide an update to CSPB.ML.2022, called CSPB.ML.2022R2.

The CSPB.ML.2022 dataset is aimed at understanding the generalization properties of a modulation-recognition algorithm or neural network. As such, when I posted the original data, I withheld the metadata (true parameter values and modulation-type labels). Researchers were encouraged to process the provided data and submit their decisions or estimates to me, and I’d provide a performance evaluation. One researcher has done so.

So the fact that some parameter vectors are duplicated in the 2022 dataset is of much less importance than it was for the 2018 dataset, where all metadata is provided to facilitate network training. Nevertheless, I’m going to post the first 20,000 signal files for the 2022R2 dataset, just as I posted 20,000 signal files for the original 2022 dataset. Both remain useful as inputs to a study of the generalization property.

The histograms for the key parameters in CSPB.ML.2022R2 are shown in Figure 1.

Figure 1. Histograms for the entire 112,000-file CSPB.ML.2022R2 dataset. The major difference between CSPB.ML.2018R2 and CSPB.ML.2022R2 is the distribution of the carrier-frequency offset parameter. The inband SNR distribution is also slightly different, with more high-SNR signals in 2022R2 than in 2018R2.

The first 20,000 files:

Batch_1

Batch_2

Batch_3

Batch_4

Batch_5

Author: Chad Spooner

I'm a signal processing researcher specializing in cyclostationary signal processing (CSP) for communication signals. I hope to use this blog to help others with their cyclo-projects and to learn more about how CSP is being used and extended worldwide.

Leave a Comment, Ask a Question, or Point out an Error

Discover more from Cyclostationary Signal Processing

Subscribe now to keep reading and get access to the full archive.

Continue reading