The same random-number-generator (RNG) error that plagued CSPB.ML.2018 corrupts CSPB.ML.2022, so that some of the files in the dataset correspond to identical signal parameters. This makes the CSPB.ML.2018 dataset potentially problematic for training a neural network using supervised learning.
In a recent post, I remedied the error and provided an updated CSPB.ML.2018 dataset and called it CSPB.ML.2018R2. Both are still available on the CSP Blog.
In this post, I provide an update to CSPB.ML.2022, called CSPB.ML.2022R2.
The CSPB.ML.2022 dataset is aimed at understanding the generalization properties of a modulation-recognition algorithm or neural network. As such, when I posted the original data, I withheld the metadata (true parameter values and modulation-type labels). Researchers were encouraged to process the provided data and submit their decisions or estimates to me, and I’d provide a performance evaluation. One researcher has done so.
So the fact that some parameter vectors are duplicated in the 2022 dataset is of much less importance than it was for the 2018 dataset, where all metadata is provided to facilitate network training. Nevertheless, I’m going to post the first 20,000 signal files for the 2022R2 dataset, just as I posted 20,000 signal files for the original 2022 dataset. Both remain useful as inputs to a study of the generalization property.
The histograms for the key parameters in CSPB.ML.2022R2 are shown in Figure 1.

The first 20,000 files: