CSPB.ML – Cyclostationary Signal Processing

CSPB.ML.2018R2.NF

A noise-free version of the 2018 CSP Blog dataset CSPB.ML.2018R2 is posted here. This allows researchers to correctly apply propagation-channel effects to the generated signals, and to easily add their own noise at whatever level they wish.

The format of the files is the same as CSPB.ML.2018R2, and the truth parameters for each file are the same as the truth parameters for the corresponding file in CSPB.ML.2018R2, except for SNR, which is infinite.

CSPB.ML.2023G1

Another dataset aimed at the continuing problem of generalization in machine-learning-based modulation recognition. This one is a companion to CSPB.ML.2023, which features cochannel situations.

Quality datasets containing digital signals with varied parameters and lengths sufficient to permit many kinds of validation checks by signal-processing experts remain in short supply. In this post, we continue our efforts to provide such datasets by offering a companion unlabeled dataset to CSPB.ML.2023.

CSPB.ML.2022R2: Correcting an RNG Flaw in CSPB.ML.2022

For completeness, I also correct the CSPB.ML.2022 dataset, which is aimed at facilitating neural-network generalization studies.

The same random-number-generator (RNG) error that plagued CSPB.ML.2018 corrupts CSPB.ML.2022, so that some of the files in the dataset correspond to identical signal parameters. This makes the CSPB.ML.2018 dataset potentially problematic for training a neural network using supervised learning.

In a recent post, I remedied the error and provided an updated CSPB.ML.2018 dataset and called it CSPB.ML.2018R2. Both are still available on the CSP Blog.

In this post, I provide an update to CSPB.ML.2022, called CSPB.ML.2022R2.

CSPB.ML.2018R2: Correcting an RNG Flaw in CSPB.ML.2018

KIRK: Everything that is in error must be sterilised.
NOMAD: There are no exceptions.
KIRK: Nomad, I made an error in creating you.
NOMAD: The creation of perfection is no error.
KIRK: I did not create perfection. I created error.

I’ve had to update the original Challenge for the Machine Learners post, and the associated dataset post, a couple times due to flaws in my metadata (truth) files. Those were fairly minor, so I just updated the original posts.

But a new flaw in CSPB.ML.2018 and CSPB.ML.2022 has come to light due to the work of the estimable research engineers at Expedition Technology. The problem is not with labeling or the fundamental correctness of the modulation types, pulse functions, etc., but with the way a random-number generator was applied in my multi-threaded dataset-generation technique.

I’ll explain after the fold, and this post will provide links to an updated version of the dataset, CSPB.ML.2018R2. I’ll keep the original up for continuity and also place a link to this post there. Moreover, the descriptions of the truth files over at CSPB.ML.2018 are still valid–the truth file posted here has the same format as the truth files available on the CSPB.ML.2018 and CSPB.ML.2022 posts.

PSK/QAM Cochannel Dataset for Modulation Recognition Researchers [CSPB.ML.2023]

The next step in dataset complexity at the CSP Blog: cochannel signals.

I’ve developed another dataset for use in assessing modulation-recognition algorithms (machine-learning-based or otherwise) that is more complex than the original sets I posted for the ML Challenge (CSPB.ML.2018 and CSPB.ML.2022). Half of the new dataset consists of one signal in noise and the other half consists of two signals in noise. In most cases the two signals overlap spectrally, which is a signal condition called cochannel interference.

We’ll call it CSPB.ML.2023.

Shifted Dataset for the Machine-Learning Challenge: How Well Does a Modulation-Recognition DNN Generalize? [Dataset CSPB.ML.2022]

Another RF-signal dataset to help push along our R&D on modulation recognition.

Update October 2023: A flaw in the way a random-number generator was used to create CSPB.ML.2022 (and CSPB.ML.2018) has led me to recreate the dataset and post it here. It is called CSPB.ML.2022R2.

Update February 2023: A third dataset has been posted to the CSP Blog: CSPB.ML.2023. It features cochannel signals.

Update January 2023: I’m going to put Challenger results in the Comments. I’ve received a Challenger’s decisions and scored them in January 2023. See below.

In this post I provide a second dataset for the Machine-Learning Challenge I issued in 2018 (CSPB.ML.2018). This dataset is similar to the original dataset, but possesses a key difference in that the probability distribution of the carrier-frequency offset parameter, viewed as a random variable, is not the same, but is still realistic.

Blog Note: By WordPress’ count, this is the 100th post on the CSP Blog. Together with a handful of pages (like My Papers and The Literature), these hundred posts have resulted in about 250,000 page views. That’s an average of 2,500 page views per post. However, the variance of the per-post pageviews is quite large. The most popular is The Spectral Correlation Function (> 16,000) while the post More on Pure and Impure Sinewaves, from the same era, has only 316 views. A big Thanks to all my readers!!

Dataset for the Machine-Learning Challenge [CSPB.ML.2018]

A PSK/QAM/SQPSK data set with randomized symbol rate, inband SNR, carrier-frequency offset, and pulse roll-off.

Update April 2025: All but the first five batch files have been removed. I needed to make space since WordPress has a hard limit on storage. Use CSPB.ML.2018R2 in any case.

Update September 2023: A randomization flaw has been found and fixed for CSPB.ML.2018, resulting in CSPB.ML.2018R2. Use that one going forward.

Update February 2023: I’ve posted a third challenge dataset here. It is CSPB.ML.2023 and features cochannel signals.

Update April 2022. I’ve also posted a second dataset here. This new dataset is similar to the original ML Challenge dataset except the random variable representing the carrier frequency offset has a slightly different distribution.

If you refer to either of the posted datasets in a published paper, please use the following designators, which I am also using in papers I’m attempting to publish:

Original ML Challenge Dataset: CSPB.ML.2018.

Shifted ML Challenge Dataset: CSPB.ML.2022.

Update September 2020. I made a mistake when I created the signal-parameter “truth” files signal_record.txt and signal_record_first_20000.txt. Like the DeepSig RML datasets that I analyzed on the CSP Blog here and here, the SNR parameter in the truth files did not match the actual SNR of the signals in the data files. I’ve updated the truth files and the links below. You can still use the original files for all other signal parameters, but the SNR parameter was in error.

Update July 2020. I originally posted $20,000$ signals in the posted dataset. I’ve now added another $92,000$ for a total of $112,000$ signals. The original signals are contained in Batches 1-5, the additional signals in Batches 6-28. I’ve placed these additional Batches at the end of the post to preserve the original post’s content.

A Challenge for the Machine Learners

The machine-learning modulation-recognition community consistently claims vastly superior performance to anything that has come before. Let’s test that.

Update September 2023: A randomization flaw has been found and fixed for CSPB.ML.2018, resulting in CSPB.ML.2018R2. Use that one going forward.

Update February 2023: A third dataset has been posted here. This new dataset, CSPB.ML.2023, features cochannel signals.

Update April 2022: I’ve also posted a second dataset here. This new dataset is similar to the original ML Challenge dataset except the random variable representing the carrier frequency offset has a slightly different distribution.

If you refer to any of the posted datasets in a published paper, please use the following designators, which I am also using in papers I’m attempting to publish:

Original ML Challenge Dataset: CSPB.ML.2018.

Shifted ML Challenge Dataset: CSPB.ML.2022.

Cochannel ML Dataset: CSPB.ML.2023.

Update February 2019

I’ve decided to post the data set I discuss here to the CSP Blog for all interested parties to use. See the new post on the Data Set. If you do use it, please let me and the CSP Blog readers know how you fared with your experiments in the Comments section of either post. Thanks!

Continue reading “A Challenge for the Machine Learners”

Chad Spooner on Cyclostationarity of Direct-Sequence Spread-Spectrum SignalsJune 9, 2026
Welcome to the CSP Blog XY! Thanks for the comment. will DSSS signals always present such clear peak characteristics regardless…
XY on Cyclostationarity of Direct-Sequence Spread-Spectrum SignalsJune 8, 2026
Hi there, hope you get a chance to see this question. I've been following your mathematical derivations regarding DSSS signals,…
Mansoor Wahab on SPTK: Sampling and The Sampling TheoremJune 7, 2026
When sampling a RF/bandpass signal with carrier frequency, you have several options: 1) Downconvert the signal to baseband with analog…
Chad Spooner on Watch Out!May 25, 2026
Welcome to the CSP Blog Tim! Thanks for the thoughtful comment. I've come across the substack called Slow AI by…
Chad Spooner on PSK/QAM Cochannel Dataset for Modulation Recognition Researchers [CSPB.ML.2023]May 12, 2026
Welcome to the CSP Blog Muhammad! Thanks for reaching out and for your interest in CSPB.ML.2023. It will take some…
Muhammad Zakir Khan on PSK/QAM Cochannel Dataset for Modulation Recognition Researchers [CSPB.ML.2023]May 11, 2026
Great post and i am really intrested to move forward. can i get the full dataset link to process?
Tim Meehan on Watch Out!March 20, 2026
Great article Chad, AI use in research has made an existing problem: sloppy research. I have found LLMs very useful…
Simon Clift on SPTK: Interconnection of Linear SystemsMarch 18, 2026
I'll happily defer to you, at least until I can say something coherent. I'm a mathematician and exploring some connections…
RUI WU on Latest Paper on CSP and Deep-Learning for Modulation Recognition: An Extended Version of My Papers [52]March 11, 2026
Thank you very much for your helpful explanation. I noticed that both Ref. 52 and Ref. 56 show relatively weak…
Chad Spooner on Latest Paper on CSP and Deep-Learning for Modulation Recognition: An Extended Version of My Papers [52]March 8, 2026
Welcome to the CSP Blog Rui! Thanks for the questions. Does this mean we don’t need to compute all (11…