For completeness, I also correct the CSPB.ML.2022 dataset, which is aimed at facilitating neural-network generalization studies.
The same random-number-generator (RNG) error that plagued CSPB.ML.2018 corrupts CSPB.ML.2022, so that some of the files in the dataset correspond to identical signal parameters. This makes the CSPB.ML.2018 dataset potentially problematic for training a neural network using supervised learning.
In a recent post, I remedied the error and provided an updated CSPB.ML.2018 dataset and called it CSPB.ML.2018R2. Both are still available on the CSP Blog.
In this post, I provide an update to CSPB.ML.2022, called CSPB.ML.2022R2.
KIRK: Everything that is in error must be sterilised.
NOMAD: There are no exceptions.
KIRK: Nomad, I made an error in creating you.
NOMAD: The creation of perfection is no error.
KIRK: I did not create perfection. I created error.
I’ve had to update the original Challenge for the Machine Learners post, and the associated dataset post, a couple times due to flaws in my metadata (truth) files. Those were fairly minor, so I just updated the original posts.
But a new flaw in CSPB.ML.2018 and CSPB.ML.2022 has come to light due to the work of the estimable research engineers at Expedition Technology. The problem is not with labeling or the fundamental correctness of the modulation types, pulse functions, etc., but with the way a random-number generator was applied in my multi-threaded dataset-generation technique.
I’ll explain after the fold, and this post will provide links to an updated version of the dataset, CSPB.ML.2018R2. I’ll keep the original up for continuity and also place a link to this post there. Moreover, the descriptions of the truth files over at CSPB.ML.2018 are still valid–the truth file posted here has the same format as the truth files available on the CSPB.ML.2018 and CSPB.ML.2022 posts.
We are attempting to force a neural network to learn the features that we have already shown deliver simultaneous good performance and good generalization.
ODU doctoral student John Snoap and I have a new paper on the convergence of cyclostationary signal processing, machine learning using trained neural networks, and RF modulation classification: My Papers  (arxiv.org link here).
Previously in My Papers [50-52, 54] we have shown that the (multitudinous!) neural networks in the literature that use I/Q data as input and perform modulation recognition (output a modulation-class label) are highly brittle. That is, they minimize the classification error, they converge, but they don’t generalize. A trained neural network generalizes well if it can maintain high classification performance even if some of the probability density functions for the data’s random variables differ from the training inputs (in the lab) relative to the application inputs (in the field). The problem is also called the dataset-shift problem or the domain-adaptation problem. Generalization is my preferred term because it is simpler and has a strong connection to the human equivalent: we can quite easily generalize our observations and conclusions from one dataset to another without massive retraining of our neural noggins. We can find the cat in the image even if it is upside-down and colored like a giraffe.
When we look at the spectral correlation or cyclic autocorrelation surfaces for a variety of communication signal types, we learn that the cycle-frequency patterns exhibited by modulated signals are many and varied, and we get a feeling for how those variations look (see also the Desultory CSP posts). Nevertheless, there are large equivalence classes in terms of spectral correlation. That simply means that a large number of distinct modulation types map to the exact same second-order statistics, and therefore to the exact same spectral correlation and cyclic autocorrelation surfaces. The gallery of cyclic cumulants will reveal, in an easy-to-view way, that many of these equivalence classes are removed once we consider, jointly, both second- and higher-order statistics.
The IEEE sent me their annual report for 2022. I was wondering how they were responding to the poor quality of many of their published papers, including faked papers and various paper retractions. Let’s take a quick look.
Another step forward in the merging of CSP and ML for modulation recognition, and another step away from the misstep of always relying on convolutional neural networks from image processing for RF-domain problem-solving.
My Old Dominion colleagues and I have published an extended version of the 2022 MILCOM paper My Papers  in the journal MDPI Sensors. The first author is John Snoap, who is one of those rare people that is an expert in signal processing and in machine learning. Bright future there! Dimitrie Popescu, James Latshaw, and I provided analysis, programming, writing, and research-direction support.
The cyclostationarity of frequency-shift-keyed signals depends strongly on the way the carrier phase evolves over time. Many distinct cycle-frequency patterns and spectral correlation shapes are possible.
Let’s get back to basics by looking at a large class of signals known as frequency-shift-keyed (FSK) signals. We will leave to the side, for the most part, the very large class of signals that goes by the name of continuous-phase modulation (CPM), which includes continuous-phase FSK (CPFSK), MSK, GMSK, and many more (The Literature [R188]-[R190]). Those are treated in My Papers , and in a future CSP Blog post.
Here we want to look at more conventional forms of FSK. These signal types don’t necessarily have a continuous phase function. They are generally easier to demodulate and are more robust to noise and interference than the more complicated CPM signal types, but generally have much lower spectral efficiency. They are like the rectangular-pulse PSK of the FSK/CPM world. But they are still used.
So among the CSP Blog readers that voted, I think the consensus is to produce more “on brand” posts on CSP and the Signal-Processing ToolKit. Also, there is significant interest in doing CSP with GNU Radio, which I have considerable experience with, and so I’ll likely be posting some flowgraph ideas and results at some point in 2023.
Thanks everybody! (But I’ll still rant and rave from time to time; sorry!)
Update June 25, 2023: When I said you can vote multiple times, I didn’t mean to ‘spam’ the poll (as my kids would say). Someone just voted for one of the responses ten times in a row (same IP address ten votes within one minute). I meant you can vote for several different items in the poll! So I did remove some of those identical votes. I’ll close the poll at the end of the day June 30.
Update May 11, 2023: Please vote in the Reader Poll below (multiple times if you’d like) soon! As of today, CSP Applications and Signal Processing ToolKit are in the lead, with Rants and Datasets at the bottom.
The CSP Blog is rolling along here in 2023!
March 2023 broke a record for pageviews in a calendar month with over 7,000 as of this writing early in the day on March 31.
Let’s note some other milestones and introduce a poll.
What a month! We’re at about 7,145 views right now, and the previous monthly record is 6,482.
About 84,000 visitors have been counted over the years since the CSP Blog launched in 2015, with 5,500 this year already. I believe this is just a count of the unique IP addresses that have accessed a page. But the number of subscribers is only 198! You can subscribe (“Follow”) to the CSP Blog by entering an email address in the “Follow Blog via Email” box on the right edge of any viewed page, near the top of the page. You’ll get notified through that email address whenever there is a new post. CSP Blog readers cannot see that email address, just as they cannot see the email address associated with any comment, unless there is an associated gravatar.
I’m planning to have more time available to devote to improving and extending the CSP Blog over the next few months. If you want to have input into that process, consider voting in the poll below.
In this post, we’ll switch gears a bit and look at the problem of waveform estimation. This comes up in two situations for me: single-sensor processing and array (multi-sensor) processing. At some point, I’ll write a post on array processing for waveform estimation (using, say, the SCORE algorithm The Literature [R102]), but here we restrict our attention to the case of waveform estimation using only a single sensor (a single antenna connected to a single receiver). We just have one observed sampled waveform to work with. There are also waveform estimation methods that are multi-sensor but not typically referred to as array processing, such as the blind source separation problem in acoustic scene analysis, which is often solved by principal component analysis (PCA), independent component analysis (ICA), and their variants.
The signal model consists of the noisy sum of two or more modulated waveforms that overlap in both time and frequency. If the signals do not overlap in time, then we can separate them by time gating, and if they do not overlap in frequency, we can separate them using linear time-invariant systems (filters).
The next step in dataset complexity at the CSP Blog: cochannel signals.
I’ve developed another dataset for use in assessing modulation-recognition algorithms (machine-learning-based or otherwise) that is more complex than the original sets I posted for the ML Challenge (CSPB.ML.2018 and CSPB.ML.2022). Half of the new dataset consists of one signal in noise and the other half consists of two signals in noise. In most cases the two signals overlap spectrally, which is a signal condition called cochannel interference.
How can we train a neural network to make use of both IQ data samples and CSP features in the context of weak-signal detection?
I’ve been working with some colleagues at Northeastern University (NEU) in Boston, MA, on ways to combine CSP with machine learning. The work I’m doing with Old Dominion University is focused on basic modulation recognition using neural networks and, in particular, the generalization (dataset-shift) problem that is pervasive in deep learning with convolution neural networks. In contrast, the NEU work is focused on specific signal detection and classification problems and looks at how to use multiple disparate data types as inputs to neural-networks; inputs such as complex-valued samples (IQ data) as well as carefully selected components of spectral correlation and spectral coherence surfaces.
My NEU colleagues and I will be publishing a rather lengthy conference paper on a new multi-input-data neural-network approach called ICARUS at InfoCom 2023 this May (My Papers ). You can get a copy of the pre-publication version here or on arxiv.org.
The CSP Blog recently received a comment from a signal processor that needed a small amount of debugging help with their python spectral correlation estimator code.
The code uses a form of the time-smoothing method and aims to compute and plot the spectral correlation estimate as well as the corresponding coherence estimate. What is cool about this code is that it is clear, well-organized, on github, and is written using Jupyter Notebook. Moreover, there is a Google Colab function so that anyone can run the code from a chrome browser and see the results, even a python newbie like me. Tres moderne.
It’s too close to home, and it’s too near the bone …
Park the car at the side of the road You should know Time’s tide will smother you… And I will too
“That Joke Isn’t Funny Anymore” by The Smiths
I applaud the intent behind the paper in this post’s title, which is The Literature [R183], apparently accepted in 2022 for publication in IEEE Access, a peer-reviewed journal. That intent is to list all the found ways in which researchers preprocess radio-frequency data (complex sampled data) prior to applying some sort of modulation classification (recognition) algorithm or system.
The problem is that this attempt at gathering up all of the ‘representations’ gets a lot of the math wrong, and so has a high potential to confuse rather than illuminate.
Neural networks with CSP-feature inputs DO generalize in the modulation-recognition problem setting.
In some recently published papers (My Papers [50,51]), my ODU colleagues and I showed that convolutional neural networks and capsule networks do not generalize well when their inputs are complex-valued data samples, commonly referred to as simply IQ samples, or as raw IQ samples by machine learners.(Unclear why the adjective ‘raw’ is often used as it adds nothing to the meaning. If I just say Hey, pass me those IQ samples, would ya?, do you think maybe he means the processed ones? How about raw-I-mean–seriously-man–I-did-not-touch-those-numbers-OK? IQ samples? All-natural vegan unprocessed no-GMO organic IQ samples?Uncooked IQ samples?) Moreover, the capsule networks typically outperform the convolutional networks.
In a new paper (MILCOM 2022: My Papers ; arxiv.org version), my colleagues and I continue this line of research by including cyclic cumulants as the inputs to convolutional and capsule networks. We find that capsule networks outperform convolutional networks and that convolutional networks trained on cyclic cumulants outperform convolutional networks trained on IQ samples. We also find that both convolutional and capsule networks trained on cyclic cumulants generalize perfectly well between datasets that have different (disjoint) probability density functions governing their carrier frequency offset parameters.
That is, convolutional networks do better recognition with cyclic cumulants and generalize very well with cyclic cumulants.
So why don’t neural networks ever ‘learn’ cyclic cumulants with IQ data at the input?
The majority of the software and analysis work is performed by the first author, John Snoap, with an assist on capsule networks by James Latshaw. I created the datasets we used (available here on the CSP Blog [see below]) and helped with the blind parameter estimation. Professor Popescu guided us all and contributed substantially to the writing.
Another brick in the wall, another drop in the bucket, another windmill on the horizon …
Let’s talk more about The Cult. No, I don’t mean She Sells Sanctuary, for which I do have considerable nostalgic fondness. I mean the Cult(ure) of Machine Learning in RF communications and signal processing. Or perhaps it is more of an epistemic bubble where there are The Things That Must Be Said and The Unmentionables in every paper and a style of research that is strictly adhered to but that, sadly, produces mostly error and promotes mostly hype. So we have shibboleths, taboos, and norms to deal with inside the bubble.
Time to get on my high horse. She’s a good horse named Ravager and she needs some exercise. So I’m going to strap on my claymore, mount Ravager, and go for a ride. Or am I merely tilting at windmills?
Let’s take a close look at another paper on machine learning for modulation recognition. It uses, uncritically, the DeepSig RML 2016 datasets. And the world and the world, the world drags me down…
Introducing swag for the best CSP-Blog commenters.
Update January 2023: You can find the list of winners on this page.
The comments that CSP Blog readers have made over the past six years are arguably the most helpful part of the Blog for do-it-yourself CSP practitioners. In those comments, my many errors have been revealed, which then has permitted me to attempt post corrections. Many unclear aspects of a post have been clarified after pondering a reader’s comment. At least one comment has been elevated to a post of its own.
The readership of the CSP Blog has been steadily growing since its inception in 2015, but the ratio of page views to comments remains huge–the vast majority of readers do not comment. This is understandable and perfectly acceptable. I rarely comment on any of the science and engineering blogs that I frequent. Nevertheless, I would like to encourage more commenting and also reward it.
Back in 2018 I posted a dataset consisting of 112,000 I/Q data files, 32,768 samples in length each, as a part of a challenge to machine learners who had been making strong claims of superiority over signal processing in the area of automatic modulation recognition. One part of the challenge was modulation recognition involving eight digital modulation types, and the other was estimating the carrier frequency offset. That dataset is described here, and I’d like to refer to it as CSPB.ML.2018.
Then in 2022 I posted a companion dataset to CSPB.ML.2018 called CSPB.ML.2022. This new dataset uses the same eight modulation types, similar ranges of SNR, pulse type, and symbol rate, but the random variable that governs the carrier frequency offset is different with respect to the random variable in CSPB.ML.2018. The purpose of the CSPB.ML.2022 dataset is to facilitate studies of the dataset-shift, or generalization, problem in machine learning.
Throughout the past couple of years I’ve been working with some graduate students and a professor at Old Dominion University on merging machine learning and signal processing for problems involving RF signal analysis, such as modulation recognition. We are starting to publish a sequence of papers that describe our efforts. I briefly describe the results of one such paper, My Papers , in this post.