Understanding and Using the Statistics of Communication Signals
Author: Chad Spooner
I'm a signal processing researcher specializing in cyclostationary signal processing (CSP) for communication signals. I hope to use this blog to help others with their cyclo-projects and to learn more about how CSP is being used and extended worldwide.
KIRK: Everything that is in error must be sterilised.
NOMAD: There are no exceptions.
KIRK: Nomad, I made an error in creating you.
NOMAD: The creation of perfection is no error.
KIRK: I did not create perfection. I created error.
I’ve had to update the original Challenge for the Machine Learners post, and the associated dataset post, a couple times due to flaws in my metadata (truth) files. Those were fairly minor, so I just updated the original posts.
But a new flaw in CSPB.ML.2018 and CSPB.ML.2022 has come to light due to the work of the estimable research engineers at Expedition Technology. The problem is not with labeling or the fundamental correctness of the modulation types, pulse functions, etc., but with the way a random-number generator was applied in my multi-threaded dataset-generation technique.
I’ll explain after the fold, and this post will provide links to an updated version of the dataset, CSPB.ML.2018R2. I’ll keep the original up for continuity and also place a link to this post there. Moreover, the descriptions of the truth files over at CSPB.ML.2018 are still valid–the truth file posted here has the same format as the truth files available on the CSPB.ML.2018 and CSPB.ML.2022 posts.
Before we translate the Laplace transform from continuous time to discrete time, deriving the Z transform, let’s take a step back and look at practical filters in continuous time. Practical here stands in opposition to ideal as in the ideal lowpass, highpass, and bandpass filters we studied earlier in the SPTK thread.
We are attempting to force a neural network to learn the features that we have already shown deliver simultaneous good performance and good generalization.
ODU doctoral student John Snoap and I have a new paper on the convergence of cyclostationary signal processing, machine learning using trained neural networks, and RF modulation classification: My Papers  (arxiv.org link here).
Previously in My Papers [50-52, 54] we have shown that the (multitudinous!) neural networks in the literature that use I/Q data as input and perform modulation recognition (output a modulation-class label) are highly brittle. That is, they minimize the classification error, they converge, but they don’t generalize. A trained neural network generalizes well if it can maintain high classification performance even if some of the probability density functions for the data’s random variables differ from the training inputs (in the lab) relative to the application inputs (in the field). The problem is also called the dataset-shift problem or the domain-adaptation problem. Generalization is my preferred term because it is simpler and has a strong connection to the human equivalent: we can quite easily generalize our observations and conclusions from one dataset to another without massive retraining of our neural noggins. We can find the cat in the image even if it is upside-down and colored like a giraffe.
When we look at the spectral correlation or cyclic autocorrelation surfaces for a variety of communication signal types, we learn that the cycle-frequency patterns exhibited by modulated signals are many and varied, and we get a feeling for how those variations look (see also the Desultory CSP posts). Nevertheless, there are large equivalence classes in terms of spectral correlation. That simply means that a large number of distinct modulation types map to the exact same second-order statistics, and therefore to the exact same spectral correlation and cyclic autocorrelation surfaces. The gallery of cyclic cumulants will reveal, in an easy-to-view way, that many of these equivalence classes are removed once we consider, jointly, both second- and higher-order statistics.
What do practicing engineers think of using large-language models like ChatGPT in their research, development, and writing tasks? And is there a future for humans in signal processing?
Let’s switch things up a bit here at the CSP Blog by presenting an interview on a technical topic. I interview two characters you might recall from the post on the Domain Expertise Trap: Engineers Dan Peritum and Eunice Akamai.
With the splashy entrance of large-language models like ChatGPT into everyday life and into virtually all aspects of science, engineering, and education, we all want to know how our jobs and careers could be affected by widespread use of artificial intelligence constructs like ChatGPT, Dall-E, and Midjourney. In this interview with a couple of my favorite engineers, I get a feel for how non-AI researchers and developers think about the coming changes, and of course how they view the hype, distortions, and fabrications surrounding predictions of those changes. You can find photos of the interviewees and brief biographies at the end of the post.
The interview transcript is carefully contrived lightly edited for believability clarity.
The IEEE sent me their annual report for 2022. I was wondering how they were responding to the poor quality of many of their published papers, including faked papers and various paper retractions. Let’s take a quick look.
Another step forward in the merging of CSP and ML for modulation recognition, and another step away from the misstep of always relying on convolutional neural networks from image processing for RF-domain problem-solving.
My Old Dominion colleagues and I have published an extended version of the 2022 MILCOM paper My Papers  in the journal MDPI Sensors. The first author is John Snoap, who is one of those rare people that is an expert in signal processing and in machine learning. Bright future there! Dimitrie Popescu, James Latshaw, and I provided analysis, programming, writing, and research-direction support.
In this Signal Processing ToolKit post, we look at a generalization of the Fourier transform called the Laplace Transform. This is a stepping stone on the way to the Z Transform, which is widely used in discrete-time signal processing, especially in control theory.
‘Insufficient facts always invite danger.’
Spock in Star Trek TOS Episode “Space Seed”
As most CSP Blog readers likely know, I’ve performed detailed critical analyses (one, two, three, and four) of the modulation-recognition datasets put forth publicly by DeepSig in 2016-2018. These datasets are associated with some of their published or arxiv.org papers, such as The Literature [R138], which I also reviewed here.
My conclusion is that the DeepSig datasets are as flawed as the DeepSig papers–it was the highly flawed nature of the papers that got me started down the critical-review path in the first place.
A reader recently alerted me to a change in the Datasets page at deepsig.ai that may indicate they are listening to critics. Let’s take a look and see if there is anything more to say.
The cyclostationarity of frequency-shift-keyed signals depends strongly on the way the carrier phase evolves over time. Many distinct cycle-frequency patterns and spectral correlation shapes are possible.
Let’s get back to basics by looking at a large class of signals known as frequency-shift-keyed (FSK) signals. We will leave to the side, for the most part, the very large class of signals that goes by the name of continuous-phase modulation (CPM), which includes continuous-phase FSK (CPFSK), MSK, GMSK, and many more (The Literature [R188]-[R190]). Those are treated in My Papers , and in a future CSP Blog post.
Here we want to look at more conventional forms of FSK. These signal types don’t necessarily have a continuous phase function. They are generally easier to demodulate and are more robust to noise and interference than the more complicated CPM signal types, but generally have much lower spectral efficiency. They are like the rectangular-pulse PSK of the FSK/CPM world. But they are still used.
So among the CSP Blog readers that voted, I think the consensus is to produce more “on brand” posts on CSP and the Signal-Processing ToolKit. Also, there is significant interest in doing CSP with GNU Radio, which I have considerable experience with, and so I’ll likely be posting some flowgraph ideas and results at some point in 2023.
Thanks everybody! (But I’ll still rant and rave from time to time; sorry!)
Update June 25, 2023: When I said you can vote multiple times, I didn’t mean to ‘spam’ the poll (as my kids would say). Someone just voted for one of the responses ten times in a row (same IP address ten votes within one minute). I meant you can vote for several different items in the poll! So I did remove some of those identical votes. I’ll close the poll at the end of the day June 30.
Update May 11, 2023: Please vote in the Reader Poll below (multiple times if you’d like) soon! As of today, CSP Applications and Signal Processing ToolKit are in the lead, with Rants and Datasets at the bottom.
The CSP Blog is rolling along here in 2023!
March 2023 broke a record for pageviews in a calendar month with over 7,000 as of this writing early in the day on March 31.
Let’s note some other milestones and introduce a poll.
What a month! We’re at about 7,145 views right now, and the previous monthly record is 6,482.
About 84,000 visitors have been counted over the years since the CSP Blog launched in 2015, with 5,500 this year already. I believe this is just a count of the unique IP addresses that have accessed a page. But the number of subscribers is only 198! You can subscribe (“Follow”) to the CSP Blog by entering an email address in the “Follow Blog via Email” box on the right edge of any viewed page, near the top of the page. You’ll get notified through that email address whenever there is a new post. CSP Blog readers cannot see that email address, just as they cannot see the email address associated with any comment, unless there is an associated gravatar.
I’m planning to have more time available to devote to improving and extending the CSP Blog over the next few months. If you want to have input into that process, consider voting in the poll below.
Danger Will Robinson! Non-technical post approaching!
When I was a wee engineer, I’d sometimes clash with other engineers that sneered at technical approaches that didn’t set up a linear-algebraic optimization problem as the first step. Never mind that I’ve been relentlessly focused on single-sensor problems, rather than array-processing problems, and so the naturalness of the linear-algebraic mathematical setting was debatable–however there were still ways to fashion matrices and compute those lovely eigenvalues. The real issue wasn’t the dimensionality of the data model, it was that I didn’t have a handy crank I could turn and pop out a provably optimal solution to the posed problem. Therefore I could be safely ignored. And if nobody could actually write down an optimization problem for, say, general radio-frequency scene analysis, then that problem just wasn’t worth pursuing.
Those critical engineers worship at the altar of optimality. Time for another rant.
In this brief Signal Processing Toolkit note, I warn you about relying on resample.m to increase the sampling rate of your data. It works fine a lot of the time, but when the signal has significant energy near the band edges, it does not.
By the pricking of my thumbs, something wicked this way comes …
I attended a conference on dynamic spectrum access in 2017 and participated in a session on automatic modulation recognition. The session was connected to a live competition within the conference where participants would attempt to apply their modulation-recognition system to signals transmitted in the conference center by the conference organizers. Like a grand modulation-recognition challenge but confined to the temporal, spectral, and spatial constraints imposed by the short-duration conference.
What I didn’t know going in was the level of frustration on the part of the machine-learner organizers regarding the seeming inability of signal-processing and machine-learning researchers to solve the radio-frequency scene analysis problem once and for all. The basic attitude was ‘if the image-processors can have the AlexNet image-recognition solution, and thereby abandon their decades-long attempt at developing serious mathematics-based image-processing theory and practice, why haven’t we solved the RFSA problem yet?’
In this post, we’ll switch gears a bit and look at the problem of waveform estimation. This comes up in two situations for me: single-sensor processing and array (multi-sensor) processing. At some point, I’ll write a post on array processing for waveform estimation (using, say, the SCORE algorithm The Literature [R102]), but here we restrict our attention to the case of waveform estimation using only a single sensor (a single antenna connected to a single receiver). We just have one observed sampled waveform to work with. There are also waveform estimation methods that are multi-sensor but not typically referred to as array processing, such as the blind source separation problem in acoustic scene analysis, which is often solved by principal component analysis (PCA), independent component analysis (ICA), and their variants.
The signal model consists of the noisy sum of two or more modulated waveforms that overlap in both time and frequency. If the signals do not overlap in time, then we can separate them by time gating, and if they do not overlap in frequency, we can separate them using linear time-invariant systems (filters).
The next step in dataset complexity at the CSP Blog: cochannel signals.
I’ve developed another dataset for use in assessing modulation-recognition algorithms (machine-learning-based or otherwise) that is more complex than the original sets I posted for the ML Challenge (CSPB.ML.2018 and CSPB.ML.2022). Half of the new dataset consists of one signal in noise and the other half consists of two signals in noise. In most cases the two signals overlap spectrally, which is a signal condition called cochannel interference.
No, not that prisoner’s dilemma. The dilemma of a prisoner that claims, steadfastly, innocence. Even in the face of strong evidence and a fair jury trial.
In this Signal Processing ToolKit cul-de-sac of a post, we’ll look into a signal-processing adventure involving a digital sting recording and a claim of evidence tampering. We’ll be able to use some of our SPTK tools to investigate a real-world data record that might, just might, have been tampered with. (But most probably not!)
How can we train a neural network to make use of both IQ data samples and CSP features in the context of weak-signal detection?
I’ve been working with some colleagues at Northeastern University (NEU) in Boston, MA, on ways to combine CSP with machine learning. The work I’m doing with Old Dominion University is focused on basic modulation recognition using neural networks and, in particular, the generalization (dataset-shift) problem that is pervasive in deep learning with convolution neural networks. In contrast, the NEU work is focused on specific signal detection and classification problems and looks at how to use multiple disparate data types as inputs to neural-networks; inputs such as complex-valued samples (IQ data) as well as carefully selected components of spectral correlation and spectral coherence surfaces.
My NEU colleagues and I will be publishing a rather lengthy conference paper on a new multi-input-data neural-network approach called ICARUS at InfoCom 2023 this May (My Papers ). You can get a copy of the pre-publication version here or on arxiv.org.