Understanding and Using the Statistics of Communication Signals
Author: Chad Spooner
I'm a signal processing researcher specializing in cyclostationary signal processing (CSP) for communication signals. I hope to use this blog to help others with their cyclo-projects and to learn more about how CSP is being used and extended worldwide.
Back in 2018 I posted a dataset consisting of 112,000 I/Q data files, 32,768 samples in length each, as a part of a challenge to machine learners who had been making strong claims of superiority over signal processing in the area of automatic modulation recognition. One part of the challenge was modulation recognition involving eight digital modulation types, and the other was estimating the carrier frequency offset. That dataset is described here, and I’d like to refer to it as CSPB.ML.2018.
Then in 2022 I posted a companion dataset to CSPB.ML.2018 called CSPB.ML.2022. This new dataset uses the same eight modulation types, similar ranges of SNR, pulse type, and symbol rate, but the random variable that governs the carrier frequency offset is different with respect to the random variable in CSPB.ML.2018. The purpose of the CSPB.ML.2022 dataset is to facilitate studies of the dataset-shift, or generalization, problem in machine learning.
Throughout the past couple of years I’ve been working with some graduate students and a professor at Old Dominion University on merging machine learning and signal processing for problems involving RF signal analysis, such as modulation recognition. We are starting to publish a sequence of papers that describe our efforts. I briefly describe the results of one such paper, My Papers , in this post.
While reading a book on string theory for lay readers, I did a double take…
I don’t know why I haven’t read any of Lee Smolin’s physics books prior to this year, but I haven’t. Maybe blame my obsession with Sean Carroll. In any case, I’ve been reading The Trouble with Physics (The Literature [R175]), which is about string theory and string theorists. Smolin finds it troubling that the string theorist subculture in physics shows some signs of groupthink and authoritarianism. Perhaps elder worship too.
I came across this list of attributes, conceived by Smolin, of the ‘sociology’ of the string-theorist contingent:
The softwarization of engineering continues apace…
I keep seeing people write things like “a major disadvantage of the technique for X is that it requires substantial domain expertise.” Let’s look at a recent good paper that makes many such remarks and try to understand what it could mean, and if having or getting domain expertise is actually a bad thing. Spoiler: It isn’t.
The paper under the spotlight is The Literature [R174], “Interference Suppression Using Deep Learning: Current Approaches and Open Challenges,” published for the nonce on arxiv.org. I’m not calling this post a “Comments On …” post, because once I extract the (many) quotes about domain expertise, I’m leaving the paper alone. The paper is a good paper and I expect it to be especially useful for current graduate students looking to make a contribution in the technical area where machine learning and RF signal processing overlap. I especially like Figure 1 and the various Tables.
Can we fix peer review in engineering by some form of payment to reviewers?
Let’s talk about another paper about cyclostationarity and correntropy. I’ve critically reviewed two previously, which you can find here and here. When you look at the correntropy as applied to a cyclostationary signal, you get something called cyclic correntropy, which is not particularly useful except if you don’t understand regular cyclostationarity and some aspects of garden-variety signal processing. Then it looks great.
But this isn’t a post that primarily takes the authors of a paper to task, although it does do that. I want to tell the tale to get us thinking about what ‘peer’ could mean, these days, in ‘peer-reviewed paper.’ How do we get the best peers to review our papers?
In this Signal Processing ToolKit post we take a close look at the basic sampling theorem used daily by signal-processing engineers. Application of the sampling theorem is a way to choose a sampling rate for converting an analog continuous-time signal to a digital discrete-time signal. The former is ubiquitous in the physical world–for example all the radio-frequency signals whizzing around in the air and through your body right now. The latter is ubiquitous in the computing-device world–for example all those digital-audio files on your DiscmanItunesIpodDVDSmartphoneCloudNeuralink Singularity.
So how are those physical real-world analog signals converted to convenient lists of finite-precision numbers that we can apply arithmetic to? For that’s all [digital or cyclostationary] signal processing is at bottom: arithmetic. You might know the basic rule-of-thumb for choosing a sampling rate: Make sure it is at least twice as big as the largest frequency component in the analog signal undergoing the sampling. But why, exactly, and what does ‘largest frequency component’ mean?
Let’s take a look at an even faster spectral correlation function estimator. How useful is it for CSP applications in communications?
Reader Gideon pointed out that Antoni had published a paper a year after the paper that I considered in my first Antoni post. This newer paper, The Literature [R172], promises a faster fast spectral correlation estimator, and it delivers on that according to the analysis in the paper. However, I think the faster fast spectral correlation estimator is just as limited as the slower fast spectral correlation estimator when considered in the context of communication-signal processing.
And, to be fair, Antoni doesn’t often consider the context of communication-signal processing. His favored application is fault detection in mechanical systems with rotating parts. But I still don’t think the way he compares his fast and faster estimators to conventional estimators is fair. The reason is that his estimators are both severely limited in the maximum cycle frequency that can be processed, relative to the maximum cycle frequency that is possible.
Another RF-signal dataset to help push along our R&D on modulation recognition.
Update February 2023: A third dataset has been posted to the CSP Blog: CSPB.ML.2023. It features cochannel signals.
Update January 2023: I’m going to put Challenger results in the Comments. I’ve received a Challenger’s decisions and scored them in January 2023. See below.
In this post I provide a second dataset for the Machine-Learning Challenge I issued in 2018 (CSPB.ML.2018). This dataset is similar to the original dataset, but possesses a key difference in that the probability distribution of the carrier-frequency offset parameter, viewed as a random variable, is not the same, but is still realistic.
Blog Note: By WordPress’ count, this is the 100th post on the CSP Blog. Together with a handful of pages (like My Papers and The Literature), these hundred posts have resulted in about 250,000 page views. That’s an average of 2,500 page views per post. However, the variance of the per-post pageviews is quite large. The most popular is The Spectral Correlation Function (> 16,000) while the post More on Pure and Impure Sinewaves, from the same era, has only 316 views. A big Thanks to all my readers!!
We take a quick look at a fourth DeepSig dataset called 2016.04C.multisnr.tar.bz2 in the context of the data-shift problem in machine learning.
And if we get this right,
We’re gonna teach ’em how to say
You and I.
Lin-Manuel Miranda, “One Last Time,” Hamilton
I didn’t expect to have to do this, but I am going to analyze yet another DeepSig dataset. One last time. This one is called 2016.04C.multisnr.tar.bz2, and is described thusly on the DeepSig website:
I’ve analyzed the 2018 dataset here, the RML2016.10b.tar.bz2 dataset here, and the RML2016.10a.tar.bz2 dataset here.
Now I’ve come across a manuscript-in-review in which both the RML2016.10a and RML2016.04c data sets are used. The idea is that these two datasets represent two sufficiently distinct datasets so that they are good candidates for use in a data-shift study involving trained neural-network modulation-recognition systems.
The data-shift problem is, as one researcher puts it:
Data shift or data drift, concept shift, changing environments, data fractures are all similar terms that describe the same phenomenon: the different distribution of data between train and test sets
An interesting paper on the true nature of the impulse function we use so much in signal processing.
The impulse function, also called the Dirac delta function, is commonly used in statistical signal processing, and on the CSP Blog (examples: representations and transforms). I think we’re a bit casual about this usage, and perhaps none of us understand impulses as well as we might.
A colleague has started up a website with lots of content on digital signal processing: Wave Walker DSP. This is, to me, a new kind of engineering blog in that it blends DSP mathematics and practice with philosophy. That’s an intriguing complement to my engineering blog, which I view as blending DSP mathematics with criticism.
What are the ranges of spectral frequency and cycle frequency that we need to consider in a discrete-time/discrete-frequency setting for CSP?
Let’s talk about that diamond-shaped region in the plane we so often see associated with CSP. I’m talking about the principal domain for the discrete-time/discrete-frequency spectral correlation function. Where does it come from? Why do we care? When does it come up?
The Fast Spectral Correlation estimator is a quick way to find small cycle frequencies. However, its restrictions render it inferior to estimators like the SSCA and FAM.
In this post we take a look at an alternative CSP estimator created by J. Antoni et al (The Literature [R152]). The paper describing the estimator can be found here, and you can get some corresponding MATLAB code, posted by the authors, here if you have a Mathworks account.
The merging of conventional probability theory with signal theory leads to random processes, also known as stochastic processes. The ideas involved with random processes are central to cyclostationary signal processing.
In this Signal Processing ToolKit post, I provide an introduction to the concept and use of random processes (also called stochastic processes). This is my perspective on random processes, so although I’ll introduce and use the conventional concepts of stationarity and ergodicity, I’ll end up focusing on the differences between stationary and cyclostationary random processes. The goal is to illustrate those differences with informative graphics and videos; to build intuition in the reader about how the cyclostationarity property comes about, and about how the property relates to the more abstract mathematical object of a random process on one hand and to the concrete data-centric signal on the other.
So … this is the first SPTK post that is also a CSP post.
Does the use of ‘total SNR’ mislead when the fractional bandwidth is very small? What constitutes ‘weak-signal processing?’
Or maybe “Comments on” here should be “Questions on.”
In a recent paper in EURASIP Journal on Advances in Signal Processing (The Literature [R165]), the authors tackle the problem of machine-learning-based modulation recognition for highly oversampled rectangular-pulse digital signals. They don’t use the DeepSig datasets (one, two, three, four), but their dataset description and use of ‘signal-to-noise ratio’ leaves a lot to be desired. Let’s take a brief look. See if you agree with me that the touting of their results as evidence that they can reliably classify signals with ‘SNRs of dB’ is unwarranted and misleading.
In this Signal Processing ToolKit post, we continue our exploration of random variables. Here we look at specific examples of random variables, which means that we focus on concrete well-defined cumulative distribution functions (CDFs) and probability density functions (PDFs). Along the way, we show how to use some of MATLAB’s many random-number generators, which are functions that produce one or more instances of a random variable with a specified PDF.
Just a reminder that if you are getting some value out of the CSP Blog, I would appreciate it if you could make a donation to offset my costs: I do pay WordPress to keep ads off the site! I also pay extra for a class of service that allows me to post large data sets like the one for the Machine-Learner Challenge.
If everyone that derived value from the CSP Blog were to donate $5, I’d have enough leftover for at least a couple cups of fancy coffee.
In signal processing, and in CSP, we often have to convert real-valued data into complex-valued data and vice versa. Real-valued data is in the real world, but complex-valued data is easier to process due to the use of a substantially lower sampling rate.
In this Signal-Processing Toolkit post, we review the signal-processing steps needed to convert a real-valued sampled-data bandpass signal to a complex-valued sampled-data lowpass signal. The former can arise from sampling a signal that has been downconverted from its radio-frequency spectral band to a much lower intermediate-frequency spectral band. So we want to convert such data to complex samples at zero frequency (‘complex baseband’) so we can decimate them and thereby match the sample rate to the signal’s baseband bandwidth. Subsequent signal-processing algorithms (including CSP of course) can then operate on the relatively low-rate complex-envelope data, which is beneficial because the same number of seconds of data can be processed using fewer samples, and computational cost is determined by the number of samples, not the number of seconds.
We continue our basic signal-processing posts with one on the moving-average, or smoothing, filter. The moving-average filter is a linear time-invariant operation that is widely used to mitigate the effects of additive noise and other random disturbances from a presumably well-behaved signal. For example, a physical phenomenon may be producing a signal that increases monotonically over time, but our measurement of that signal is corrupted by noise, interference, or flaws in the measurement process. The moving-average filter can reveal the sought-after trend by suppressing the effects of the unwanted disturbances.