# A Challenge for the Machine Learners

## UPDATE

I’ve decided to post the data set I discuss here to the CSP Blog for all interested parties to use. See the new post on the Data Set. If you do use it, please let me and the CSP Blog readers know how you fared with your experiments in the Comments section of either post. Thanks!

A while back I was working with some machine-learning researchers on the problem of carrier-frequency-offset (CFO) estimation. The CFO is the residual carrier frequency exhibited by an imperfectly downconverted radio-frequency signal. I’ll describe it in more detail below. The idea behind the collaboration was to find the SNR, SINR, block-length, etc., ranges for which machine-learning algorithms outperform more traditional approaches, such as those involving exploitation of cyclostationarity. If we’re going to get rid of the feature-based approaches used by experts, then we’d better make sure that the machines can do at least as well as those approaches for the problems typically considered by the experts.

Well, the collaboration fizzled out after I produced my current best-effort results using CSP. I still don’t know how well my erstwhile collaborators can do on the CFO problem. But I’ve got a Blog with decent reach and readership, so maybe somebody else will pick up the challenge.

### Problem Definition

Given a complex-valued data sequence $x(t)$, $t = 0, 1, \ldots T-1$, that contains a single digitally modulated signal taken from a finite set of modulation types $C$, find the modulation type, symbol/bit rate, and carrier-frequency offset. That is, the data to be processed is given by

$\displaystyle x(t) = s(t) + w(t), \ \ t = 0, 1, \ldots T-1 \hfill (1)$

where $w(t)$ is white Gaussian noise that is independent of $s(t)$, which is a digital signal drawn from the modulation-type set $C = \{m_1, m_2, \ldots, m_M\}$. The signal $s(t)$ consists of the complex-valued baseband signal (having zero carrier-frequency offset) multiplied by a complex sine wave that represents the offset:

$\displaystyle s(t) = s_{cb}(t) e^{i 2 \pi f_0 t + i\phi_0} \hfill (2)$

For example, for the eight signals below except $\pi/4$-DQPSK and MSK, the complex-baseband signal is a complex-valued pulse-amplitude-modulated signal:

$\displaystyle s_{cb}(t) = A \sum_k a_k p(t - kT_0 - t_0) \hfill (3)$

where the random variables $a_k$ represent the transmitted symbols (taken from the modulation’s constellation set), $1/T_0$ is the symbol rate, $p(\cdot)$ is the square-root raised-cosine pulse with roll-off parameter $\beta \in [0.1, 1.0]$, $t_0$ is the symbol-clock phase, and $A$ controls the overall power level of the signal. The constellation-set values are chosen so that the expected value of $|a_k|^2$ is unity.

The modulation types determine the basic way that message bits are combined with pulse trains and frequency shifters to create a signal suitable for transmission over a propagation medium. To be more specific, let’s use $M = 8$ and the following set of modulation types:

$\displaystyle m_1$: BPSK, Square-Root Raised-Cosine Pulses

$\displaystyle m_2$: QPSK, Square-Root Raised-Cosine Pulses

$\displaystyle m_3$: 8PSK, Square-Root Raised-Cosine Pulses

$\displaystyle m_4$: $\displaystyle \pi/4$-DQPSK, Square-Root Raised-Cosine Pulses

$\displaystyle m_5$: Minimum-Shift Keying

$\displaystyle m_6$: 16QAM, Square Constellation, Square-Root Raised-Cosine Pulses

$\displaystyle m_7:$ 64QAM, Square Constellation, Square-Root Raised-Cosine Pulses

and

$\displaystyle m_8$: 256QAM, Square Constellation, Square-Root Raised-Cosine Pulses

The definitions of the constellations I use here can be found in my post on the cyclostationarity of digital QAM/PSK. The MSK signal is equivalent to staggered QPSK with half-period-cosine pulses. The $\pi/4$-DQPSK (also called $\pi/4$-QPSK) signal is a QPSK signal that changes its constellation each symbol interval, alternating between one four-point constellation and a version of that constellation shifted by $\pi/4$ radians.

Let’s agree to use normalized frequencies and times here, so the sampling rate is unity. To reflect a realistic situation, we’ll also assume that the carrier-frequency offset is small compared to the signal bandwidth, which is itself roughly equal to the symbol rate.

How well can any given algorithm or machine estimate these parameters and make modulation-type decisions? To answer that, I created a very large number of instances of the signals. The symbol rate, carrier-offset frequency, power level, message bits, and roll-off parameter $\beta$ are varied as the signals are generated and stored. This is described in a little more detail next.

#### Symbol Rates

The symbol interval $T_0$ varies from about $2$ (two samples per symbol) to about $50$. The values for $T_0$ are not constrained to the integers in that range. Through the use of resampling techniques I can create any $T_0$ I wish to. But do realize that in this experiment, and in all the work on the CSP Blog, there is never any requirement on the relationship between the sampling rate and the symbol rate.

#### Carrier-Frequency Offsets

The carrier-frequency offsets are applied to the complex-baseband signals by multiplying by a complex sine wave, as in (2) above. The value of the offset $f_0$ lies in the range $[-0.001, 0.001]$, so even for the smallest bandwidth of about $1/50$, the largest offset is still a small fraction of the bandwidth. The offset is generated as a random number with a uniform distribution on $[-0.001, 0.001]$, so no particular offset is favored.

#### Excess Bandwidths

I allow the roll-off parameter $\beta$ in the square-root raised-cosine pulse to be any number in the interval $[0.1, 1.0]$. Real-world signals typically use something in the range $[0.2, 0.5]$. This is important because my method of recognizing the modulation type depends on the roll-off (which controls the excess bandwidth of the observed signal). Although I’m allowing a huge number of possible roll-offs in the generation of the signal set, the recognizer only contains classification features for roll-offs equal to $k/10$ and also $0.35$. So, strictly speaking, there are a great many signals that are not perfectly recognizable by my CSP approach (more on that below).

#### Message Sequences

Each signal is generated using a randomly chosen message sequence that uses symbols that are identically distributed and independent.

#### Power Levels

The power level of the signal is randomly varied over the generated signal set. The effect is that the inband SNR varies from about $0$ dB to about $13$ dB, with the bulk of the inband SNR values near $10$ dB.

#### Signal Length

Each signal has a length of $2^{15} = 32768$ samples. I consider block lengths of $2^k$ for $k \in \{11,12, \ldots, 15\}$ when I apply my modulation recognition algorithm.

#### Number of Trials

I’ve generated over $10000$ realizations of each of the eight signal types, but in the results presented in this post, I analyze only $1000$ per type.

#### Propagation Channel

There are no channel effects applied to the generated signals except the presence of additive white Gaussian noise $w(t)$.

### The Multiple-Signal Scene Analyzer

I analyze each data file in a blind fashion using an algorithm I call the multiple-signal scene analyzer (MSSA). The core of this approach to modulation classification is described mathematically in My Papers [25,26,28]. The modulation classification part is generally accomplished using cyclic-cumulant matching, and parameters such as symbol rate and carrier-frequency offset are estimated using the spectral correlation function and cyclic cumulants (cyclic temporal cumulant functions). Simpler estimators of the symbol rate and carrier-frequency offset parameters involve a nonlinearity (squarer or quadrupler) followed by Fourier analysis (The Literature [R100]). The MSSA’s estimators are similar in nature, but since they can take advantage of multiple $n$th-order cycle frequencies that are related to the symbol rate and/or carrier-frequency offset, there is also an averaging gain to be had. So I’m asserting that the MSSA is a good example of an approach that uses “expertly crafted features” in the parlance of some of the Machine Learners.

I used the MSSA to do blind modulation classification in the DySPAN 2017 paper (My Papers [44]), but in that case I also used a blind constellation analyzer to boost performance for the hardest cases (for example, 64QAM vs 256QAM). In this post, I do not employ any constellation-analysis to boost performance.

The MSSA employs a default catalog of known signal types that is quite a bit larger than the set of eight modulation types listed above. (For example, it contains multiple amplitude-and-phase-shift-keyed signals, GMSK, a variety of continuous phase modulation types, AM, FM, DSSS, other non-square QAM, etc.) The catalog for this post is restricted to the eight types above. However, the MSSA is also allowed to output an FM, DSSS BPSK, DSSS QAM, and UNKNOWN decision types.

Key MSSA parameters for this experiment are the maximum cyclic-cumulant parameter $N = 6$ and the use of an automatic spectral segmentation algorithm (blind band-of-interest detector) to preprocess the data. The maximum order of $N=6$ means that the classification feature is a set of cyclic temporal cumulant magnitudes for orders $n = 2, 4, 6$ (My Papers [25,26,28]). The use of the blind spectral segmentation algorithm allows the cyclic-cumulant process to operate on data that has out-of-band noise removed.

### MSSA Results

Since the main goal here is to compare “expert” carrier-frequency offset estimators with Machine-Learning estimators, let’s start with the offset results. The following graph shows the mean absolute error (MAE) in the offset estimate as a function of modulation type and block length $T$:

The graph shows the MAE for each input signal type regardless of whether or not the signal is correctly classified. For each block length I also plot the value of the reciprocal of the processing block length ($1/T$) as a reference. This is because a basic FFT-based approach to sine-wave frequency estimation will have a resolution that is approximately $1/T$.

Once the block length $T$ exceeds a threshold, here about $8192$, the carrier-frequency offset MAE is better than $1/T$ except for 8PSK. For the other seven signals, the MSSA uses either second-order statistics (spectral correlation), fourth-order statistics (cyclic cumulants) or both to determine the offset. Continuing with that notion, 8PSK requires a minimum of an eighth-order nonlinearity prior to Fourier analysis to determine an estimate of the offset, and the MSSA doesn’t do that automatically. So the offset estimate for 8PSK is limited to the offset arising from spectral analysis and multi-resolution band-of-interest estimation, which is quite a coarse estimate relative to the cycle-frequency-based methods used for the other signals.

So the figure above is what the MSSA can do, today, against the data set I described. Remember that those curves subsume a large variety of symbol rates, carrier offsets, pulse roll-off factors, and SNRs!

Let’s look at the modulation-recognition performance for this data set. I want to show this so that any challengers know what to exceed, but also because the many roll-off factors used in the data set provides a way to see how the MSSA performance depends not only on the roll-off but on the combination of the signal type and the roll-off, something that we don’t usually attempt to quantify in the literature. That was a long-winded way of saying: A new look at the problem.

First, let’s look at the confusion matrices for the experiment as a whole. I’ll show one confusion matrix for each block length $T$, starting with the smallest and ending with the largest:

We see that for the smaller values of $T$, the MSSA simply doesn’t produce good features–the blind processing fails, and the most common decision is UNKNOWN. As the block length increases, the confusion between the signal types decreases. When we reach our (arbitrary) maximum of $32768$ samples, there is little confusion about BPSK, QPSK, 8PSK, $\pi/4$-DQPSK, and 16QAM. However, 64QAM and 256QAM are routinely confused for each other. This confusion is separate from the ability of the algorithm to accurately estimate the symbol rate and carrier-frequency offset–see the first figure in this post.

Note that a correct decision for the confusion matrices above does not require that the square-root raised-cosine pulse roll-off (excess bandwidth) parameter be accurately estimated. All that is required is that the basic modulation type of the decision matches that of the input. Also recall that the number of roll-off parameters represented in the inputs is much larger than the number represented in the MSSA’s catalog of known signal types. So these confusion matrices reveal that the MSSA is quite tolerant to mismatches in the roll-off parameter between reality and catalog.

But now let’s take a look at how well the MSSA does do at estimating the roll-off. In the following confusion matrices, I look at a single input type, say, BPSK or QPSK. I then separate the inputs into eleven classes, one for each of the roll-off parameters represented in the MSSA catalog. The input type label for each trial is determined by the minimum distance between the actual roll-off parameter used and the eleven roll-off parameters represented in the MSSA catalog.

First, let’s look at BPSK:

As we might expect (and hope), for the larger values of $T$, the confusions between the different BPSK signals are between those with roll-off parameters that are adjacent in the list $\{0.1, 0.2, 0.3, 0.35, 0.4, \ldots, 0.9, 1.0\}$. This indicates that the influence of the roll-off on the cyclic-cumulant features is fairly strong (at least for BPSK), and that no matter which roll-off is used, the extracted cyclic-cumulant features are strong relative to their estimation noise.

Now let’s look at QPSK:

For QPSK we see a little more confusion, especially for the smallest and largest values of the roll-off parameter.

Finally, let’s look at 256QAM, which has the weakest cyclic-cumulant features (weak in the sense that the cyclic-cumulant magnitudes are smallest when all eight signals are constrained to exhibit unit power):

Performance improves with increasing $T$, but confusion is common between any two values of roll-off for 256QAM for all the values of $T$ considered. I interpret this to mean that the cyclic-cumulant features are weak in general, and the influence of the roll-off parameter on the features is swamped by estimation variance due to the additive noise and self noise. Eventually, with large enough $T$, the performance would improve to near perfection, as is the case with the other modulations. It is just that the required value of $T$ for 256QAM will be larger than the required $T$ for the other modulations.

### Obtaining the Data Set

I’ve not uploaded any data to the CSP Blog yet. Let me know in the Comments if you’re interested in obtaining the data for your own Machine Learning modulation recognition (signal classification) experiments. Once that happens, I’ll update this section of the post with instructions on how to obtain the data.

As always, I appreciate your patience and diligence in getting to the end of a post, and I would also appreciate any comments, corrections, or suggestions that you might want to place in the Comments section below.

## 8 thoughts on “A Challenge for the Machine Learners”

1. Brian Hoenes says:

I am interested in obtaining the data set to evaluate an existing cyclostationary classifier and a Machine Learning adaption of that tool.

2. Sreeraj Rajendran says:

I am also interested in this dataset. Can you please provide me the links. Thanks in advance.

3. Mohamed says:

Hi,

I will appreciate that if you share the dataset.

Best regards,
Mohamed