In this Signal Processing ToolKit post, we look at a generalization of the Fourier transform called the *Laplace Transform.* This is a stepping stone on the way to the *Z Transform*, which is widely used in discrete-time signal processing, especially in control theory.

Jump straight to ‘Significance of the Laplace Transform in CSP‘ below.

Let’s motivate the upcoming Z transform by generalizing the Fourier transform. But why do we *need* to generalize something so pure, so good, so useful, and so perfect as the Fourier transform???

Consider the unit-ramp function shown in Figure 1. Recalling that the unit-step function is zero for negative , one for positive , and variously defined as one or zero for (let’s not worry about that), the unit-ramp can be expressed as .

What is the Fourier transform of ? We can start by writing it down,

An expression for this Fourier transform can be found, but it involves the derivative of the impulse function, so it doesn’t exist as a well-behaved function, and is even difficult to deal with as a generalized function.

Consider also random functions like Gaussian noise and exponentials like with . They do not have Fourier transforms. For the exponentials and ramps, the basic problem is that the functions are increasing with (or whatever the independent variable is) and so the integral–which is the limit of a sum involving the values of that increasing function–cannot converge. For sample paths of random processes like the Gaussian process or a BPSK signal, the limit simply does not converge to any particular value, although the signal does not blow up like the ramp and exponentials.

One way around this lack of convergence in the Fourier transform is to introduce a damping factor inside the transform’s integral to ensure that the signal does not increase with time so much that the integral diverges. For example, if we multiply the unit ramp by a unit exponential and integrate the result, we get a finite number. This exponential dampening is illustrated in Figure 2.

The exponential tends to zero rapidly, and controllably with the magnitude of the positive number , and is never zero, so it is a good choice to both preserve the character of the signal it is multiplying (since it is never zero, no values of that function are discarded in the integration) and to ensure that no matter how fast the function under study increases with time, it can be brought down to earth.

So to enable a transform of a signal that is not Fourier transformable, we enter a factor of , for some real number , into our Fourier transform as follows

Now let the variable be equal to . Then our transform becomes

which is the Laplace transform of . Most of the time we want to apply this transform to signals that are zero for negative times , so that the Laplace transform is one-sided and is usually written as

but the two-sided transform is also used. The one-sided transform, being applicable to signals that are zero for negative time, is particularly useful for transformation of causal impulse-response functions, which possess that exact property already.

The advantage of the Laplace transform over the Fourier transform is that the functions to be transformed can be poorly behaved–they might correspond to systems that are unstable and so their outputs grow without bound. So you might be able to see why we’ve done without the Laplace transform at the CSP Blog lo these many years–we are almost always interested in communication signals that are perhaps not Fourier transformable but are not growing without bound. We got around the fact that random communication signals (that is, all useful communication signals) are not Fourier transformable by switching our focus from transforms to power spectra.

If the Fourier transform for exists, then it is given by the Laplace transform with . Following our notation convention for the Fourier transform, the Laplace transform is denoted by the operator and also the doubled-ended arrow as in

Since the transform is defined by an integral, and integration is itself linear, it follows that the Laplace transform, like the Fourier transform, is a linear transform. This simply means that the Laplace transform of the sum of scaled signals is the sum of scaled Laplace transforms,

Linearity will help us compute the Laplace transform of complicated signals by permitting us to express them as the sum of simpler signals, find the Laplace transform of each of the summands, and finally add them up.

For what values of does a particular Laplace transform exist? This is typically visually expressed by considering the -plane, which has vertical axis denoted by , which equals , or by itself, and horizontal axis denoted by . Let’s take a look at the region of convergence by taking our first Laplace transform: the transform of the exponential function .

Let’s go through the math. Applying the definition of the transform,

Formally, this integral equals

If , then , and also we won’t divide by zero because for any . With the condition , then, the transform is

The convergence parameter in must be greater than for the integral to exist, which can be satisfied whether or not is positive or negative (whether or not the exponential decreases or increases as time increases).

When , the exponential is decreasing, and the region of convergence looks like the shaded area in Figure 3. Since is positive, is negative, and the region of convergence includes a part of the -plane where and all of the half plane for . In particular, the region of convergence contains the axis, where . This means that the Laplace transform formula is valid if we substitute into (12), which is the Fourier transform of ,

which indeed matches the Fourier transform for the decaying exponential obtained by direct computation of the Fourier transform.

The important point is that if the Laplace transform formula corresponds to a region of convergence that includes the axis, the Fourier transform can easily be determined from the Laplace transform. If the region of convergence does not contain the axis, then the Fourier transform cannot be determined from the Laplace transform. The Fourier transform in such cases does not exist in the normal sense of a function, but may exist if generalized functions such as impulses are permitted. We’ll see examples shortly.

If , then the region of convergence is wholly contained in the right-half plane as illustrated in Figure 4. The axis is not contained in this region, so that the Fourier transform of the increasing exponential does not exist.

What about when ? The function under consideration is , or just the unit-step function itself. The condition on remains, which is , and under this condition

You might recall that the Fourier transform of the unit-step function is not a particularly friendly function,

which invites the question of what is going on in at . Compare that expression to the Laplace transform expression of with . Better! Not your best friend (that is rectangular-pulse BPSK of course), to be sure, but friendly enough.

At this point in the development, we have the Laplace transforms of the unit-step function and the exponential function. We’d like to know a lot more if we want to try to apply the transform to problems involving signals and systems. To do that, we could apply the Laplace integral (5) to each of a number of signals we’ve encountered in the SPTK posts, but it is typically easier to try to be more clever. We’d like to understand how common mathematical operations, such as scaling, differentiation, integration, convolution, multiplication, etc., affect a signal’s transform. Then when we encounter a new signal, we try to express that signal in terms of one or more of these operations on a signal for which we already know the transform.

What is the Laplace transform of the signal , for any complex constant , given that we know ? Since we already know that the Laplace transform is linear, it follows easily that the transform of the scaled signal is the scaled transform,

Suppose we have a differentiable function ,

with Laplace transform . What is ? The transform integral (5) is

We can proceed to evaluate this kind of integral by applying the technique called* integration by parts*.

The first step is crucial: Identify and from the integrand components of the integral to be solved. We’ll make the choice and ,

With this choice for and , we can identify and ,

With this choice, let’s carefully follow the integration-by-parts rule.

If , then as , so that

We adopt the convention of my beloved The Literature [R132], and interpret to be , the value of the function just before zero, to get around certain technical issues involving discontinuities at , such as might occur for certain causal linear time-invariant systems‘ impulse-response functions. So the final answer for the derivative of is

.

As a preview, since we know that , and then , which is consistent with , as we’ve seen before.

It follows immediately that the Laplace transform of the second derivative of () is

Next let’s look at differentiation’s inverse operation: integration. What is the Laplace transform of ?

Let’s again reach for integration by parts. Since the integral of is easy, let’s choose that for ,

Our formula reduces to

If , is finite, and no impulse in at the origin), then the first term on the right in (29) is zero. We are left with

which is satisfying because the effect of differentiation (factor of ) undoes the effect of integration (factor of ). The final result is

As a preview, consider that the unit ramp is the integral of the unit step

What does that imply about ?

Suppose . What is ?

First, let’s rule out because then we don’t have a function of time anymore–we’d be asking about the Laplace transform of , which is the Laplace transform of a constant, which we already know is . But let’s also rule out , because those values of not only compress or expand the time axis, but they swap all the function values for negative time with those for positive time. Yet is itself only on function of for . So we wouldn’t be able to say anything about the relationship between and if . That leaves the still-considerable set of that are real numbers greater than zero.

Let’s proceed by evaluating the Laplace integral (5).

Let’s do a substitution for the variable of integration:

This substitution leads to

The final result is, in our compact notation,

We are interested here in , but let’s guess at the answer first and then work backward to verify. We know that is the transform of the derivative of , so if there is any significant duality between time and complex frequency in the Laplace transform, we might guess that the transform of is the derivative of . And since the Laplace transform and Fourier transform are closely related, and the Fourier transform does possess duality, we have good reason to make this guess. Let’s check.

Apart from the negative sign, the guess is verified.

A delayed version of is with . This delay pushes the signal forward in time (to the right along the time axis). What is the Laplace transform of the delayed signal in terms of the known transform of the original signal ? (You might guess based on the behavior of the Fourier series and transform for delayed signals.)

We have to be a little careful about delaying here because it may be nonzero for negative time, and when , some of the function defined for negative time shifts into positive time, yet none of that portion of was used to find .

So what we want to consider is , ensuring that the function is zero for all negative time, and its delayed version . Otherwise, if we want to deal with the negative-time portion of , we can use the two-sided Laplace transform.

We’ll proceed directly to the definition (5),

We require a change of variables,

Applying this change of variables leads to

Since , and the integrand is zero for , we have

or

If we shift a Laplace transform by some amount , as in , what is the corresponding time function? If we had an easy-to-evaluate inverse transform, we could apply it here. But we’ve avoiding introducing the inverse Laplace transform so far (for good reason), so let’s once again take a guess, and see if that leads to easy analysis.

We know that a shift in frequency for the Fourier transform is a multiplication of the time waveform by a complex exponential,

so we can guess that multiplication of the time waveform by something like will produce a shifted version of the Laplace transform. Let’s work it out.

which implies the desired result

Suppose we have two causal signals and . Then their (normal) convolution is also causal in that

So all three signals are of the usual sort we are dealing with as we study the one-sided Laplace transform.

What is ? Again, we can make a very good guess by reflecting on the convolution theorem: . I’ll leave the proof as an exercise for the interested reader.

Let’s look at the case of a generic periodic signal. Let the period of the signal be . Then the defining characteristic of the periodic signal is that for all real numbers .

Periodic signals can be written in a lot of equivalent ways because all that is required is that the function over some interval is replicated at every other interval . Consider the periodic rectangular pulse train shown in Figure 5.

Recalling that the function is defined as being equal to one on and zero otherwise, it is natural to express in Figure 5 as

But for the Laplace transform we’re considering here, we only care about the function for non-negative times . We can express the signal in a Laplace-transform friendly way by using the Base Period shown in Figure 5,

Then the entire signal can be expressed as

and therefore the positive-time portion of is easily expressed as the truncated sum

To find here, we can invoke the established linearity and time-delay properties of the transform to yield

where .

Now let’s derive some Laplace transforms for some simple signals that we frequently encounter in signal analysis, such as the unit-step function , the ramp function , the trigonometric functions, exponentials with real exponents, exponentials with imaginary exponents (sine waves), the rectangular pulse train, etc.

Let’s start with , where is the impulse function or Dirac’s delta function. Applying the Laplace transform definition directly gives the answer in short order, due to the sifting property of the impulse function and the fact that it integrates to unity,

The unit-step function is zero for all negative time and one for positive time and is variously defined at ,

For , we have

Now, if , then as so that

Alternatively, we can observe that the unit-step function is the integral of the impulse function

and apply the integration formula to obtain

The unit-slope ramp function is defined as

which is also equal to

So we can use the integration rule derived above to immediately find

Alternatively, we can use the multiplication-by- rule above, since we have and we know ,

and

as before in (67).

Here , where is a real number. If , the exponential grows without bound as increases. If , the exponential approaches zero from above as increases. If , we have the unit step function again. Let’s plug this exponential into the Laplace integral and turn the crank,

If , then as , which means we can evaluate the upper and lower limits as

so that

Next let’s consider the exponential with an imaginary exponent, , which is a complex sine wave (use Euler’s Formula). Let’s go through it the same way as for the previous exponential,

If , then as , so that the evaluated integral is

We can observe that the Laplace transform for is the same whether is real or imaginary. Since the region of convergence here does not include , the formula (77) cannot be used to determine the formula for the Fourier transform of the complex exponential, which we know is an impulse function centered at .

Let’s find the Laplace transforms of and .

Here we consider , the real-valued sine wave with frequency (period of assuming ). Since we already know the Laplace transform for the complex sine wave and we know that the real sine wave is easily expressed as the sum of two complex sine waves,

we can apply the linearity property of the transform to quickly obtain the result. We have

Therefore

For , we have at least three options for finding : (1) direct evaluation of the Laplace integral (as we did for ); (2) using the derivative rule since ; (3) using the integration rule since .

To use the derivative rule, which is , we realize that

so

We then have the desired result,

We use periodic pulse trains with various pulse shapes in different parts of signal processing and radio-frequency communication theory and practice. We’ve already encountered rectangular-pulse pulse trains in our study of signals, their representations, the Fourier series, and the Fourier transform. Closer to home, the rectangular-pulse BPSK signal can be viewed as a rectangular pulse train where each pulse is multiplied by, randomly, a or .

So let’s continue with that level of analysis. We’ll first want to know the Laplace transform of a simple positive-time rectangle, as seen in Figure 6.

The transform of in Figure 6 is straightforwardly computed by applying the Laplace integral, but it is convenient to use previously established results. In particular, this rectangle is easily expressed as the difference between two unit-step functions,

Since and , we immediately obtain the result

Each of the transforms of the two unit-step functions implies a region of convergence of . But if we directly apply the transform definition we obtain

and there is no restriction on here, so that the region of convergence includes . Therefore we can check whether this transform reduces to the known Fourier transform of the rectangle when in (92) (or (89)). We obtain

which is indeed the Fourier transform of a -shifted rectangle with width and height .

Recall that the convolution of a rectangle with itself is a triangle. The triangle shown in Figure 7 is in fact the convolution of the rectangle in Figure 6 with itself if in Figure 6 is replaced by . In that case, .

We can write down the equations for the two lines making up the triangle and put that expression in the Laplace integral, or we can write it as the convolution of a rectangle with itself (and a scaling factor) and employ the convolution relation. We have the expression

Since , we have

Now let’s look at the asymmetrical rectangular pulse train shown in Figure 8. Note that this is a shifted (time-delayed) version of the symmetrical pulse train shown in Figure 5.

We can express this as an infinite sum of shifted rectangles,

Now, we know the transform of each and every rectangle in that sum,

Adding them all up yields

What is the region of convergence for this Laplace transform? The region of convergence for each transformed rectangle is the entire complex plane (any value of ), but we are adding up an infinite number of phase-shifted transforms, so the convergence depends on that sum too.

We need to understand the condition on for the infinite sum to converge

Recall the geometric series formula

.

Here For , we require that , and in this case, the transform converges.

Finally, let’s look at the symmetric pulse train shown in Figure 5, and replicated here in Figure 9.

We need to represent the positive-time portion of this function. There are an infinite number of identical rectangles that have centers for and one rectangle with center , width , and height . We can use the function here

The transform follows easily,

The region of convergence is for the same reasons as outlined in the case of the asymmetric rectangular pulse train.

The inverse Laplace transform is not as simple as the inverse Fourier transform, which is itself scarcely different from the forward Fourier transform. Here we must undertake contour integration if we want to directly evaluate the inverse Laplace transform. The formula is

The constant is any real number in the region of convergence. In practice, such as in control theory or lumped-circuit analysis, the direct computation of the inverse Laplace transform is not common. Instead, the Laplace transform expression is manipulated into a form consisting of known transforms and the full inverse transform is then effectively determined by table lookup and combination due to linearity. We’ll see an example of that shortly.

The Laplace transform is most often used in control problems and in analysis of differential equations governing lumped-parameter circuits (resistor/capacitor/inductor) or other dynamical energetic systems. We will soon progress to the *Z transform* in the SPTK posts, which is essentially the Laplace transform for discrete time, and is commonly applied in digital (discrete-time) control and communication-system problems. In those cases, difference equations (rather than differential equations) are of interest and the Z transform is the right tool.

Let’s just give a taste of why the Laplace transform is an excellent tool for solving differential equations. The idea is that complicated differential equations are transformed into relatively simple sets of polynomial equations, which can be more readily solved. The desired time-domain solution can then be had by inverse Laplace transforming the -domain solution.

Consider the second-order differential equation given by

What is , given that we know the four constants and the initial conditions and ? Transforming the equation, we obtain the following function of ,

Gathering terms leads to

where , , and . We can solve for easily using algebra,

We see that is a *rational function*–a fraction with polynomials in the numerator and denominator. We need to express this rational function in terms of the kinds of functions that we already know are Laplace transforms, such as . Fortunately, such rational functions as (111) can be expressed as a sum of simpler rational functions. That is, we can factor the denominator and then express the function as the weighted sum of terms with each factor in the denominator:

where is the degree of . Things get a bit complicated when the are not all unique–let’s assume they are though.

Returning to (111), we seek

For , consider ,

which must be true for , so we have . Similarly, by considering , and evaluating at , we obtain . Finally, .

We can evaluate the inverse transform because we can inverse transform each term in the new expression for ,

Not much!

Previous SPTK Post: MATLAB’s resample.m Next SPTK Post: TBD

]]>My conclusion is that the DeepSig datasets are as flawed as the DeepSig papers–it was the highly flawed nature of the *papers* that got me started down the critical-review path in the first place.

A reader recently alerted me to a change in the Datasets page at deepsig.ai that may indicate they are listening to critics. Let’s take a look and see if there is anything more to say.

Here is the updated page at deepsig.ai/datasets:

We see that there are “known errata” but that the datasets are still available for download, as ever. However, each one is now called a “Historical” dataset. And it is true that those datasets (the final one includes the hoary string ‘2018’) are ancient, old news, superannuated. In fact, they all come from that distant, hazy, innocent era known as “Before GPT,” which we’ll just call BGPT. If there are any old-school researchers that care about BGPT material, DeepSig is kindly keeping the flame alive. Fine.

But … there is no mention of the nature of the errata (errors). Typically people use the word *errata* to denote errors such as omissions or typographical errors rather than major conceptual errors or massive programming errors, or at least they did BGPT. Those latter errors are more clearly referred to as *flaws* and *bugs*, respectively.

The main point is that we get the “mistakes were made” admission but the vibe is “here is the error-filled material anyway, find the mistakes yourself if you care about that sort of historical, merely academic, thing.” Caveat emptor! I wouldn’t, actually, care much about this, except for the fact that lots of people have used this data to make many many many grandiose claims about ML-based modulation-recognition performance as well as relative claims about “the signal-processing state of the art” (about which they know nothing). Remember, this is the sum total of the higher-order moment mathematics put forth in O’Shea’s The Literature [R138]:

Regarding all those learners and their claims, a simple Google Scholar search reveals The Literature [R138] is cited by at least 1078 papers. (I feel like I’ve had to slog through half of those myself.)

So does DeepSig care about those 1078 researchers (really a couple thousand, since hardly any papers are single-author papers)? What about all the other researchers, students, and practicing engineers who read * those* papers and came away with certain rosy conclusions about ML for MR?

Why not just tell us what the errors are?

Where is the link to the “known errata?”

(DeepSig: Feel free to use these: All BPSK Signals, More on DeepSig Datasets, 2018 RML, One Last Time.)

Why are DeepSig’s fellow machine learners being treated this way?

h/t Steve F.

]]>Here we want to look at more conventional forms of FSK. These signal types don’t necessarily have a continuous phase function. They are generally easier to demodulate and are more robust to noise and interference than the more complicated CPM signal types, but generally have much lower spectral efficiency. They are like the rectangular-pulse PSK of the FSK/CPM world. But they are still used.

Three distinct types of frequency-shift-keyed (FSK) signals are

analyzed in this post. The analysis is directed at finding the set of potential cycle frequencies for each type of FSK signal for all orders and conjugation patterns by examining the cyclic temporal moment functions.

The FSK signals analyzed here are not constrained to exhibit a continuous phase function. The three types of signals arise from distinct models for the sequence of phase variables in the generic complex-envelope FSK signal model given by The Literature [R1]

where is a sequence of IID random variables drawn from the ary set .

The first type of FSK signal corresponds to an independent and identically distributed (IID) phase-variable sequence ,

where the distribution is uniform on the interval . Such an FSK signal is known as *incoherent FSK* (IFSK). The second type of FSK signal is known as *carrier-phase-coherent FSK* (CaPC FSK). For CaPC FSK, the phase sequence is dependent on only through the value of the frequency ,

Thus, for CaPC FSK, the signal consists of bursts of randomly selected fixed-phase oscillator outputs. The third type of FSK signal is called *clock-phase coherent FSK* (ClPC FSK), and it is formed by setting the phase of the oscillator to a constant that depends on the transmitted frequency each time that frequency is selected for transmission. Thus, the phase variables are given by

We analyze the three types of FSK separately next.

The complex-envelope of the IFSK signal is given by

where is an IID sequence of continuous random phase-variables with uniform distribution on , and is an IID sequence of equiprobable frequencies drawn from the set of frequencies .

The IFSK signal can be represented as a *random-pulse* complex-valued PAM signal by simple manipulation,

where

The moments of the symbol sequence are nonzero only for , a result that follows easily from the properties of the phase sequence . It is also relatively easy to show that the moments of the symbol sequence are identical to those for .

Because the pulse function and the symbols are both random, the formulas for digital QAM cumulants presented in the DQAM post do not apply. Let’s try to find the moment functions for the signal. The th-order temporal moment function is given by

The th-order moment of the symbols

is a little tricky to evaluate. Let’s express the product as a product of products, each term of which involves one value of . To do this, we employ the notion of partitions once again,

where

is the common value of for each , and no two values of are equal. This notation includes all possible selections of indices for the symbols, from all equal to some index () to all distinct (, ).

Because the symbols are independent, the moment is given by

For each expectation to be nonzero, we require that the order be even and be equal to , where is equal to the number of conjugated factors in the th moment. Thus, we require that . The moment is given, therefore, by the following expression

The remaining analysis does not depend heavily on the particular set of indices that are chosen; a reasonable choice to focus on is the set in which all indices are equal: . If there are ways to partition the indices so that the resulting moment component is nonzero, then the moment function can be represented by the sum over these components,

Let’s assume that corresponds to the case in which all indices are equal and find the corresponding moment component .

So, we are left with evaluating the moment function for the random pulse,

This moment function is relatively easy to evaluate since the number of conjugations is equal to . The result is given by

where

Thus, the component of the moment function corresponding to identical indices is given by

Note that this component is periodic in with period ; all other components possess this property as well. Therefore, the moment function for IFSK is periodic with period and is nonzero only for . It follows that the cumulant function is also periodic with period and is nonzero only for . In conclusion, the cycle frequencies for IFSK are limited to harmonics of the symbol rate for for all even orders , which is the desired result of the analysis.

For carrier-phase coherent FSK (CaPC FSK), the carrier phase variable depends only on the value of and not explicitly on ,

where is equal to the constant whenever , . Thus, this kind of FSK modulator transmits a burst of the output of one of continuously running oscillators with frequencies , during each signaling interval.

We use straightforward analysis to find the temporal moment function for the CaPC FSK signal, which will allow us to determine the largest possible set of moment and cumulant cycle frequencies for the signal. The th-order temporal moment function is given by

which, after some algebraic manipulation, can be expressed as

where

The random quantities are those that involve the random symbols , so that the expectation can be moved inside the sums. However, as we saw in the case of IFSK, the value of the expectation depends on the nature of the indices . For distinct, the expectation simplifies to

Notice that the expectation results in the n-fold product of the sum of sine waves. At the other extreme, the values of the indices are equal, , and the expectation simplifies to

which is the sum of sine waves with frequencies given by . The other possibilities for the indices also result in the presence of additive sine-wave components. In fact, the notion of partitions is again of use here. The expectation yields sine-wave components with frequencies given by

where can be any of the frequencies , and denotes a partition of the index set with elements.

Since the function is periodic in with period for any choice of the indices , the actual set of moment cycle frequencies is given by

This is a large set of cycle frequencies. To demonstrate this, and to corroborate the cycle frequencies with those in The Literature [R1], let us compute the cycle frequencies for order for ary CaPC FSK.

For and , for all partitions, and the cycle frequencies are given by

For , the general formula applies,

Table 1 provides the cycle frequencies as a function of the partitions for the two values of . The derived cycle frequencies herein match those in The Literature [R1] (pgs. 450–451) for the special case in which the numbers are integers (which is the only case of CaPC FSK explicitly considered in [R1]).

Partition Element | |||
---|---|---|---|

{1,2} | 1 | (2,0) | |

{{1}, {2}} | 2 | (2,0) | |

{1,2} | 1 | (2,1) | |

{{1}, {2}} | 2 | (2,1) |

In the third and final type of FSK signal, clock-phase coherent FSK (ClPC FSK),

the phase variable in the generic FSK model,

is reset at the beginning of each signaling interval such that the carrier phase for each transmitted tone is the same whenever that tone is transmitted. In other words, a specific segment of the oscillator output is transmitted each time the symbol is encountered. So, we transmit one of the following functions each signaling interval

for . Our complex-envelope signal then takes the form

which implies that the phase variable in the generic model is given by

The general case provides a little insight. We consider generic ary signaling,

where

The moment function is given by

The value of the expectation will depend on how the indices are chosen, as we have seen in cases of the other two FSK models. Here, however, the conjugation pattern is irrelevant and any choice of indices that does not result in a moment function of zero results in one that is periodic with period . For example, when all the indices are distinct, the expectation is given by (assuming independent symbols)

Thus, the component of the moment function due to distinct values of is given by

which is periodic in with period . All other index conditions can be expressed in terms of partitions of the index set . For each condition, the product of functions can be expressed as a product involving terms associated with a single value of the index. The expectation associated with a particular partition element is given by a product of expectations,

where is the number of elements of , and .

As in the case of distinct indices, each of the expectations in the general case results in a function that is periodic in with period . Therefore, the moment function is a sum of periodic functions, each with period , and is therefore periodic itself with period . Thus, the cycle frequencies are given by

potentially for all orders (not just even orders). The signal will contain discrete components if the average pulse has nonzero mean,

FSK signals exhibit a variety of cycle frequency patterns, that is, a variety of types of cycle frequencies as a function of order and number of conjugations .

For the incoherent FSK (IFSK) signal, the carrier phase is chosen at random for each signaling interval, which results in a random-pulse PAM signal with random complex-valued symbols distributed on the unit circle. The random symbols result in a relative paucity of cycle frequencies: symbol-rate harmonics for .

For the carrier-phase-coherent FSK (CaPC FSK) signal, the carrier phase in each signaling interval is determined by the phase of the chosen oscillator, which is free-running. The cycle frequencies are numerous (even more than for BPSK) and are given by (33) and (34). Examples include multiples of each of the tones, sums and differences of the tones, and these frequencies plus harmonics of the symbol rate. Odd-order cumulants can be nonzero and the location of the maxima of the cyclic cumulant functions depends on the values of the oscillator phases.

For the clock-phase-coherent FSK (ClPC FSK) signal, the carrier phase is reset in each signaling interval such that only one waveform is transmitted per tone; no symbol-generating oscillators are needed to implement this signaling scheme, only stored waveforms are needed. This FSK signaling scheme produces cycle frequencies similar to those for BPSK, except odd-order cyclic cumulants can be nonzero. The general form of the cycle frequency is .

In summary, only the IFSK signal produces a familiar cycle frequency pattern (QPSK-like). The remaining two FSK signal types produce a great many cycle frequencies and, perhaps more importantly, can exhibit nonzero odd-order cumulants.

Here we simulate the three different classes of FSK signals, apply blind cycle-frequency estimation using the SSCA, use the blindly detected cycle frequencies to estimate the corresponding spectral correlation functions, and finally plot these obtained functions in the usual CSP-Blog three-dimensional surface format.

The carrier frequency for all simulated signals is 0.1 (normalized Hz), the symbol rate is for all binary FSK signals (2FSK, ), for all quaternary FSK signals (), and for all FSK signals. The decreasing symbol rate ensures that the signals are adequately sampled with our default sampling rate of one. The signal power is always unity, and the noise power is , or dB. The signal-to-noise ratio is therefore high, which is desired when we are trying to understand the basic cyclostationarity of the signals.

The obtained spectral correlation plots are arranged in videos for convenience.

The three basic types are treated in the following subsections–and there is a bonus movie of the spectral correlation functions for continuous-phase modulation (CPM) as a preview of a future post on CPM and to provide a contrast with the form of the spectral correlation functions for their closely related FSK kin.

For the IFSK signal type, we look at the three values of , which is the number of individual frequencies that are “visited” as the incoming bits are turned into symbols and modulated onto a carrier, but also we vary the separation between those frequencies, which is the common separation between the frequencies . We’ll call that separation . The style of specifying is to report the quotient of the separation and the symbol rate, which leads to the product . The signals are generated for values of in the range .

From the analysis above, we expect to detect non-conjugate cycle frequencies and no conjugate cycle frequencies for IFSK. The obtained spectral correlation surfaces are shown in Video 1.

Note that the basic cycle-frequency pattern of incoherent FSK is more like that for rectangular-pulse QPSK (or, more generally, MPSK with ) than it is like square-root raised-cosine QPSK, which has only a single non-trivial non-conjugate cycle frequency.

For the CaPC FSK signal, we again vary the frequency deviation product between 0.5 and 1.5 and record the results in a video of spectral correlation surfaces, which is shown in Video 2.

Continuing on in the same vein, the video of blindly determined spectral correlation surfaces for clock-phase coherent (ClPC) FSK are shown in Video 3. Like the CaPC FSK signal, and unlike the incoherent FSK signal, the ClPC FSK signal possesses strong conjugate cyclostationarity. Unike the CaPC FSK signal, however, the ClPC FSK signal has a BPSK-like conjugate cycle-frequency pattern (which is simpler in general).

To show, as a preview of a future post on CPM, how different the spectral correlation surfaces for CPM can be compared to the three FSK signal types considered in the bulk of this post, the blindly determined spectral correlation function surfaces for a variety of CPM signals are shown in Video 4.

Here the relevant parameters are the alphabet size of the underlying pulse-amplitude-modulated (PAM) signal (similar to for the FSK signals above), the modulation index , and the response parameter . See My Papers [8] for a precise mathematical definition of CPM (or await the upcoming post), but the modulation index influences how large the swings in frequency are in response to the randomly varying symbols, and the response parameter specifies the temporal duration of the pulse function for the underlying PAM signal, which modulates the phase of the carrier wave.

When the pulse function is a rectangle, the CPM signal is typically referred to as continuous-phase frequency-shift keying (CPFSK), and otherwise it is typically called CPM. However, for the special case of and rectangular pulses, the signal is exactly minimum-shift keying (MSK), and for and Gaussian pulses, the signal is Gaussian MSK (GMSK) as used in GSM for example. The string ‘LRC’ refers to a raised-cosine pulse function.

Things look good, right? I mean, just by eyeballing the surfaces in the videos, and knowing the key parameters of and , we can see that the cycle frequencies are often just simple harmonics of (non-conjugate) or offset harmonics (conjugate). But some of the surfaces are more complex than that.

How can we check our work?

We have three elements that need to cohere. The first is the mathematical models and analysis results, the second is the signal-simulation code, and the third is the cycle-frequency and spectral-correlation estimators. To check things, we need to see evidence that the cycle-frequency formulas match the blindly obtained cycle frequencies for the set of CSP-Blog simulated FSK signals. To do that, I’m going to plot the blindly obtained cycle frequencies on the x-axis and the corresponding maximum spectral correlation magnitude on the y-axis. Then I’ll mark the points on the x-axis that correspond to a numerical evaluation of the obtained cycle-frequency formulas.

A typical example (I’m not going to show them all–too tedious) is shown in the following figures for all three values of for and . **What we look for is the occurrence of a (significant) detected cycle frequency that is not a predicted one.**

In the DeepSig RML datasets (here, here, here, and here), we see a reference to “CPFSK” as one of the included signal types. In The Literature [R187] we see a reference to “2FSK” as one of the included signal types. There are many other examples of this kind of signal description in datasets and in published papers. “We set out to perform automatic modulation recognition of BPSK, QPSK, MSK, and FSK,” or the like. We’ve already criticized the idea that there is just ‘one BPSK signal’ in the All BPSK Signals post. Things appear to be worse with regard to FSK and CPM. There are many choices, and the temporal, spectral, and cyclic properties of the resulting signal depend heavily on these choices. Therefore the adjusted weights in a neural network must be influenced by those properties and choices. A neural network trained on one choice will likely fail when presented with input signals corresponding to a different choice, although both choices are FSK.

Just which FSK or CPM signal are you talking about in your mod-rec work, and why?

]]>The CSP Blog is rolling along here in 2023!

March 2023 broke a record for pageviews in a calendar month with over 7,000 as of this writing early in the day on March 31.

Let’s note some other milestones and introduce a poll.

What a month! We’re at about 7,145 views right now, and the previous monthly record is 6,482.

2023 was the year that a CSP Blog post crossed the 20,000-view milestone: The Spectral Correlation Function. The Cyclic Autocorrelation Function is not far behind.

About 84,000 visitors have been counted over the years since the CSP Blog launched in 2015, with 5,500 this year already. I believe this is just a count of the unique IP addresses that have accessed a page. But the number of subscribers is only 198! You can subscribe (“Follow”) to the CSP Blog by entering an email address in the “Follow Blog via Email” box on the right edge of any viewed page, near the top of the page. You’ll get notified through that email address whenever there is a new post. CSP Blog readers cannot see that email address, just as they cannot see the email address associated with any comment, unless there is an associated gravatar.

I’m planning to have more time available to devote to improving and extending the CSP Blog over the next few months. If you want to have input into that process, consider voting in the poll below.

Thanks so much to all my readers!

]]>When I was a wee engineer, I’d sometimes clash with other engineers that sneered at technical approaches that didn’t set up a linear-algebraic optimization problem as the first step. Never mind that I’ve been relentlessly focused on single-sensor problems, rather than array-processing problems, and so the naturalness of the linear-algebraic mathematical setting was debatable–however there were still ways to fashion matrices and compute those lovely eigenvalues. The real issue wasn’t the dimensionality of the data model, it was that I didn’t have a handy crank I could turn and pop out a provably optimal solution to the posed problem. Therefore I could be safely ignored. And if nobody could actually write down an optimization problem for, say, general radio-frequency scene analysis, then that problem just wasn’t worth pursuing.

Those critical engineers worship at the *altar of optimalit*y. Time for another rant.

It starts early on, in calculus. We learn to find the extremal points of a differentiable function soon after we learn about differentiation. Find the derivative! Set it equal to zero! Use algebra to solve! Lather, rinse, repeat.

Then in engineering courses, we develop physical models that produce functions, and we then apply the calculus tools to find their extremal values. These functions, unlike most in calculus classes, have *meanings* connected to more-or-less real-world problems. Now when we find the maximum of some function, we’re finding the maximum velocity, or when we find the minimum of another function, we’re finding the minimum cost. The optimal cost!

And we keep doing this in school, over and over. What’s the optimal solution to some posed problem? Find the mathematical model, turn the calculus crank, draw a box around the solution.

And sometimes this is actually workable in engineering settings. We may have a subproblem involving, say, finding an optimal filter. That is, we have a good description of the input to the filter (say, a deterministic Fourier-series representation, or a stochastic-process model), some constraint on the form of the filter (say, finite-impulse response), and an idea of the desired output (say, maximum signal-to-noise ratio). Then we may be able to find the best (optimal) such filter by applying our mathematical tools.

Simple systems are often functions of a single variable. To find the best solution, you might need only find the extremal values of such a function, and calculus is your friend.

When the system is even a little bit complex, it is not a function of a single variable. It is often an unknown function of many variables, some of which are known. Then how do we find the optimum solution? In some cases it is possible to write down a cost function, or error expression, or SNR formula, etc., that is a function of all the identified variables. Then it can be extremized (maximized or minimized, as the situation warrants) using mathematics or by numerical evaluation/search. The function one writes down is often called the *objective function. *

In many cases, though, there isn’t a single objective function, or it can’t be determined. So in these cases, is there actually an optimal solution or design? ‘Optimal’ is relative to the chosen objective function, and if there isn’t one, then, well, no there isn’t an optimal answer. What is there, though? Trade-offs. That’s what there is, and that’s what engineering is all about, ‘making trades’ as engineers say.

For example, what is the objective function to maximize for designing an electric car? What is the objective function for a fighter jet? Or a C compiler? What is the optimal car, jet, or compiler?

I’ve always been amused by the BMW advertising tagline: “BMW: The Ultimate Driving Machine.” But when I first heard it, I had already seen BMW models change multiple times in the past, and I knew that for sure there would be another set of *different* models next year. At any given time, though, BMW professes to have created the ultimate car. Er … driving machine. Anyway, for what possible meaning of ‘ultimate’ could this even be remotely true? None. There is no ultimate driving machine. There is no optimal car. There are just a whole bunch of tradeoffs that car designers can make in pursuit of a product that is appealing to enough paying customers that the company can make a profit by selling lots of them.

If BMW succeeded in creating the ultimate (optimal) driving machine, that would be the last model they produced. Why create anymore? You’ve got the ultimate. Ultimate *means* final.

For BMW, the tradeoffs skew toward high horsepower (large values of acceleration) and luxurious interiors (leather, wood paneling, great sound, etc.). But for the Toyota Prius, the tradeoffs are quite different. Gas mileage and low carbon emissions are at the top of this list, not zero-to-sixty in six seconds.

Over the past three decades, I’ve noticed supplicants worshiping at the altar, clutching their likelihood-ratio rosaries and painting the world in drab colors of Guassianity ocher, statistical independence ecru, and textbook-signal gray. They’ll twist their signal models any which way they can to appease the Gods of Optimality, even if the model becomes so divorced from reality or from any actual problem needing solving that it should be discarded like so much dusty doctrine. But! But the Holy Crank *can be turned* and the Optimal Solution can thereby be found, praise be to the Seven.

So we ended up with papers that present some optimal parameter estimator for some textbook signal in white Gaussian noise, and then the next year a paper with the exact same signal setup except now the noise is exponential, and the optimal estimator (slightly different, like the predictions of the Second Coming by two slightly different numerological cranks) presented with a flourish and deep bow. The following year, another sect emerges, redoing the same estimator but now for a mixture model of the noise. And on and on …

For us heathens out in the fields, the optimal estimators and detectors were mostly useless. Not all! The Viterbi Algorithm is not useless. But that’s coding, really, not so much statistical signal processing. Our problem out in the laity was that the data that we use as input to the holy optimals did not quite correspond to the assumptions that let that oh-so-blessed crank turn. So they didn’t work well. Commonly this is referred to as an algorithm being ‘brittle.’ To complicate matters, the optimals were often too computationally expensive to use in many applications (like some CSP, I am oath-bound to admit) and so had to be approximated, sometimes rather roughly.

In modern times, the form of worship has changed. We still want to minimize or maximize *something*, but we don’t want to do it mathematically. We want to get a computer to do it for us. Don’t bother me with mathematical models of signals and systems, for those things are of the world, and we seek otherworldly performance. We gather and present our oblations, dutifully and reverently (if not carefully) labeled and packaged. Then we Run the Machine to minimize the labeling error. And, lo! The error is indeed minimized.

By the munificence of the Gods, the error is minimized whether the labels and oblations are in correspondence or not! And we need not concern ourselves with the details, for the sanctitude of the Machine will prevail.

But, like the pious optimality worshipers of old, the new clergy keep producing Machines that minimize the error but fail when used by the laity out in the field. We have brittleness here too, and we know not how to soften it. We know not how to make general our all-too-specific optimal Machine.

A schism is at hand! On the one side are those that strive mightily to improve, polish, tweak, and tinker with the Machine’s holiest of parts. If only we can read the mind of God, they think, we will find the ultimate machine that will provide the final optimality. On the other are those that want to rend the Machine, tearing at its very fabric and structure, until it becomes general, useful, and tame. Although perhaps no longer provably optimal.

End of Rant.

]]>In this brief Signal Processing Toolkit note, I warn you about relying on resample.m to increase the sampling rate of your data. It works fine a lot of the time, but when the signal has significant energy near the band edges, it does not.

I use MATLAB a lot for the CSP Blog because I think it produces nice graphics. I can make all kinds of plots with fine control over the details, and I can easily make videos (see here and here for examples) that show how functions or estimates change over time or over selections from some parameter set.

I also use MATLAB for my non-CSP-Blog signal-processing day-job work on some projects, but mostly I use C and gnuradio these days.

Sometimes I need to resample a discrete-time signal. Perhaps the signal was provided to me by someone else, or I generated it, or more likely it is derived from some other signal through something like spectral segmentation or other kinds of filtering. In any case, I’ve got some signal sampled at some physical sampling rate and I want to resample it so that the sample rate of the new signal is . One common reason for this is that I am going to subject the signal to a nonlinear operation, which generally increases bandwidth, and also can produce finite-strength additive sine-wave components. I don’t want the frequencies of those sine waves (the cycle frequencies of the signal) to alias, so I have to keep them small by increasing the sampling rate. Physically, the cycle frequencies don’t change! I’m just talking about the smallness relative to the sampling rate.

In MATLAB, I have a variable that is a complex vector having, say, samples. I want a new variable that is also a complex vector, but corresponds to a resampling of such that there are now plausible signal values inserted between each adjacent pair of elements of . This is also called interpolation, which is the opposite of decimation, where we wish to decrease the sampling rate, not increase it.

MATLAB provides a function called resample.m to do both kinds of sample-rate conversion (interpolation and decimation). Here is the usage message you get when you type ‘help resample’ in MATLAB version R2022b Rev 3:

```
>> help resample
resample Resample uniform or nonuniform data to a new fixed rate.
Y = resample(X,P,Q) resamples the values, X, of a uniformly sampled
signal at P/Q times the original sample rate using a polyphase
antialiasing filter. If X is a matrix, then resample treats each
column as an independent channel.
In its filtering process, resample assumes the samples at times before
and after the given samples in X are equal to zero. Thus large
deviations from zero at the end points of the sequence X can cause
inaccuracies in Y at its end points.
[Y,Ty] = resample(X,Tx) resamples the values, X, of a signal sampled at
the instants specified in vector Tx. resample interpolates X linearly
onto a vector of uniformly spaced instants, Ty, with the same endpoints
and number of samples as Tx. Tx may either be a numeric vector
expressed in seconds or a datetime object. NaNs and NaTs (for datetime
objects) are treated as missing data and are ignored.
[Y,Ty] = resample(X,Tx,Fs) uses interpolation and an anti-aliasing
filter to resample the signal at a uniform sample rate, Fs, expressed
in hertz (Hz).
[Y,Ty] = resample(X,Tx,Fs,P,Q) interpolates X to an intermediate
uniform grid with sample rate equal Q*Fs/P and filters the result
using UPFIRDN to upsample by P and downsample by Q. Specify P and Q
so that Q*Fs/P is least twice as large as the highest frequency in the
input signal.
[Y,Ty] = resample(X,Tx,...,METHOD) specifies the interpolation method.
The default is linear interpolation. Available methods are:
'linear' - linear interpolation
'pchip' - shape-preserving piecewise cubic interpolation
'spline' - piecewise cubic spline interpolation
Y = resample(...,P,Q,N) uses a weighted sum of 2*N*max(1,Q/P) samples
of X to compute each sample of Y. The length of the FIR filter
resample applies is proportional to N; by increasing N you will get
better accuracy at the expense of a longer computation time.
resample uses N = 10 by default. If N = 0, resample performs
nearest neighbor interpolation: the output Y(n) is
X(round((n-1)*Q/P)+1), with Y(n) = 0 for round((n-1)*Q/P)+1 > length(X)).
Y = resample(...,P,Q,N,BTA) uses BTA as the BETA design parameter for
the Kaiser window used to design the filter. resample uses BTA = 5 if
you don't specify a value.
Y = resample(...,P,Q,B) uses B to filter X (after upsampling) if B is a
vector of filter coefficients. resample assumes B has odd length and
linear phase when compensating for the filter's delay; for even length
filters, the delay is overcompensated by 1/2 sample. For non-linear
phase filters consider using UPFIRDN.
[Y,B] = resample(X,P,Q,...) returns in B the coefficients of the filter
applied to X during the resampling process (after upsampling).
[Y,Ty,B] = resample(X,Tx,...) returns in B the coefficients of the
filter applied to X during the resampling process (after upsampling).
Y = resample(...,'Dimension',DIM) specifies the dimension, DIM,
along which to resample an N-D input array. If DIM is not specified,
resample operates along the first array dimension with size greater
than 1.
[Y,B] = resample(TT,...) resamples the data in timetable TT and returns
a timetable Y. TT must contain double-precision data and must have at
least two rows. Each variable in TT is treated as an independent
signal. If TT has an N-D array as a variable, then resample operates
along the first dimension. In other words, it treats columns as
channels. The RowTimes in TT may either be a duration vector or a
datetime object with unique and finite values. If RowTimes in TT are
not sorted, then resample sorts them in an ascending order. Non-finite
time values (NaNs and NaTs) are treated as missing data and are
ignored. You can replace X,Tx with a nonuniformly sampled timetable TT
in the above syntaxes that use X and Tx as inputs. You can replace X
with a uniformly sampled timetable TT in all the other syntaxes. Use
isregular to check if TT is uniformly sampled or not.
```

So the basic usage is to select and , the over- and undersampling parameters, such that . If you want a sampling rate that is ten times the input-signal rate, pick and . But you could also pick and .

In this usage message, there is one warning. It concerns the edges of the input . If the signal starts, relative to zero, too large or ends, relative to zero, too large, the endpoints of the output might exhibit larger error than interior points in . There is no warning about the arrangement of the signal’s power as a function of frequency; no warnings about acceptable or problematic functional forms for power spectral densities. But the output is not as desired if the power spectrum has significant values near or

An alternative to MATLAB’s resample.m is to use a form of zero padding to provide linearly interpolated samples between each pair of samples in the input . The recipe is to Fourier transform to obtain for , then prepend and append zeros to to form , and finally to inverse Fourier transform to obtain the resampled signal .

Let’s choose , where the integer . A typical choice might be or . Then the vector is defined by

so that the vector has total length ; it is times as long as and .

You can work out the interpolation effect by writing down the inverse discrete Fourier transform of –but remember this is an inverse transform of length , not .

To see why the FFT/IFFT method works intuitively, consider what the Fourier transform of a narrowband signal looks like. To do that, you need to grasp what *narrowband* means in that last sentence. Narrow relative to what? Relative to the sampling rate. Figure 1 shows a typical narrowband signal. The bandwidth of the signal (however you define it, within reason) is much smaller than the sampling rate of unity.

So if we pretend that our signal is a narrowband signal, we can imagine its Fourier transform will be an energetic central portion, surrounded by zeros on each side. By using this form of zero padding, we’re essentially forcing the transform to have the appearance it would have if it really were narrowband.

I consider two examples. In the first example, both resample.m and CSP-Blog FFT/IFFT work as desired to increase the sampling rate without introducing significant distortion. In the second, only the FFT/IFFT method works well.

The scenario consists of two simulated square-root raised-cosine BPSK signals and a small amount of noise. In the first example, the center frequencies of the two signals are . The symbol rates are both and the excess bandwidths are , so that each signal has an occupied bandwidth of . The number of samples is . A power spectrum estimate of the scene is shown in the upper plot of Figure 2.

The resample.m command is

`outdata = resample(indata, P, Q);`

where and . The FFT/IFFT code is

```
pad_len = (floor((P-Q)/2)*N + N/2);
X = fftshift(fft(indata)).';
Y = [zeros(1, pad_len) X zeros(1, pad_len)];
y = ifft(fftshift(Y));
```

and the results are shown in the lower two plots in Figure 2. Everything is great!

In the next example, all variables are the same as in the first except the carrier frequencies are now instead of . A power spectrum estimate for this scene is shown in the upper plot of Figure 3.

The two resampling attempts use exactly the same code as before, and the results are shown in the lower two plots of Figure 3. In this case, there is extra energy, from aliased components of the signals, spectrally adjacent to the properly resampled signals. Because the power of the signal at is twice that of the signal at , you can see that the energy near the resampled BPSK signal at must be due to the signal at and vice versa.

The resampled signal with the aliases (Figure 3) will not give the same results when CSP estimators and detectors are applied as the original signal or the properly sampled signal (Figure 2). That is, the introduction of the aliased signal components, having significant energy relative to the original signals, will give rise to additional cycle frequencies.

So just be careful with resample.m. MATLAB has other functions that can be used to resample signals, which may or may not work better for the band-edge case. Let me know in the Comments if you have experience getting around the resample.m problem.

Previous SPTK Post: Echo Detection Next SPTK Post: The Laplace Transform

]]>What I didn’t know going in was the level of frustration on the part of the machine-learner organizers regarding the seemingly inability of signal-processing and machine-learning researchers to solve the radio-frequency scene analysis problem once and for all. The basic attitude was ‘if the image-processors can have the AlexNet image-recognition solution, and thereby abandon their decades-long attempt at developing serious mathematics-based image-processing theory and practice, why haven’t we solved the RFSA problem yet?’

The chief organizer went on to provide a potted history of modulation recognition research, name-checking me as having written one of the earliest papers (My Papers [17]) they were able to find. *So, still at it, eh Spooner? Looks like we need some fresh ideas here, cuz you’re not getting it done.*

I didn’t know it beforehand, but at that moment I realized I had sauntered into the lion’s den.

Now, I’m conflating modulation recognition with RFSA here, a bit, because most of the papers and research work over the years have addressed modulation recognition, whereas the challenge at the conference really was RFSA–multiple time-varying signals were to be transmitted across a bandwidth much wider than the bandwidth of any individual transmitted signal (I still have the data). The vast majority of published papers deal with a subproblem in RFSA called modulation recognition. RFSA is more than modulation recognition.

Half the challenge participants were machine learners and half signal processors. The challenge was poorly thought out and run, and there were so many problems with the transmission of the signals that nobody came out looking good. In fact, at conference close, the organizers did not announce a winner, or describe what happened, or mention the challenge at all. The organizers had vastly underestimated the difficulty of even their toy version of RFSA–which was completely under their control! Which brings us to the topic of the post: Why is RF scene analysis so difficult? Is it a *wicked problem*?

We know what RF scene analysis is, sort of, but what is a wicked problem? It is an interesting question because if RFSA is recognized as a peculiar problem, or some member of a known difficult-problem class, it would help people like the conference organizers, as well as all those in the machine-learning cult(ure), understand *why* progress is so slow and painful. And we all might become better researchers and problem-solvers (even us ~~antiquated~~ superannuated signal processors).

So what is a wicked problem? Let’s start where I often start, Wikipedia.

A wicked problem isn’t an evil problem–there’s no moral judgement of the problem’s nature. It is more like it is ‘wicked hard’ as they might say in Boston. The origin of the term and idea is in the area of social problems, the big problems of the world, such as poverty, homelessness, climate change, nuclear proliferation, etc., and in attempts at social (group) solutions to such problems. But it applies to other problems too.

The origin of the term is attributed to a pair of researchers Rittel and Webber (The Literature [R185]). The characteristics of the Rittel and Webber wicked problem are, according to Wikipedia, shown here in Figure 1.

Another researcher, J. Conklin, generalized the idea to problems outside of social and policy problems (The Literature [R186]). According to Wikipedia, Conklin’s defining characteristics are as shown in Figure 2.

The Wikipedia authors then go on to attribute further aspects to wicked problems, which are pertinent here at the CSP Blog. These are listed next:

**The solution depends on how the problem is framed and vice versa (i.e., the problem definition depends on the solution)***Stakeholders have radically different world views and different frames for understanding the problem.**The constraints that the problem is subject to and the resources needed to solve it change over time.**The problem is never solved definitively.*

These problem attributes are somewhat general and also confusing or subject to widely divergent interpretations. We won’t look at every one in detail. One of the key aspects seems to be the inability to clearly define the problem. Another is that there is no real way to figure out when you’ve solved the problem (no ‘stopping rule’). In the next section, we look at some of these key wicked-problem attributes in the context of the RF scene-analysis problem.

**There is no definitive formulation of a wicked problem. The solution depends on how the problem is framed and the problem definition depends on the solution. The problem is not understood until after the formulation of a solution.**

Is there a definitive statement of the RFSA problem? Well, there is my formulation, which is ‘the cocktail party problem for radios.’ This formulation says we need to detect, characterize, geolocate, and classify all man-made RF signals in some particular frequency range. But what does *characterize* mean? What does *all* mean in *all man-made signals*? Does that include innocuous transients like garage-door openers? How weak is too weak to care about? 0 dB? -10 dB? -100 dB? What exactly is the frequency range? If we say there is a BPSK signal in the scene, is that enough? What about ‘a BPSK signal with square-root raised-cosine pulse-shaping function?’ Or ‘a BPSK signal with SRRC pulses and symbol rate of 250 kHz?’ Or ‘a BPSK signal with SRRC pulses, symbol rate of 250 kHz, carrier frequency of 790 MHz, symbol-clock phase of 10 s, carrier phase of 23.4 degrees, Reed-Solomon error-control coding with and , periodically transmitted synchronization burst with OOK modulation, …?’

What *is* the set of all man-made signals? That is actually a moving target since new signals are invented and deployed nearly continuously, and the rate at which that happens has increased since the advent of commodity programmable radios like the Ettus SDRs.

Then there are more recent attempts at moving away from conventional formulations of the modulation-recognition problem and toward the modern problem of generic RF scene analysis such as The Literature [R187], which is rather weird due to the application of image-processing transformations to the RF domain. That is, their conception of RF scenes involves applying typical image-processing transformations to RF-scene spectrograms. The point is that what they think of as an RF scene is markedly different from what I do in my RFSA formulation and RF signal-processing projects.

That the solution modifies the problem is evident as well. ‘*I’m going to use CSP*.’ So I define my RFSA problem as one involving cyclostationary signals–I don’t care about analog single-sideband or analog FM (stationary signals) because I know my solution won’t work there.

Or ‘*I’m going to train a massive convolutional neural network*,’ so I define my RFSA problem as involving extremely short captured-data segments so that I can train in a reasonable time. This formulation completely rules out approaches like CSP and also severely limits the minimum SNR for successful processing because there is little averaging gain.

Or my RFSA problem is related to a fixed band of frequencies such as just the HF band 3-30 MHz (a hard problem in its own right).

So I think this aspect of a wicked problem is exhibited by RFSA.

**Wicked problems have no stopping rule.**

The interpretation of this wicked-problem aspect is that the wicked-problem solution doesn’t know when it has achieved success. For the RFSA problem, this is most clear in terms of minimum achievable SNR. OK, so you can successfully detect, recognize, and characterize signals down to -5 dB inband SNR. Are we done? Well, what about signals at -10 dB? OK, got those. Well, what about -20? Etc. How weak is weak enough? It never stops.

And when do we stop adding signal types to our feature catalogs or to our training datasets? Never. It doesn’t stop because new signals, new variants, and new parameter settings happen continuously as humans attempt to make better and better use of the RF spectrum and also to make heavier and heavier use thereof.

**Solutions to wicked problems are not right or wrong, just better or worse.**

This definitely fits RFSA and modulation-recognition where we must always deal with randomness, noise, and deviations from signal-model ideality. We are forced to evaluate performance in terms of probabilities, such as the probability of detection, probability of false alarm, and probability of correct classification. Moreover, the signal-parameter estimates are almost never ‘right’ (or ‘wrong’, really), they are simply more accurate (better) or less accurate (worse).

**The constraints that the problem is subject to and the resources needed to solve it change over time.**

There are many constraints on the RFSA problem. Some come from the poser, who demands, say, that any answer be produced not more than seconds after the data is supplied to the system. The poser might also put constraints on the minimum values of the performance probabilities.

The world puts time-varying constraints on the problem by constantly introducing new signal types, by crowding frequency bands creating adjacent-channel interference, by shrinking reuse distances creating cochannel interference, but introducing new propagation channels due to new modes of transportation or new kinds of urban areas.

And the constraints imposed by available computing machinery change over time too-often rapidly. By the time you’ve got a solution based on multiple CPUs, signal-processing-ready GPUs are available that enable previously discarded computationally costly approaches tenable. Random-access memory that was expensive becomes cheap. Adequate radio receivers that used to cost a fortune become mass-produced commodities.

**Every wicked problem is essentially novel and unique.**

This aspect is not as easy to analyze for the RFSA problem. A lot of misdirected effort has been expended due to the widespread notion that the object-recognition problem in image processing is highly analogous to the modulation-recognition problem in signal processing. It is hard for me to put my finger on exactly why the analogy isn’t strong, but it has to do, I believe, with how the label is represented in the data.

Consider the image in Figure 3. The image-recognition problem is: What objects are in this image? Or perhaps: Is there a guitar in this image and if so draw a tight box around it. The image can be decomposed into an image with just the guitar pixels having non-zero values and an image with just the non-guitar pixels having non-zero values. The Blue Guitar image is then the sum of these images, which is the sum of a backgroundless guitar image with a background having a guitar-shaped hole in it. So in this sense the object to be recognized is an additive component of the data to be processed.

Our modulation-recognition problem is more like: Here is a vector of complex numbers. What is the modulation type of the signal in the vector? Every element of the vector contributes to the modulation type–there is no subpart of the vector that is the BPSK signal (I always circle back to BPSK on the CSP Blog). In terms of the analogy with the Blue Guitar, the modulation-recognition question is more like: What style of art is represented by this image? Or perhaps: Who painted this image? (Cubism. Picasso.)

Turning to Still Life by Picasso in Figure 4, we see four objects: a pitcher, a candlestick, a pot, and a tabletop. An image-recognition question would be: what objects are in this image? Draw boxes around them and produce their labels. Or: Is there a pot in this image?

The analogous RFSA problem is: What is this scene? And a good answer might be: *Kitchen at night*. Every pixel contributes to the answer, and object-recognition can play a role in helping answer the scene question, but object recognition alone doesn’t answer the holistic question.

So I think the RFSA problem is unique, although perhaps there is some other scene-based problem that is similar. In other writing, I’ve analogized RFSA to alternative kinds of scene analyses, such as acoustic, dramatic, and crime.

I think this lack of a strong analogy between MR/RFSA on one hand and image-recognition on the other is why convolutional neural networks are having a hard time definitively solving the MR/RFSA problem. We need different networks (and I’m actively working on that).

**StakeholdersÂ have radically different frames for understanding the problem.**

My typical frame is that of the Star Trek tricorder (Spock’s, not Bones’). “Tell me everything you can about all radio-frequency signals using a single antenna and radio receiver.” So this is a generic kind of application and the focus is really on technology–how do we supply the answer?

Consumers of modulation recognition and RFSA, however, are often not focussed on the technology, but the utility of the answers. Even within the consumer set of stakeholders, there are wide differences in the frames. Cognitive-radio developers are looking for true spectrum holes and want badly to know about a relative few kinds of signals: Primaries primarily and Secondaries secondarily. Military types want to know about adversarial signals much more than they want to know about neutral or friendly signals, and their interest might dictate a different tradeoff between speed of answer and quality of answer relative to the cognitive-radio types. Other consumer stakeholders just want to know when something weird happens in the radio environment–so you have to rapidly understand the nominal RF scene, and then figure out if some subtle change happens after that, perhaps involving many GHz of bandwidth to search over and monitor closely.

I don’t know if these different frames for the RFSA problem qualify as radical, but they definitely differ and this causes a further splintering of the problem formulation and solution approach.

**The problem is never solved definitively.**

This aspect is somewhat redundant with some of the other aspects of a wicked problem. A completed modulation-recognition or RFSA system becomes obsolete quickly because new signals are deployed continuously and new, faster computational platforms are endlessly spun forth from the semiconductor industry. Additionally, if the deployed system can deal with signals down to, say, -10 dB SNR, well then what about those signals at -11 dB? The solution is never definitive because the signals are always evolving, the resources that can be brought to bear are always improving, there are always radio users that want to evade the latest system, and there are always edge cases that one wishes to do better at.

My answer to the titular question is yes. Fully general RF scene analysis is indeed a wicked problem. At the very least it has a lot of wickedness to it. Let me know if you disagree in the Comments.

]]>In this post, we’ll switch gears a bit and look at the problem of *waveform estimation*. This comes up in two situations for me: single-sensor processing and array (multi-sensor) processing. At some point, I’ll write a post on array processing for waveform estimation (using, say, the SCORE algorithm The Literature [R102]), but here we restrict our attention to the case of waveform estimation using only a single sensor (a single antenna connected to a single receiver). We just have one observed sampled waveform to work with. There are also waveform estimation methods that are multi-sensor but not typically referred to as array processing, such as the *blind source separation problem* in acoustic scene analysis, which is often solved by principal component analysis (PCA), independent component analysis (ICA), and their variants.

The signal model consists of the noisy sum of two or more modulated waveforms that overlap in both time and frequency. If the signals do not overlap in time, then we can separate them by time gating, and if they do not overlap in frequency, we can separate them using linear time-invariant systems (filters).

Relevant FRESH filtering publications include My Papers [45, 46] and The Literature [R6].

Let’s first illustrate the problem using a concrete example.

Our signal model consists of a desired signal , noise , and interference ,

where the interference may be zero (no interference) or the sum of multiple distinct statistically independent interference signals , for . In our motivating example, we use , and the signal of interest and the interferers are all BPSK signals. Estimated power spectra for the individual signals and their noisy sum are shown in Figure 1.

We’ll use the setup in Figure 1 throughout the post to unify the exposition. All frequencies and times in this post are normalized by the sampling rate, as is the custom at the CSP Blog, which is equivalent to using a sampling rate of unity. The parameters of each of the involved signals are shown in Table 1.

Signal Name | Parameter Name | Parameter Value | Parameter Units |
---|---|---|---|

Signal of Interest | Modulation Type | BPSK | |

Power | 0 | dB | |

Bit Rate | 1/5 | Hz | |

Carrier Offset | -0.01 | Hz | |

Pulse Type | SRRC | ||

Excess Bandwidth | 0.5 | ||

Interferer 1 | Modulation Type | BPSK | |

Power | -10 | dB | |

Bit Rate | 1/6 | Hz | |

Carrier Offset | 0.13 | Hz | |

Pulse Type | SRRC | ||

Excess Bandwidth | 0.5 | ||

Interferer 2 | Modulation Type | BPSK | |

Power | 4 | ||

Bit Rate | 1/7 | ||

Carrier Offset | -0.08 | ||

Pulse Type | SRRC | ||

Excess Bandwidth | 0.5 | ||

Noise | Type | AWGN | |

Power | -20 | dB |

The goal of a FRESH filter is to process the received noisy interference-corrupted data and return a signal that has the interference attenuated as much as possible, while maintaining the amplitude of the signal of interest. (We can always reduce the interference to zero by just multiplying the whole input data signal by zero, but that also annihilates the signal of interest.) In particular, if the interference is so bad that we cannot successfully demodulate the signal of interest, can we use a FRESH filter to bring the interference level down enough to permit demodulation?

To measure how much interference and noise is present in the input or output of a FRESH filter, or any other signal-separation system, we will use the normalized mean-squared error (NMSE) expressed in decibels (dB). For the model (1), the NMSE is given by

where is the power (variance) of . The quantity in the square brackets in (2) is the mean (average) squared error between and the desired signal , and the NMSE expresses this as a fraction of the power of . Moreover, we convert that value to decibels by

For our motivating example in Figure 1, the NMSE is about 4.2 dB. That means the ratio of the total power of the interference and noise to the signal power is about 2.6. So think of the NMSE as the reciprocal of an SNR–higher is worse, lower is better. If the NMSE is -10 dB, then the total noise and interference power corrupting the signal is ten times less than the signal power–the SNR is about +10 dB.

That’s the motivating example. How do we separate the signal from the cochannel interference and noise? That depends on what kind of processing we are willing to apply. As usual in such situations, let’s start with a linear time-invariant system–a filter.

One way to approach the signal-separation problem is to specify the form of the processing and then attempt to find the optimal version of that form. When the form is restricted to a linear time-invariant system, or filter, the resulting optimal form is called the Wiener filter, when the criterion for optimality is to minimize the mean-squared error.

The Wiener filter has transfer function given by

where is the cross spectral density between the desired signal and the received signal and is the spectral density of the received signal.

Does the filter in (4) make physical sense? Let’s try to check. Suppose is zero so that we just have . Then because and are uncorrelated, and . The Wiener filter is

For those frequencies for which , the filter transfer function is zero, indicating that the filter should not pass any frequency components that do not contain any of the desired signal–the transfer function is ideally zero. For those frequency components for which , the transfer function is approximately , so that these frequency components are attenuated. For those components such that , the filter is approximately , indicating that the filter should just pass those frequency components mostly unchanged. In other words, the filter attenuates the input in frequency bands where the noise is dominant (in terms of power) and passes unchanged those components where the signal is dominant. Makes sense, right?

Now add back in the interference. Suppose the interference has a bandwidth that is much narrower than the bandwidth of , but is cochannel with . The filter is, in general,

and notice that for those frequencies for which is not zero, the transfer function is attenuated because . Suppose is very large compared to and for some frequency . Then for that frequency, , and the filter severely attenuates the input signal. So we see that the Wiener filter has all-pass filter, notch filter, or bandpass filter characteristics over frequency as needed in response to the spectral characteristics of the noise and interference relative to those for the signal of interest.

But … what about when the functions and are similar? Let’s say that . Then

When the SNR is not low, over those frequencies for which is not small, and we have

So that all frequency components–signal and interferer–are passed without any relative attenuation. That is, the NMSE of the output is the same as the NMSE at the input, neglecting the slight decrease obtained due to filtering the out-of-band noise components.

**When the interferer(s) substantially spectrally overlaps the signal of interest, and the interferer and signal have similar bandwidths and power levels, the optimal linear time-invariant signal-separation system [Wiener filter] is ineffective.**

Let’s get quantitative and apply the Wiener filter to our motivating example in Figure 1. We use the FSM to estimate the power spectra needed to form the filter’s transfer function in (5). To form the impulse-response function for the filter, we simply inverse transform the obtained transfer function and retain only the 128 values in the energetic central portion near . The result is an impulse-response function for a finite impulse response (FIR) filter, which we apply to the data . We then synchronize to the filter output, scale it (why?), and compute the NMSE. The result is an NMSE of -0.2 dB and the power spectrum of the input and output are shown in Figure 2.

The Wiener filter computed by my software, applied to obtain the signal corresponding to the green line in Figure 2, has transfer function shown in Figure 3.

A typical desired output SINR needed for adequate demodulation performance is something around 6 dB. So we see that the Wiener filter cannot separate the signals adequately in this particular case. So we need something else. Since the linear time-invariant system is ineffective, we can consider nonlinear time-invariant systems, nonlinear time-variant systems, or linear time-variant systems. In particular, we can consider linear time-variant systems where the time variation is periodic (surprised?): linear periodically time-variant filtering.

To fully understand FRESH filtering, I recommend starting with The Literature [R6] and W. A. Brown’s doctoral dissertation. I’ll sketch my understanding of linear periodically time-variant (LPTV) filtering and FRESH filtering here, following closely The Literature [R6].

As we know well, a linear time-invariant system is characterized by the impulse-response function , which relates the system input to the system output through the convolution

For a linear time-varying system, the impulse response is now a general function of time and delay, so that is replaced by ,

For a linear *periodically* (or *almost periodically*) time-varying system, the form of that general impulse response is constrained to be representable by a generalized Fourier series,

where the functions are obtained by an infinite-time version of the usual Fourier-series coefficient formula,

Putting (11) together with (10), we have the input-output relation for a general linear periodically time-varying system

Now associate the complex exponential with the input to yield

which is a sum of filtered frequency-shifted versions of the input: FREquency-SHift (FRESH) filtering.

A final wrinkle involves whether the input is real-valued or complex-valued. It turns out we need to filter the complex envelope and its conjugate, unlike the case of linear time-invariant filtering, where we can always just apply a filter to the complex envelope, ignoring its conjugate. For the full story, see Brown’s dissertation. I’ll give a justification here.

Suppose we have real-valued signals and related by a convolution,

How is the analytic signal of related to the analytic signal of and the filter ? The analytic signal for is the positive-frequency portion of the signal, . It can be obtained by convolving with the following impulse response function

That is, using our normal CSP-Blog Fourier transform notation,

Now, what is ? Let’s take it step by step,

We see that the analytic signal for can be obtained by filtering the analytic signal for using the same filter as was applied to the real-valued signals. The complex envelope is a filtered and shifted version of the analytic signal, so we can conclude that filtering a bandpass real-valued signal is equivalent to filtering its complex envelope. You need to make sure you convert the bandpass filter for the real signal (and analytic signal) to a lowpass filter once you have obtained the complex envelope.

In the case of linear time-invariant filtering, then, filtering can be performed using the complex envelope and we’ll get what we would have gotten had we processed the real-valued bandpass signal.

What about when the processing is not linear time-invariant processing? In particular, we know we’re interested in linear time-varying processing. Let’s look at that case now.

The input-output relationship between real-valued and is now given by

which is not a convolution unless as in the previous analysis. Proceeding as before, let’s look at the analytic signal for . What is it in terms of the analytic signal for now that we have a time-varying system?

We need to express in terms of its analytic signal . We start by writing what we know,

where

and is the unit-step function, which is equal to zero for and one otherwise. Because is real-valued, its Fourier transform possesses conjugate symmetry,

and so we can express the transform of as the sum of a positive-frequency term and a negative-frequency term as follows

Here we are neglecting a complication that occurs when is not zero; let’s assume it is, which is reasonable for being an RF (bandpass) signal.

Equation (31) implies the following formulation of the real-valued signal in terms of its analytic signal

because

Substituting our expression involving the analytic signal into (27) gives

or

which means to get the analytic signal for requires that we filter *both* the analytic signal for and its conjugate. It follows that if we are working with complex envelopes in a linear periodically time-varying system (FRESH filter), we need to separately filter the complex envelope and its conjugate and combine the results. The final structure, then, that we are discussing in this post is the FRESH filter for complex signals:

where are the non-conjugate filters, are the non-conjugate frequency shifts, are the conjugate filters, are the conjugate frequency shifts, and and can be infinite.

The signal-processing structure that we want to investigate is (36), the FRESH filter, which is a linear (almost) periodically time-variant system. It accepts a single complex-valued input that contains two or more temporally and spectrally overlapping cyclostationary signals, and aims to produce a single complex-valued output that is a high-SINR version of just one of the signals in the input. That is, it performs signal separation through waveform estimation.

The engineering question is: How do we choose the frequency shifts and and the associated impulse-response functions and ?

To proceed, The Literature [R6] keeps the FRESH structure generic–we don’t specify the shifts, which is equivalent to keeping the kernel generic, and attempt, a la Wiener, to minimize the error between the system output and the desired signal in the input. This leads to a set of constraints on the filters that involve the frequency shifts. I call these the FRESH-filtering design equations, and they are (16a) and (16b) in [R6], and (37)-(38) here:

where ranges from to , and

where ranges from to .

There are a couple things to notice about the design equations and some terminology to introduce to help us understand experimental results (outputs of a software system aimed at implementing (37)-(38)).

First, notice that on the left-hand sides, two kinds of non-conjugate spectral correlation function appear: one that involves differences between the non-conjugate frequency shifts and one that involves differences between the conjugate frequency shifts .

Second, notice that on the left-hand sides, two kinds of conjugate spectral correlation function appear: one that involves and the other that involves . In both cases, the conjugate spectral correlation function is evaluated for a cycle frequency that is a mixture of non-conjugate and conjugate FRESH-filter frequency shifts. We’ll call the first set of cycle frequencies ‘*Mixture 1*‘ and the second ‘*Mixture 2*,’ although they are really the same set, I want to be sure that we are enumerating them properly, and estimating the conjugate spectral correlation functions properly, separately because of the complexity and trickiness of implementing the design equations.

The mathematical problem is to solve (37)-(38) for the and given two sets of frequency shifts and .

A secondary, and vexing, problem is how to choose the frequency-shift sets. The design equations, if solved, will provide the filters and that minimize the mean-squared error between FRESH output and the desired signal, but that minimum error will vary with the selection of the frequency shifts.

There is some optimal set of frequency shifts, possibly infinite in size, that provides the smallest minimum mean-squared error. For this set of frequency shifts, and the filters corresponding to the solution of the design equations for those shifts, the corresponding FRESH filter is called the *cyclic Wiener filter*. For all other sets of shifts, we just have some suboptimal (non-cyclic-Wiener) FRESH filter, which of course may be quite good.

In the remainder of this post, we’ll look at various selections of the frequency shifts for our motivating problem and variants, we’ll discuss how to solve the design equations, and we’ll introduce adaptive FRESH filtering.

The way I solve the design equations (37)-(38) is one frequency at a time using simple linear algebra. We can write (37) more explicitly in terms of the filter transfer functions and , again thinking of the equations as involving only a single value of in the following way

And a similar expression exists for the long form of (38). We can write the two sets of equations as a single set of linear equations in the unknown filter transfer functions by defining the following vectors and matrix,

The vectors and are column vectors and the matrix has dimensions . The design equations are therefore

Therefore we can obtain the filters through

for all frequencies for which the inverse exists. So that’s how we get our FRESH filter parameters: Specify the frequency shifts, solve the design equations (43) through (44) for all frequencies of interest, and the implement the filter using a set of frequency shifters and linear time-invariant filters.

There are some nuances, though, relating to solving the design equations. The equations specify the filter transfer functions in terms of the spectral correlation functions–the limit parameters–rather than in terms of *estimates* of spectral correlation functions. *We will be using estimates here*. The main reason is that using estimates is closer to what we want to do in practice because specifying the limit parameters, while possible in our motivating example because we can numerically evaluate the theoretical spectral-correlation formulas for BPSK, requires that we know the symbol-clock and carrier phases exactly, as well as the pulse-shaping function, power levels, and of course the cycle frequencies.

Another nuance is whether to use the noise-free desired signal when forming the spectral-correlation estimates needed to solve the design equations. Because the spectral correlation function is signal selective, we can estimate every spectral correlation function we need from the observed data except for the PSD . In my implementation I have an option to use for all cross estimates or just for the PSD. Using for the PSD only is practical in the sense that a FRESH-filter user would presumably know the PSD of the desired signal even in the common case where the exact values of are not known (and they are not known because this is a communication signal we’re trying to receive).

One interesting performance metric is the frequency spectrum of the error signal , where is the FRESH-filter output (intended to be a good estimate of the desired signal ). From The Literature [R6], the spectrum is given by

We’ll show plots of the spectrum for along with that for as we progress through some numerical samples. You can compare these error spectra with the filter-output spectra to see where, in frequency, the filter is doing a good or poor job producing the desired signal. For example, for the Wiener filtering shown in Figures 2-3, the error spectrum is shown in Figure 4.

Ok, that was a long setup with a lot of math, rather abstract ideas, and little in the way of graphical aids. But the good thing about having a blog is that you can put as many plots and diagrams and pictures and tables as you want. No reviewers complaining *you gotta make it shorter*! So let’s do some processing.

First let’s look at the motivating example, which has a BPSK desired signal and two cochannel BPSK interferers (see Figure 1 and Table 1 to refresh your memory).

We have to select the frequency shifts and , which will determine and . When we choose and , we get the Wiener filter, as we explained earlier. How should we choose these shifts in a FRESH filter? We want better performance than the Wiener filter can provide. But we don’t know the set of shifts that corresponds to the cyclic Wiener filter, which set may be infinite (especially for infinite-bandwidth signals like the rectangular-pulse PSK/QAM signals). And we don’t know the minimal set of shifts that gives us “very nearly” the same performance as the cyclic-Wiener shift set.

The basic guide to shift selection is to include the signals’ non-conjugate cycle frequencies in the non-conjugate shift set and the conjugate cycle frequencies in the conjugate shift set, because those choices lead to non-trivial design equations. But the shifts can also be linear combinations of the various signals’ cycle frequencies, so where to start?

I’m going to start with an intuitive choice: the three bit rates and zero for the non-conjugate shifts and the three doubled-carrier frequencies for the conjugate shifts. Considering that we could (and perhaps should?) include the negative of the bit rates, the negative doubled carriers, the doubled carriers plus-and-minus the bit rates, etc., this intuitive choice is rather sparse, and we’ll refer to it as Sparse 3A. To be perfectly clear, then, the shifts are

which are all cycle frequencies exhibited by the individual signals.

Let’s start by verifying that the input signal, which is the sum of the three BPSK signals and noise, shown in Figure 1, exhibits the cycle frequencies that we think it should. I apply the SSCA to each individual signal as it is generated and plot the results. The obtained cycle frequencies (again, blindly detected) are shown for the desired signal, interferer 1, and interferer 2 in Figures 5-10. All the cycle frequencies check out, which means that we can use them as frequency shifts in a FRESH filter with confidence.

We choose to solve the design equations using spectral correlation estimates in place of the theoretical spectral correlation functions and these estimates are formed using the frequency-smoothing method applied to 32,768 samples of the observable data and the reference signal . The smoothing window is rectangular and has width equal to 0.02 Hz (the sampling rate is unity here).

The input NMSE is 4.2 dB, and for the Sparse 3A set of frequency shifts, the output NMSE is -8.1 dB, for a total gain of 12.3 dB. The final result, in terms of PSDs, is shown in Figure 11, and in terms of waveforms in Figure 12.

Now let’s look at some of the internals of the process. First is the error spectrum shown in Figure 13. Compare with the error spectrum in Figure 4 for the Wiener filter.

Figures 14 and 15 show the solutions to the design equations in the frequency and time domains. That is, these are the filter transfer-function and impulse-response-function magnitudes obtained by solving the design equations using spectral correlation estimates in place of the theoretical spectral correlation functions.

In the next section of this post, we’ll show that an adaptive version of this filter outperforms the FRESH filter of Figures 14-15. How can that be?

Clearly the filters obtained by solving the FRESH design equations can provide much better signal-separation performance than the Wiener filter, even for frequency-shift selections that are not optimal (non-cyclic-Wiener FRESH filters). In evisaged applications, however, the involved signals may change over time, leading to a discrepancy between the design-equation filters for the original situation and the current situation. One way out of this is to periodically re-solve the design equations, possibly informed by a fresh blind spectral-correlation analysis to see if the cycle frequencies of the interferers have changed.

Another way to deal with changing interference/noise statistics is to make the filter adaptive in the usual least-mean-squares (LMS) sense. Normally one would adapt the filter’s impulse-response values in accordance with an error signal, but here we want to adapt a *set* of filters simultaneously. That is, we want to adapt the impulse responses for each of the filters in the branches of the FRESH filter. So in the usual complex-signal LMS (The Literature [R184]), we use an extended data vector and an extended impulse-response vector instead of the input and a single impulse response.

The extended data vector is simply the concatenation of frequency shifted data vectors with lengths equal to the branch-filter impulse-response functions, which we’ll call . So the input data vector for the LMS adapter is

where the vectors have elements given by

which are just the current frequency-shifted versions of the input data. Similarly, the filter to be adapted is the concatenation of the impulse-response functions from the branch filters, each of which is samples in length.

Let’s revisit the Sparse 3A FRESH filter and the motivating problem with two interferers, but this time after we solve the design equations, we’ll run the extended-vector complex-signal LMS algorithm to adapt the filters over time. The result is an output NMSE of -12.2 dB, for a gain of 4.2 – 12.2 = 16.4 dB, a full 4 dB better than the non-adaptive FRESH filter! The adapted transfer functions and impulse-response functions are shown in Figures 16 and 17, and the output at the end of adaptation is shown in the frequency domain in Figure 18 and in the time domain in Figure 19.

It isn’t clear to me why the adaptive FRESH filter almost always outperforms the FRESH filter implied by the solution to the design equations (using spectral correlation estimates). I can show, however, that the performance of the design-equation FRESH filter depends on the quality of the spectral correlation estimates used to construct the branch filters, which is intuitively reasonable. We’ll have more to say about these aspects of the FRESH-filtering quest in subsequent sections.

The motivating problem is a hard one in that one of the interferers is quite a bit stronger than the desired signal, and all frequency components of the desired signal are corrupted by one or the other of the interferers. To show how excellent performance is possible in other, less difficult but still hard for the Wiener filter problems, let’s set the power level of interferer 1 to zero and find the Wiener filter and look at a FRESH filter.

The spectral situation is now as shown in Figure 18–interferer 1 is gone.

The transfer-function magnitude for the computed Wiener filter is shown in Figure 19. This filter produces an output NMSE of -0.9 dB, which means that the input NMSE was reduced by a total of 4.9 dB. The input and output spectra are shown in Figure 20.

The adaptive Wiener filter for this situation only does slightly better, producing an output NMSE of -1.9 dB, for a total gain of 5.9 dB.

Turning to an adaptive FRESH filter, we use a sparse set of frequency shifts, as before, except we don’t include any of the cycle frequencies associated with interferer 1 since it is absent. The Sparse1_2 frequency shifts are:

For the adaptive Sparse1_2 FRESH filter, the output NMSE is -18.4 dB, for a NMSE gain of 22.4 dB. The input and output spectra are shown in Figure 21 and a segment of the time-domain input and output is shown in Figure 22.

The adapted branch transfer functions for the Sparse1_2 adaptive FRESH filter are shown in Figure 23.

So we conclude that we can, in principle, get very large gains in SNR by applying a FRESH filter when a signal of interest is subject to cochannel interference. But we must be able to find the right branch-filter impulse responses. This isn’t as easy as solving the design equations, as we’ve already seen–the adaptive filter often does better than the design-equation filter. Why? We’ll discuss that in the next section.

In this post, I’ve assumed that I know the cycle frequencies for each of the involved signals: the BPSK signal of interest and the two BPSK interferers. That may indeed be the case in some practical situations. In other situations, we will likely know the cycle frequencies, signal type, and cycle-frequency pattern for our desired signal, but not for the interference. In such cases, if we desired to use the interference cycle frequencies to form FRESH-filter frequency shifts (and the results here show that we do), then we need to blindly estimate the cycle frequencies from received data.

But blindly estimating cycle frequencies from noisy data, and identifying the cycle-frequency patterns exhibited by those cycle frequencies, are key steps in a common topic on the CSP Blog: blind cochannel modulation recognition. So we can do this step, in general, but it may be difficult when the involved signal-to-interference-and-noise ratios are small for some of the signals.

As an illustration, I applied the strip spectral correlation analyzer (SSCA) algorithm to the first 32,768 samples of the generated received data for the motivating example and plotted the results in the following figure:

This illustration, and many other results on the CSP Blog and in the literature, show that it is feasible to estimate the needed cycle frequencies from the data. These cycle frequencies can be used directly as FRESH-filter shifts, or parsed into subgroups with a cycle-frequency pattern recognizer so that the role, or nature, of each one can be determined. Once that is done, it is easier to select subsets of the detected cycle frequencies that are likely to produce good results. For example, if the doubled-carrier for each of the signals is identified from the items in the conjugate list, these three numbers can be used to good effect as frequency shifts in the conjugate branches of a FRESH filter.

Let’s return to the motivating example, where two BPSK signals are cochannel with the desired BPSK signal, as in Figure 1. Here we are going to use yet another set of frequency shifts in the FRESH structure. There will be a single non-conjugate branch with shift 0, and the conjugate branches have the following shifts:

When we go to solve the design equations using 32768 samples of data, as before, we obtain the transfer functions shown in Figure 24.

We observe two very large spikes in the transfer functions of Figure 24. These translate into long tails of the various impulse-response functions, and the output NMSE obtained by using these filters is -0.4 dB, for a gain of 4.2 – (-0.4) = 4.6 dB. If these filters are used to initialize the LMS adaptive FRESH filter, the output NMSE improves to -7.6 dB for a total gain of 11.8 dB.

These spikes in the solutions to the design equations are fairly common, although I was able to avoid them in the various examples I provided up to this point. I also track the condition number of the matrix as the design equations are being solved. For the present case, a zoomed-in version of the plot of the condition number (versus frequency ) and the total energy in the matrix is shown in Figure 25.

So the matrix is close to being singular at/near these two frequencies. Why? Is this an inherent property of the spectral correlation functions and the design equations for this particular example? Is it due to poor estimates?

If we increase the number of samples used to estimate the spectral correlation functions needed to solve the design equations to 131072, we eliminate the spikes and the resulting output NMSE is a much more reassuring -11.0 dB. The transfer functions are shown in Figure 26.

The spikes in the transfer functions cause the error spectrum to be much more energetic, which is consistent with the large NMSEs we see whenever we use such transfer functions in the FRESH filter. The error spectrum for the short-input case of Figure 24 is shown in Figure 27 and for the long-input case of Figure 26 in Figure 28.

The problem is that we would very much like to solve the design equations using a small number of samples, as this might render FRESH filtering more practical. One possible approach is to track the derivative of the condition number and when it exceeds a threshold, instead of accepting the solution to the design equations, perform interpolation using nearby transfer-function values that correspond to smaller condition numbers.

I’ve talked with other researchers that have encountered the spikes in the design-equation-obtained transfer functions, and they have employed completely independent software to implement their system, so I’m confident that this isn’t a peculiar fluke of my own set of software tools.

We see that FRESH filtering is a complicated idea and a complex set of signal-processing software tools are needed to implement it. We have skipped over describing possible practical adaptation methods–perhaps that will appear in a future CSP Blog post. There are many other signal types and signal combinations to study as well.

Here are some questions that I think would be valuable to investigate as a signal-processing researcher.

- For a given signal and interference scenario, how do we pick good frequency shifts? We’d like an algorithm that can find a minimal set of shifts that provides output NMSE to within dB of the cyclic Wiener filter for the scenario.
- What is the best way to solve the design equations using data? Here we’ve used the FSM with a smoothing window width of 2 percent of the sampling rate. Might more sophisticated methods or more complex smoothing windows help reduce the prevalence of the spikes when forming estimates with relatively short input-data blocks?
- What are the fundamental reasons for the occurrence of the transfer-function spikes? Some choices of frequency shifts present no spikes–the condition number of is low and constant across the entire signal band. Others provide multiple spikes.
- Can the transfer-function spikes
*always*be removed or attenuated by simply increasing the time-bandwidth product (resolution product) of the spectral correlation estimates used in the design equations? Are there situations where is singular simply due to the nature of the signal-separation problem and its parameters? - In almost all cases I’ve run, the adaptive FRESH filter produces small NMSE than the corresponding design-equation-solution FRESH filter. Is this
*always*due to the imperfect solution of the design equations implied by the use of spectral correlation estimates instead of theoretical spectral correlation function values?

If you’ve found significant value in this post, please consider donating to the CSP Blog to keep it ad-free and keep the posts and datasets coming.

]]>We’ll call it CSPB.ML.2023.

Let’s start by describing the dataset in detail. Then I’ll present my current processing results as a performance target.

I created 60,000 IQ data files that contain a single signal in white Gaussian noise and 60,000 IQ data files that contain two signals in white Gaussian noise. For the single-signal files, the spectral density of the noise is always one (0 dB), and for the two-signal files, it is always two (3 dB). This latter fact arises because the way the two-signal files are created is by adding two of the single-signal files together. Since the noises are always independent file-to-file, the noises add on a power basis, and we get twice the noise spectral density. All data files have length 262,144 samples.

I’m not trying to make these datasets realistic in terms of some specific receiver noise floor. In fact, lots of realistic things are ignored. In my various machine-learner challenge datasets, the idea isn’t to solve the general modulation-recognition problem, it is to explore the strengths and weaknesses of the machine-learning optimization techniques because those strengths and weaknesses are not studied carefully by actual machine learners in favor of just pushing hype and false claims of superiority. (Sorry, but it’s true dudes.)

The dataset employs eight signals, like the original Challenge datasets, but here the excess bandwidth is fixed at 0.35 for the PSK and QAM signals, and is whatever it is for the MSK and GMSK signals. The eight signals are BPSK, QPSK, 8PSK, 16QAM, 64QAM, SQPSK, MSK, and GMSK. The randomized parameters are the symbol rate, carrier frequency offset, and SNR.

The symbol rates range from about 0.1 Hz to 0.6 Hz, the carrier frequency offsets range from -0.2 Hz to 0.2 Hz, and the signal powers range from 2 dB to 20 dB. The distributions of these parameters are not uniform.

The sampling rate is set equal to unity; we (I) don’t care about the physical setup. We are trying to see how well-trained neural networks perform on the dataset and how well they generalize–I’ll be posting a generalized (slightly) dataset eventually.

Internally at the CSP Blog, this dataset is called “PSK Mixtures,” and so the involved data files have names like ‘psk_mixture_1200.tim.’ And if you see the acronym ‘PM’ related to CSPB.ML.2023, it stands for PSK Mixtures. You can read these binary data files into MATLAB using read_binary.m. Even easier, perhaps, is to realize that the data-file format of these .tim files is nearly identical to the Ettus SDR binary float data-file format, with the exception that the first two records of a .tim file are two integers: a 1 or 2 to indicate real or complex data, and the number of samples in the data file.

The original motivation for this dataset comes from a project I was working on a couple years back. In that project, there was an RF situation where multiple cochannel signals could be received and we desired to successfully demodulate one of them in the presence of the other(s). That is, we had to perform signal separation. But to do that, we needed to assess the situation: How many signals are present, what are their parameters, and what are their types? So this is a bit closer to RF Scene Analysis than a typical machine-learning modulation-recognition paper usually gets (they just output labels). But it isn’t a made-up problem or situation, it really happens sometimes.

The truth parameters for the single-signal files are contained in a simple text file called PM_single_truth_10000.txt. The first few entries are shown in Figure 1.

There is some redundancy in the file. The first three fields indicate the index of the data file. For Index_1, the corresponding data file in the dataset is psk_mixtures_1.tim. The number after Index_ in the truth files is always the index embedded in the data-file name. The fourth field is the symbol rate, the fifth is the carrier frequency offset, the sixth indicates the modulation variant, the seventh is the modulation type, and the last parameter is the signal power in dB.

The seventh field is the modulation-type parameter, which can be 1 for PSK, 2 for QAM, or 3 for staggered modulation (MSK, GMSK, SQPSK).

This indicates the variant within the modulation type. For modulation-type 1 (PSK), the value of the modulation-variant parameter is the number of bits per symbol. The same is true for modulation-type 2 (QAM). For modulation-type 3 (staggered), if the variant is 1, the signal is SRRC SQPSK, if the variant is 2, the signal is MSK, and if it is 3, the signal is GMSK.

Mod-Type | Mod-Variant | Signal |

1 | 1 | BPSK |

1 | 2 | QPSK (4QAM) |

1 | 3 | 8PSK |

2 | 2 | 4QAM (QPSK) |

2 | 4 | 16QAM |

2 | 6 | 64QAM |

3 | 1 | SQPSK |

3 | 2 | MSK |

3 | 3 | GMSK |

The two-signal files are created by simply adding together each successive pair of single-signal files. The first two-signal file is for index 60001 (so it would be psk_mixtures_60001.tim) and is just the sum of the single-signal data files corresponding to Index_1 and Index_2 in Figure 1. The metadata for each of the involved signals is copied over to the corresponding line in the two-signal truth file PM_two_truth_10000.txt. See Figure 2 for a snippet of the full file.

To make concrete what kinds of signal scenarios we’re talking about in the PSK Mixtures dataset, I plotted the PSD for the first 100 single-signal files and also the first 100 two-signal files in Video 1. Of particular note is that sometimes the two signals completely overlap spectrally and sometimes they do not overlap at all.

I’ve applied my CSP-based modulation-recognition system (My Papers [25,26,28,43]) to a tenth of the dataset–I processed every tenth file of the 120,000 files. (Actually there are only 119,999 files because there are only 59,999 of the two-signal files.)

There are several ways to assess whether the signal-processing algorithm has detected and successfully processed the signal in the data file. The first is to determine whether the produced symbol rate matches the true one. The produced estimate must be within Hz of the true value, where is the number of processed samples. This is approximately the native cycle-frequency resolution of all cycle-frequency estimators. Another metric is how well the produced carrier frequency offset matches the truth. For all except 8PSK, the matching criterion is the same as for the rate. For 8PSK, we cannot hope to estimate the frequency offset that accurately, since we’d have to use at least 8th-order moments or cumulants, and here we restrict our attention to the joint use of cumulants of orders 2, 4, and 6. We can also compare, of course, the modulation-type labels and the produced SNR with the true SNR.

The single-signal results are shown in Figure 3. “Correct Const” means that the modulation-variant parameter estimate matches the truth, and “Correct Mod” means the modulation-type parameter estimate matches the truth. For the SNR to be correct requires it is within 2 dB of the truth.

The two-signal results are shown in Figure 4. Here the problem is much harder, but the performance monotonically improves with increasing processing block length. The achievable performance is not well understood. I made no attempt to limit the range of signal-to-interference (SIR) values experienced by either signal–I just add successive pairs of the single-signal data files together. So that means sometimes there will be a low SIR indeed, when one of the signals has power at the high end of the range (20 dB) and the other at the low end (2 dB).

It would be great to compare machine-learning and CSP processing on this (and all) datasets in a uniform way. In particular, compare the modulation-class label performance AND compare the parameter-estimation performance. That was, in fact, the driving motivation behind the original Challenge dataset, but nobody has yet provided me anything from a neural network except the modulation-class labels. In many real-world applications, though, the parameters that define the signal are also of high interest–not just the label.

I’ve put the first 10,000 of the 60,000 single-signal files on the CSP Blog in 10 batches of about 2 GB each. I’ve also put the first 10,000 two-signal files on the Blog. This chunking and partial-posting is necessary because WordPress.com limits an individual file to 2 GB, and also if I tried to upload all 120,000 files I’d run out of storage space. I’m at the maximum you can get from WordPress, and I asked specifically for more and said I’d pay, but no luck. Maybe 10,000 of each type of PSK mixture will be enough to get people started?

Comments, errors, compliments, and humor are welcome in the Comments section below. Especially errors!

]]>Of course I have to see what ChatGPT has to say about CSP. Including definitions, which I don’t expect it to get too wrong, and code for estimators, which I expect it to get very wrong.

Let’s take a look.

A vexing problem I’m having is that every single time I login to OpenAI on a computer, “ChatGPT is a capacity right now.” And every single time I login on my phone, I can interact with it no problem. But capturing the output from the phone and bringing it to the CSP Blog is not as convenient as using Chrome for both.

Anyway, here is a transcribed interaction:

**CSP Blog**: What is the spectral correlation function?

**ChatGPT**: The spectral correlation function (SCF) is a mathematical function that describes the correlation between two signals or signal components in the frequency domain. It is often used in signal processing and communication systems to analyze the frequency-dependent characteristics of a signal, such as its bandwidth and power spectral density [1]. The SCF is typically computed by taking the Fourier transform of the cross-correlation function of the two signals, and can be used to identify the presence of certain frequency components in a signal, and their relative strengths [2].

**CSP Blog**: Define the cyclic temporal cumulant function

**ChatGPT**: The cyclic temporal cumulant function is a mathematical function that describes the cyclic temporal structure of a signal [3]. It is related to the cyclic autocorrelation function, which describes the cyclic correlation between different points in a signal, but instead of measuring correlation, it measures cumulants. Cumulants are a measure of the deviation of a signal from a normal distribution and are used to describe the signal’s statistical properties.

The cyclic temporal cumulant function is typically calculated by taking the Fourier transform of the cyclic autocorrelation function of the signal [4], and it can be used to identify the presence of certain cyclic patterns in a signal [5], and their relative strengths. It is often used in signal processing and communication systems to analyze the time-dependent characteristics of a signal, such as its periodicity [6] and frequency content [7].

**CSP Blog**: Define the cyclic autocorrelation function

**ChatGPT**: The cyclic autocorrelation function is a mathematical function that describes the cyclic correlation between different points in a signal. It is used to measure the similarity between a signal and a version of itself that has been shifted in time [8].

The cyclic autocorrelation function is calculated by taking the inner product of a signal with itself at different time lags [9], and it is often represented as a function of the time lag. It can be used to identify the presence of certain cyclic patterns in a signal, and their relative strengths. For example, a signal with a strong cyclic pattern will have a high cyclic autocorrelation at the time lag corresponding to the period of the pattern [10].

It is often used in signal processing and communication systems to analyze the time-dependent characteristics of a signal, such as its periodicity [11] and frequency content [12]. It is also used in cyclostationary signal processing for detecting the presence of cyclostationary signals and for estimating the parameters of the cyclostationary signals.

**CSP Blog**: Write code to estimate the spectral correlation function

**ChatGPT**: Here is some sample Python code to estimate the spectral correlation function of a signal using the Welch method [13]:

```
import numpy as np
from scipy.signal import welch
# Input signal
x = ...
# Window function
window = ...
# Number of samples in each segment
nperseg = ...
# Number of points to use in the FFT
nfft = ...
# Overlap between segments
noverlap = ...
# Estimate the power spectral density
f, Pxx = welch(x, window=window, nperseg=nperseg, nfft=nfft, noverlap=noverlap)
# Estimate the spectral correlation function
Rxx = np.real(np.fft.ifft(Pxx))
In this example, 'x' is the input signal, 'window' is the window function to use (e.g. 'hanning', 'hamming', etc.), 'nperseg' is the number of samples in each segment, 'nfft' is the number of points to use in the FFT, and 'noverlap' is the number of samples that the segments should overlap. The function 'welch' is used to estimate the power spectral density of the signal. The power spectral density estimate is then converted to the spectral correlation function using the inverse Fourier transform.
```

**CSP Blog**: Write code to estimate the cyclic temporal cumulant function

**ChatGPT**: I’m sorry, I am a text-based AI and I cannot execute code. However, I can explain how the code for estimating the cyclic temporal cumulant function might look like in a language like python. Here’s an example of how the code might look like [14]:

```
import numpy as np
def
cyclic_temporal_cumulant(x, k):
"""
Estimate the kth order cyclic temporal cumulant of a signal x.
"""
N = len(x)
X = np.fft.fft(x)
R = np.real(X * np.conj(X)) / N
c = np.fft.ifft(R)
return c[k]
x = ... # input signal
k = ... # order of cumulant
c = cyclic_temporal_cumulant(x, k)
```

This code defines a function ‘cyclic_temporal_cumulant’ that takes two inputs: the signal ‘x’ and the order of cumulant ‘k’. The function first calculates the Fourier transform of the signal using the ‘fft’ function from the NumPy library, and then calculates the magnitude-squared of the Fourier transform, which is the cyclic autocorrelation function of the signal [15]. Finally, it takes the inverse Fourier transform of the cyclic autocorrelation function to get the kth order cyclic temporal cumulant [16].

It’s important to note that this is just a simple example and it might not work well for some signals, you should use appropriate windowing, check the signal’s length, and decide on the order of cumulant based on the signal’s properties.

**CSP Blog**: Write code for estimating the cyclic polyspectrum

**ChatGPT**: I’m sorry … [Same caveat as above]

```
import numpy as np
def cyclic_polyspectrum (x, k):
"""
Estimate the kth order cyclic polyspectrum of a signal x.
"""
N = len(x)
X = np.fft.fft(x)
R = np.real(X * np.conj(X)) / N
c = np.fft.ifft(R)
c_cumulant = np.zeros(n, dtype=complex)
c_cumulant[k] = c[k]
for i in range(i, k):
c_cumulant[i] = c_cumulant[i-1] - c[i]
return np.fft.fft(c_cumulant)
x = ... # input signal
k = ... # order of cumulant
P = cyclic_polyspectrum(x, k)
```

This code defines a function ‘cyclic_polyspectrum’ that takes two inputs: the signal ‘x’ and the order of cumulant ‘k’. The function first calculates the Fourier transform of the signal using the ‘fft’ function from the NumPy library, and then calculates the magnitude-squared of the Fourier transform, which is the cyclic autocorrelation function of the signal. Then it takes the inverse Fourier transform of the cyclic autocorrelation function to get the kth order cyclic cumulant of the signal. Then it calculates the kth order cyclic cumulant by subtracting the cumulant of lower order, and finally, it takes the Fourier transform of the cumulant to get the kth order cyclic polyspectrum of the signal [17].

Some thoughts.

First, I’m in zero danger of being out of a job due to large language models (LLMs) like ChatGPT. Yay!

Second, almost everything in ChatGPT’s responses above is wrong, misleading, or so confused and jumbled up that it is *not even wrong*. But–silver-lining alert–you can use the posts on this website, the CSP Blog, to find all the errors. Just please don’t use code generated by ChatGPT and then ask me to help you debug it. Better to start from scratch on your own. And looking at ChatGPT’s code makes me dizzy.

Third, the responses sound eerily familiar. These kinds of mixed-up not-even-wrong mathematical incoherencies are cropping up more and more often in the papers I review for journals, review for the CSP blog, and read in publications. And that’s where ChatGPT gets its training data: the world of human-generated stuff. So maybe there is a circling-the-drain vicious circle here: Bad papers get published, LLMs trawl the internet and find them, bad researchers use ChatGPT and publish, and the cycle continues.

Finally, why do we want this stuff? Suppose ChatGPT gave flawless answers and code. Why do we want to offload our intellectual efforts, pursuits, failures, and successes to a machine? I can understand a *better* tool: A more efficient mousetrap, fishing net, radio antenna. I can understand a different *kind* of tool: smoke signals, then pony express, then telegraph, then telephone, then WiFi. But those things enable us to work better, live better, be healthier, communicate better, etc. How will a *complete* offloading of our efforts, our struggles to create and understand, to a machine help us? Probably a better question is: Why do the engineers creating ChatGPT and Dall-E want this stuff?

After the post is up for a while, I’ll go back and highlight the howlers.

[1] The spectral correlation function is not used to measure the bandwidth of a signal; in fact you’d go quite wrong if you tried. The width of the symbol-rate spectral correlation function for a low-excess-bandwidth QAM/PSK signal is a small fraction of the actual occupied bandwidth. And saying that the spectral correlation function is used to ‘analyze the power spectral density’ is not actually, technically, wrong, since the power spectrum is a slice of the non-conjugate spectral correlation function, it is just misleading.

[2] Suppose I gave you the correlation between two random variables and . Could you then tell me the relative strengths (say, the individual standard deviations, or just their individual sample values) from that number? Again, this is power-spectrum-y talk mixed with spectral-correlation-y terms. We’ll see that this is a common theme in Chat-GTP’s responses.

[3] A near tautology, except for the fuzziness of the phrase “*cyclic temporal structure*.” Obvious bot is obvious.

[4] “*The cyclic temporal cumulant function is typically calculated by taking the Fourier transform of the cyclic autocorrelation function of the signal*” And here is where we get into the harm of Chat-GPT and its ilk. Stated confidently and oh-so-knowingly. You know, it is *typically* calculated like that, but I’m not going to bore you with all the other ways … And of course it is wrong in the extreme. Not only is it not typically calculated by Fourier transforming the cyclic autocorrelation, it is never calculated by doing that. See the cyclic cumulant and cyclic-cumulant estimation posts for the real story.

[5] What are “*cyclic patterns in the signal*“?

[6] “*It *[cyclic cumulant]* is often used in signal processing and communication systems to analyze the time-dependent characteristics of a signal, such as its periodicity*” Well, no, it is not. At all. In fact, a properly calculated cyclic temporal cumulant of order two or greater reflects those statistical characteristics of the signal that do not depend on any periodic component!

[7] “*It *[cyclic cumulant]* is often used in signal processing and communication systems to analyze the time-dependent characteristics of a signal, such as its periodicity and frequency content*” Frequency content? In a temporal parameter?

[8] “*The cyclic autocorrelation function is a mathematical function that describes the cyclic correlation between different points in a signal. It is used to measure the similarity between a signal and a version of itself that has been shifted in time*.” That’s the normal autocorrelation function. The cyclic autocorrelation can be interpreted as measuring the similarity between a signal and a version of the signal that has been shifted in time and subjected to a frequency shift. Or the similarity between two time- and frequency-shifted versions of a signal.

[9] “*The cyclic autocorrelation function is calculated by taking the inner product of a signal with itself at different time lags*” See [8].

[10] “*a signal with a strong cyclic pattern will have a high cyclic autocorrelation at the time lag corresponding to the period of the pattern*” Well, no, not necessarily. In fact, for the CSP-Blog mascot signal, the rectangular-pulse BPSK signal, the cyclic autocorrelation function is associated with the period equal to the bit duration (reciprocal of the bit rate), and the cyclic autocorrelation slices for all cycle frequencies (including zero!) is equal to zero for a lag equal to the bit duration! Same is true for any rectangular-pulse PSK or QAM signal. See the cyclic autocorrelation gallery post for lots of examples. Plenty of real-world signals also disobey this Chat-GPT hallucination:

Here the fundamental period of the time-varying autocorrelation is 1/[1.23 MHz] = 0.81 s.

[11] “*It *[cyclic autocorrelation]* is often used in signal processing and communication systems to analyze the time-dependent characteristics of a signal, such as its periodicity*” See [6].

[12] “*It *[cyclic autocorrelation]* is often used in signal processing and communication systems to analyze the time-dependent characteristics of a signal, such as its periodicity [11] and frequency content*” See [7].

[13] The Welch method refers to a spectrum estimator–an estimator of the PSD. The generalization of the Welch method to the spectral correlation function is what I call the time-smoothing method. The provided code does this: Estimate the PSD using the Welch method, inverse transform the PSD estimate to yield the autocorrelation, then declare that the autocorrelation is the spectral correlation function estimate. Simple! Won’t somebody please–please!–think of the children?? They will learn the wrong things. So wrong. And then us AI skeptics and holdouts will have to help them unlearn. Thanks OpenAI!

[14] The proffered code does this: compute the Fourier transform of the input, form the periodogram, inverse Fourier transform the periodogram to form an estimate of the autocorrelation, then declare that the element of the autocorrelation function estimate is actually the th-order cyclic cumulant estimate! Oh the humanity! Or utter lack thereof.

[15] “*and then calculates the magnitude-squared of the Fourier transform, which is the cyclic autocorrelation function of the signal*” Well, no, not at all. The (scaled) magnitude-squared of the Fourier transform is the periodogram.

[16] “*Finally, it takes the inverse Fourier transform of the cyclic autocorrelation function to get the kth order cyclic temporal cumulant*” See [14].

[17] Too bizarre to untangle. This is not even in the same universe as the actual cyclic polyspectrum.

What kills me is the way they’ve set up Chat-GPT to always sound so utterly confident and sure. Why can’t it just say “I don’t know” instead of vomiting up a bunch of crap and then acting like it is bored even having to tell you these obvious things?

This AI tool will harm the education of young people.

]]>