Before we translate the Laplace transform from continuous time to discrete time, deriving the Z transform, let’s take a step back and look at practical filters in continuous time. *Practical* here stands in opposition to *ideal* as in the ideal lowpass, highpass, and bandpass filters we studied earlier in the SPTK thread.

Jump straight to Significance of Practical Filters in CSP.

Ideal filters are linear time-invariant systems with frequency-response (transfer) functions that are piecewise constant. That is, the transfer functions of ideal filters, , are composed of one or more rectangles. Taking some figures from the post on ideal filters, the ideal lowpass, bandpass, and highpass filters have transfer functions shown by the (a) subplots in Figures 1, 2, and 3 of the present post.

The ideal filters are ideal in the sense that they perfectly select and reject frequency components–a frequency component of the input signal is either passed (appears in the output) with a scale factor or it is perfectly rejected (absent in the output) when .

However, ideal filters are unrealizable, which means that they cannot be constructed using physical elements such as resistors, capacitors, and inductors–the ‘elements’ in the *lumped-element systems* such as simple passive circuits (no transistors). The basic reason they cannot be built in the real world is that they are non-causal. That is, for a non-causal system, the impulse-response function (inverse transform of the transfer function ) is non-zero for . This means that the filter must combine inputs from the past *and the future* to produce the output at the present time, which is impossible.

In the physical world, we construct time-invariant systems (filters) using various elements, as mentioned above, and the resulting time-domain behavior (such as the output signal given some input signal) can be described in terms of differential equations, as we touched on in the SPTK post on the Laplace transform. The order of the differential equation, which is the value of the highest-order derivative in the equation, determines the complexity of the system. More complex (higher-order) systems can produce more complex transfer functions, and therefore may more closely approximate ideal filters.

In this post, we’ll take a look at first- and second-order linear time-invariant systems that have input-output relations described by linear differential equations. We call such systems *practical filters*. It might be helpful to point out that the practical filters discussed here are quite general in that the very same equations model electrical-engineering systems like lumped circuits *and* mechanical systems involving elements such as springs, masses, and dashpots. I’m sure they describe other physical systems too. So we’re not *just* doing signal processing in the electrical-engineering context here.

These filters are governed by the simple first-order linear differential equation

where is interpreted as the filter input, is the filter output, and is a constant with significance that will become apparent as we develop solutions to the equation.

As usual, we’d like to find the impulse-response function and the transfer function for this system, but to do that we should make sure that (1) really does correspond to a linear time-invariant system, for only such systems have well-defined impulse responses and transfer functions of the sort we’ve been developing and using in the SPTK series of CSP Blog posts.

Suppose the pair of signals obeys (1) with and , and the same for . Then we have the two equations given by

Then if we simply add these two equations together, we obtain another equation, which is given by

Then because the derivative is linear, we have

which implies that the output corresponds to the input , and therefore that the output for the sum of two arbitrary inputs is the sum of the outputs for those inputs, establishing linearity of the underlying system that gives rise to the differential equation.

Since (1) is valid for any time , choose , which shows that a delayed (or advanced) input gives rise to the output , establishing time-invariance.

A typical way to proceed with solving such differential equations is through transform techniques. This simply means applying a well-defined and well-behaved transformation (operation) to both sides of the equation and then using algebra to solve for the desired quantity, say the impulse response or transfer function. The transform can be the Fourier or Laplace transform or, as we’re leading up to in this part of the SPTK sequence, the Z transform for discrete-time systems.

Let’s carefully apply the Fourier transform to each side of (1), keeping in mind our usual notation that links the time and frequency domains, such as and . The analysis looks like this

where we took advantage of a result we derived previously, which is .

If , we can rearrange this equation to yield an expression for the transfer function ,

Recall from (13) in the Laplace Transform post that the Fourier transform of the causal decaying exponential is a simple rational function in

We can write the transfer function in the appropriate form with a little algebra

and therefore by inspection, we have an expression for the impulse-response function for the first-order practical filter,

We have found expressions for the impulse-response function and the transfer function. We’ll want to plot these and investigate their behavior as a function of , but first let’s obtain one more important function: the system response to a unit-step-function input, more commonly known as the *step response*.

We have an impulse response in (11) and an input of interest, . We can employ the input-output relation for a linear time-invariant system, which is that the output is the convolution of the input with the impulse response ,

This convolution is only non-zero, potentially, for because is zero for over the region (see Figure 4.)

Keeping in mind that the solution to (14) involves , we can evaluate the integral easily,

Notice that for any , as , the step response approaches one, and so eventually the response to the step function mirrors the step function. (What kind of filter is that?)

Notice that the transfer function cannot be zero and is well-behaved for all frequencies . This is because the magnitude of the denominator is

which can’t be zero for any combination of real and . The maximum of the transfer function is at , because the maximum will correspond to the minimum of the denominator, and elementary calculus tells us that is at . So the transfer function peaks at . How fast does it decay as we increase ?

Let’s look at a related function, which is the squared magnitude of expressed in decibels

This function simplifies to

for which it is easy to fill out an approximate table of values using ,

Plots of for three values of are shown in Figure 5.

From the table or the plot, we can see that the transfer function decays to about half its peak at the frequency and to about one-tenth of its peak at the frequency . Unlike the ideal filters, where the bandwidth of the filter can be unambiguously determined by the width of an appropriate rectangle (see Figures 1–3), practical filters have transfer functions that smoothly vary and are lump- or bump-like, in a manner highly reminiscent of the power spectra for communication signals. When we developed the sampling theorem, we encountered the problem of specifying the “maximum frequency” or the “bandwidth” of a signal with a smooth lump-like spectrum, and that led to the realization that there is no unambiguous or always-preferred measure for the bandwidth of a real-world signal.

In the case of practical filters, we have the same problem as with signals–how should we think about, specify, or constrain the bandwidth of the passband or stopband of a practical filter? Well, we generally do the same thing as for signals. So here, for the first-order filters, we can characterize the filter in terms of its 3-dB bandwidth (see Figure 5), its 10-dB bandwidth (see Table 1), a 20-dB bandwidth, a 99% bandwidth, etc.

Since the bandwidth of the filter, however we define bandwidth, is clearly a function of , we see that the parameter in the original differential equation controls the bandwidth of the equivalent filter, and that that filter is a lowpass filter.

The impulse-response function is given by (11) and is a simple decaying exponential function. For , the value of the impulse response is . So by , the response has decayed to about one-twentieth of its peak at . Figure 6 shows plots of for the same three values of used in Figure 5.

The energy of the impulse-response function on is

The energy is for and for ,

So determines the energy in the impulse response and the function is well-approximated by restricting it to the interval . Note that even the interval is a good approximation,

We can, therefore, interpret as a *time constant* for the system, which means that, to a good approximation, fluctuations in the input signal occurring over a time interval of about seconds are combined to yield an output, but values of the input that are separated by are not. This behavior is consistent with our interpretation of a lowpass filter (which the first-order system definitely is, see Figure 5) as a kind of moving-average filter.

The step response (19) is plotted for the three values of interest in Figure 7. We note that the response achieves about 2/3 of its final value of one by , and after seconds, the response is within about $10%$ of its final value. These empirical facts simply cement the idea of as a time constant controlling the temporal (and therefore spectral) behavior of the system outputs.

Before changing topics, let’s take a quick look at the phase of . Recall that for a filter to provide a delayed version of its input, the phase of the transfer function must be a linear function over the passband(s) of the filter. (This arises from consideration of the transfer function of a pure-delay system .) The phase is shown in Figure 8, along with vertical dotted lines that remind us of the 3-dB bandwidths of the three filters.

Let’s take the next step forward and increase the order of the governing differential equation by one, yielding the second-order filters,

where and .

I’ll leave it to you to check whether the effective system defined by input and output is linear and time-invariant. I’ll proceed as if it is.

We know that the derivatives of are related to the transform of by the following relations

so that Fourier transforming both sides of (28) is simple,

which immediately allows us to solve for the transfer function (yay transform techniques!)

To find the impulse-response function we need to inverse transform the transfer function (32). We begin by reexpressing the rational function by using $latex $g = 2\pi f$ and some algebra to yield

If we can factor the denominator we can then use the known Fourier transform of to inverse-transform the result. Applying the trusty quadratic formula, we obtain

Using more elementary algebra we obtain results for both and ,

and

Now we can use our knowledge of Fourier transform pairs to find expressions for the impulse-response functions for and . Recall the basic transform pairs relating to the kinds of simple rational functions in (37) and (38) are

This immediately leads to the following result for ,

For , both roots and are real because . The factorization of can be written as

with . The inverse transform follows easily

Finally, for , the two roots are and , so similar algebra and use of the frequency-shifting property of the Fourier transform

leads to the result

For the step response, we’ll forgo the derivation of the formula and just convolve the impulse response with a step function in MATLAB.

Like the first-order filters, the practical second-order filters that arise from differential equations that relate system input to system output are lowpass filters. The combination of and determine the bandwidth of the lowpass filter (however you define it). For fixed , increasing increases the bandwidth, and for fixed , increasing decreases the bandwidth. These trends, and the general shapes of the obtainable transfer functions are shown in Figure 9.

The filters have approximately linear phase across their passbands as illustrated in Figure 10. This means that the input signal is shaped by the transfer function, selecting largely the signal’s frequency components near zero frequency, but overall the signal is also delayed–distinct frequency components are delayed appropriately so that the overall signal is not significantly distorted beyond the selection aspect.

The impulse-response functions corresponding to the transfer functions in Figures 9 and 10 are shown in Figure 11. Compare these to the impulse-response functions for the first-order filters in Figure 6. These functions are more complicated, indicating that they can ‘do more’ than their first-order cousins.

Turning to the step response, we obtain the plots shown in Figure 12. Several common filtering and control-system terms are typically introduced with this kind of plot (see also Figure 13). One is *overshoot*, which quantifies the maximum amount that the step response exceeds its eventual long-term value (here unity); *settling time*, which quantifies the time needed for the step response to achieve some small error relative to its long-term value; and *ringing*, which describes the oscillatory nature of the response as it moves towards the settling time.

Finally, it is worth comparing these practical-filter functions with those for an ideal filter. In Figure 14 I’ve plotted the transfer function, impulse-response function, and step-response function for an ideal lowpass filter with bandwidth equal to the 10-dB bandwidth of the second-order filter with and in Figure 9. As we studied in the post on ideal filters, to make such filters causal requires truncation and delay of the impulse-response function, which is clearly non-zero for . Truncating to and shifting by leads to the step response in the lower plot of Figure 14. We see the ringing, overshoot, and settling, but we have to live with the large delay as well. The practical filters do not have that latter feature.

The practical filters we’ve looked at here in this SPTK post are a first step away from the ideal filters we used to introduce the basic concepts and functions involved in linear time-invariant system analysis (filtering). There are many many more kinds of practical filters that allow all kinds of engineering tradeoffs between complexity (how hard it is to do the convolution) and performance (passing frequency components of interest and attenuating those not of interest). Examples are Butterworth, Chebeyshev, Gaussian, and others.

Our trajectory in the SPTK thread is to move toward digital filters and the mathematics that pertains thereto. We are now in a position to introduce a major analysis tool for digital filters called the Z transform.

Not much! But we are building up to digital filters, which include the ubiquitous finite-impulse-response (FIR) and infinite-impulse-response (IIR) filters.

One could use a filter of the sort we’ve studied here in an algorithm such as the frequency-smoothing method for spectral correlation estimation, instead of a simple rectangular moving-average filter. The effect on the resulting estimate would likely be salubrious, but the computational cost would be much increased, as the moving-average filter (smoother) is extremely cheap to implement.

Previous SPTK Post: The Laplace Transform Next SPTK Post: TBD

]]>Previously in My Papers [50-52, 54] we have shown that the (multitudinous!) neural networks in the literature that use I/Q data as input and perform modulation recognition (output a modulation-class label) are highly brittle. That is, they minimize the classification error, they converge, but they don’t *generalize*. A trained neural network generalizes well if it can maintain high classification performance even if some of the probability density functions for the data’s random variables differ from the training inputs (in the lab) relative to the application inputs (in the field). The problem is also called the *dataset-shift* problem or the *domain-adaptation* problem. Generalization is my preferred term because it is simpler and has a strong connection to the human equivalent: we can quite easily generalize our observations and conclusions from one dataset to another without massive retraining of our neural noggins. We can find the cat in the image even if it is upside-down and colored like a giraffe.

Since the unfortunate paper The Literature [R138], our research program has taken the following form:

- Are the RML datasets of high quality? Do they span a reasonable subset of digital modulation parameters? (Answers: No. See here, here, here, here and here.)
- Can a typical convolutional neural network outperform my CSP-based carrier-frequency-offset estimator? (Answer: No attempt I’ve seen comes close.)
- Can a typical convolutional neural network outperform a CSP-based modulation recognizer on the CSPB.ML.2018 and CSPB.ML.2022 datasets? (Answer: No CNN has, but capsule networks can.)
- Can a CNN or capsule network match the generalization ability of a CSP-based modulation recognizer using, say, CSPB.ML.2018 and CSPB.ML.2022? (Answer: With IQ inputs, no. With cyclic-cumulant inputs, yes.)
- Can we create a new type of neural network, with new types of layers, that can take IQ inputs and yet deliver the performance and generalization of the cyclic-cumulant-trained capsule networks? (Answer: As of Snoap’s MILCOM ’23 paper My Papers [55], and upcoming journal paper, yes.)

In other words, don’t use RML datasets, don’t use convolutional neural networks borrowed directly from image-processing successes, and don’t forget to include serious generalization tests in your machine-learning modulation-recognition work. And we’re bringing the receipts.

Here are some My Papers [55] teasers.

Here is the all-important Figure 1:

We use CSPB.ML.2018 and CSPB.ML.2022 to assess both classification performance and generalization ability. Recall that CSPB.ML.2022 is nearly identical to CSPB.ML.2018–the main difference is that the signals’ carrier-frequency offset parameters are governed by two different and non-overlapping uniform distributions. This gives rise to the following “trained on X, tested on Y” probability-of-correct-classification plots:

Now, since the submission of My Papers [55], we have made substantial progress on refining the novel-layer capsule networks. I don’t want to excerpt from that nearly complete, but not yet submitted, paper, but I can provide this basic view of the results:

Inference Method | Trained On | Tested On | Classification Performance | Generalization Performance |
---|---|---|---|---|

CSP Blog CSP | 2018 | 2022 | 0.8 | High |

CSP Blog CSP | 2022 | 2018 | 0.8 | High |

IP Cap NN w/IQ | 2018 | 2022 | 0.4 | Low |

IP Cap NN w/IQ | 2022 | 2018 | 0.6 | Med |

IP Cap NN w/CC | 2018 | 2022 | 0.9 | High |

IP Cap NN w/CC | 2022 | 2018 | 0.9 | High |

New Cap NN w/IQ | 2018 | 2022 | 0.9 | High |

New Cap NN w/IQ | 2022 | 2018 | 0.9 | High |

Why do the networks with the novel nonlinear layers outperform the image-processing networks, which largely feature convolution layers, *when IQ data is at the network input*? I think it is because the IQ data is not amenable to edge detection, and things like edge detection are the forte of convolutions. In fact, convolutional neural networks were inspired by the eye-brain system, which is well-known for its ability to recognize images quickly and efficiently. See for example The Literature [R191], which tries to explain how the convolutional neural networks came about in an engineering-history sense:

Turning to our IQ data from various radio signals, do we think the eye-brain model is appropriate or useful? Let’s take a look, literally. In Figure 4 I’ve plotted the IQ samples for three different digital QAM signals, each of which has eight points in their constellation: -DQPSK, punctured-square 8QAM, and 8APSK.

Compare those IQ plots with plots of the higher-order cyclic cumulants for the three signals (8QAM2 is another name for 8APSK), visualized in the style of the recent cyclic-cumulant gallery post, in Figures 5-7.

It is pretty easy to see the difference and tell which is which, just from looking at the pattern. So a neural network that is designed to ‘look’ like us will have no trouble either, and that is why we see such good classification performance **and** good generalization for the cyclic-cumulant-trained image-processing capsule networks.

Now, when Snoap uses his novel-layer IQ-input network, it doesn’t get fed the patterns in Figures 5-7. Instead, we force it to ‘see’ some *proxies* for those theoretical (and beautiful) patterns in Figures 5-7. In particular, we force it to see the Fourier transforms of the IQ samples raised to the powers of two, four, six, and eight. These contain sine waves related to the cyclic cumulants corresponding to . For our three eight-point constellations, these cyclic-cumulant proxies are shown in Figure 8 for . Again, our eye-brain system can easily distinguish these patterns–and so can the new novel-layer capsule networks.

The patterns in Figures 5-7 won’t change if we change the symbol rate or carrier offset, and they don’t change for different bit/symbol sequences, provided that they adhere to the independent and identically distributed assumption. The patterns in Figure 8 will change, somewhat, with changes in symbol rate and carrier offset. The spikes will move around, but their basic shapes–the relationships between the different spikes–will not change.

There is no escape from domain expertise. Maybe neural networks will be the basis for lots of our RF modulation-recognition tasks in the future, maybe not. But we can’t ignore the fundamental nature of the data we wish to classify and expect to do well no matter what approach we take.

]]>When we look at the spectral correlation or cyclic autocorrelation surfaces for a variety of communication signal types, we learn that the cycle-frequency patterns exhibited by modulated signals are many and varied, and we get a feeling for how those variations look (see also the Desultory CSP posts). Nevertheless, there are large equivalence classes in terms of spectral correlation. That simply means that a large number of distinct modulation types map to the exact same second-order statistics, and therefore to the exact same spectral correlation and cyclic autocorrelation surfaces. The gallery of cyclic cumulants will reveal, in an easy-to-view way, that many of these equivalence classes are removed once we consider, jointly, both second- and higher-order statistics.

We develop cyclic cumulants from random-process theory in the cyclic temporal cumulants post. Let’s review the basic formula for an th-order cyclic cumulant before turning to the main topic of this post, which is the visualization of cyclic cumulants.

A cyclic cumulant is expressible in terms of cyclic moments, and both cyclic cumulants and cyclic moments are intimately related to the various th-order joint probability density functions of a cyclostationary random process or time-series. We can sidestep the density functions by just focusing on the moments and cumulants.

The most familiar (to signal processors) moment is the second-order moment

for some complex-valued signal . For a stationary signal, this moment is not a function of , and it is usually renamed to the autocorrelation function. For a cyclostationary signal, this moment is periodic (or almost periodic) in , and is expressible as a generalized Fourier series, the Fourier frequencies of which are the cycle frequencies, and the Fourier amplitudes are the cyclic autocorrelation functions.

To generalize the second-order moment to arbitrary orders , we consider a slightly more general version of (1),

then we also realize that the moment changes if we delete the conjugation on the second factor or introduce one to the first factor. So we consider conjugations by using an optional conjugation notation as in

Now we can more easily generalize this second-order moment to orders by introducing an -vector of delays as in

Here the variable denotes the number of optional conjugations that are selected–it must range from zero to . However, as I’ve noted elsewhere, this notation is not satisfactory because it does not specify exactly which factors in the product are conjugated. To remedy this, in the post on symmetries of higher-order cyclic functions, I introduce a binary vector that has elements equal to one for the indices of that are conjugated and elements equal to zero for those indices corresponding to unconjugated factors. So for we get the non-conjugate cyclic autocorrelation with , the conjugated non-conjugate cyclic autocorrelation with , the conjugate cyclic autocorrelation with , and the conjugated conjugate cyclic autocorrelation with .

Now for cyclostationary communication signals , the moments (4) are periodic for some (usually all) choices of even , and so we can express those moments in Fourier series

or if you like with the added specification of ,

The cyclic cumulant is a specific nonlinear function of cyclic moments. This follows directly from the well-known relationship between the th-order cumulant of a set of random variables and all the lower-order joint moments of subsets of those random variables (see the cyclic cumulant post for details). The cyclic cumulant is given by the following expression involving a sum of products of cyclic moments, where each product of cyclic moments has a set of cycle frequencies that sum to the cyclic-cumulant cycle frequency,

In this post, then, what we want to do is compute and display, somehow, the multidimensional function for a wide variety of communication signals for which these cumulants are not zero. And that is potentially helpful to us visually oriented humans because these cyclic cumulant functions are highly useful as features in signal-processing-based and machine-learning-based modulation-recognition algorithms and systems (My Papers [17,25,26,28,30,38,43,50,51,52,54,55]).

So how should we visualize these complicated multidimensional functions? One way I’ve tried in the past is to plot two-dimensional slices of the function and arrange them in a sequence indexed by the values of a third dimension, then show the sequence in a video. I do this in the cyclic-cumulants and higher-order symmetries posts. That works OK for fourth-order cumulants, but the dimensionality of sixth- and higher-order functions would lead to some very long videos indeed. In the fourth-order case, I can set , leading to the reduced-dimension cyclic temporal cumulant, fix , the cycle frequency, and the conjugation configuration , and then make plots of the cyclic-cumulant magnitude as a function of and . But for sixth-order cumulants, I’ve got too many delays to deal with for that approach to be viable.

Another possibility is to simply arrange all the cyclic cumulants in a giant matrix and plot that matrix using, say MATLAB’s imagesc.m. Let’s look at that for rectangular-pulse BPSK. Here the sampling rate is unity (as usual), the bit rate is 1/8, and the carrier offset is zero. I use 32,768 samples to estimate each cyclic cumulant (32,768/8 = 4,096 bits), delays ranging from -8 to 8, and cycle frequencies . Note that this set of cycle frequencies is sufficient to cover all BPSK cycle frequencies because .

I’ll consider cyclic-cumulant orders of 2 and 4 separately. The basic reason I don’t plot them all on one set of axes is that the dimensionalities of the two cyclic cumulant functions differ due to the delay vector. For , it is simply and for it is .

Let’s first look at the cyclic cumulants for the ideal BPSK signal, and then we’ll take a quick look at the same functions for that BPSK signal after it experiences a simple multipath-channel impulse response.

All of the non-zero second-order () cyclic cumulant magnitudes and phases are shown in Figures 1 and 2, respectively. To check my work, I plot a column of the matrix in Figure 1 in Figure 3. I chose the column corresponding to and , which is column 23. This column corresponds to the conventional autocorrelation, and we know that the autocorrelation for our rectangular-pulse BPSK signal is a triangle with width equal to twice the symbol interval, or 16. The height should be just greater than one, since the BPSK signal has unit power and there is a small amount of noise present. And from Figure 3 this all checks out.

Turning to fourth-order, the same three kinds of plots that we see in Figures 1-3 are shown for in Figures 4-6. Now you’ve literally ‘seen’ a higher-order cyclic cumulant! (I’ve posted the jpg images and the corresponding MATLAB .fig files on the Downloads page.) To check my work, I plot the 53rd column of the magnitude matrix in Figure 6 (see caption).

The second- and fourth-order cyclic-cumulant functions for the ideal rectangular-pulse BPSK signal are estimated for a filtered version of the signal and shown in Figures 7-12. The filter is a four-ray multipath channel with delays 0.0, 2.0, 3.3, and 5.5 samples and associated complex-valued gains 1.0, -0.4+0.25i, 0.2-0.2i, and -0.1-0.07i.

So we can see that this method of visualization has a drawback–too much information is plotted. We can’t make much sense out of the fourth-order plots, so you can imagine that we’d have lots of trouble with the sixth- and eighth-order plots.

Fortunately there is another way.

The important, at least to me, aspect of the multidimensional cyclic-cumulant function is what I call the *cycle-frequency pattern*. We know that cycle frequencies depend on the order , the number of conjugated factors , the signal’s probability structure (for example, the moments and cumulants of the symbol random variable), the pulse-shaping function, and even the particular values of the delay vector (for example, consider rectangular-pulse signals and delays all equal to zero). For the broad class of digital PSK and QAM signals (and SQPSK and CPM/CPFSK), the cycle-frequency pattern can be discerned from a low-dimensional cut or slice of the full cyclic temporal cumulant function. In particular, we can set and not render any cycle frequencies invisible (unlike for rectangular-pulse signals). So the cycle-frequency pattern is then some function

And for almost all of these types of signals, that function is

However, the cyclic-cumulant magnitude might be zero for some or even many of these cycle frequencies–the set of the cycle-frequencies corresponding to all of the non-zero cyclic-cumulant magnitudes is the cycle-frequency pattern.

So an alternative visualization approach is to set the delay vector equal to the origin and find the cyclic-cumulant magnitudes and phases for all the cycle frequencies in (9), up to some maximum desired order, and plot those values. This gives rise to the style of cyclic-cumulant visualizations that form the banner of the CSP Blog–they are what I consider the fingerprint of a communication signal (I was using that term before the Machine Learners got their hands on it, sigh).

Of course there are complications even with this drastically dimension-reduced view of the cyclic cumulants. The main one is that typically the cyclic-cumulant magnitudes grow rapidly with , and so the dynamic range of the plot, so to speak, is high. The larger values (for larger ) will tend to dominate the plot. So the first thing we can do is raise each cyclic cumulant magnitude to the power . This puts all the cyclic cumulant magnitudes on a equal footing with the power (). The second thing we can do is plot the base-10 logarithm of that warped cyclic-cumulant magnitude. These steps render the cycle-frequency pattern easy to visually grasp–and that’s why we’re here after all.

Putting it all together results in the gallery of cyclic-cumulant features shown in Video 1.

The constellation definitions are shown in Figure 13.

Regarding the CPM and CPFSK feature plots, CPFSK is a CPM signal with a rectangular pulse. All other CPM signals are called CPM, and the pulse type is indicated by a component of the name string. CPM signals can be partial-response signals or full-response signals, depending on whether the frequency pulse extends over more than one symbol interval (partial) or not (full). The underlying pulse-amplitude-modulated signal (that forms part of the argument of the carrier cosine) can have binary or larger alphabets. The alphabet size is also embedded in the filename. So, for example, the feature with title CPMLRC-0.3-3 is a CPM signal with raised-cosine pulses (‘LRC’), modulation index of 0.3, and the pulse spans three symbol intervals.

]]>With the splashy entrance of large-language models like ChatGPT into everyday life and into virtually all aspects of science, engineering, and education, we all want to know how our jobs and careers could be affected by widespread use of artificial intelligence constructs like ChatGPT, Dall-E, and Midjourney. In this interview with a couple of my favorite engineers, I get a feel for how non-AI researchers and developers think about the coming changes, and of course how they view the hype, distortions, and fabrications surrounding predictions of those changes. You can find photos of the interviewees and brief biographies at the end of the post.

The interview transcript is ~~carefully contrived~~ lightly edited for ~~believability~~ clarity.

**CS: **Welcome to the CSP Blog interview Dan and Eunice! I’m thankful you said yes to my invitation to talk with me about the future of signal processing, and whether or not signal processors (the humans not the machines) are needed any longer now that we have ChatGPT and the like.

**EA**: Thanks for having us Chad! Nice to be with you, even if virtually. I hope we can shed some light on the topic.

**DP**: Yes, thanks Chad. Great to be here. You know Eunice and I like to talk about engineering, science, and of course signal processing. We’re ready to go!

**CS**: Well, I want to talk about the future of human signal processors, and in particular about the likelihood of AI constructs taking over our jobs and relegating us to, I don’t know, IT or something. But first I want to see how AI, things like ChatGPT and Dall-E, are being used by you and people in your labs. After all, if it doesn’t help us now, maybe we’re in good shape for some time to come. So, do you use AI in your engineering or engineering management work?

**EA**: I don’t, not much anyway. I have tried a couple times to use ChatGPT to sort of get me started on some new signal-processing algorithm, but the answers I get back are so simplistic most of the time, and, uh, I guess some of the time they have errors that I have to find and try to correct. So in the end I just walk away from the thing shaking my head. Let me give you an example. I was trying to do some signal-separation work and so I tried to short-cut use of the internet, libraries, pencil-and-paper, white-boarding, etc., so I asked ChatGPT to come up with an algorithm for separating two signals in noise. It went something like this:

**EA**: ChatGPT came back with this first response:

**DP**: Wait, Eunice didn’t you tell ChatGPT that the signals were cochannel? Maybe it just doesn’t know that word?

**EA**: Well that would be surprising and very bad for OpenAI. They boast about the massive training. Also, if ChatGPT doesn’t know a word (like maybe somebody just horribly misspells something) shouldn’t it just say so?

**CS**: Agreed. But here ChatGPT just seems to ignore key elements of your request–the correlation between most of your words and this response was just too high for it to ignore. I guess. Maybe. Possibly. Who knows?

**EA**: So, yeah, this kind of response is actually a way to train the humans to become what is now known as a prompt engineer.

**CS**: I was looking around the web for the accepted requiremens for becoming a prompt engineer. I don’t think you need a college degree in engineering, but a BS in computer science, data science, programming, etc. would be helpful. So it appears to be one of those cases where a high-status word is borrowed from one domain to bolster the look in another–in this case ‘engineering.’

**DP**: Like ‘political science’ I guess. Or the classic ‘sanitation engineer.’

**EA**: Yeah. But it’s minor.

**CS**: Sure, minor. And irritating.

**EA**: So, of course, the weird ChatGPT prompt response prompts in me the desire to provide a refined prompt back to ChatGPT, kind of like talking to a small child. And so *I* am being trained. I promptly came back with

and ChatGPT cheerfully (as always) replied with

But the independent component analysis (ICA) method of signal separation requires that you have multiple receivers (typically microphones in the common audio-signal processing application). And I recall that I specifically tried to head this off by stating in the original prompt “I have only one data record.” So that’s two things that ChatGPT completely ignored–it is overwhelmed by spurious correlations I guess.

**DP**: I’m thinking that you next refined the prompt. ChatGPT is sculpting you into a prompt engineer for sure. I’m starting to get worried about you Eunice.

**EA**: Me too. But, yeah, of course I couldn’t let it go. Here is my next prompt:

and I got this response back:

**CS**: Well, that’s news to me! I hadn’t come across this non-negative matrix stuff before.

**EA**: Yes, me either, and for good reason. It doesn’t apply.

**DP**: It seems like ChatGPT is sort of chasing its tail. This is kind of a new phenomenon: a correlation-seeking treadmill or mobius strip. I’m starting to think that you can refine your prompts forever, Eunice, and not really get any closer to a solution. I mean, why doesn’t ChatGPT just bring up FRESH filtering? Isn’t that what you intended in the first place?

**EA**: Yes, that is an appropriate solution. I suppose ChatGPT just doesn’t know about it yet, or the correlation between the prompts and what it does know is too low. Not enough FRESH-filtering mentions or descriptions in the training corpus? Anyway, I did continue a little longer:

and so like Dan said, we’re moving in circles. Circling the drain? I dunno.

**CS**: Thanks Eunice. I’m wondering how common that experience is. I think a lot of such experiences don’t bubble up to the top of someone’s Google feed or get written up, breathlessly, on some technophile’s blog. They just get forgotten. Let’s turn to Dan. What do you think Dan?

**DP**: Yeah, I have had similar experiences. Just a lot of useless responses that show that the system does not understand the question and has a highly limited–with respect to nominally competent humans–view of what a solution should look like. A lot of times a solution to some posed problem is just a wrapper around a function or process that mysteriously encapsulates the solution. Like, I ask for a solution to the problem of *framistaning a thingamajig* and I get back some code that sets up, eventually, a call to some function called framistan_the_thingamajig(). Helpful, and right on the money!

But my biggest problem with the LLMs like ChatGPT is that they claim to be able to do things that require separating truth from falsity, but they can’t actually do that. At all. Like debugging. Debugging some signal-processing or mathematical-analysis code. Even super simple stuff seems well beyond the debugging capabilities of the LLMs–I’m not sure why people even mention this as a possible use. For example, I took some of the code available on the CSP Blog … and there’s not much …

**CS**: Yeah, that’s my approach–the CSP Blog is mostly self-help with an assist from me …

**DP**: … because I thought that would be an easy example for your readers to follow, and of course relevant to their work, and I just changed a single thing in a working function. Then I asked ChatGPT to debug the code. I’m not talking about a serious CSP or SP bug in a complex software system! Just a very simple function with a very simple error. Simple for expert humans to spot, anyway.

**EA**: You just wanted to see what the debugging workflow, sort of, of ChatGPT was?

**DP**: Yeah, that’s a good way to put it Eunice. Workflow. What is the workflow here for debugging signal-processing code and, of course, can the system find the bugs and not also raise a bunch of false-alarms about code that actually is fine … that, I mean, that doesn’t have bugs.

**CS**: OK, so what happened? What code did you pluck from the CSP Blog?

**DP**: I took your convolution code from the Signal Processing ToolKit post on convolution and created a simplified version that just does the ‘convolve a unit-height rectangle with itself’ part. It sets up the rectangle, calls conv.m, and plots the result. To introduce a bug, I replaced the second argument in the call to conv.m with a time-reversed version of the rectangle. Actually, this produces the correct result! The time-reversed rectangle is equal to itself. But it should be flagged as a bug. Later I introduced an even more obvious and still-simpler bug. Here is how the ChatGPT session started:

So far, so good. I then provided the code:

The response I got to this very simple debugging task was this:

The final line of the supplied code is a return statement, which does terminate the execution, but there are no further lines after that return statement, so all should be well! Moreover, most of the supplied code is the plotting section, but ChatGPT thinks the entire plotting section is missing. So it fails completely to parse or analyze the provided code. But, OK, I just modified the original submission by multiplying the second argument to conv.m by two:

and resubmitted (*reprompted*, a new word). Didn’t do nothing about the supposedly missing plotting section and of course I kept the return statement in place. I got this response:

So ChatGPT identifies the call to conv.m as correct, when in fact that is the only line of code that is incorrect! Chad, Eunice, it’s stuff like this that tells me we still need human signal processors and coders, and will for a long time yet.

**CS**: Right, because ChatGPT isn’t really parsing or analyzing *anything* here, it is just trying to give you a high-probability response based on a bunch of things it has seen during training.

**EA**: Well, presumably it has seen MATLAB code during training, we just don’t know. We do know that it is very happy to tell you it can debug your code, but we don’t know how many pairs of {wrong code, right code} it has seen in that training corpus.

**DP**: Yeah, I mean I’ve heard it described as a giant auto-complete system. It takes our familiar word-level auto-complete function and raises it to a paragraph-level or essay-level auto-complete. But we know that auto-complete is not a proper model for doing things like debugging code! We don’t want to get suggestions on what is similar to what we’ve done, we want to find a flaw or falsehood and replace it with correct code or truth. What is true about an auto-complete? Maybe you like the provided completion, maybe you don’t, but there isn’t any truth value to it.

**CS**: I have my own example to share. One of the things ChatGPT says it can do is something called ‘code completion.’ I guess if you’re writing some code to do something–in our case a signal-processing something–and you get stuck and don’t finish the job, you can provide the incomplete code and ChatGPT will finish it (‘complete it’) for you, providing code snippets and ‘even entire functions.’ Well, we all get stuck, don’t we?

**DP**: Yes!

**EA**: For sure.

**CS**: Of course I’m suspicious that ChatGPT can only complete simple codes like a quadratic-equation implementation or take-an-FFT-and-plot-the-magnitude kind of thing, but let’s see. A lot of CSP Blog readers have trouble implementing the time-smoothing method of spectral correlation estimation, so let’s see if ChatGPT can complete it if I just leave off the final steps and place good comments there for the nature of the missing functions or code.

**EA**: But why would we think that ChatGPT is good enough for that level of sophistication in signal-processing coding?

**CS**: I guess because it boasts about it? Here is something it likes to say:

**CS**: OK? So let’s see what happens with the TSM. I have a 262-line MATLAB implementation that is generously commented and computes the spectral correlation and the coherence. I removed the final steps and replaced them with comments that just said: “Missing code”:

**CS**: And just for context, here is the form of the function call, seen at the top of the file that implements the function, which I provided to ChatGPT:

**CS**: So basically I tell ChatGPT what to do in each comment. Here is what I got back:

**EA**: It looks like it got the frequency vector right!

**DP**: Yay! And If S_T is the spectral correlation estimate, then it assigned the output variable scf correctly! And coh = coh! Of course it does!

**CS**: But that’s where the correctness ends, I’m afraid. P_T is indeed the power spectrum estimate, but the coherence, recall, requires that the spectral correlation is normalized by the geometric mean of the PSD values at and . ChatGPT just divided the spectral correlation function by the square-root of the PSD vector. Also, the nature of that normalization is different depending on the variable conj_flag. And this is the point–most of signal processing is surprisingly complicated compared to other kinds of software because there are a lot of underlying mathematical concepts and definitions. I think this is what people mean by leaky abstractions in mathematics and engineering. I didn’t expect good results here, and I got what I expected.

So to summarize our conversation on using ChatGPT in our everyday signal-processing work, we don’t because we can’t because it ain’t any good. The apparent silver lining (for OpenAI) is that the more you try, though, the more ChatGPT trains you, taking you from a knowledgeable mathematically inclined signal-processing engineer to a master questioner (prompt engineer). You’ll get better and better at providing more and more prompts. It is, however, unclear if you’ll ever actually get to a useful result, but for sure you interact with the system for longer and longer periods of time. Kinda like how social media platforms nudge you and guide you to ‘high engagement’ items, steadily prolonging your visits to the apps. Hmmm…

**CS**: Alright, let’s move to a new topic. In the past few decades we’ve seen several high-profile biologically inspired algorithms for system optimization go through hype-and-trash kind of cycles: evolutionary algorithms, simulated annealing, genetic algorithms, various kinds of neural networks for inference-making, and now large-language models. These are just tools, and we can use them if they fit the job at hand. What kind of job or task in your professional life might be better done, or more easily done, by using a large-language model? We’ve covered a couple cases where ChatGPT didn’t work out in a design or debug setting, but are there other settings that you can at least envision a large-language model helping?

**DP**: ChatGPT is a language model. It accepts text prompts and provides text responses. So I think there is a possible role for ChatGPT and the like for technical writing. Things like monthly reports, technical reports, presentation outlines, maybe even the simpler emails. Maybe it could, eventually, provide useful starting points for such things in our domain.

**EA**: But what I struggle with is that all those writing tasks are usually highly specific. You don’t just write a generic report for your client or customer: “*Dear Human Source of Funding: We have made substantial technical progress. We have encountered non-serious technical problems and have identified potential solutions. Sincerely, Human Sinks of Funding.*” Doesn’t cut it. Engineering work is always detail-oriented.

**CS**: Well, what does ChatGPT say? I asked it about helping me with ‘engineering reports’ and this is what I got back:

Would it be any faster to try to supply ChatGPT with all those things it asks for rather than just write the report from the start? I mean, are we going to get back on the mobius strip of doom? Do we need a Certified Prompt Engineer ™ to have ChatGPT create a one-page report? Of course, like Eunice, I’m being trained too:

**DP**: How is that any different than finding a website that quickly explains the elements of a report? Gotta be a million of those around.

**CS**: Yes, how is ChatGPT’s assistance here better than, say, ProjectManager.com?

**EA**: In all honesty, Chad, I’m thinking the answer is: *AI is modern, Project Managers are not*. What do they call the cognitive bias toward the new over the old regardless of the actual quality or benefits involved?

**DP**: I think the MIT people call it ‘new-technology bias,’ a positive-toward-new-tech bias that comes from being, for example, awe-struck over some new machine or viewing computer-program outputs are wondrous, mysterious, and ineffable.

**CS**: OK, let’s now move to discussing some ideas of critics and promoters. One way to look at the latest generative AI tools is as the latest in a long line of technologies that yes, disrupt, but also that will simply end up being used by humans and integrated into artistic, scientific, political, and mundane efforts just as they always do. Here’s Michael Woudenberg

Many on the thread totally understood the power that these new tools [Midjourney, ChatGPT, Dall-E] provide to the aspiring artist. Painters were panicked about photography as Mark Palmer so well points out in The Joy of Generative AI Cocreation. In the 1990s photographers then panicked about digital film and then cell phone cameras. Today millions of people can take photos that were limited to only professionals with expensive equipment. Yet there are not fewer photographers or less art. There’s more!

Woudenberg on the Polymathic Being Substack

So the idea is let’s not fear the disruption of this new tool, let’s just figure out how to use it. It won’t take over, any more than cameras, movie cameras, cell phone cameras have taken over. We are still in control; we are still the artists; we still ‘do the work.’ The tools change over time, we do not.

**DP**: I get the idea. It reminds me of some of the educational technology that appeared over the years in science and engineering. Suddenly all engineering students had powerful programmable calculators swinging from their belt loops in the 80s, then personal computers, routine access to campus mainframes, wikipedia, e-books, online courses, etc. Professors adapted and good solid engineers are still created by the universities of America.

**EA**: But there is this siren song telling me ‘This time it’s different. This time things are *really* going to change, and not for the better.” And what seems different is that the amount of effort on the part of the artist, scientist, or engineer seems minimal–all the work happens in the AI software.

**DP**: But is that an illusion? Probably people felt the exact same way about cameras–too easy relative to painting! Even though at first it wasn’t easy at all, in that developing the film was messy, expensive, and time-consuming.

**CS**: So we’re kind of converging on the idea that prompt engineers in 2023 are like the first photographers back in the 1800s. Maybe the skill and the training required to be really good at it will start to look like the training and skill required in the 2020s to be really good at photography?

**EA**: OK, I’ll try to accept that!

**CS**: Woudenberg goes on to make the case that humans still have reserved powers. In particular, he thinks critical thinking is still solely the domain of humans:

AI computes, humans think. When humans think, they ask questions because they are curious. AI only works with what they have and asks for no more.

This is an important distinction that often goes overlooked. If you ask ChatGPT a question, it will respond and churn out an answer. To get it to ask for more information, you have to tell it to ask for more information which then becomes a separate sequence of activities. It’s not actually asking you for more information but more of a ‘call and response.’ It’s not curious and it’s never confused.

Woudenberg on Polymathic Being

**DP**: Well it sure does *look* like the AIs are thinking. When that ChatGPT response comes back, it feels like you’re talking to a thinking being–a human.

**EA**: But it is an illusion. Keep that metaphor you mentioned–

**CS**: ‘Autocomplete at the level of an essay’?

**EA**: –yeah, that autocomplete idea, in your head and you won’t be so caught up in the anthropomorphizing of the Chat bot in front of you.

**DP**: Which is what the LLM designers want you to feel! They want you wowed, they want that new-tech bias so you keep coming back, you expand your use, and you spread the good news.

**CS**: But one problem with sort of adopting Woudenberg’s positive stance is that these AI systems make a lot of mistakes and produce a lot of crap, as we’ve seen and shown here on the CSP Blog. I guess it is one thing if we use them for artistic or, say, low-stakes activities like planning a birthday party or goofing with writing:

But when novices [newbies] use it like a search engine or an interactive version of Wikipedia, things are much more serious and we could be doing damage to ourselves.

**EA**: Right, when I ask ChatGPT about something I know well, I see the errors easily. But if I asked it about something I am ignorant about, in an attempt to become less ignorant (like quantum computing), I may very well end up even more ignorant.

**DP**: Or worse, misinformed!

**EA**: Yes, exactly. I can see how ChatGPT can push people *backward* in their intellectual development if used in many of the touted ways.

**CS**: Alright, so Woudenberg is pretty optimistic about our ability to do things that all these AIs can’t do AND about our ability to harness them properly–things can only get better for humans, and I think he would agree with us that humans are still needed for things like creative signal-processing algorithm development, problem-solving, debugging, and making new math.

Let’s switch to another favorite critic: Freddie DeBoer. Freddie is more interested in analyzing and complaining about (it that so-entertaining and well-written way he has) **The Hype**. Which is kinda my thing too.

**EA**: No duh.

**DP**: And how.

**CS**: I think I saw that “Giant Autocomplete” metaphor in Freddie’s work, by the way. In an attempt to explain why he thinks the hype is way over the top, he looks at a classic weakness of natural-language processing systems: The Winograd dilemma. This is the dilemma of determining the antecedent of an ambiguous pronoun.

**DP**: Huh? Wazzat?

**EA**: Oh Dan. Come on!

**CS**: Let’s just give Freddie’s examples and things will be clear to us humans Dan.

The ball broke the table because it was made of concrete.

The ball broke the table because it was made of cardboard.

**DP**: Ah, so we have to identify what “it” refers to–either the ball or the table, right?

**EA**: Right.

**CS**: Right. Do you find it easy?

**EA**: Sure. In the first sentence, “it” refers to the ball, and in the second “it” refers to the table.

**DP**: But how is that so easy for us?

**CS**: Well, Freddie’s answer is just that we know, and have internalized, a lot of real-world facts and notions about all manner of both balls and tables. We know that a concrete table is very hard to break with any ball, so we just quickly “know” that the “it” must be the table.

**EA**: Similarly, we know that any table made of cardboard is a weak table indeed and could be broken by all kinds of things–generally not a great material for a table.

CS: Freddie explains:

These two sentences are grammatically identical and differ only by the material specified. And yet 99 out of 100 human beings will say that, in the first sentence, “it” refers to the ball, while in the second, “it” refers to the table. Why? Because concrete tables don’t break if you drop balls on them, and balls don’t break tables if they (the balls) are made out of cardboard. In other words, we can coindex these pronouns because we have a theory of the world – we have a sense of how the universe functions that informs our linguistic parsing. And this, fundamentally, is a key difference between human intelligence and a large language model. ChatGPT might get the coindexing right for any given set of sentences, depending on what response its model finds more quantitatively probable. But it won’t do so consistently, and even if it does, it’s not doing so because it has a mechanistic, cause-and-effect model of the world the way that you and I do.

**CS**: A more difficult example from Freddie is the following:

The committee denied the group a parade permit because they advocated violence.

The committee denied the group a parade permit because they feared violence.

**CS**: Here we want to know what the word “they” refers to; “they” is the ambiguous pronoun. What is its antecedent?

**EA**: Yeah, I see that this is tougher because the involved nouns are more abstract that balls and tables. But still, for us puny humans, its easy: In the first sentence, “they” refers to the group and in the second it refers to the committee.

**CS**: Yes, and it appears ChatGPT gets this one right–it is cannonical and it is highly likely that it appears in the training corpus. Unfamiliar ones cause ChatGPT to fail, like the ball and table one. It isn’t as if ChatGPT is steadily building up a model of the world; it is merely increasing its training corpus by using us (all of its users). It is saying “OK, yeah, I heard that one before… let me autocomplete for ya.”

**DP**: Well, let’s just program in some knowledge of the world and then ChatGPT will be right there with us on the Winograd dilemma. Right?

**EA**: But I think, Dan, that’s the long-term problem with AI systems based on a catalog of facts or a massive set of rules: We just can’t cram all the stuff into our programs. That’s why, in fact, supervised-learning-based machine learning has been so ascendant in the twenty-first century: all the previous-era failures of rule- and fact-based systems.

**CS**: And now the pendulum appears to have swung much too far away from AI systems that have, somehow, built-in knowledge or models of the real physical world. Which brings us, of course, to Gary Marcus.

**EA**: Ah yes, the current Cassandra of artificial intelligence.

**DP**: Wet blanket. Party pooper. Naysayer. Prophet of Doom! Which are all the things I *like* about him!

**CS**: Gary’s basic problem with the AI and ML community is the huge hype and the heavy disdain for building in world models/facts, but lately, since ChatGPT, he is concerned with bad actors using the rushed-to-market tools for all kinds of nefarious ends.

**EA**: And he’s probably right to be concerned–we already know there are a large number of people out there that will use any technology they can get their hands on to separate you from your money.

**DP**: And others that use it to push your beliefs around and to mess with your ability to tell truth from fiction. I definitely worry about my elderly parents!

**CS**: He signed that original letter of concern about the dangers–

**EA**: That one-sentence letter?

**CS**: –no, the longer one in March 2023 [link], not the one-sentence one associated with Hinton that came out later [link]. I like parts of that letter because they echo my own thoughts and words (Why do we want any of this?):

Contemporary AI systems are now becoming human-competitive at general tasks,[3] and we must ask ourselves: Should we let machines flood our information channels with propaganda and untruth? Should we automate away all the jobs, including the fulfilling ones? Should we develop nonhuman minds that might eventually outnumber, outsmart, obsolete and replace us? Should we risk loss of control of our civilization?

https://futureoflife.org/open-letter/pause-giant-ai-experiments/

**CS**: And then later he put a couple slides from his talks on his substack summarizing the two paths we can take regarding regulating–

**DP**: or attempting to anyway

**CS**: –highly capable natural-language processors, large-language models, generative-AI image producers, etc. Here they are:

**CS**: The question for you two is: do you think we can avoid the Bleak Future? Can we–should we–regulate these powerful AI tools, like we regulate the airwaves, the phone system, pharmaceuticals, common carriers, etc.?

**EA**: I suppose it is interesting to ponder whether we *should* regulate AI products or AI research, and what that regulation would look like, but it seems to me the more important question is: *Can* we regulate AI? Is it even possible? Is the cat fully out of the bag already?

**DP**: I don’t think we can regulate AI research and development. It can be done with relatively little in the way of capital investments, unlike, say, developing a new antibiotic. Some dude in his basement can create all kinds of AI models and systems with relatively inexpensive hardware and software. So small companies can too. We might be able to regulate the appearance of AI systems in markets or on major websites. Maybe. Kind of like regulating homeopathic products or food supplements–they aren’t allowed on the market until they are checked and found to be benign. But the regulators don’t reach into the food-supplement labs much. They focus on the point of sale.

**EA**: Yeah, the cat is out of the bag. I think maybe only a social force can work to slow or improve the situation. Going back to those hard questions of that original “temporary halt” letter: Should we be doing this work?

**CS**: Alright engineers, we’ve arrived at our final topic: fairness.

**DP**: Are AI or ML systems fair? You mean are they biased? Like in facial recognition systems and the like?

**CS**: No, I mean do the companies that create and train these massive language models operate fairly in the world.

**EA**: Is it fair of them to scour the internet for human-created texts and then use them for profit, effectively cutting off the original creators from their audience?

**CS**: Yeah. That.

**EA**: Well I did hear that OpenAI is now being sued for copyright infringement.

**CS**: Yes, by a couple authors. The crux of the matter is whether or not grabbing copyrighted material, en masse, from sources on the internet and then using that material as training inputs for large language models is “fair use,” which is a copyright-law concept. Here is an explanation from the Copyright Alliance:

That fourth item seems key. I recently asked ChatGPT if it knew about the Cyclostationary Signal Processing Blog, and it rather sorrowfully said no, but added that it doesn’t know anything past 2018 (Ouch: CSPB birthdate is 2015). But it will, perhaps, eventually get around to scraping–I mean copying wholesale–my content. And when that happens, and engineers ask ChatGPT about CSP, it might be able to answer using whole intact paragraphs or even some of my essays or equations. Whither the CSP Blog then?

**EA**: Done for, I presume. Overcome by events. Deprecated.

**DP**: Snuffed. Annihilated. Toast. Term–

**CS**: OK, OK. Jeez guys…

**EA**: And if a Google search for something about CSP ends up showing a bunch of links to AIs like ChatGPT, and then, like, your Blog is the 73rd link, nobody will show up to your Blog anymore. Sorry Chad. RIP CSPB.

**CS**: Yeah. That’s what I was thinking too. But every one of my posts has a copyright notice! How can they get away with this?

**DP**: Money talks? Also, the new-technology bias at work in, well, just about everybody? “You’re just mad and crying because you’re being left behind by this glorious new tech, which we really love. Really. Love. It. Get used to it! Happens to all of us.”

**CS**: I could rush to convert the CSP Blog to a subscription-only website, locking it up, effectively, before the OpenAI scraper-bots get their hands, er, claws, uh, *virtual articulated grasping units* (VAGUs) on it.

**EA**: But a lot of the ground we covered earlier in the interview leads me to believe there is still a role for you, your posts, and most especially the back-and-forth commenting sessions you have with your readers. I think ChatGPT is a long, long way from that kind of expertise–it still doesn’t understand how to do the things it says it can do, like debugging, code completion, creative solution-finding, etc., in the context of signal processing anyway.

**CS**: So, are we agreed then? The world still needs human signal processors?

**DP**: Yup, agreed. At least for a while yet. Hopefully I’ll be dead before we become obsolete. I don’t mind being both dead and obsolete.

**EA**: Agreed. We are still needed. But … we might do well to study prompt engineering on the sly.

**CS**: Well, that’s the end of the interview. Thanks so much for your time and energy. I really appreciate it, and I’ve enjoyed our conversation. Let’s go get a drink!

**EA**: Me too, Chad, lotta fun. Might I suggest you get the “other side” and interview our colleagues Leo Martello and Mary Brevectus?

**CS**: I’ll consider it!

Dan Peritum is an expert on signal processing for communications, communications standards, and demodulation techniques for a wide variety of terrestrial and satellite modulation types. He holds a PhD in Electrical Engineering from the University of Felpersham, UK.

Eunice Akamai has twenty years of experience with statistical signal processing and algorithm design. Her primary technical interests are compressive sensing, fractional-order transforms, array processing for direction-finding, and the theory of non-stationary random processes. She earned a PhD in Applied Mathematics from Ivy University.

This post won’t help you directly with your CSP work or your signal-processing education, but I do hope it might help you indirectly. It might help you by illustrating that peer-review is broken, so that published technical papers should be viewed with extreme suspicion (including mine of course), and that the gold-rush mentality that infects so many ML and AI researchers feeds a growing boldness on the part of grifters and fraudsters.

You can download either version of the IEEE 2022 Final Report from the CSP Blog here or here. I couldn’t find any mention of the status of peer-review in the reports. They don’t contain the word ‘peer’ and the instances of ‘review’ relate to the financial-condition sections of the report.

But peer-review is a big part of the IEEE–or at least I assume it to be. According to Google (I’m not going to ask ChatGPT), about 25,000 documents are added to IEEE Xplore each month. I presume the bulk of those are conference and journal papers–a few are standards documents and the like.

And there are known problems with the IEEE peer-review process, problems well beyond those I document here at the CSP Blog. An interesting website for scientific-minded or scholarly people is Retraction Watch. They catalog retracted papers across all kinds of scholarly disciplines, and explain the reasons for the retractions and sometimes the methods by which journals are forced to do retractions. They say the IEEE is a major offender.

For example, here is a relevant Retraction Watch item about IEEE papers in 2022 (the year covered by the aforementioned IEEE Annual Report):

So I wonder why the IEEE does not see fit to report on the sorry state of peer review in its Annual Report? I also wonder why they didn’t just include some high-level statistics on peer review as yet another way to promote themselves: Total number of papers reviewed, number of reviewers, number of countries-of-origin of the reviewers, acceptance rates, etc. But instead I see nothing–just look the other way and whistle past the graveyard. Everything’s fine.

There are people that think peer review is irredeemable and say good riddance to it. One is Adam Mastroianni. He makes some interesting arguments about the failures of peer review and thinks science is better off without it. I think his main point is that science has not benefitted from peer review. We see a modern consensus that science has stagnated (although people disagree about why), and lots of opinions that at least as much good science was done before peer review than after. Peer review is expensive in time spent and is difficult labor when done honestly and conscientiously. So why even use it?

My problem with just letting peer review die is students. *Won’t someone please think of the children!* Heh. I just can’t countenance letting up-and-coming graduate students fend for themselves, trying to find the good stuff hidden in much larger set of crap. Adam’s argument is something like “the truth will win out in the end,”

“They do triumph eventually.” Eventually. Just tough luck for the people in the here and now, very far from eventually, whenever that is, who have to wade through the current muck. *And all the muck we’ve allowed to accumulate over the years as peer review degraded.*

And I’m not convinced by the pre- and post-peer-review argument. We couldn’t do a controlled experiment. We did some science pre and we did *different* science with *different* people in a *different* world post. Does that mean peer review is a failure? Or might it mean that without peer review that post-peer-review era would have been worse?

The new paper is titled “Deep-Learning-Based Classification of Digitally Modulated Signals Using Capsule Networks and Cyclic Cumulants,” and is My Papers [54]. If you go to the My Papers page, you can download a pdf of the new paper using a link in the citation for [54].

In the extended paper [54], we provide additional details of cyclic-cumulant estimation and direct comparisons to a CSP-based blind modulation-recognition algorithm (My Papers [25,26,28]). The discussions concerning motivations, processing approaches, and future directions are also extended relative to [52].

Like [52], the focus of [54] is on the generalization problem associated with trained neural networks. In our application area, modulation recognition, and in many other areas, a major drawback of using trained neural networks (convolutional neural networks, residual networks, capsule networks, etc.) is that their performance is highly sensitive to slight changes in the probability density functions that describe the random variables influencing the input data. This brittleness has several names, including generalization, dataset-shift, data drift, data shift, and concept shift.

We find, perhaps unsurprisingly, that there is no dataset-shift (generalization) problem for simple modulation-recognition problems if the input is a principled extracted data feature rather than I/Q samples. The principled feature here is a matrix of cyclic-cumulant magnitudes of various orders (such as the features depicted in the CSP Blog banner). By *principled* I simply mean that the feature is directly related to the fundamental mathematical characterization of the data, which is the set of all joint probability density functions for the samples. Such features contrast with data-mining features obtained by rooting around in some giant dataset looking for correlations (and you’ll always find some, principles be damned).

The obtained excellent generalization of our networks when using cyclic-cumulant inputs can be explained by realizing that the (properly estimated and normalized) cyclic cumulants for a BPSK signal with rate , carrier offset of $f_1$, and square-root raised-cosine pulse rolloff of are exactly the same as those for a BPSK signal with rate , offset and rolloff . All BPSK signals (with a fixed rolloff) are characterized by the same feature matrix. So the distribution of the bit rates and/or the carrier offsets is immaterial. This is not the case for I/Q input data.

The drawback of the cyclic-cumulant-input approach to training neural networks is that, well, you have to estimate, blindly, the cyclic-cumulant matrix. If only we could stick with I/Q inputs and get both the high performance and the excellent generalization that comes with using cyclic cumulants as inputs… Well, we can. We’ve done some work to show that and have a couple MILCOM papers in submission. I’m looking forward to seeing you all again at MILCOM 2023 if we can get those papers accepted.

The crucial point, which I’ve made before and so am in danger of belaboring it, is that to obtain simultaneous good performance and good generalization in machine-learning modulation recognition, one needs a machine that is designed with the modulation-recognition problem in mind. Therefore, we have explicitly rejected the wholesale copying of successful image-recognition neural networks to the RF domain in favor of designing network layers that have the chance to extract the very features that we *know* work best. The modulation-recognition problem is not the same, in terms of the probabilistic description of the input data, as the image-recognition problem and convolutions won’t cut it. The original motivation for including all the different two-dimensional convolutions in the network was to mimic known good performance of biological image-recognition systems (human eye-brain system). That system is terrible at modulation recognition by staring at plots of I/Q data, but great at finding the cat in the photo.

There is no universal classifier that provides good performance AND good generalization across multiple disparate domains.

Here is an extracted figure from the paper to motivate you to go read the whole thing. We used the CSP Blog datasets CSPB.ML.2018 and CSPB.ML.2022 to assess performance and generalization differences between networks with different kinds of inputs.

In this Signal Processing ToolKit post, we look at a generalization of the Fourier transform called the *Laplace Transform.* This is a stepping stone on the way to the *Z Transform*, which is widely used in discrete-time signal processing, especially in control theory.

Jump straight to ‘Significance of the Laplace Transform in CSP‘ below.

Let’s motivate the upcoming Z transform by generalizing the Fourier transform. But why do we *need* to generalize something so pure, so good, so useful, and so perfect as the Fourier transform???

Consider the unit-ramp function shown in Figure 1. Recalling that the unit-step function is zero for negative , one for positive , and variously defined as one or zero for (let’s not worry about that), the unit-ramp can be expressed as .

What is the Fourier transform of ? We can start by writing it down,

An expression for this Fourier transform can be found, but it involves the derivative of the impulse function, so it doesn’t exist as a well-behaved function, and is even difficult to deal with as a generalized function.

Consider also random functions like Gaussian noise and exponentials like with . They do not have Fourier transforms. For the exponentials and ramps, the basic problem is that the functions are increasing with (or whatever the independent variable is) and so the integral–which is the limit of a sum involving the values of that increasing function–cannot converge. For sample paths of random processes like the Gaussian process or a BPSK signal, the limit simply does not converge to any particular value, although the signal does not blow up like the ramp and exponentials.

One way around this lack of convergence in the Fourier transform is to introduce a damping factor inside the transform’s integral to ensure that the signal does not increase with time so much that the integral diverges. For example, if we multiply the unit ramp by a unit exponential and integrate the result, we get a finite number. This exponential dampening is illustrated in Figure 2.

The exponential tends to zero rapidly, and controllably with the magnitude of the positive number , and is never zero, so it is a good choice to both preserve the character of the signal it is multiplying (since it is never zero, no values of that function are discarded in the integration) and to ensure that no matter how fast the function under study increases with time, it can be brought down to earth.

So to enable a transform of a signal that is not Fourier transformable, we enter a factor of , for some real number , into our Fourier transform as follows

Now let the variable be equal to . Then our transform becomes

which is the Laplace transform of . Most of the time we want to apply this transform to signals that are zero for negative times , so that the Laplace transform is one-sided and is usually written as

but the two-sided transform is also used. The one-sided transform, being applicable to signals that are zero for negative time, is particularly useful for transformation of causal impulse-response functions, which possess that exact property already.

The advantage of the Laplace transform over the Fourier transform is that the functions to be transformed can be poorly behaved–they might correspond to systems that are unstable and so their outputs grow without bound. So you might be able to see why we’ve done without the Laplace transform at the CSP Blog lo these many years–we are almost always interested in communication signals that are perhaps not Fourier transformable but are not growing without bound. We got around the fact that random communication signals (that is, all useful communication signals) are not Fourier transformable by switching our focus from transforms to power spectra.

If the Fourier transform for exists, then it is given by the Laplace transform with . Following our notation convention for the Fourier transform, the Laplace transform is denoted by the operator and also the doubled-ended arrow as in

Since the transform is defined by an integral, and integration is itself linear, it follows that the Laplace transform, like the Fourier transform, is a linear transform. This simply means that the Laplace transform of the sum of scaled signals is the sum of scaled Laplace transforms,

Linearity will help us compute the Laplace transform of complicated signals by permitting us to express them as the sum of simpler signals, find the Laplace transform of each of the summands, and finally add them up.

For what values of does a particular Laplace transform exist? This is typically visually expressed by considering the -plane, which has vertical axis denoted by , which equals , or by itself, and horizontal axis denoted by . Let’s take a look at the region of convergence by taking our first Laplace transform: the transform of the exponential function .

Let’s go through the math. Applying the definition of the transform,

Formally, this integral equals

If , then , and also we won’t divide by zero because for any . With the condition , then, the transform is

The convergence parameter in must be greater than for the integral to exist, which can be satisfied whether or not is positive or negative (whether or not the exponential decreases or increases as time increases).

When , the exponential is decreasing, and the region of convergence looks like the shaded area in Figure 3. Since is positive, is negative, and the region of convergence includes a part of the -plane where and all of the half plane for . In particular, the region of convergence contains the axis, where . This means that the Laplace transform formula is valid if we substitute into (12), which is the Fourier transform of ,

which indeed matches the Fourier transform for the decaying exponential obtained by direct computation of the Fourier transform.

The important point is that if the Laplace transform formula corresponds to a region of convergence that includes the axis, the Fourier transform can easily be determined from the Laplace transform. If the region of convergence does not contain the axis, then the Fourier transform cannot be determined from the Laplace transform. The Fourier transform in such cases does not exist in the normal sense of a function, but may exist if generalized functions such as impulses are permitted. We’ll see examples shortly.

If , then the region of convergence is wholly contained in the right-half plane as illustrated in Figure 4. The axis is not contained in this region, so that the Fourier transform of the increasing exponential does not exist.

What about when ? The function under consideration is , or just the unit-step function itself. The condition on remains, which is , and under this condition

You might recall that the Fourier transform of the unit-step function is not a particularly friendly function,

which invites the question of what is going on in at . Compare that expression to the Laplace transform expression of with . Better! Not your best friend (that is rectangular-pulse BPSK of course), to be sure, but friendly enough.

At this point in the development, we have the Laplace transforms of the unit-step function and the exponential function. We’d like to know a lot more if we want to try to apply the transform to problems involving signals and systems. To do that, we could apply the Laplace integral (5) to each of a number of signals we’ve encountered in the SPTK posts, but it is typically easier to try to be more clever. We’d like to understand how common mathematical operations, such as scaling, differentiation, integration, convolution, multiplication, etc., affect a signal’s transform. Then when we encounter a new signal, we try to express that signal in terms of one or more of these operations on a signal for which we already know the transform.

What is the Laplace transform of the signal , for any complex constant , given that we know ? Since we already know that the Laplace transform is linear, it follows easily that the transform of the scaled signal is the scaled transform,

Suppose we have a differentiable function ,

with Laplace transform . What is ? The transform integral (5) is

We can proceed to evaluate this kind of integral by applying the technique called* integration by parts*.

The first step is crucial: Identify and from the integrand components of the integral to be solved. We’ll make the choice and ,

With this choice for and , we can identify and ,

With this choice, let’s carefully follow the integration-by-parts rule.

If , then as , so that

We adopt the convention of my beloved The Literature [R132], and interpret to be , the value of the function just before zero, to get around certain technical issues involving discontinuities at , such as might occur for certain causal linear time-invariant systems‘ impulse-response functions. So the final answer for the derivative of is

.

As a preview, since we know that , and then , which is consistent with , as we’ve seen before.

It follows immediately that the Laplace transform of the second derivative of () is

Next let’s look at differentiation’s inverse operation: integration. What is the Laplace transform of ?

Let’s again reach for integration by parts. Since the integral of is easy, let’s choose that for ,

Our formula reduces to

If , is finite, and no impulse in at the origin), then the first term on the right in (29) is zero. We are left with

which is satisfying because the effect of differentiation (factor of ) undoes the effect of integration (factor of ). The final result is

As a preview, consider that the unit ramp is the integral of the unit step

What does that imply about ?

Suppose . What is ?

First, let’s rule out because then we don’t have a function of time anymore–we’d be asking about the Laplace transform of , which is the Laplace transform of a constant, which we already know is . But let’s also rule out , because those values of not only compress or expand the time axis, but they swap all the function values for negative time with those for positive time. Yet is itself only on function of for . So we wouldn’t be able to say anything about the relationship between and if . That leaves the still-considerable set of that are real numbers greater than zero.

Let’s proceed by evaluating the Laplace integral (5).

Let’s do a substitution for the variable of integration:

This substitution leads to

The final result is, in our compact notation,

We are interested here in , but let’s guess at the answer first and then work backward to verify. We know that is the transform of the derivative of , so if there is any significant duality between time and complex frequency in the Laplace transform, we might guess that the transform of is the derivative of . And since the Laplace transform and Fourier transform are closely related, and the Fourier transform does possess duality, we have good reason to make this guess. Let’s check.

Apart from the negative sign, the guess is verified.

A delayed version of is with . This delay pushes the signal forward in time (to the right along the time axis). What is the Laplace transform of the delayed signal in terms of the known transform of the original signal ? (You might guess based on the behavior of the Fourier series and transform for delayed signals.)

We have to be a little careful about delaying here because it may be nonzero for negative time, and when , some of the function defined for negative time shifts into positive time, yet none of that portion of was used to find .

So what we want to consider is , ensuring that the function is zero for all negative time, and its delayed version . Otherwise, if we want to deal with the negative-time portion of , we can use the two-sided Laplace transform.

We’ll proceed directly to the definition (5),

We require a change of variables,

Applying this change of variables leads to

Since , and the integrand is zero for , we have

or

If we shift a Laplace transform by some amount , as in , what is the corresponding time function? If we had an easy-to-evaluate inverse transform, we could apply it here. But we’ve avoiding introducing the inverse Laplace transform so far (for good reason), so let’s once again take a guess, and see if that leads to easy analysis.

We know that a shift in frequency for the Fourier transform is a multiplication of the time waveform by a complex exponential,

so we can guess that multiplication of the time waveform by something like will produce a shifted version of the Laplace transform. Let’s work it out.

which implies the desired result

Suppose we have two causal signals and . Then their (normal) convolution is also causal in that

So all three signals are of the usual sort we are dealing with as we study the one-sided Laplace transform.

What is ? Again, we can make a very good guess by reflecting on the convolution theorem: . I’ll leave the proof as an exercise for the interested reader.

Let’s look at the case of a generic periodic signal. Let the period of the signal be . Then the defining characteristic of the periodic signal is that for all real numbers .

Periodic signals can be written in a lot of equivalent ways because all that is required is that the function over some interval is replicated at every other interval . Consider the periodic rectangular pulse train shown in Figure 5.

Recalling that the function is defined as being equal to one on and zero otherwise, it is natural to express in Figure 5 as

But for the Laplace transform we’re considering here, we only care about the function for non-negative times . We can express the signal in a Laplace-transform friendly way by using the Base Period shown in Figure 5,

Then the entire signal can be expressed as

and therefore the positive-time portion of is easily expressed as the truncated sum

To find here, we can invoke the established linearity and time-delay properties of the transform to yield

where .

Now let’s derive some Laplace transforms for some simple signals that we frequently encounter in signal analysis, such as the unit-step function , the ramp function , the trigonometric functions, exponentials with real exponents, exponentials with imaginary exponents (sine waves), the rectangular pulse train, etc.

Let’s start with , where is the impulse function or Dirac’s delta function. Applying the Laplace transform definition directly gives the answer in short order, due to the sifting property of the impulse function and the fact that it integrates to unity,

The unit-step function is zero for all negative time and one for positive time and is variously defined at ,

For , we have

Now, if , then as so that

Alternatively, we can observe that the unit-step function is the integral of the impulse function

and apply the integration formula to obtain

The unit-slope ramp function is defined as

which is also equal to

So we can use the integration rule derived above to immediately find

Alternatively, we can use the multiplication-by- rule above, since we have and we know ,

and

as before in (67).

Here , where is a real number. If , the exponential grows without bound as increases. If , the exponential approaches zero from above as increases. If , we have the unit step function again. Let’s plug this exponential into the Laplace integral and turn the crank,

If , then as , which means we can evaluate the upper and lower limits as

so that

Next let’s consider the exponential with an imaginary exponent, , which is a complex sine wave (use Euler’s Formula). Let’s go through it the same way as for the previous exponential,

If , then as , so that the evaluated integral is

We can observe that the Laplace transform for is the same whether is real or imaginary. Since the region of convergence here does not include , the formula (77) cannot be used to determine the formula for the Fourier transform of the complex exponential, which we know is an impulse function centered at .

Let’s find the Laplace transforms of and .

Here we consider , the real-valued sine wave with frequency (period of assuming ). Since we already know the Laplace transform for the complex sine wave and we know that the real sine wave is easily expressed as the sum of two complex sine waves,

we can apply the linearity property of the transform to quickly obtain the result. We have

Therefore

For , we have at least three options for finding : (1) direct evaluation of the Laplace integral (as we did for ); (2) using the derivative rule since ; (3) using the integration rule since .

To use the derivative rule, which is , we realize that

so

We then have the desired result,

We use periodic pulse trains with various pulse shapes in different parts of signal processing and radio-frequency communication theory and practice. We’ve already encountered rectangular-pulse pulse trains in our study of signals, their representations, the Fourier series, and the Fourier transform. Closer to home, the rectangular-pulse BPSK signal can be viewed as a rectangular pulse train where each pulse is multiplied by, randomly, a or .

So let’s continue with that level of analysis. We’ll first want to know the Laplace transform of a simple positive-time rectangle, as seen in Figure 6.

The transform of in Figure 6 is straightforwardly computed by applying the Laplace integral, but it is convenient to use previously established results. In particular, this rectangle is easily expressed as the difference between two unit-step functions,

Since and , we immediately obtain the result

Each of the transforms of the two unit-step functions implies a region of convergence of . But if we directly apply the transform definition we obtain

and there is no restriction on here, so that the region of convergence includes . Therefore we can check whether this transform reduces to the known Fourier transform of the rectangle when in (92) (or (89)). We obtain

which is indeed the Fourier transform of a -shifted rectangle with width and height .

Recall that the convolution of a rectangle with itself is a triangle. The triangle shown in Figure 7 is in fact the convolution of the rectangle in Figure 6 with itself if in Figure 6 is replaced by . In that case, .

We can write down the equations for the two lines making up the triangle and put that expression in the Laplace integral, or we can write it as the convolution of a rectangle with itself (and a scaling factor) and employ the convolution relation. We have the expression

Since , we have

Now let’s look at the asymmetrical rectangular pulse train shown in Figure 8. Note that this is a shifted (time-delayed) version of the symmetrical pulse train shown in Figure 5.

We can express this as an infinite sum of shifted rectangles,

Now, we know the transform of each and every rectangle in that sum,

Adding them all up yields

What is the region of convergence for this Laplace transform? The region of convergence for each transformed rectangle is the entire complex plane (any value of ), but we are adding up an infinite number of phase-shifted transforms, so the convergence depends on that sum too.

We need to understand the condition on for the infinite sum to converge

Recall the geometric series formula

.

Here For , we require that , and in this case, the transform converges.

Finally, let’s look at the symmetric pulse train shown in Figure 5, and replicated here in Figure 9.

We need to represent the positive-time portion of this function. There are an infinite number of identical rectangles that have centers for and one rectangle with center , width , and height . We can use the function here

The transform follows easily,

The region of convergence is for the same reasons as outlined in the case of the asymmetric rectangular pulse train.

The inverse Laplace transform is not as simple as the inverse Fourier transform, which is itself scarcely different from the forward Fourier transform. Here we must undertake contour integration if we want to directly evaluate the inverse Laplace transform. The formula is

The constant is any real number in the region of convergence. In practice, such as in control theory or lumped-circuit analysis, the direct computation of the inverse Laplace transform is not common. Instead, the Laplace transform expression is manipulated into a form consisting of known transforms and the full inverse transform is then effectively determined by table lookup and combination due to linearity. We’ll see an example of that shortly.

The Laplace transform is most often used in control problems and in analysis of differential equations governing lumped-parameter circuits (resistor/capacitor/inductor) or other dynamical energetic systems. We will soon progress to the *Z transform* in the SPTK posts, which is essentially the Laplace transform for discrete time, and is commonly applied in digital (discrete-time) control and communication-system problems. In those cases, difference equations (rather than differential equations) are of interest and the Z transform is the right tool.

Let’s just give a taste of why the Laplace transform is an excellent tool for solving differential equations. The idea is that complicated differential equations are transformed into relatively simple sets of polynomial equations, which can be more readily solved. The desired time-domain solution can then be had by inverse Laplace transforming the -domain solution.

Consider the second-order differential equation given by

What is , given that we know the four constants and the initial conditions and ? Transforming the equation, we obtain the following function of ,

Gathering terms leads to

where , , and . We can solve for easily using algebra,

We see that is a *rational function*–a fraction with polynomials in the numerator and denominator. We need to express this rational function in terms of the kinds of functions that we already know are Laplace transforms, such as . Fortunately, such rational functions as (111) can be expressed as a sum of simpler rational functions. That is, we can factor the denominator and then express the function as the weighted sum of terms with each factor in the denominator:

where is the degree of . Things get a bit complicated when the are not all unique–let’s assume they are though.

Returning to (111), we seek

For , consider ,

which must be true for , so we have . Similarly, by considering , and evaluating at , we obtain . Finally, .

We can evaluate the inverse transform because we can inverse transform each term in the new expression for ,

Not much!

Previous SPTK Post: MATLAB’s resample.m Next SPTK Post: Practical Filters

]]>My conclusion is that the DeepSig datasets are as flawed as the DeepSig papers–it was the highly flawed nature of the *papers* that got me started down the critical-review path in the first place.

A reader recently alerted me to a change in the Datasets page at deepsig.ai that may indicate they are listening to critics. Let’s take a look and see if there is anything more to say.

Here is the updated page at deepsig.ai/datasets:

We see that there are “known errata” but that the datasets are still available for download, as ever. However, each one is now called a “Historical” dataset. And it is true that those datasets (the final one includes the hoary string ‘2018’) are ancient, old news, superannuated. In fact, they all come from that distant, hazy, innocent era known as “Before GPT,” which we’ll just call BGPT. If there are any old-school researchers that care about BGPT material, DeepSig is kindly keeping the flame alive. Fine.

But … there is no mention of the nature of the errata (errors). Typically people use the word *errata* to denote errors such as omissions or typographical errors rather than major conceptual errors or massive programming errors, or at least they did BGPT. Those latter errors are more clearly referred to as *flaws* and *bugs*, respectively.

The main point is that we get the “mistakes were made” admission but the vibe is “here is the error-filled material anyway, find the mistakes yourself if you care about that sort of historical, merely academic, thing.” Caveat emptor! I wouldn’t, actually, care much about this, except for the fact that lots of people have used this data to make many many many grandiose claims about ML-based modulation-recognition performance as well as relative claims about “the signal-processing state of the art” (about which they know nothing). Remember, this is the sum total of the higher-order moment mathematics put forth in O’Shea’s The Literature [R138]:

Regarding all those learners and their claims, a simple Google Scholar search reveals The Literature [R138] is cited by at least 1078 papers. (I feel like I’ve had to slog through half of those myself.)

So does DeepSig care about those 1078 researchers (really a couple thousand, since hardly any papers are single-author papers)? What about all the other researchers, students, and practicing engineers who read * those* papers and came away with certain rosy conclusions about ML for MR?

Why not just tell us what the errors are?

Where is the link to the “known errata?”

(DeepSig: Feel free to use these: All BPSK Signals, More on DeepSig Datasets, 2018 RML, One Last Time.)

Why are DeepSig’s fellow machine learners being treated this way?

h/t Steve F.

]]>Here we want to look at more conventional forms of FSK. These signal types don’t necessarily have a continuous phase function. They are generally easier to demodulate and are more robust to noise and interference than the more complicated CPM signal types, but generally have much lower spectral efficiency. They are like the rectangular-pulse PSK of the FSK/CPM world. But they are still used.

Three distinct types of frequency-shift-keyed (FSK) signals are

analyzed in this post. The analysis is directed at finding the set of potential cycle frequencies for each type of FSK signal for all orders and conjugation patterns by examining the cyclic temporal moment functions.

The FSK signals analyzed here are not constrained to exhibit a continuous phase function. The three types of signals arise from distinct models for the sequence of phase variables in the generic complex-envelope FSK signal model given by The Literature [R1]

where is a sequence of IID random variables drawn from the ary set .

The first type of FSK signal corresponds to an independent and identically distributed (IID) phase-variable sequence ,

where the distribution is uniform on the interval . Such an FSK signal is known as *incoherent FSK* (IFSK). The second type of FSK signal is known as *carrier-phase-coherent FSK* (CaPC FSK). For CaPC FSK, the phase sequence is dependent on only through the value of the frequency ,

Thus, for CaPC FSK, the signal consists of bursts of randomly selected fixed-phase oscillator outputs. The third type of FSK signal is called *clock-phase coherent FSK* (ClPC FSK), and it is formed by setting the phase of the oscillator to a constant that depends on the transmitted frequency each time that frequency is selected for transmission. Thus, the phase variables are given by

We analyze the three types of FSK separately next.

The complex-envelope of the IFSK signal is given by

where is an IID sequence of continuous random phase-variables with uniform distribution on , and is an IID sequence of equiprobable frequencies drawn from the set of frequencies .

The IFSK signal can be represented as a *random-pulse* complex-valued PAM signal by simple manipulation,

where

The moments of the symbol sequence are nonzero only for , a result that follows easily from the properties of the phase sequence . It is also relatively easy to show that the moments of the symbol sequence are identical to those for .

Because the pulse function and the symbols are both random, the formulas for digital QAM cumulants presented in the DQAM post do not apply. Let’s try to find the moment functions for the signal. The th-order temporal moment function is given by

The th-order moment of the symbols

is a little tricky to evaluate. Let’s express the product as a product of products, each term of which involves one value of . To do this, we employ the notion of partitions once again,

where

is the common value of for each , and no two values of are equal. This notation includes all possible selections of indices for the symbols, from all equal to some index () to all distinct (, ).

Because the symbols are independent, the moment is given by

For each expectation to be nonzero, we require that the order be even and be equal to , where is equal to the number of conjugated factors in the th moment. Thus, we require that . The moment is given, therefore, by the following expression

The remaining analysis does not depend heavily on the particular set of indices that are chosen; a reasonable choice to focus on is the set in which all indices are equal: . If there are ways to partition the indices so that the resulting moment component is nonzero, then the moment function can be represented by the sum over these components,

Let’s assume that corresponds to the case in which all indices are equal and find the corresponding moment component .

So, we are left with evaluating the moment function for the random pulse,

This moment function is relatively easy to evaluate since the number of conjugations is equal to . The result is given by

where

Thus, the component of the moment function corresponding to identical indices is given by

Note that this component is periodic in with period ; all other components possess this property as well. Therefore, the moment function for IFSK is periodic with period and is nonzero only for . It follows that the cumulant function is also periodic with period and is nonzero only for . In conclusion, the cycle frequencies for IFSK are limited to harmonics of the symbol rate for for all even orders , which is the desired result of the analysis.

For carrier-phase coherent FSK (CaPC FSK), the carrier phase variable depends only on the value of and not explicitly on ,

where is equal to the constant whenever , . Thus, this kind of FSK modulator transmits a burst of the output of one of continuously running oscillators with frequencies , during each signaling interval.

We use straightforward analysis to find the temporal moment function for the CaPC FSK signal, which will allow us to determine the largest possible set of moment and cumulant cycle frequencies for the signal. The th-order temporal moment function is given by

which, after some algebraic manipulation, can be expressed as

where

The random quantities are those that involve the random symbols , so that the expectation can be moved inside the sums. However, as we saw in the case of IFSK, the value of the expectation depends on the nature of the indices . For distinct, the expectation simplifies to

Notice that the expectation results in the n-fold product of the sum of sine waves. At the other extreme, the values of the indices are equal, , and the expectation simplifies to

which is the sum of sine waves with frequencies given by . The other possibilities for the indices also result in the presence of additive sine-wave components. In fact, the notion of partitions is again of use here. The expectation yields sine-wave components with frequencies given by

where can be any of the frequencies , and denotes a partition of the index set with elements.

Since the function is periodic in with period for any choice of the indices , the actual set of moment cycle frequencies is given by

This is a large set of cycle frequencies. To demonstrate this, and to corroborate the cycle frequencies with those in The Literature [R1], let us compute the cycle frequencies for order for ary CaPC FSK.

For and , for all partitions, and the cycle frequencies are given by

For , the general formula applies,

Table 1 provides the cycle frequencies as a function of the partitions for the two values of . The derived cycle frequencies herein match those in The Literature [R1] (pgs. 450–451) for the special case in which the numbers are integers (which is the only case of CaPC FSK explicitly considered in [R1]).

Partition Element | |||
---|---|---|---|

{1,2} | 1 | (2,0) | |

{{1}, {2}} | 2 | (2,0) | |

{1,2} | 1 | (2,1) | |

{{1}, {2}} | 2 | (2,1) |

In the third and final type of FSK signal, clock-phase coherent FSK (ClPC FSK),

the phase variable in the generic FSK model,

is reset at the beginning of each signaling interval such that the carrier phase for each transmitted tone is the same whenever that tone is transmitted. In other words, a specific segment of the oscillator output is transmitted each time the symbol is encountered. So, we transmit one of the following functions each signaling interval

for . Our complex-envelope signal then takes the form

which implies that the phase variable in the generic model is given by

The general case provides a little insight. We consider generic ary signaling,

where

The moment function is given by

The value of the expectation will depend on how the indices are chosen, as we have seen in cases of the other two FSK models. Here, however, the conjugation pattern is irrelevant and any choice of indices that does not result in a moment function of zero results in one that is periodic with period . For example, when all the indices are distinct, the expectation is given by (assuming independent symbols)

Thus, the component of the moment function due to distinct values of is given by

which is periodic in with period . All other index conditions can be expressed in terms of partitions of the index set . For each condition, the product of functions can be expressed as a product involving terms associated with a single value of the index. The expectation associated with a particular partition element is given by a product of expectations,

where is the number of elements of , and .

As in the case of distinct indices, each of the expectations in the general case results in a function that is periodic in with period . Therefore, the moment function is a sum of periodic functions, each with period , and is therefore periodic itself with period . Thus, the cycle frequencies are given by

potentially for all orders (not just even orders). The signal will contain discrete components if the average pulse has nonzero mean,

FSK signals exhibit a variety of cycle frequency patterns, that is, a variety of types of cycle frequencies as a function of order and number of conjugations .

For the incoherent FSK (IFSK) signal, the carrier phase is chosen at random for each signaling interval, which results in a random-pulse PAM signal with random complex-valued symbols distributed on the unit circle. The random symbols result in a relative paucity of cycle frequencies: symbol-rate harmonics for .

For the carrier-phase-coherent FSK (CaPC FSK) signal, the carrier phase in each signaling interval is determined by the phase of the chosen oscillator, which is free-running. The cycle frequencies are numerous (even more than for BPSK) and are given by (33) and (34). Examples include multiples of each of the tones, sums and differences of the tones, and these frequencies plus harmonics of the symbol rate. Odd-order cumulants can be nonzero and the location of the maxima of the cyclic cumulant functions depends on the values of the oscillator phases.

For the clock-phase-coherent FSK (ClPC FSK) signal, the carrier phase is reset in each signaling interval such that only one waveform is transmitted per tone; no symbol-generating oscillators are needed to implement this signaling scheme, only stored waveforms are needed. This FSK signaling scheme produces cycle frequencies similar to those for BPSK, except odd-order cyclic cumulants can be nonzero. The general form of the cycle frequency is .

In summary, only the IFSK signal produces a familiar cycle frequency pattern (QPSK-like). The remaining two FSK signal types produce a great many cycle frequencies and, perhaps more importantly, can exhibit nonzero odd-order cumulants.

Here we simulate the three different classes of FSK signals, apply blind cycle-frequency estimation using the SSCA, use the blindly detected cycle frequencies to estimate the corresponding spectral correlation functions, and finally plot these obtained functions in the usual CSP-Blog three-dimensional surface format.

The carrier frequency for all simulated signals is 0.1 (normalized Hz), the symbol rate is for all binary FSK signals (2FSK, ), for all quaternary FSK signals (), and for all FSK signals. The decreasing symbol rate ensures that the signals are adequately sampled with our default sampling rate of one. The signal power is always unity, and the noise power is , or dB. The signal-to-noise ratio is therefore high, which is desired when we are trying to understand the basic cyclostationarity of the signals.

The obtained spectral correlation plots are arranged in videos for convenience.

The three basic types are treated in the following subsections–and there is a bonus movie of the spectral correlation functions for continuous-phase modulation (CPM) as a preview of a future post on CPM and to provide a contrast with the form of the spectral correlation functions for their closely related FSK kin.

For the IFSK signal type, we look at the three values of , which is the number of individual frequencies that are “visited” as the incoming bits are turned into symbols and modulated onto a carrier, but also we vary the separation between those frequencies, which is the common separation between the frequencies . We’ll call that separation . The style of specifying is to report the quotient of the separation and the symbol rate, which leads to the product . The signals are generated for values of in the range .

From the analysis above, we expect to detect non-conjugate cycle frequencies and no conjugate cycle frequencies for IFSK. The obtained spectral correlation surfaces are shown in Video 1.

Note that the basic cycle-frequency pattern of incoherent FSK is more like that for rectangular-pulse QPSK (or, more generally, MPSK with ) than it is like square-root raised-cosine QPSK, which has only a single non-trivial non-conjugate cycle frequency.

For the CaPC FSK signal, we again vary the frequency deviation product between 0.5 and 1.5 and record the results in a video of spectral correlation surfaces, which is shown in Video 2.

Continuing on in the same vein, the video of blindly determined spectral correlation surfaces for clock-phase coherent (ClPC) FSK are shown in Video 3. Like the CaPC FSK signal, and unlike the incoherent FSK signal, the ClPC FSK signal possesses strong conjugate cyclostationarity. Unike the CaPC FSK signal, however, the ClPC FSK signal has a BPSK-like conjugate cycle-frequency pattern (which is simpler in general).

To show, as a preview of a future post on CPM, how different the spectral correlation surfaces for CPM can be compared to the three FSK signal types considered in the bulk of this post, the blindly determined spectral correlation function surfaces for a variety of CPM signals are shown in Video 4.

Here the relevant parameters are the alphabet size of the underlying pulse-amplitude-modulated (PAM) signal (similar to for the FSK signals above), the modulation index , and the response parameter . See My Papers [8] for a precise mathematical definition of CPM (or await the upcoming post), but the modulation index influences how large the swings in frequency are in response to the randomly varying symbols, and the response parameter specifies the temporal duration of the pulse function for the underlying PAM signal, which modulates the phase of the carrier wave.

When the pulse function is a rectangle, the CPM signal is typically referred to as continuous-phase frequency-shift keying (CPFSK), and otherwise it is typically called CPM. However, for the special case of and rectangular pulses, the signal is exactly minimum-shift keying (MSK), and for and Gaussian pulses, the signal is Gaussian MSK (GMSK) as used in GSM for example. The string ‘LRC’ refers to a raised-cosine pulse function.

Things look good, right? I mean, just by eyeballing the surfaces in the videos, and knowing the key parameters of and , we can see that the cycle frequencies are often just simple harmonics of (non-conjugate) or offset harmonics (conjugate). But some of the surfaces are more complex than that.

How can we check our work?

We have three elements that need to cohere. The first is the mathematical models and analysis results, the second is the signal-simulation code, and the third is the cycle-frequency and spectral-correlation estimators. To check things, we need to see evidence that the cycle-frequency formulas match the blindly obtained cycle frequencies for the set of CSP-Blog simulated FSK signals. To do that, I’m going to plot the blindly obtained cycle frequencies on the x-axis and the corresponding maximum spectral correlation magnitude on the y-axis. Then I’ll mark the points on the x-axis that correspond to a numerical evaluation of the obtained cycle-frequency formulas.

A typical example (I’m not going to show them all–too tedious) is shown in the following figures for all three values of for and . **What we look for is the occurrence of a (significant) detected cycle frequency that is not a predicted one.**

In the DeepSig RML datasets (here, here, here, and here), we see a reference to “CPFSK” as one of the included signal types. In The Literature [R187] we see a reference to “2FSK” as one of the included signal types. There are many other examples of this kind of signal description in datasets and in published papers. “We set out to perform automatic modulation recognition of BPSK, QPSK, MSK, and FSK,” or the like. We’ve already criticized the idea that there is just ‘one BPSK signal’ in the All BPSK Signals post. Things appear to be worse with regard to FSK and CPM. There are many choices, and the temporal, spectral, and cyclic properties of the resulting signal depend heavily on these choices. Therefore the adjusted weights in a neural network must be influenced by those properties and choices. A neural network trained on one choice will likely fail when presented with input signals corresponding to a different choice, although both choices are FSK.

Just which FSK or CPM signal are you talking about in your mod-rec work, and why?

]]>So among the CSP Blog readers that voted, I think the consensus is to produce more “on brand” posts on CSP and the Signal-Processing ToolKit. Also, there is significant interest in doing CSP with GNU Radio, which I have considerable experience with, and so I’ll likely be posting some flowgraph ideas and results at some point in 2023.

Thanks everybody! (But I’ll still rant and rave from time to time; sorry!)

**Update June 25, 2023:** When I said you can vote multiple times, I didn’t mean to ‘spam’ the poll (as my kids would say). Someone just voted for one of the responses ten times in a row (same IP address ten votes within one minute). I meant you can vote for several different items in the poll! So I did remove some of those identical votes. I’ll close the poll at the end of the day June 30.

**Update May 11, 2023:** Please vote in the Reader Poll below (multiple times if you’d like) soon! As of today, *CSP Applications* and *Signal Processing ToolKit* are in the lead, with *Rants* and *Datasets* at the bottom.

The CSP Blog is rolling along here in 2023!

March 2023 broke a record for pageviews in a calendar month with over 7,000 as of this writing early in the day on March 31.

Let’s note some other milestones and introduce a poll.

What a month! We’re at about 7,145 views right now, and the previous monthly record is 6,482.

2023 was the year that a CSP Blog post crossed the 20,000-view milestone: The Spectral Correlation Function. The Cyclic Autocorrelation Function is not far behind.

About 84,000 visitors have been counted over the years since the CSP Blog launched in 2015, with 5,500 this year already. I believe this is just a count of the unique IP addresses that have accessed a page. But the number of subscribers is only 198! You can subscribe (“Follow”) to the CSP Blog by entering an email address in the “Follow Blog via Email” box on the right edge of any viewed page, near the top of the page. You’ll get notified through that email address whenever there is a new post. CSP Blog readers cannot see that email address, just as they cannot see the email address associated with any comment, unless there is an associated gravatar.

I’m planning to have more time available to devote to improving and extending the CSP Blog over the next few months. If you want to have input into that process, consider voting in the poll below.

Thanks so much to all my readers!

]]>