SPTK: Linear Time-Invariant Systems

Previous SPTK Post: The Fourier Transform         Next SPTK Post: Frequency Response

In this Signal Processing Toolkit post, we’ll take a first look at arguably the most important class of system models: linear time-invariant (LTI) systems.

What do signal processors and engineers mean by system? Most generally, a system is a rule or mapping that associates one or more input signals to one or more output signals. As we did with signals, we discuss here various useful dichotomies that break up the set of all systems into different subsets with important properties–important to mathematical analysis as well as to design and implementation. Then we’ll look at time-domain input/output relationships for linear systems. In a future post we’ll look at the properties of linear systems in the frequency domain.

Continue reading “SPTK: Linear Time-Invariant Systems”

On Impulsive Noise, CSP, and Correntropy

I’ve seen several published and pre-published (arXiv.org) technical papers over the past couple of years on the topic of cyclic correntropy (The Literature [R123-R127]). I first criticized such a paper ([R123]) here, but the substance of that review was about my problems with the presented mathematics, not impulsive noise and its effects on CSP. Since the papers keep coming, apparently, I’m going to put down some thoughts on impulsive noise and some evidence regarding simple means of mitigation in the context of CSP. Preview: I don’t think we need to go to the trouble of investigating cyclic correntropy as a means of salvaging CSP from the clutches of impulsive noise.

Continue reading “On Impulsive Noise, CSP, and Correntropy”

Data Set for the Machine-Learning Challenge

Update September 2020. I made a mistake when I created the signal-parameter “truth” files signal_record.txt and signal_record_first_20000.txt. Like the DeepSig RML data sets that I analyzed on the CSP Blog here and here, the SNR parameter in the truth files did not match the actual SNR of the signals in the data files. I’ve updated the truth files and the links below. You can still use the original files for all other signal parameters, but the SNR parameter was in error.

Update July 2020. I originally posted 20,000 signals in the posted data set. I’ve now added another 92,000 for a total of 112,000 signals. The original signals are contained in Batches 1-5, the additional signals in Batches 6-28. I’ve placed these additional Batches at the end of the post to preserve the original post’s content.

I’ve posted 20000 PSK/QAM signals to the CSP Blog. These are the signals I refer to in the post I wrote challenging the machine-learners. In this brief post, I provide links to the data and describe how to interpret the text file containing the signal-type labels and signal parameters.

Overview of Data Set

The 20,000 signals are stored in five zip files, each containing 4000 individual signal files:

Batch 1

Batch 2

Batch 3

Batch 4

Batch 5

The zip files are each about 1 GB in size.

The modulation-type labels for the signals, such as “BPSK” or “MSK,” are contained in the text file:

signal_record_first_20000.txt

Each signal file is stored in a binary format involving interleaved real and imaginary parts, which I call ‘.tim’ files. You can read a .tim file into MATLAB using read_binary.m. Or use the code inside read_binary.m to write your own data-reader; the format is quite simple.

The Label and Parameter File

Let’s look at the format of the truth/label file. The first line of signal_record_first_20000.txt is

1 bpsk  11  -7.4433467080e-04  9.8977795076e-01  10  9  5.4532617590e+00  0.0

which comprises 9 fields. All temporal and spectral parameters (times and frequencies) are normalized with respect to the sampling rate. In other words, the sampling rate can be taken to be unity in this data set. These fields are described in the following list:

  1. Signal index. In the case above this is `1′ and that means the file containing the signal is called signal_1.tim. In general, the nth signal is contained in the file signal_n.tim. The Batch 1 zip file contains signal_1.tim through signal_4000.tim.
  2. Signal type. A string indicating the modulation format of the signal in the file. For this data set, I’ve only got eight modulation types: BPSK, QPSK, 8PSK, \pi/4-DQPSK, 16QAM, 64QAM, 256QAM, and MSK. These are denoted by the strings bpsk, qpsk, 8psk, dqpsk, 16qam, 64qam, 256qam, and msk, respectively.
  3. Base symbol period. In the example above (line one of the truth file), the base symbol period is T_0 = 11.
  4. Carrier offset. In this case, it is -7.4433467080\times 10^{-4}.
  5. Excess bandwidth. The excess bandwidth parameter, or square-root raised-cosine roll-off parameter, applies to all of the signal types except MSK. Here it is 9.8977795076\times 10^{-1}. It can be any real number between 0.1 and 1.0.
  6. Upsample factor. The sixth field is an upsampling parameter U.
  7. Downsample factor. The seventh field is a downsampling parameter D. The actual symbol rate of the signal in the file is computed from the base symbol period, upsample factor, and downsample factor: \displaystyle f_{sym} = (1/T_0)*(D/U). So the BPSK signal in signal_1.tim has rate 0.08181818. If the downsample factor is zero in the truth-parameters file, no resampling was done to the signal.
  8. Inband SNR (dB). The ratio of the signal power to the noise power within the signal’s bandwidth, taking into account the signal type and the excess bandwidth parameter.
  9. Noise spectral density (dB). It is always 0 dB. So the various SNRs are generated by varying the signal power.

To ensure that you have correctly downloaded and interpreted my data files, I’m going to provide some PSD plots and a couple of the actual sample values for a couple of the files.

signal_1.tim

The line from the truth file is:

1 bpsk  11  -7.4433467080e-04  9.8977795076e-01  10  9  5.4532617590e+00  0.0

The first ten samples of the file are:

-5.703014e-02   -6.163056e-01
-1.285231e-01   -6.318392e-01
6.664069e-01    -7.007506e-02
7.731103e-01    -1.164615e+00
3.502680e-01    -1.097872e+00
7.825349e-01    -3.721564e-01
1.094809e+00    -3.123962e-01
4.146149e-01    -5.890701e-01
1.444665e+00    7.358724e-01
-2.217039e-01   -1.305001e+00

An FSM-based PSD estimate for signal_1.tim is:

psd_1

And the blindly estimated cycle frequencies (using the SSCA) are:

cfs_signal_1

The previous plot corresponds to the numerical values:

Non-conjugate (\alpha, C, S):

8.181762695e-02  7.480e-01  5.406e+00

Conjugate (\alpha, C, S):

8.032470942e-02  7.800e-01  4.978e+00
-1.493096002e-03  8.576e-01  1.098e+01
-8.331298083e-02  7.090e-01  5.039e+00

signal_4000.tim

The line from the truth file is

4000 256qam  9  8.3914849139e-04  7.2367959637e-01  9  8  1.0566301192e+01  0.0

which means the symbol rate is given by (1/9)*(8/9) = 0.09876543209. The carrier offset is 0.000839 and the excess bandwidth is 0.723. Because the signal type is 256QAM, it has a single (non-zero) non-conjugate cycle frequency of 0.098765 and no conjugate cycle frequencies. But the square of the signal has cycle frequencies related to the quadrupled carrier:

cfs_signal_4000

Final Thoughts

Is 20000 waveforms a large enough data set? Maybe not. I have generated tens of thousands more, but will not post until there is a good reason to do so. And that, my friends, is up to you!

That’s about it. I think that gives you enough information to ensure that you’ve interpreted the data and the labels correctly. What remains is experimentation, machine-learning or otherwise I suppose. Please get back to me and the readers of the CSP Blog with any interesting results using the Comments section of this post or the Challenge post.

For my analysis of a commonly used machine-learning modulation-recognition data set (RML), see the All BPSK Signals post.

Additional Batches of Signals:

Batch 6

Batch 7

Batch 8

Batch 9

Batch 10

Batch 11

Batch 12

Batch 13

Batch 14

Batch 15

Batch 16

Batch 17

Batch 18

Batch 19

Batch 20

Batch 21

Batch 22

Batch 23

Batch 24

Batch 25

Batch 26

Batch 27

Batch 28

Signal parameters text file

Cyclostationarity of Direct-Sequence Spread-Spectrum Signals

In this post we look at direct-sequence spread-spectrum (DSSS) signals, which can be usefully modeled as a kind of PSK signal. DSSS signals are used in a variety of real-world situations, including the familiar CDMA and WCDMA signals, covert signaling, and GPS. My colleague Antonio Napolitano has done some work on a large class of DSSS signals (The Literature [R11, R17, R95]), resulting in formulas for their spectral correlation functions, and I’ve made some remarks about their cyclostationary properties myself here and there (My Papers [16]).

A good thing, from the point of view of modulation recognition, about DSSS signals is that they are easily distinguished from other PSK and QAM signals by their spectral correlation functions. Whereas most PSK/QAM signals have only a single non-conjugate cycle frequency, and no conjugate cycle frequencies, DSSS signals have many non-conjugate cycle frequencies and in some cases also have many conjugate cycle frequencies.

Continue reading “Cyclostationarity of Direct-Sequence Spread-Spectrum Signals”

Second-Order Estimator Verification Guide

In this post I provide some tools for the do-it-yourself CSP practitioner. One of the goals of this blog is to help new CSP researchers and students to write their own estimators and algorithms. This post contains some spectral correlation function and cyclic autocorrelation function estimates and numerically evaluated formulas that can be compared to those produced by anybody’s code.

The signal of interest is, of course, our rectangular-pulse BPSK signal with symbol rate 0.1 (normalized frequency units) and carrier offset 0.05. You can download a MATLAB script for creating such a signal here.

The formula for the SCF for a textbook BPSK signal is published in several places (The Literature [R47], My Papers [6]) and depends mainly on the Fourier transform of the pulse function used by the textbook signal.

We’ll compare the numerically evaluated spectral correlation formula with estimates produced by my version of the frequency-smoothing method (FSM). The FSM estimates and the theoretical functions are contained in a MATLAB mat file here. (I had to change the extension of the mat file from .mat to .doc to allow posting it to WordPress.) In all the results shown here and that you can download, the processed data-block length is 65536 samples and the FSM smoothing width is 0.02 Hz. A rectangular smoothing window is used. For all cycle frequencies except zero (non-conjugate), a zero-padding factor of two is used in the FSM.

For the cyclic autocorrelation, we provide estimates using two methods: inverse Fourier transformation of the spectral correlation estimate and direct averaging of the second-order lag product in the time domain.

Continue reading “Second-Order Estimator Verification Guide”

Textbook Signals

What good is having a blog if you can’t offer a rant every once in a while? In this post I talk about what I call textbook signals, which are mathematical models of communication signals that are used by many researchers in statistical signal processing for communications.

We’ve already encountered, and used frequently, the most common textbook signal of all: rectangular-pulse BPSK with independent and identically distributed (IID) bits. We’ve been using this signal to illustrate the cyclostationary signal processing concepts and estimators as they have been introduced. It’s a good choice from the point of view of consistency of all the posts and it is easy to generate and to understand. However, it is not a good choice from the perspective of realism. It is rare to encounter a textbook BPSK signal in the practice of signal processing for communications.

I use the term textbook because the textbook signals can be found in standard textbooks, such as Proakis (The Literature [R44]). Textbook signals stand in opposition to signals used in the world, such as OFDM in LTE, slotted GMSK in GSM, 8PAM VSB with synchronization bits in ATSC-DTV, etc.

Typical communication signals combine a textbook signal with an access mechanism to yield the final physical-layer signal–the signal that is actually transmitted (My Papers [11], [16]). What is important for us, here on the CSP blog, is that this combination usually results in a signal with radically different cyclostationarity than the textbook component. So it is not enough to understand textbook signals’ cyclostationarity. We must also understand the cyclostationarity of the real-world signal, which may be sufficiently complex to render mathematical modeling and analysis impossible (at least for me).

Continue reading “Textbook Signals”

Creating a Simple CS Signal: Rectangular-Pulse BPSK

To test the correctness of various CSP estimators, we need a sampled signal with known cyclostationary parameters. Additionally, the signal should be easy to create and understand. A good candidate for this kind of signal is the binary phase-shift keyed (BPSK) signal with rectangular pulse function.

PSK signals with rectangular pulse functions have infinite bandwidth because the signal bandwidth is determined by the Fourier transform of the pulse, which is a sinc() function for the rectangular pulse. So the rectangular pulse is not terribly practical–infinite bandwidth is bad for other users of the spectrum. However, it is easy to generate, and its statistical properties are known.

So let’s jump in. The baseband BPSK signal is simply a sequence of binary (\pm 1) symbols convolved with the rectangular pulse. The MATLAB script make_rect_bpsk.m does this and produces the following plot:

rect_bpsk_time_domain

The signal alternates between amplitudes of +1 and -1 randomly. After frequency shifting and adding white Gaussian noise, we obtain the power spectrum estimate:

rect_bpsk_psd

The power spectrum plot shows why the rectangular-pulse BPSK signal is not popular in practice. The range of frequencies for which the signal possesses non-zero average power is infinite, so it will interfere with signals “nearby” in frequency. However, it is a good signal for us to use as a test input in all of our CSP algorithms and estimators.

The MATLAB script that creates the BPSK signal and the plots above is here. It is an m-file but I’ve stored it in a .doc file due to WordPress limitations I can’t yet get around.