Previous SPTK Post: Examples of Random Variables Next SPTK Post: The Sampling Theorem
In this Signal Processing ToolKit post, I provide an introduction to the concept and use of random processes (also called stochastic processes). This is my perspective on random processes, so although I’ll introduce and use the conventional concepts of stationarity and ergodicity, I’ll end up focusing on the differences between stationary and cyclostationary random processes. The goal is to illustrate those differences with informative graphics and videos; to build intuition in the reader about how the cyclostationarity property comes about, and about how the property relates to the more abstract mathematical object of a random process on one hand and to the concrete data-centric signal on the other.
So … this is the first SPTK post that is also a CSP post.
Jump straight to ‘Significance of Random Processes in CSP’ below.
Introduction
We started our signal-processing toolkit journey by looking at signals, including rectangles, triangles, unit-steps, sinc, etc. We then looked at representing arbitrary signals of time in terms of building-block functions such as Walsh functions, impulses, and harmonically related sine waves. The latter led us to the useful frequency-domain signal representations of the Fourier series and Fourier transform.
We shifted focus to systems–those entities that act on our signals to achieve some goal–and found that our analysis tools enabled much insight into system behavior if the system was linear and time-invariant. We added the crucial tool of convolution to our toolkit, and looked at various kinds of linear time-invariant systems, also known as filters.
We then realized that many of our most important signals (in CSP) don’t quite fit into our analysis framework. In particular, communication signals are well-modeled as random infinite-duration power signals, which are not Fourier transformable. So we started learning probability.
The concept of a random variable bridges the gap between an abstract probability space of events and sets of measurements or numbers that we can operate on with arithmetic, algebra, calculus, and associated computing devices. But random variables are not functions of time. Communication signals are. So we now take the next logical step: going from a random variable to a random function or, as it is more commonly referred to, a random process.
How should we generalize the concept of a signal to include both probability and infinite time? We need infinite time because we want to study our systems’ behavior for arbitrarily long-duration inputs. We need probability because the signals we use in communication theory and practice are inherently unpredictable–they are random.
A useful model of a communication signal is an infinite-duration signal that incorporates some parameters or variables that are in some sense unknown to a receiver of the signal:
Here is the
th ‘symbol‘ to be transmitted to the receiver,
is a ‘pulse function‘ to be modulated (multiplied) by the symbol, and
is the rate of producing pulses, and therefore the rate of transmitting symbols.
It is possible to use such a model–a single infinite-duration random signal–to construct an entire probabilistic theory of signals. That theory is the fraction-of-time (FOT) probability theory, and I hope to get to that in the SPTK series, or in a CSP post. A more conventional way, and a way that is somewhat easier mathematically even if more obscure physically, is to use a generalization of our previous notion of a probability space.
Probability
A random process is a collection of random variables indexed by time. It is actually more general than that. The indexing can be some other independent variable, such as distance. But for us, swimming in the deep waters of CSP, our random processes are indexed by time .
We use the notation to denote a random process–typically we use upper-case letters to denote a process, and we’ll continue to use lower-case letters to denote signals. For each
,
is a random variable with some cumulative distribution function (CDF) and probability density function (PDF). Moreover, every collection of random variables
has an
th-order joint PDF and CDF. (Here we are considering real-valued
so the definition of the CDF is straightfoward; complex-valued random processes are only a little more tricky.) In this post, we’ll focus on
and
. That will be sufficient to ground our discussion in the familiar quantities of mean value (first-order moment), power spectrum, autocorrelation (second-order moment), spectral correlation, and cyclic autocorrelation.
Given the definition of a random process, it follows that the random variable has some PDF we can denote by
and the random variable
has PDF
. These two PDFs may or may not be identical. The two random variables may or may not be independent, correlated, uncorrelated, etc.
The joint PDF for the two random variables is denoted by
.
All the usual probability and random-variable definitions and tools apply.
For random variables, a particular occurrence of the variable, such as the outcome of a particular card draw or die throw, is called an instance or sample of the variable. For a random process, a particular occurrence of the process is a function, and we call it, variously, a sample path, a sample function, a time-series, or a signal.
Moments
Suppose we have knowledge of a random process Then we know the PDFs for
for all
: the collection of functions
. We can therefore compute the mean value for
,
and any particular may or may not equal some other
Similarly, we can look at the correlation between two elements of the random process,
which may turn out to be independent of , dependent on them in some general way, or dependent only on their difference
The covariance is the correlation of the two variables after their means are removed,
as usual.
In many cases we don’t actually know the densities (PDFs) of the process itself. However, this is often no barrier to finding the mean, autocorrelation, and covariance functions. Suppose we have a random process that is characterized by a couple specific random variables, such as a sine-wave phase or amplitude. Then we can consider as a function of the random variables
,
, and we know that the expected value of the process is the expected value of the function of
, and so all we actually need are the joint PDFs of the random variables
. An example should make this more clear.
Example: The Random-Phase Sine Wave
Let’s consider the random process defined by
Here and
are constants; they are not random. The only random variable is the sine-wave phase
, which is uniform on the interval
. This random process models the situation where we receive a sine wave, and we know the amplitude and frequency, but we don’t know anything about the phase. Or, to put it another way, we don’t know the time origin of the wave, the shift of the sine-wave relative to the sine wave for
.
What is the mean value of this random process? Is it a function of time?
To evaluate the expectation in (6) requires the joint distribution of all involved random variables. Since there is only one, , we only require that distribution–no need to know explicitly the various
.
The mean value, also known as the first moment of , is given by the expectation
which may be a function of . For our random-phase sine wave and
this expectation is equivalent to
This definite integral is easily evaluated,
Since is arbitrary, the mean value (first moment) of the random process is identically zero. We have a ‘zero-mean random process.’
The reason the mean is zero for each and every time instant is that we are averaging over the ensemble of sample functions, where each sample function is delayed by some amount and all delays modulo the period
are equally represented in that ensemble. A picture helps: see Figure 1.

Now let’s look at the autocorrelation function for the random-phase sine wave. We want to find the average value of the product of two values of the process,
Normally we (and many others) reparameterize the two times and
so that
,
, which means that the lag variable
, and the time variable
is the center point of the two times. We’ll do that later, but for now I want to keep our two times as
and
.
Our autocorrelation is
So the mean is independent of time, being identically zero, and the autocorrelation is a function only of the difference between the two considered times , and is also, therefore, independent of time.
What happens when we try to compute the first and second moments (mean and autocorrelation) using only a single sample path? We (and I mean mathematicians, engineers, and statisticians) define the mean and autocorrelation by infinite-time averages in such situations. The mean is just the infinite-time average,
Here is a number, not a random variable–it is an instance of the random variable
. It is easy to show that the time-average mean in (17) is zero for any
.
The time-average autocorrelation is typically defined as the following infinite-time average
where we recognize that and
so that, again,
. This average is easy to evaluate too,
which does not depend on time, and which cannot depend on time, since we integrated over time to obtain the average.
Notice that the ensemble mean matches the time-average mean (zero), and that the ensemble correlation matches the time-average correlation. Does this always happen? No. The ways the two kinds of average can diverge are important, and depend on the particular random variables involved.
For example, consider a different random process given by
where and
are non-random, and
is a uniform random variable on the interval
with
. The mean is easily seen to be zero,
On the other hand, the temporal mean of some sample path where takes on a concrete value
is
So, as with the random phase, the mean of this random-amplitude sine wave is zero (but what if the distribution of was changed to uniform on
?).
Next let’s compute and compare the probabilistic (ensemble-average) autocorrelation with the temporal (time-average) autocorrelation.
The probabilistic autocorrelation is
If we let and
, we obtain
which is a function of both and
.
On the other hand, the temporal correlation is
which is not–cannot be–a function of time . Therefore, for the random-amplitude sine-wave random process, the probabilistic and temporal autocorrelation functions do not match.
Averaging Over the Ensemble vs Over Time
Let’s take a closer look at the differences and similarities between ensemble and temporal averages using a prototypical communication signal: binary pulse-amplitude modulation (PAM) with rectangular pulses. The signal is defined by
where is the
th symbol, the symbols are constrained here to be in the set
,
is the symbol rate,
is the symbol-clock phase, and
is the pulse function. To make things simple–we’re not trying to be communication-system engineers at the moment–we’ll use the rectangular pulse,
The symbols are independent and identically distributed–the s are as probable as the
s, and no symbol depends on any other symbol.
It is relatively easy to show that the mean value of this random process is zero and that the autocorrelation function is dependent on the nature of the symbol-clock phase random variable . We talked a lot about phase-randomizing in the post on stationary-vs-cyclostationary models, and we obsess (uh, focus) on it here too.
When the symbol-clock phase variable is a constant (nonrandom), the autocorrelation is a periodic function of time; we have a cyclostationary process. When the symbol-clock phase variable is a uniform random variable on an interval with width , we have a stationary random process. In this latter case, the autocorrelation function is a unit-height rectangle, centered at the origin, with a base with width
.
Let’s illustrate these claims using calculations on simulated versions of (38). We can do both ensemble (probabilistic) averaging and single-sample-path (temporal) averaging, approximately, because we can generate a sizeable ensemble, and each sample path can be made quite long (length equal to many thousands times ).
Figure 2 shows a portion of an ensemble of rectangular-pulse PAM sample paths for the case of a random phase variable that is set to zero. That is, there is no difference between the pulse transition times for all the elements of the ensemble–they have ‘the same timing.’

We can use this non-infinite ensemble to approximate the kinds of ensemble averaging we do with math. For example, we can pick some time and average over all the sample-path values for that time, giving an estimate of
. If we do that here, we get small numbers like 0.002.
If we pick two times, and
, we can form an average of the quantity
, which is an estimate of the ensemble-average autocorrelation
, where
indexes the sample path. In Figure 2, we’ve indicated the ensemble values for
and
by the two black vertical lines. If we fix
, and allow
to vary, we can plot the averages as a function of
. This is done in Figure 3 for the ensemble shown in Figure 2.

When we turn to a time average of a single sample path in the ensemble of Figure 2, we get a much different result, as shown in Figure 4. Moreover, this is the same result we get for any of the sample paths–it does not vary with the index as defined in the previous paragraph. (Well, that is almost true. You can imagine a sample path for which every symbol
is
. For this sample path, the time average is unity. But this sample path occurs ‘almost never,’ which means ‘with probability zero.’)

The autocorrelation function estimate for the case of no phase variable in Figures 3 and 4 depends on and
in a more general way than through just their difference. This can be seen by forming the estimate for each of a number of choices for
and a variety of
for each of these choices. The resulting collection of autocorrelation-function estimates can then be viewed as a movie, as shown in Video 1.
Another way of looking at the time variation of the ensemble-average autocorrelation is by fixing the difference and varying
. The autocorrelation estimates in Video 1, on the other hand, fix
and allow the difference
to vary. The former choice is illustrated in Figure 5.

Now let’s look at these same estimated quantities for the ensemble we generated with a uniform random phase variable . Figure 6 shows a portion of the generated ensemble. Since the value of
is randomly chosen for each sample path, the timing of the pulses in the PAM signals differs from path to path.

Figure 7 shows the correspondence between the ensemble-average autocorrelation function and the temporal-average autocorrelation for the random-phase ensemble. Here the two match; you get the same answer whether you average over the ensemble or over time.

Figure 8 shows that the ensemble-average autocorrelation is a function of and
only through their difference
. Note that the constant value in each of the subplots of Figure 8 should map to a particular value on the triangle in Video 2. At upper left,
, and so the constant value there of one maps to the tip of the triangle in Video 2, that is the value of the triangle for
, which is satisfyingly also one. The value in Figure 8 for
is 0.5, and this is exactly what we see for the triangle in Video 2 when
.

Stationarity
When the probabilistic functions of a random process are independent of time, the process possesses a kind of time-invariance that is called stationarity. This property is reminiscent of the all-important system property of time-invariance that we made use of in characterizing linear systems. Systems that are linear and time-invariant are filters, and signals that are time-invariant are stationary, loosely speaking. Of course the signal isn’t actually a constant as a function of time, it is the underlying ensemble that possesses the time-invariance in its statistical functions–stationarity. But we still speak of ‘stationary signals’ in day-to-day work.
There are kinds of stationary random processes. The stricter kind requires all probabilistic parameters (all moments) to be time-invariant. The weaker kind requires only the first two moments to be time-invariant. This latter kind of stationarity is known as wide-sense stationarity (WSS). Remember, stationarity of any kind is a property of a random process–an ensemble together with a set of joint PDFs for all choices of –not of an individual time-series, signal, or sample path.
Strict-sense stationarity is hard to check because you have to have a simple enough process, and the time, to be able to check all the moments (or PDFs). Wide-sense stationarity saves us from this hassle: we need only check the mean and autocorrelation.
For a wide-sense stationary (WSS) process, the mean must be a constant,
and the autocorrelation is a function only of the time difference between the two involved times and
,
We’ve abused the notation a bit in (41): we’re using to indicate both a function of two variables and a function of one. But we know that in the formal two-variable autocorrelation, the values of
and
only appear as
.
For our two PAM ensembles, only one is wide-sense stationary, and that is the one with the random-phase variable. The other ensemble produces a time-varying autocorrelation (second moment), and is therefore neither wide-sense nor strict-sense stationary.
On the other hand, the sample paths of each of the ensembles are cyclostationary signals. That is, if we compute the cyclic autocorrelation function for , we get a non-zero reliable answer for long processing-block lengths (almost surely).
Ergodicity
Ergodicity is a property of a random process that ensures that the time-averages of the sample paths correspond to the ensemble averages. This is the property that mathematicians must invoke so that they can pursue actual real-world utility of the ensemble sample paths.
Ergodic random processes must be stationary because the time averages of sample paths are, by construction, not functions of time. Therefore, if those averages have any chance to match the ensemble averages, those ensemble averages also cannot be functions of time, and so the process must be stationary.
Our ensemble with the random-phase variable is stationary and ergodic because the mean and the autocorrelation functions in ensemble- and temporal-averaging agree.
Cyclostationarity
A wide-sense cyclostationary random process is one for which the mean and autocorrelation are periodic (or almost-periodic) functions of time, rather than being time-invariant.
A second-order cyclostationary signal, or cyclostationary time-series, is simply a signal for which the infinite-time average defined by
or that defined by
is not identically zero for all non-trivial cycle frequencies and delays
. The sole trivial cycle frequency is
for (42). There are no trivial cycle frequencies for the conjugate case in (43).
It is perfectly possible that there are no non-trivial cycle frequencies for which either the non-conjugate cyclic autocorrelation function or the conjugate cyclic autocorrelation function
are non-zero, and yet the signal is still a cyclostationary signal. This can occur if some other, higher-order, moment function possesses a non-trivial cycle frequency. A real-world example is duobinary signaling. Another is simply a square-root raised-cosine QPSK signal with roll-off of zero: it is a wide-sense stationary signal, but possesses many higher-order non-trivial cyclic cumulants.
Cycloergodicity
If we do have a cyclostationary random process, and we want to use stochastic machinery to, say, derive an algorithm, we will also want to know whether the sample paths of that cyclostationary random process are cyclostationary, and that they possess the same cyclic parameters as the process. This is done by checking for, or invoking, cycloergodicity.
For ergodicity, we check the ensemble-average mean value against the temporal-average mean value for a sample path,
and we check the ensemble-average second moment against the temporal-average second moment,
For cycloergodicity, we need a new tool involving time averages to compare with the time-varying ensemble-average autocorrelation. This is the fraction-of-time (FOT) expectation, also called the multiple-sine-wave extractor (or similar), . This operator returns all finite-strength additive sine-wave components in its argument. We’ll have a post or two in the (far?) future on fraction-of-time probability. Accept for now the claim that the FOT expectation really is an expected-value operation, which means somewhere lurking in here is a new kind of cumulative distribution function and associated probability density function (The Literature [R1]).
For now, a couple examples should suffice to illustrate the idea. The FOT mean value of a single sine wave is just that sine wave
(Compare that with the usual temporal average applied to a sine wave.)
Suppose we have a periodic function , such as a single sine wave, a square wave, a radar signal, etc., with period
. Then we know that we can represent
in a Fourier series
The FOT sine-wave extractor is linear, so that again we have
Suppose some periodic signal is embedded in noise and interference that does not possess any periodic component,
Then
To check the cycloergodic property of a random process, we check the ensemble-average mean value against the FOT mean value
Note that in (49), the ensemble average on the left and the temporal (non-stochastic) average on the right can be functions of time . We also check the ensemble-average second moment against the FOT second moment,
If these averages match, and the process is cyclostationary, then we say the process is a cycloergodic process.
From this point of view, then, the random process corresponding to the PAM ensemble with a uniform random phase variable is stationary, ergodic, and non-cycloergodic. The PSDs obtained from ensemble averaging match those obtained from the sample paths (almost surely), and the ensemble and temporal means are zero. We end up with strong observable (sample-path-based) cyclic features, but no cyclic features in the stochastic domain, as illustrated in Figure 9.

On the other hand, if we perform the same operations on the ensemble and sample paths using the ensemble without phase randomization, we see that the cyclic features match (Figure 10). This process is cyclostationary, nonergodic, and cycloergodic.

The properties of the two PAM random processes we have studied in this post are summarized in Table 1.
Property | PAM with no Symbol-Clock Phase ( | PAM with Random Symbol-Clock Phase ( |
Zero Mean Process? | Yes | Yes |
Stationary Process? | No | Yes |
Cyclostationary Process? | Yes | No |
Ergodic Process? | No | Yes |
Cycloergodic Process? | Yes | No |
Stationary Sample Paths? | No | No |
Cyclostationary Sample Paths? | Yes | Yes |
This was a complex, difficult post. I’m sure there are errors. If you find one, please let me know in the Comments.
Significance of Random Processes in CSP
The theory of cyclostationary signals can be formulated in terms of conventional non-stationary random processes, and so from that point of view, random processes are central to CSP. To bridge the gap between a random process (infinite collection of time-series of infinite durations together with a set of probability density functions for all possible collections of random variables plucked from the process) and a signal (one infinite-time member of the ensemble), we need to invoke a kind of ergodicity called cycloergodicity.
A good way to achieve cycloergodicity is to refrain from adding random variables to the random process model that attempt to account for what are simply unknown constants in a given sample path. For instance, refrain from adding random symbol-clock phases and random carrier phases. This will render the process non-stationary (cyclostationary), but will also lead to cycloergodicity. Include random variables in the random process that are models of quantities that also randomly vary in a single sample path–such as transmitted symbols in a PSK or QAM signal.
It is possible to formulate a theory of cyclostationary signals that completely sidesteps the creation of a random process (ensemble plus probability rules). What is needed is a way to characterize the random (erratic, unpredictable) behavior of a signal using only infinite-time averages. This leads to the idea of a fraction-of-time probability to take the place of the ensemble probability. We’ll look at that in a future CSP post.
Previous SPTK Post: Examples of Random Variables Next SPTK Post: The Sampling Theorem