SPTK: Signal Representations

A signal can be written down in many ways. Some of them are more useful than others and can lead to great insights.

Previous SPTK Post: Signals                    Next SPTK Post: Fourier Series

In this Signal Processing ToolKit post, we’ll look at the idea of signal representations. This is a branch of signal-processing mathematics that expresses one signal in terms of one or more signals drawn from a special set, such as the set of all sine waves, the set of harmonically related sine waves, a set of wavelets, a set of piecewise constant waveforms, etc.

Signal representations are a key component of understanding stationary-signal processing tools such as convolution and Fourier series and transforms. Since Fourier series and transforms are an integral part of CSP, signal representations are important for all our discussions at the CSP Blog.

[Jump straight to the ‘Significance of Representations in CSP’ below.]

Basics of Representations

Let’s call our signal of interest s(t), and let us restrict our attention to the interval [0, T]. A representation of s(t) is given by the weighted sum of elementary signals \phi_j(t),

\displaystyle s(t) = \sum_j a_j \phi_j(t), \hfill (1)

where I’ve purposely left off the limits on the sum–it could be a finite sum or, more commonly, an infinite sum. The equality in (1) is sometimes exact, but more often we understand it to be equality in some weaker sense, such as at all points except those in some zero-measure set or the like. Not the kind of mathematical detail that I want to dwell on at the CSP Blog. Don’t want this to be the CSP Bog.

The signal representation is useful when you can learn something about the signal through its representation, or when you can use the representation to perform a signal-processing task, such as compression, transmission over a channel, noise rejection, etc.

So what are good choices for the elementary signals \phi_j(t)?

Orthonormal Representations

A desirable property of the set \{ \phi_j(t) \} is that each distinct pair of functions (\phi_m(t), \phi_n(t)) is orthogonal:

\displaystyle \int_{0}^{T} \phi_m(t) \phi_n^*(t) \, dt = \left\{ \begin{array}{ll} C_n, & n = m, \\ 0, & n \neq m, \end{array} \right. \hfill (2)

where C_n > 0. When two functions are orthogonal, they don’t have anything in common, once you take into account the entire interval [0, T]. This is similar to the notion of orthogonality in vector mechanics, where two vectors are orthogonal when their dot (inner) product is zero. That notion, in turn, generalizes our intuitive geometric notion of perpendicularity.

Why is orthogonality desirable? Suppose we have the representation (1),

\displaystyle s(t) = \sum_{j=-\infty}^\infty a_j \phi_j(t) \hfill (3)

and we want to know the value of, say, a_p. If we multiply s(t) by \phi_p^*(t) and integrate, we get

\displaystyle \int_{0}^T s(t) \phi_p^*(t) \, dt = \int_0^T \phi_p^*(t) \sum_{j=-\infty}^\infty a_j \phi_j(t) = C_p a_p \hfill (4)

If C_n = 1 for all n, then the elementary set of functions is an orthonormal set. In which case, we can find the coefficient a_p of \phi_p(t) by integrating the product of that elementary function with the signal.

The discrete-time versions of these ideas and equations are straightforward–replace integrals with sums.

Representation with a Truncated Series

The idea of analyzing arbitrary signals into definite quantities of other, simpler, building-block signals is appealing, but we can’t help noticing the infinities involved in the equations presented so far in this post.

What happens when we want to represent an arbitrary signal using only a finite number of the basis functions? Which do we choose?

First, let’s look at the energy in a signal s(t) in terms of its representation. We have

\displaystyle s(t) = \sum_k a_k \phi_k(t) \ \ \  t\in [t_1, t_2] \hfill (5)

The energy on the interval t \in [t_1, t_2] is given by

\displaystyle E_s = \int_{t_1}^{t_2} | s(t) |^2 \, dt \hfill (6)


\displaystyle E_s = \int_{t_1}^{t_2} s(t) s^*(t) \, dt. \hfill (7)

Let’s substitute (5) into (7) to yield

\displaystyle E_s = \int_{t_1}^{t_2} s(t) \left[ \sum_k a_k \phi_k(t) \right]^* \, dt \hfill (8)

\displaystyle = \sum_k a_k^* \int_{t_1}^{t_2} s(t) \phi_k^*(t) \, dt \hfill (9)

\displaystyle = \sum_k a_k^* a_k = \sum_k | a_k |^2 \hfill (10)

So the energy of s(t) on the interval [t_1, t_2] is equal to the sum of the squared magnitudes of the basis-function components of s(t). This is a form of Parseval’s relation, which says that the energy in a signal is equal to the energy in a transform of the signal.

Armed with this new fact, let’s look at a truncated series representation for s(t). We have

\displaystyle \hat{s}(t) = \sum_{k=-M}^M \hat{a}_k \phi_k(t) \hfill (11)

where M is smaller than the number of functions in the basis set. That is, if the number of the \phi_k(t) is infinite, then M is finite, and if the number of functions is 2K+1, then M < K.

What should we choose for the \hat{a}_k? Is \hat{a}_k = a_k the best choice? Is it even a good choice? Does that depend on the particular kinds of functions making up the basis set \{\phi_k(t)\}?

One way to determine a good choice for the \hat{a}_k is to choose them so as to minimize the error between the signal (equivalently, its representation) and the approximation (the truncated series). But what error measure do we use? Well, let’s use one that we have encountered in other areas of signal processing and statistics, and that is mathematically friendly: the squared error. In particular, let’s try to minimize  the energy in the direct error:

\displaystyle e(t) = s(t) - \hat{s}(t). \hfill (12)

The error energy is

\displaystyle E_e = \int_{t_1}^{t_2} | e(t) |^2 \, dt \hfill (13)

which we can express as

\displaystyle E_e = \int_{t_1}^{t_2} (s(t) - \hat{s}(t))(s(t) - \hat{s}(t))^* \, dt \hfill (14)

By expanding this expression, we get to

\displaystyle E_e = E_s + \sum_{-M}^M |\hat{a}_k |^2 - \int_{t_1}^{t_2} s(t) \sum_{k=-M}^M \hat{a}_k^* \phi_k^*(t) \, dt - \int_{t_1}^{t_2} s^*(t) \sum_{k=-M}^M \hat{a}_k \phi_k(t) \, dt \hfill (15)

\displaystyle = E_s + \sum_{k=-M}^M \left[ |\hat{a}_k|^2 - \hat{a}_k^* a_k - \hat{a}_k a_k^* \right] \hfill (16)

Let’s add zero to the sum for each k. In particular, add |a_k|^2 - |a_k|^2 to each term:

\displaystyle E_e = E_s + \sum_{k=-M}^M \left( \left[ |\hat{a}_k|^2 - \hat{a}_k^* a_k - \hat{a}_k a_k^* + |a_k|^2\right] + |a_k|^2 \right) \hfill (17)

Simplifying (17) leads to

\displaystyle E_e = E_s - \sum_{k=-M}^M |a_k|^2 +\sum_{k=-M}^M | a_k - \hat{a}_k |^2 \hfill (18)

and we see that only the last term depends on the coefficients \hat{a}_k. That term is never negative, so that the smallest it can be is zero. Therefore, the minimum error energy corresponds to the choice

\displaystyle \hat{a}_k = a_k, \ \ \ k = 0, \pm 1, \pm 2, \ldots \pm M \hfill (19)

and that minimum error is

\displaystyle E_{e, min} = E_s - \sum_{k=-M}^M | a_k |^2.

The best coefficients to use in the truncation (approximation) are those used in the representation.

So we know how to manipulate signal representations, and we know how to specify finite-term approximations to infinite-series representations. We know how to compute a particular coefficient a_k, and we know the value of orthonormal representations. Given all that, how do we pick the particular basis functions \phi_k(t)?

The application of interest drives the nature of s(t), and this will give hints as to good choices for the \phi_k(t). By far the most commonly used set of basis functions is the set of harmonically related sine waves. That choice leads to the Fourier series and the Fourier transform. But some others are of interest too, and simpler, so let’s look at them first.

The Impulse Functions

Let’s consider some function of time t that is reasonably smooth, such as the one in Figure 1. To start our development of an impulse-function representation, let’s consider approximating this function by  a set of shifted and scaled rectangle functions.

Figure 1. A smoothly varying function of time.

The width of each rectangle is \Delta t, and the center points of the rectangles are at the time values k \Delta t, as shown in Figure 2.

Figure 2. One way to approximate the smoothly varying function in Figure 1 by using elementary building-block waveforms such as rectangles.

It might be obvious to you that the approximation gets better as \Delta t decreases. Let’s consider some finite interval of time, say [-T, T] over which to perform this approximation, meaning that for any finite \Delta t, there will be a finite number of rectangles needed for the approximation, say 2K + 1. We can write the approximation in mathematical terms by introducing the rect() function,

\displaystyle \mbox{\rm rect} (t) = \left\{ \begin{array}{ll} 1, & |t| \leq \Delta t, \\ 0, & \mbox{\rm otherwise} \end{array} \right. \hfill (20)

Using (20), the function \mbox{\rm rect}((t - k\Delta t)/\Delta t) is a rectangle with width \Delta t centered at t = k\Delta t. Our approximation is then given by the formula

\displaystyle \underbrace{s(t)}_{Signal} \approx \underbrace{\hat{s}(t)}_{Approx} = \sum_{k=-K}^K \underbrace{s(k\Delta t)}_{\substack{Height\ of \\ Rectangle}}  \underbrace{\mbox{\rm rect}((t-k\Delta t)/\Delta t)}_{\substack{Width\ and \\ Location\ of \\ Rectangle}}, \ \ t \in [-T, T] \hfill (21)

Let’s multiply each term in the sum by 1 = \Delta t / \Delta t. This yields the approximation

\displaystyle \hat{s}(t) = \sum_{k=-K}^K \left[ s(k\Delta t) \right] \left[ \frac{\mbox{\rm rect}((t-k\Delta t)/\Delta t)}{\Delta t} \right] \left[ \Delta t \right], \ \ \ t \in [-T, T] \hfill (22)

The rectangles in the second set of brackets have width \Delta t and height 1/\Delta t. As \Delta t decreases, the rectangles get narrower, but also get taller. Their area is always unity though.

Figure 3. Various unit-area rectangles. When the width shrinks to zero, the rectangle becomes the impulse function.

When we let the width \Delta t approach zero, the height becomes unbounded, and the resulting function is called an impulse or the delta function. It is typically denoted by the symbol \delta(t):

\displaystyle \delta (t) = \lim_{\Delta t \rightarrow 0} \frac{\mbox{\rm rect}(t/\Delta t)}{\Delta t} \hfill (23)

Returning to (22), we see that several things happen as we let \Delta t approach zero. First, the unit-area rectangle functions become delta functions. Second, the number of rectangles 2K+1 becomes infinite. Third, the variable \Delta t becomes the differential du, and the variable k\Delta t becomes the continuous variable u, and ranges over the entire interval [-T, T]. Combining these ideas leads to the impulse representation of s(t),

\displaystyle s(t) = \int_{-T}^T s(u) \delta(t-u)\, du, \ \ t \in [-T, T] \hfill (24)

To represent the entire signal, we let T increase without bound,

\displaystyle s(t) = \int_{-\infty}^\infty s(u) \delta(t-u)\, du \hfill (25)

The meaning of (25) is that the signal s(t) is composed of an infinite number of shifted impulse functions, with strengths s(u)du and locations t = u.

We’ll use this continuous-time impulse-function representation later in the Signal Processing ToolKit posts. We’ll also make use of its discrete-time version.

The continuous-time impulse, \delta(t), is a tricky mathematical object because it is not a function as we usually define a function, but is instead a generalized function. The trickiness comes in because we typically try to use this generalized function in situations involving otherwise completely well-behaved normal functions (signals). For an illuminating and accessible discussion of this trickiness of generalized functions, and the impulse in particular, see my review of a paper by Candan.

The Walsh Basis Functions

Let’s now turn to a more practical and concrete signal representation: The Walsh functions. These are simple binary-valued piecewise-constant functions on some finite interval. For (my) convenience, let’s now consider the interval t \in [0, T] instead of [-T, T] or [t_1, t_2].

Binary-valued just means that the Walsh functions can take on one of two values for any instant of time, let’s say A and -A. Piecewise-constant means that the derivative of each Walsh function is zero everywhere it exists, and it exists at almost all instants of time. The places where the derivative doesn’t exist are those instants where the Walsh function transitions from A to -A or from -A to A.

The simplest binary-valued piecewise-constant function on [0, T] is just the constant A:

\displaystyle \phi_0 (t) = \left\{ \begin{array}{ll} A, & 0 \leq t \leq T \\ 0, & \mbox{\rm otherwise} \end{array} \right. \hfill (26)

For orthonormal representations, we need the energy of the basis functions to be equal to unity, which forces A = T^{-1/2} (why?). But let’s keep using the ‘A’ notation for ease of reading and typing.

There is no unique way to define \phi_1(t). We know that it must be orthogonal to \phi_0(t) (and all other Walsh functions of course), have unit energy, and be binary-valued and piecewise-constant. You can probably envision the inner product (2) of any relatively simple \phi_1(t) with \phi_0(t) since the latter is constant–for the inner product integral (2) to equal zero we’ll need \phi_1(t) to be equal to A on subintervals whose lengths sum to half the interval and equal to -A on the other subintervals. So a straightforward choice is simply two subintervals:

\displaystyle \phi_1(t) = \left\{ \begin{array}{ll} A, & 0 \leq t \leq T/2 \\ -A, & T/2 < t \leq T \\ 0, & \mbox{\rm otherwise} \end{array} \right. \hfill (27)

Figure 4. Illustration of the first two Walsh basis functions. The zeroth function is simply a constant A on the interval [0, T], and the second is A on half the interval and -A on the other half. A is equal to the reciprocal of the square root of T, so that both functions have unit energy and it is simple to verify that they are orthogonal.
Let’s move on to the choice for \phi_2(t). After that, a general pattern will emerge and we can specify an infinite number of orthogonal Walsh functions with a simple recursive algorithm. The third Walsh basis function must satisfy the following three equations:

\displaystyle \int_0^T \phi_0(t) \phi_2^*(t)\, dt = 0 \ \ \  (28)

\displaystyle \int_0^T \phi_1(t) \phi_2^*(t)\, dt = 0 \ \ \  (29)

\displaystyle \int_0^T \phi_2(t) \phi_2^*(t)\, dt = 1 \ \ \  (30)

From (27) we know that \phi_2(t), like \phi_1(t), must be equal to A on a number of subintervals whose lengths sum to T/2 and equal to -A on the remainder of the subintervals, whose lengths also sum to T/2. From (28), \phi_2(t) must take on two values on the interval [0, T/2], else it will be identical to \phi_1(t) or its negative, and therefore it will not be orthogonal to \phi_1(t).  Again, the choice is not unique, but let’s stay with the notion of splitting the intervals in half, like we did to obtain \phi_1(t) from \phi_0(t). This leads to the following choices:

Figure 5. Various choices for the third Walsh function. Each is orthogonal to the first two chosen Walsh functions, but they are not all orthogonal to each other.

Perhaps \phi_2(t)_1 is most appealing because it just replicates \phi_1(t) twice in succession. If we add that function to our growing list of Walsh building-block functions, we can also add \phi_2(t)_2, because it is orthogonal to \phi_2(t)_1 and is orthogonal to \phi_0(t) and \phi_1(t).

\phi_2(t)_2 was created by contracting \phi_1(t) so if fits in [0, T/2], then copying that to the interval [T/2, T], and finally negating the function on [T/2, T]. \phi_2(t)_1 is created the same way except omit the negation.

This leads to a general pattern of creating Walsh functions using previously selected Walsh functions: Shrink, replicate and shift, negate. And shrink, replicate and shift, no negation. So for each previously computed Walsh function we get two more after each iteration of this function-generation algorithm. Let’s call \phi_2(t)_2 by the name \phi_2(t) and \phi_2(t)_1 by the name \phi_3(t).

Let’s continue with an example to make it perfectly clear how to obtain the coefficients in the Walsh representation. Our signal will be s(t) = t on the interval [0, T]. Each representation coefficient a_k is the integrated product of the signal and the corresponding function \phi_k(t),

\displaystyle a_k = \int_0^T s(t) \phi_k^*(t) \, dt. \hfill (32)

Let’s compute the first several coefficients.

\displaystyle a_0 = \int_0^T s(t) \phi_0^*(t) \, dt \hfill (33)


\displaystyle a_0 = \int_0^T At \, dt = \frac{T^{3/2}}{2} \hfill (34)

When T= 1, a_0 = 1/2. Does that match your intuition about ‘how much’ of the ramp s(t) is a constant?


\displaystyle a_1 = \int_0^T t \phi_1^*(t) \, dt = \int_0^{T/2} At \, dt + \int_{T/2}^T (-At)\, dt, \hfill (35)


a_1 = \frac{-T^{3/2}}{4}.

Computing a_2 and a_3 is more tedious, and leads to

\displaystyle a_2 = 0 \hfill (36)

\displaystyle a_3 = \frac{-T^{3/2}}{8} \hfill (37)

Now let’s verify this analysis using a few numerical examples.

Walsh Examples

Since we computed the coefficients a_k for a simple ramp function above, let’s start there,

s(t) = t, \ \ \ \ t \in [0, T] \hfill (38)

Let’s set T= 128 and consider various numbers K of Walsh functions. We plot the signal, the Walsh approximation using K basis functions, and the values of the K Walsh-function coefficients a_k in the following movie:

And, just for fun, here are three more examples: a sine wave, a sine wave plus an exponential, and random Gaussian noise:

Here are visualizations of various numbers of Walsh functions:

Harmonically Related Sine Waves

The most common and most useful set of building-block signal-representation functions are the harmonically related sine waves

\displaystyle \phi_k(t) = e^{i 2 \pi (k/T) t} \hfill (39)

where k ranges over all the integers and T is the length of the interval over which we represent a signal, as usual. You can check that any two such sine waves are orthogonal and that each one has energy equal to T. This representation is known as the Fourier series representation.

We’ll go over how to derive the Fourier-series basis functions using our general representation machinery in the next Signal Processing ToolKit post. We’ll also look at various examples of Fourier series, and preview how the Fourier series leads to the Fourier transform.

The Significance of Representations in CSP

The key functions, or probabilistic parameters, in CSP are things like the time-varying autocorrelation and time-varying temporal moments and cumulants. These functions of time are, well, signals themselves, and so they can be usefully represented in terms of weighted sums of simpler signals such as sine waves. This will become more clear once we study a particularly useful and relevant representation: The Fourier series.

Previous SPTK Post: Signals                    Next SPTK Post: Fourier Series

Author: Chad Spooner

I'm a signal processing researcher specializing in cyclostationary signal processing (CSP) for communication signals. I hope to use this blog to help others with their cyclo-projects and to learn more about how CSP is being used and extended worldwide.

4 thoughts on “SPTK: Signal Representations”

  1. Seems to be a problem with what should be equation 20 (right after equation 19, showing the minimum error). Typo in setting up or closing the equation environment? I just see raw LaTeX…

    1. More raw LaTeX in the section on Impulse Functions:
      “It might be obvious to you that the approximation gets better as $\Delta t$ decreases.”

      1. Clint: Right, thanks. I now see multiple raw latex errors. I must not have done the corresponding search that reveals them! Stay tuned a few minutes…

Leave a Comment, Ask a Question, or Point out an Error