For my second warm-up post for this blog, I want to talk about a small experiment I did about a year ago to try out MathJax with Jekyll to embed some equations.

Alvin Lucier is a sound artist, perhaps best known for his piece “I am Sitting in a Room.” In it, he recites a piece of text that describes the process of his piece and serves as the base for which it builds upon:

I am sitting in a room different from the one you are in now. I am recording the sound of my speaking voice and I am going to play it back into the room again and again until the resonant frequencies of the room reinforce themselves so that any semblance of my speech, with perhaps the exception of rhythm, is destroyed. What you will hear, then, are the natural resonant frequencies of the room articulated by speech. I regard this activity not so much as a demonstration of a physical fact, but more as a way to smooth out any irregularities my speech might have.

A link to a recording of the piece is embedded below. I urge you to listen through the entirety of the piece at some point, rather than just skipping to the end.

It’s a very thought-provoking piece, and it makes you want to set up a rig in your home to hear what your room “sounds” like. However, we can simulate what is going on to get an idea of what it might sound like in much less time and effort.

Modeling the Piece

Essentially, you can think of the acoustics in your room as a system, where you feed it some sound as input, and the room filters it to create a response (e.g: reverberations and echo). One way we could model this system is to come up with a big equation that dictates how a sound should be processed, taking into account all of the physical and acoustic properties of the room, but we probably don’t have the time or resources (or will!) to figure that out.

Instead, we can estimate the system using an impulse response – how the system responds if you stimulate it with a very loud, sudden, and infinitesimally short impulse. For example, if you were to clap right now, the reverberation you hear would essentially be the impulse response of your room.

Once we have an impulse response, we can estimate a system’s output by convolving it with an impulse response. Formally, for a system with an impulse response , we can estimate the response for an input signal with:

This is the definition of a convolution, although the equation doesn’t lend itself much to making it easy to understand. Essentially, we can think of this as two signals being “blended” together. For example, if you were speaking in a cathedral, the sound of your voice reverberating off the walls is essentially your voice being convolved with the impule response of the cathedral (recall: the sound that follows after you clap).

For a more visual example, if we think about convolving two rectangular pulses, we would slide one rectangle through the other, resulting in a triangle:

(Source: Wikipedia)
(Source: Wikipedia)

Convolving a rectangular pulse with a more complex signal:

(Source: Wikipedia)
(Source: Wikipedia)

One interesting and useful property of convolutions is that a convolution in the time domain amounts to a multiplication in the frequency domain. In practice if you wanted to compute a convolution, you almost never compute it the naive way, but instead use fast Fourier transforms (FFT) to operate on their frequency representations.

For example, if we wanted to use Python with numpy to convolve two signals, we would use:

import numpy as np
from numpy.fft import fft, ifft

def convolve(a, b):

    # Get a frequency-domain representation of our signals using FFT
    a_freq = fft(a)
    b_freq = fft(b)

    # Multiply them together to perform the convolution
    c_freq = a_freq * b_freq

    # Use an inverse FFT to get the result in the time domain
    c = np.real( ifft(c_freq) )

    return c

Now if we want to model the process of the piece, where there’s a speaker at one side of the room and a microphone on the other, we can model it as an iterative process where we’re repeatedly mixing a dry signal (the sound recorded from the last iteration) and a wet signal (the sound of the response of the room). We can model our new process with a recurrence relation:

Where and are the same as before, and we use as an attenuation constant to control how much “room” we want to include. For now, we’ll use to reduce the sound of the room by -6dB.

Creating Sounds

To hear what it sounds like, let’s mix a recording of me speaking with an impulse response that I generated using some audio software:

The impulse response of a room
My voice

Convolved together and mixing the dry signal back in, it sounds as if I was speaking in the room that the impulse response was sourced from:

What it would sound like if I was in the room

Now, we can “fast forward” to see how this would sound if we used the same process as Alvin Lucier:

After 5 iterations
After 10 iterations
After 25 iterations
After 50 iterations

Finally, we can stitch these together at each iteration to create a mini “Sitting in a Room”:

The full simulated piece, up to 50 iterations

As you can hear, as we apply the process more and more, the dominant frequency in the impulse response starts to dominate the recording. This makes sense if you think of what repeatedly convolving the same signals over and over is doing – the frequencies that are shared between the two signals are reinforced because we’re constantly multiplying them together, while the frequencies that aren’t shared are destroyed because we’re effectively multiplying them by zero.

Of course, this is what Alvin told us would happen from the very beginning~

Anyways, the code is here if you want to play with this yourself.