Ron’s right arm: tactility, visualization, and the synesthesia of audio engineering


To the casual observer it might look like Ron’s entire body is being supported by the SoundWorkshop mixing board. Much of his body weight leans into the console, which is coupled to his body through his right arm. Head faced down, he concentrates deeply on the funk/jazz/dub mix in progress. Earlier in the session hours had been spent in aligning and biasing the Otari MTR90 24 track tape machine; in selecting and tweaking the outboard EQs, compressors, and reverbs; in getting the mix “in the ballpark.” However, “in the ballpark” won’t do. Ron’s concentrating on the bass. In particular, he’s concentrating on his right arm, for Ron knows his control room, he knows his monitors, he knows that the bass is “in the pocket” when it feels a certain way in his right forearm. The feeling is inaudible, or at least is subtle enough that ears alone can not be trusted to perceive it.

For Ron, mixing is not just a process of aural listening. Mixes have a feel, and by that I do not mean feeling as a metaphoric quality. Feel is tactile, feel can even be visually mapped, particularly now that Ron’s mixing workflow is now oriented around a Samplitude Digital Audio Workstation. The right arm is still an invaluable sensing organ, but couples not just with the analog console but with a synesthesia of visual and auditory practices.

I must confess, my paper’s subtitle may be a bit misleading: “Tactility, Visualization, and the Synesthesia of Audio Engineering.” More precisely, I am not interested in replacing one universalizing discourse about audio engineering practice with another universalizing discourse about the same. Instead, I suggest that not only should we, as scholars and engineers, be interested in synesthetia, but in differing, historically, culturally, socially, and politically situated synaesthesias, plural. Ron, the engineer with whom I begin my analysis, has developed a unique and particular mode for audio engineering which differs from those of other engineers, for example tonmeisters in Istanbul who also are concerned with bass but use different conceptual, bodily, and sensory techniques for achieving their aims.

Despite the differences, what appears to be generalizable across contexts is that audio engineering practices are not reducible to one sense alone. Every widespread form of engineering developed until today has depended on the body for the manipulation of interfaces and on audition through headphone or loudspeaker audition systems. All computer-based audio engineering technologies depend upon the visualization of abstractions of sound and also a visualization of the interface for manipulating sound. However, scholarship on audio engineering has ignored the sensing body for the most part, focusing primarily on the products of audio engineering (i.e., commercially released recordings), on engineering-specific knowledge sets, and on engineering as an art form. My presentation today is a small subset of a much larger project concerning sensory practices and recording occupations. As such, my findings are tentative and intended more to be provocative and exploratory than to stand as a finished research project.

Theorizing the Senses

I argue that in order to understand the production of affect, or perhaps the affect of production, we need to pay attention to bodies, to the senses, to the practices of audio engineering and musicianship. I draw on some seemingly unlikely sources for inspiration, including Brian Massumi’s theorization of the relation between affect, synesthesia, and virtuality; and Charles Hirschkind’s analysis of bodily practices and cultivated “sensorium.”

“For the present is lost with the missing half second, passing too quickly to be perceived, too quickly, actually, to have happened” (Massumi 2002: 30).

Massumi, fusing the experimental psychological research of Benjamin Libet with the philosophical writings of Spinoza and Henri Bergson, is acutely interested in unpacking the relation between so-called “free will” and functions of “higher” consciousness, with autonomic bodily reactions that occur in the brain but are outside of cognitive processes per se. Among the examples he considers is the mystery of the missing half-second, which in brief is the strange phenomenon of the half-second lag time between a non-anticipated sensory stimulus and the brain’s cognition of and interpretation of its happening, which is then “back-dated” in time to seem as if the cognition and the stimulus was simultaneous. The most obvious example of this is if you are suddenly burned by scalding water – you feel no pain for at least one-half second, even though all of your pain receptors have fired, as the sensation has yet to be interpreted as pain by your brain.

Why should we care about the missing half-second? Audio engineering, like many musical-technical practices, involves matters of extremely quick timing, of motor movements that  appear to happen at will and to result from conscious volition. However, it is impossible for actions of this rapidity to be cognitive or entirely conscious. Therefore, an understanding of these temporal micropractices (or perhaps, microtemporal practices) requires an understanding of the training the body goes through in order to perform audio engineering tasks. Extending this, I argue that audio engineering is necessarily conceived not as a mono-sensory set of motor skills, but rather as a particular kind of synesthesia. Returning to Massumi:

“Affect is synesthetic, implying a participation of the senses in each other: the measure of a living thing’s potential interactions is its ability to transform the effects of one sensory mode into those of another. Affects are virtual synesthetic perspectives anchored in the actually existing, particular things that embody them” (Massumi 2002: 35).

I am also influenced by the work of Charles Hirschkind, who analyzes the practices of specific forms of Islam in relation to what he terms a particular sensorium. His work concerns what he terms the practice of “ethical listening” present in the contemporary Egyptian da’wa movement, where religious activity works towards cultivating a proper relation between hearing, the heart, and ethical practice:

“I approach the question of the sensorium… from the perspective of a cultural practice through which the perceptual capacities of the subject are honed and, thus, through which the world those capacities inhabit is brought into being, rendered perceptible” (Hirschkind 2001: 624).

The idea of differing, cultivated sensoriums allows us to understand the role of long-term repetitive practices (in the case of Islam, ritualized weeping and “ethical hearing”; in the case of engineering, special modes of listening and certain kinds of engineering-specific tactility and vision) in creating particular modes of being in the world. However, it would be a mistake to suddenly declare the existence of a singular audio engineering sensorium. Rather, audio engineering in different cultural, social, historical, architectural, and technological contexts has come to depend on particular sensoriums. I will sketch out aspects of a few contrasting examples.

Feeling and Timing

Let us consider the matter of audio engineering knowledge, through a couple examples of practices which show that an analysis of knowledge alone is not enough to explain how it is that audio engineering is performed. Returning to Ron’s right arm, his perceptual practices, on an ongoing, real-time basis, inform his choice of EQ, compression, and subharmonic synthesis strategies for addressing the sound of bass in a mix. Although he has a few “default” EQ or compression settings that might be a starting point for further tweaking, ultimately every song, every different electric or acoustic bass instrument produces sequences of timbres that impart unique challenges. There is no visual representation of the bass recording that assists Ron in the process of mixing, and no formula or objective measurement which adequately conceptualizes the field of practice. In other words, it is not conceptualized or abstracted knowledge about bass that informs his particular working style, but rather the simultaneous interaction between a synesthetic sensory disposition towards bass sound and a loosely-structured repertoire of signal processing techniques. Clients who come to Ron’s studio often say he “knows” how to mix bass. They’re not totally right: Ron feels how to mix bass.

Ron’s right arm draws our attention to the body as a perceiving organ, but other engineering practices are more apt for demonstrating that missing half-second in action. Let’s consider the once widespread practice of riding the fader during live vocal tracking. We could be tempted to say that the engineer “knows” what the vocalist is going to sing and anticipates the singer’s every move, but this misses the crucial detail that the timing window is so narrow that it is physically impossible, due to that half-second rule, for the engineer to have conceptualized every fader move they make. An engineer who is comfortable with the technique of fader riding – and this is a technique which requires a lot of practice – harnesses certain kinds of knowledge, but ultimately what they do entails a semi-autonomic motor process whereby their fader moves correlate near-immediately with raw sensory data, including auditory stimulus (the sound of the singer) and visual information (the sight of the singer). In short, fader riding is thoroughly non-cognitive. Knowledge of fader-riding, largely a collection of reflections on prior successful and unsuccessful attempts, does help shape a kind of special disposition of the engineer. We could even say that, in part, knowledge preforms the practice. But that knowledge alone cannot explain the immediacy of the practice, let alone the sensorium of the fader-riding engineer.

The production of büyük ses1: percussion editing in Istanbul

I conducted over two years of intensive field research in numerous recording studios in Istanbul. Two of the things that stood out to me included the incredible speed with which every part of the recording process unfolded, and the considerable manual dexterity of everyone involved, most notably the studio musicians and the engineers. To put it into perspective, it was not uncommon for an entire album’s worth of 36-60 track mixes to be: conceived, arranged, tracked, edited, mixed, and mastered within a five day period of time. No prefabricated samples were used and only a single musician was tracked at a time; tracking consisted primarily of acoustic studio musician overdubs and doubles. This work unfolded in ProTools HD facilities with a minimum of outboard gear, which were installed in fabric-covered concrete rooms that were not originally designed for music tracking. Due to the extreme nonlinearity of the rooms, and the presence of many null points and standing waves, bass frequencies presented the most significant problem – how to track them, how to fit them into a mix, how to perceive their balance within a mix, how to engineer them.

The first point about Istanbul recording production and the issues of tactility and synesthesia concerns the general sense among engineers that the sound coming from speakers was misleading and unlikely to translate from one room to another. Subsequently, the visualization of sample data in the ProTools editing window was used to garner critical information about the sound of the mix. One interesting linguistic turn epitomizes a common interchange between the arranger and engineer: bu miks nasil görünüyori – bu miks bitmi gibi görünüyor. “How does this mix look? – It looks like it could be finished.”2 In this case, the expression is not metaphoric, but rather literal, as both parties stare intently at the LCD monitors, appraising the visual impression of the waveform representations of the two-track mixdown as shown on the screen. The visualization corresponds, perhaps, to an auditory image of what the mix could sound like on an imaginary sound system in an imaginary studio, albeit one that the arranger and engineer will never have access to.

A related issue concerns percussion arrangement. The predominant arrangement aesthetic consists of numerous layered local and foreign origin percussion instruments, creating a complex, polyrhythmic texture coalescing on a small number of strong accents. With the multitude of instruments available today, and the seemingly limitless track count afforded by ProTools HD, this has led to arrangements with ten or more unique instruments playing simultaneously, many with significant energy in the low bass frequency range. In analyzing one innovative percussion arrangement of the song “Gülçini,” a traditional 7/8 horon dance piece from the Eastern Black Sea of Turkey, it is instructive to see how the primary four-bar groove was constructed (Figure 1). First, the basic ask-davul part was tracked. This part has the closest relation to what could be considered an authentically local (as-l, yerli) performance of the rhythm for a kemençe horon ensemble and dance context.3 Amongst the important musical features, this part has the correct pattern of accents, the correct relative dynamic contrast between accented and unaccented beats, and an appropriate groove (which can be defined as a pattern of expressive microtimings, of events that don’t correspond with a metronomic division of the bar).

Next, the two cajon drum rhythms are overdubbed, with percussionist Soner Akal-n listening to a mix of metronome and the ask-davul part. These parts add additional low energy on the downbeats of every measure, and on some of the third and fifth beat accents as well. Following this, the udu, frame drum, and tambourine parts were tracked separately to the mix of ask-davul and the two cajon drums. At this point, the seven-part percussion arrangement had become quite a strong dance rhythm, with four drums providing bass frequency components. As if this seven-part arrangement were not enough, the song arrangers (Aytekin Gazi Ata and Soner Akal-n) decided that there was a need for additional sounds more explicitly evocative of dance, so Aytekin and Soner went into the tracking room, laid down a large plywood box, and overdubbed themselves stomping on it. Four stereo tracks of that were created, as well as three stereo tracks featuring aspirated inhales and exhales on strategic downbeats. A staff-notated rendition of the complete percussion arrangement can be seen in Figure 2.

Figure 3 shows the ProTools visualization of just one measure, in particular the six parts that compete for space in the bass frequency range. Among these parts, the event attacks on the strong beats of the measure (one and five) appear to have been performed at different moments in time, ranging from 30 ms. prior to the bar line (the first cajon) to 5 ms. after the bar line (the second cajon). While at first it may seem that this is due to some inaccuracy on the part of the percussion performances, or a lack of attention to quantizing the beats, this could not be further from the case. The audible effect of this arrangement does not convey the sense of multiple nonaligned attacks. Due to the nature of auditory event perception, delays of less than 60 ms between events of similar timbres are typically not perceived as separate entities but rather as acoustic reflections of a single entity (Chowning 1999). Thus, the up-to-35 ms. deviation in event attacks produces an effect which sounds like one very long, evolving, and timbrally complex bass drum sound.

There are two things at play here: the first is a novel performance practice of studio percussionists, who while tracking deliberately offset particular attacks not in order to alter the groove, but instead to create parts that contribute to the illusion of a single, huge bass drum sound. Studio musicians have also developed ways of playing certain instruments while minimizing the volume of the attack component of each separate event. These techniques differ markedly from any traditional performance practice used for live performance on the same instruments.

The second element concerns the practice of engineers in using digital editing to deliberately offset bass-intensive events, when the effect isn’t successfully produced by the studio musician. As I noted earlier, hearing bass in the studio is a precarious operation, and therefore this latter operation is typically done visually by engineers, with the stereo master output of the DAW being used to measure the success. I should point out that no compression is ever used on percussion, and rarely is any EQ applied to the bass frequencies of sound sources. Therefore, this combination of practices ensures that peak amplitudes of multiple sound sources don’t combine to overload the output of the mix.

I began my discussion of Turkish recording workflows with discussion of the speed of the recording process, as well as a mistrust of, or perhaps nondependence on, studio listening. My observations in Turkey indicate that specially cultivated visual practices – seeing when the mix is done – as well as precise motor control in studio musicianship and mouse-keyboard based digital audio editing, form an integral part of the synesthesias and sensoriums of recording professionals in Turkey. Of course, listening does play a part in the creation of this music. However, what is being listened for, and how studio-situated listening relates in real-time to other sensory practices, and to what perhaps we might call a sense of imaginary acoustic ideation, has a local specificity.


Steven Connor has written extensively on the mistrust and fear of tactility and the sensing body that pervades Western historical and ethnographic writing (Connor 2000, 2004). Such feelings run deep, and have had pronounced effects both on the techniques and the subject matters of academic scholarship, although this division between knowledge and practice has not always been the norm. Inside the U.S. studio milieu, abstract knowledge is highly prized as a mark of professionalism. However, the discourses in and surrounding the studio sometimes blur an important reality: that studio work is ultimately a practice, is a craft, is something that not only requires touch and the sensing body but is first and foremost a tactile art. To understand the practices of audio engineering is to understand the synesthesias and sensoriums of audio engineering in different milieus.

Some scholars have written about “golden ears” engineers, those who have amazing listening skills that allegedly surpass those of mortal humans. I ask, what about the engineers with golden eyes and golden forearms?


My research was logistically made possible by a State Department Fellowship from ARIT (American Research Institute in Turkey) and a Fulbright IIE grant. I wish to offer particular thanks to Aytekin Gazi Ata?, Ömer Avc?, Benjamin Brinner, Ladi Dell’aira, Jocelyne Guilbault, Charles Hirschkind, Ay?enur Kolivar, Urum Ula? Özdemir, and Paul Théberge, who provided invaluable comments on earlier versions of this paper.

About The Author

Dr. Eliot Bates

University of Maryland, College Park


1. Büyük ses literally means “big sound,” and is an aesthetic feature of mixes that developed in the 1990s and became a standard mix paradigm by the early 21st century. Although related to Western concepts of loudness, the büyük ses aesthetic has indigenous origins, yet no known precedents in Anatolian traditional performance practices. See Bates (2010) for a more extensive analysis of büyük ses.

2. The producer, in Turkey, is typically absent from the recording workflow and functions largely as the financier of a project. The aranjor (arranger) refers to an individual who orchestrates a piece and manages the workflow of the recording sessions, yet unlike Western producers lacks the same degree of “creative liberty.” See Bates (2008) for more on Turkish studio work.

3. The kemençe horon ensemble consists at a minimum of a solo singer playing a kemençe (three-stringed box fiddle from the Eastern Black Sea region). More commonly, the kemençeci (kemençe player)/singer is joined by a chorus of voices provided by the horon (line) dancers. In the city of Trabzon and its surrounding area, a single ask?-davul drum may accompany thekemençe and singing. See Picken (1975) for a historical account of the kemençe.


Bates, Eliot. 2008. Social interactions, musical arrangement, and the production of digital audio in Istanbul recording studios. Ph.D. Dissertation. University of California Berkeley.

Bates, Eliot. 2010. Mixing for parlak and bowing for a büyük ses: the aesthetics of arranged traditional music in Turkey. Ethnomusicology 54(1), forthcoming.

Chowning, John. 1999. Perceptual fusion and auditory perspective. In Music, cognition, and computerized sound: an introduction to psychoacoustics, edited by P.R. Cook. Cambridge, MA. MIT Press.

Connor, Steven. 2000. Dumbstruck: a cultural history of ventriloquism. Oxford. Oxford University Press.

Connor, Steven. 2004. The book of skin. Ithaca. Cornell University Press.

Hirschkind, Charles. 2001. The ethics of listening: cassette-sermon audition in contemporary Egypt. American Ethnologist 28(3), 623-649.

Iyer, Vijay. 2002. Embodied mind, situated cognition, and expressive microtiming in African-American music. Music Perception 19(3): 387–414.

Massumi, Brian. 2002. Parables for the virtual: movement, affect, sensation. Durham. Duke University Press.

Picken, Laurence Ernest Rowland. 1975. Folk musical instruments of Turkey. London: Oxford University Press.


Kabaosmano?lu, Ya?ar. 2006. ‘Gülçini’ on Rakani. Metropol Müzik Üretim.

Figure 1: “Gülçini” basic aski-davul rhythm

Figure 2: “Gülçini” nine-part percussion arrangement

Figure 3: Protools visualization of the first measure of “Gülçini,” showing the staggered timings of low frequency percussion strokes.