Some time ago I was skimming through a handful of books dealing with music production and spatial audio when preparing the curriculum for a university course. This led me to the discovery of a peculiar combination of claims regarding how to stage recorded sounds in the depth field:
“Sounds that are closer to us are louder and distant sounds are softer, therefore, the volume of a sound in the mix can be mapped out as front-to-back placement.”
– David Gibson: The Art of Mixing (2005, p. 23)
“One of the most useful cues for range estimation is loudness.”
– Carlos Avendano ‘Virtual Spatial Sound’ (2004 p. 352)
“The loudness of an instrument affects its balance in the mix; softer instruments also sound a bit farther away. But the primary influence on perception of depth and distance is the amount of early reflections and reverberation.”
– Bob Katz: Mastering Audio (2007. p. 231)
“Distance is NOT loudness. In nature, distant sounds are often softer than near sounds. This is not necessarily the case in recording production. Loudness does not directly contribute to distance localisation in audio recordings.”
– William Moylan: The Art of Recording (2002, p. 185)
“Reverbs are the main tool we use when positioning sounds in the depth field.”
– Roey Izhaki: Mixing Audio (2008, p. 403)
In these five quotes there is notable disagreement concerning the importance of loudness as a distance cue. Gibson (2005) and Avendano (2005) describe loudness as the most important depth cue, while Katz (2007), Moylan (2002) and Izhaki (2008) find that loudness is less important, or even irrelevant, for auditory depth estimation in recordings. So, is loudness an important cue or simply an irrelevant depth cue in record listening? It is tempting to conclude that half of these scholars have got it all wrong. After all, from the point of view of psychoacoustics, loudness cannot be both an important and an unimportant depth cue. However, as I explain below, relying on psychoacoustics as a means to explain the spatiality of recorded sound may just provide a much too narrow framework for meaningful interaction with sound in music production.
Psychoacoustics is essentially the study of how hearing works, that is, how sounds are processed by the ear. Thus, it assumes a conceptual judgement of sound that is directly tied to an external (mind-independent) reality. This has little to do with sense-making per se, which, in the main, belongs to the field of cognition. It could even be argued that psychoacoustic evidence does not possess any real value in the aesthetic decisions that govern the tasks mixing engineers are faced with (e.g. how does a mix engineer meaningfully and intuitively respond to a mix that requires certain spatial adjustments). Still, psychoacoustics provides a common framework in literature on music production (e.g. Rumsey: 2001; Corey 2010), although in recent years more writers apply cognitive approaches to the study of recorded music and record production (e.g. Moore, Schmidt and Dockwray 2009; Zagorski-Thomas: 2014).
In this paper spatiality serves as an example of a much larger discussion pertaining to how music producers make sense of their actions when mixing. The focus is to explore the embodied cognitive abilities that govern interaction with sound in the music studio. I frame my discussion around cognitive linguistic theory (Lakoff: 1987; Johnson: 1987; Kövecses: 2002; Gibbs: 2008) building on the embodied mind paradigm. Using this framework, I discuss how conceptual metaphors and image schemas function as hidden structures in our cognitive system that we use to make sense of the often abstract—or at least multifaceted— tasks required for interaction with sound. I propose that music mixing builds on three master metaphors: the signal flow metaphor, the sound stage metaphor, and the container metaphor. Each of these metaphors is discussed below. Finally, I present some future perspectives for interface designs that better reflect the different ways sound engineers, producers and designers make sense of their interaction with the mix.
Cognitive Metaphors in User Interface Design
Relying on visual metaphors is often seen as a means to design more intuitive user interfaces. In a recent article, Bell and colleagues (2015) present a history of visual design metaphors, where they assess recording software’s typical dependence on visual design metaphors— skeuomorphism—from analogue music production technology. They then argue that this dependence is likely to change in the future; and perhaps hardware design will even be modelled on software design as users come to adapt to novel interface schemes from music production software. Indeed, visual metaphors can assist the user in recalling prior knowledge in order to complete the task at hand. Visual metaphors, however, only have this function if they are familiar to the user. Furthermore, the user’s understanding and ability to interact are also shaped by prior knowledge and experiences that are not tied specifically to music production.
The principles of cognitive metaphor theory—building on the idea of the embodied mind— provide a much broader platform to assess music production interface design. Several researchers in sonic interaction design, for example, have used the principles of embodied cognition to map body movements onto parameters of sound (Wilkie: 2009; Loeffler et al.: 2013). As an example of the benefits of this approach, Antle and colleagues (2009) have shown that interfaces built on embodied concepts are much easier to learn.
The original principle of cognitive metaphor theory—when the field emerged more than three decades ago—is that actions, reasoning, and thought are structured by cognitive schemas that are formed through previous sensory-motor experiences. A metaphorical system operates at the back of our minds in every action and it allows us to make sense of what we do and to act appropriately. While many of these embodied sensory-motor experiences are nearly universal to humans, more recent research in cognitive linguistics shows that our cognitive capacities are also shaped by specific cultural understandings and learning processes (Beilock: 2009; Slepian: 2014). Hence, cognitive metaphors are not stable. While they are based, to some degree, on previous sensorimotor experiences common to most humans, they are also shaped by culture and other contextual aspects, such as the person’s specific training, cultural understanding and the historical context. For the sake of clarity these conditions are listed below:
- Previous sensorimotor experiences
- Contextual understanding (history, culture)
- Learned concepts
In the following section, I outline how each of these conditions for embodied knowledge contributes to the forming of the three master metaphors in music production. In doing so, I discus how the cognitive structures governing sense-making in music mixing have changed with new technologies and new terminology in the field. For example, a sound engineer that is taught to use a specific set of linguistics concepts for a specific music mixing operation may possess other embodied cognitive capacities and respond differently to a sound stimulus than another sound engineer who learned to use other concepts for the same operation.
As already mentioned, I base my thesis on three master metaphors. I will provide closer descriptions of each of these in the following sections but it is worthwhile presenting them here in the form of graphical representations (fig. 1) that highlight a few key differences.
Figure 1: Three cognitive metaphors governing music production
The signal flow metaphor is represented as a channel strip—a piece of technology that, here, serves as a symbol for a technology-driven mode of reasoning in music mixing. The sound stage is a representation of the phenomenal space of the recording that uses the actual performance space in live music as a metaphor to make sense of a recording’s spatiality. Finally, the container metaphor takes the form of a cognitive schema that operates as an organizing structure or mental representation of experience and action. These metaphors, then, operate from different perspectives—technology, phenomenal world, and cognition— and they serve here as examples of how different approaches to mixing might be theorized.
The Signal Flow Metaphor
The signal flow metaphor is one of the earliest and most persistent metaphors in the age of recording technology. As the name indicates, it started taking shape with electrical recording technology in the 1920s, where sound was no longer recorded directly onto disc or cylinder, but had to follow an electrical signal path from source through microphone then onto disc. Adjustments to the electrified signal could be made at gain stages that could either interrupt the signal flow or allow some part of the signal to pass through.
The specific form of the signal flow metaphor I explore here found its present form by the end of 1950s with the implementation of the dynamic fader in mixing desk design. It is generally acknowledged that Tom Dowd was the first person to install dynamic faders on a mixing desk. As sound engineer in Atlantic Studios, NY he was among the first group of sound engineers with access to an eight track Ampex Recorder (number 3 in fact, Mitch Miller and Les Paul owned the first two, see Cunningham: 1996). Tom Dowd found that it was difficult to operate eight tracks with rotary knobs and therefore decided to swap the knobs with dynamic faders instead. Having eight dynamic faders at his disposal he could adjust several tracks at the same time with only one finger on each fader, which allowed him to ‘perform’ the mixing process in a much more intuitive way (Moorman: 2003).
Now, as we know, the faders were made to function in such a way that the volume increases when the fader is pushed forward/up and vice versa. Naturally, most engineers today take the direction of the fader movements for granted—after all, this layout has been around for more than a half a century. However, if we instead imagine a set of ‘reversed’ faders, where the sound levels increase when you pull the faders towards you, the motor-activation the engineer would have to perform would fit the idea that increasing loudness makes sound sources appear closer as Gibson and Avendano propose (e.g. you pull the fader towards you when you want the sound to appear closer and vice versa) . So, why are fader movements not designed this way as default? The classic fader design makes sense from the point of view of signal routing. If full gain is brought up to the faders, the signal is allowed to pass through when you open up the gate by pushing the faders forward/up, and the gate is closed when you pull the faders towards you/down. There is thus a logical link between the way the sound flows through the technology and the actions the sound engineers perform.
More interesting is that fact that, more than three decades into the age of digital technology, the signal flow metaphor is still the key model for the design of music mixing technology— both in the hardware and software domain. Graphical representations of the analogue channel strip layout are found in most of today’s DAWs, not to mention buttons illustrating tape loops, scissors, glue and other tools from the age of analogue recording. Given the opportunities to redesign music production interfaces with digital technology, one should think that there are no, or only few, technological constrains that may hinder interface designers to change the metaphor. Implementing the logic of the analogue in the digital has, of course, obvious benefits in terms of allowing music producers to make a smooth transition from one domain to the other, but the reasons for sticking to this logic are fading.
The signal flow metaphor is a ‘learned’ metaphor reflected in textbooks on music production and represented in much of the recording technology we operate. Thus, its strength is primarily related to the fact that many mix engineers have learned to operate equipment modelled on this logic—rather than because the metaphor reflects more universal embodied capacities. If this thesis is right, it is fair to speculate that the signal flow metaphor will become weaker in the years to come—or perhaps gradually disappear. Many students enrolling on music production courses at universities today have never, or only occasionally, laid their hands on analogue equipment; needless to say, the way they make sense of music production technology is not necessarily as bound up with the logic of analogue equipment as it is for their older colleagues. It is, therefore, reasonable to believe that other interface metaphors will grow stronger in the future and lead to music production technologies that afford other modes of interaction.
The Sound Stage Metaphor
In 1992 William Moylan (2002) proposed the concept of the sound stage as an analytical tool to evaluate stereo mixes. Although Moylan did not himself envision the sound stage as a metaphor, but as some kind of perceptual object (in fact Moylan explicitly states that metaphors are weak concepts that should be omitted in analyses of sound), it makes sense to think of the sound stage as derived from a performance metaphor, where individual sounds ‘belong’ to a performer located on a two-dimensional stage. The task of the mix engineer is then to distribute these sound sources in an appropriate way.
If we think of tracks as perceived performances, every sound is an index for what produced it and for the physical gesture involved in its production. Meaningful interaction with recorded music thus relies on visual imagery where the location of musicians on a stage is brought to mind. Albin Zak (2001) proposes that the “natural image” serves mainly as the point of reference for further manipulation and the creation of different forms of unique textures that are created for more dramatic effect and distinctiveness:
”The point of reference for a conventional stereo mix of a rock band is the visual image presented to a listener facing a bandstand. The vocalist is at centre stage with the drummer just behind — the kick and snare drums are dead centre, the high hat to one side, toms and cymbals spread from left to right. Because of its powerful sonic presence and its function as anchor of both groove and chord changes, the bass is also usually placed in the centre. Other instruments in the arrangement are spread across the stereo spectrum, as they would be on stage, balanced in such a way that all can be clearly distinguished. This ’natural’ image is often manipulated for expressive effect, however, and a distinctive stereo image is often a thematic feature of a track.” (Zak: 2001, p.145)
In Zak’s account, it is still the perceived performance that serves as the underlying model for interacting with the recording. The creation of more expressive effects could then be viewed as ways to extend and enhance the recording’s expressiveness, while the underlying metaphor ties the different auditory experiences to a coherent phenomenal situation, where the listener participates in an imagined performance event. Philip Auslander has called the affective state emerging from this participation the experience of liveness by arguing that recordings form a technologically mediated co-presence of a performance, in spite of the fact that recordings represent a temporal gap between the performance and the reception (Auslander: 1999). The notion of liveness thus corresponds to the idea that recordings can create a perceived performance environment (Moylan: 2002), that is, the illusion of a shared space that includes both the listener and the perceived performance.
Although one of the key tasks for many music producers is to design mixes that produce such affective relations in listeners, few interface designers have made any use of the sound stage metaphor. The introduction of digital audio in some of the big studios during the 1980s and the shift to mixing interfaces based solely on computers in the 1990s allowed every opportunity to implement the sound stage in the graphic design, but the signal flow metaphor remained the dominant basis for the design of DAWs coupled with the analogue tape metaphor.
Recently, Steven Gelineck and colleagues (2013) developed a music-mixing controller for iPad—Tangible Mix (fig 2.)—that builds on the sound stage metaphor. In this interface, dragging elements towards you make them appear closer to the front edge of the sound stage and vice versa. The interface allows the user to make sense of the recording’s spatiality in terms of the sound stage and also to interact with the sound without having to mentally shift back and forth between the signal flow metaphor and the sound stage metaphor. The application, then, combines the evaluation methods Moylan proposed—that are taught on many music production programs at universities around the world—with practical aesthetic decisions required from the engineer in the mixing process. As part of a current research project (Gelineck and Walther-Hansen: 2016) the interface is being evaluated in specific teaching situations at Aalborg University, Denmark, where students are being asked to use the interface to create mixes that fit specific style idioms. These tasks provide a means to further assess the suitability of the sound stage metaphor to perform different tasks in different parts of the mix process.
Figure 2: Screen shot of ‘Tangible Mix’
The Container Metaphor
In a recent article (Walther-Hansen: 2015), I used a specialized linguistic corpus (approx. 50 million words) to explore the language of sound engineers and other audio professionals. I found that the concepts sound engineers use to describe sound and the actions they perform to edit the sound often point to an underlying cognitive container schema that governs how they experience recorded sound and how they describe sound quality in language. Consider the following examples from the corpus:
- Bring out the vocal a little bit more
- Open up the sound
- Compress the sound more
- The mix sounds too crowded
- I needed a longer reverb to fill in spaces
- Keep a hole open in the middle
While these expressions may appear random they are, in fact, highly structured. The expressions (metaphors are in italics) are different representations of the container metaphor and the examples show how this schema structures our experience of sound and how we put this experience into words. The recording is not physically a container but we impose a container schema onto the recording to make sense of it, and to make sense of our actions when interacting with sound. Consider, also, how we might say that sounds and sound sources are in the recording, that sounds should be compressed, expanded, and so on. While some tracks may generate an experience of an empty or only partially full container, other tracks may activate a sense of fullness.
Following the internal logic of this metaphor one might say that some sound sources in the recording take up more space than other sound sources. Low frequency sounds, for instance, are often characterised as both larger and heavier than high frequency sounds. Also, loud level sounds often appear larger than low level sounds (Gibson: 2005). Sounds thus have different sizes that take up more or less space in the container. The boundaries of the container may have different characteristics. We can think of containers that enclose the sounds in different ways, providing a more closed or open structure; for example, in some tracks the sounds may seem fixed and constrained, while other tracks have sounds that are more loosely constrained (see Walther-Hansen: 2014 for a broader overview of the container metaphor).
Although, as I have previously argued (Walther-Hansen: 2014; 2015), the container metaphor is a prevalent and established metaphor in the language of sound and this metaphor is deeply grounded in embodied sensorimotor experiences, it remains to be acknowledged as a model to design music production technology. The reasons are undoubtedly tied to the lack of an obvious way to put the container metaphor into operation with present day technology because this would require some kind of mid-air interface with haptic feedback that would allow for precise adjustments of audio (fig. 3).
Figure 3: Mid-air interaction
However, a number of technologies point to the possibility of such interfaces becoming a reality in the future. One of them is Disney’s AIREAL system (Sodhi et al.: 2013) that uses compressed air to provide haptic feedback in mid-air. Second – and more promising – a group of researchers from Bristol University has developed a prototype of a mid-air interface that provides haptic feedback with ultrasound (Long et al.: 2014). Contrary to Disney’s system, this allows the user to shape objects in mid-air with haptic feedback and this, then, allows the user to move elements of sound around, to expand the container, to compress the container and so on in three-dimensional, mid-air interaction.
I suggest here that mid-air interface technology offers—at least in theory—the prospects for the realization of the container metaphor in music production, yet these technologies are currently lacking in a number of areas (e.g. the detail by which you can control ‘objects’ in free air). As such deficiencies are improved or become accepted by the user, mid-air interfaces are likely to provide the technological means for the attainment of a more intuitive interface.
Conclusion and Future Perspectives
In this paper I proposed three master metaphors for music production that govern interaction on many levels on the creative process. The goal of this research is to facilitate the design of new interfaces for music production that more closely reflect how the user make sense of the mix and allow for more intuitive interaction with recorded sound. My study shows that the three metaphors—signal flow, sound stage, and container—function in different ways as mental representations of the mixing process, and these mental representations are activated when performing different tasks (e.g. interaction with technology, evaluating a track etc.) that allow us to assess their applicability as interface metaphors.
The signal flow metaphor makes sense from the point of view of technology and thus fits into an ‘engineering’ logic, where an understanding of the ‘flow of sound’ through the mixing desk is the key element. It ensures a systemic understanding of how the technology works and it allows the engineer—assuming the engineer has learned to operate the equipment—to make a causal connection between the operations he/she makes and the alterations made to the sound at different stages as it flows through the system.
The sound stage metaphor arises from a search for phenomenal coherence in recorded sound. The sound stage is a mental representation of an experienced auditory ‘reality’, where sounds are causally related to imagined physical sources located on an imagined stage. This metaphor can be (and is) implemented with current technology and it allows for a direct link between auditory stimuli, visual feedback and haptic interaction. Still, it presupposes a view on sound as sound sources, that is, a causal form of listening (Chion: 1994), that affords certain kinds of interactions (e.g. moving sound sources around in the imagined space of the recording) at the expense of others (e.g. shaping the sound’s timbral qualities).
The container metaphor functions as a cognitive schema that is activated when approaching sounds as shapeable objects. This is the position of the sound designer or producer, rather than the engineer. As suggested earlier, if we understand and reason about sound as metaphorical objects and the imagined space of the recording as a metaphorical container, the best way to convert this understanding to meaningful interaction is to develop interfaces that allow us to shape and move sounds as if they were three-dimensional objects and the entire mix as if it were a three-dimensional container. This is not, or not yet, possible with current technology, but there are certainly future prospects for the realization of the theoretical framework presented here.
As previously mentioned, it seems likely that we use different metaphors in different parts of the process. In that case, the ideal controller interface should change its appearance to fit each part of the process. Future studies should explore which metaphors are most relevant in any given part of the music production process, relevant to specific genres, or relevant as a framework to work towards specific aesthetic ideals and so on. Changes to the master metaphor of music mixing also lead to questions about what to teach our music production students in the future. Will the gap between engineers (who know how the signal flow works) and designers or producers (who are primarily concerned with the quality or aesthetics of the sounds) grow bigger? And do the latter need to know about signal flow in analogue mixing desks at all? If not, what kind of interfaces should they learn to operate? Will it then be possible to focus exclusively on the ‘design’ tasks of music production as opposed to the more traditional ‘engineering’ tasks? Such questions are again linked to the conditions of sense-making and how well the technology we use reflects our thinking; to this end, I have here presented different forms of sense-making that can serve as models for future advances in this field.
Antle, A. N., Corness G, and Droumeva, M. (2009). ‘What the Body Knows: Exploring the Benefits of Embodied Metaphors in Hybrid Physical Digital Environments. In: Interacting with Computers, 21, pp. 66–75.
Auslander, P. (1999). Liveness: Performance in a Mediatized Culture. Oxon: Routledge.
Avendano, C. (2004). ‘Virtual Spatial Sound’. In: Y. Huang and J. Benesty (eds.), Audio Signal Processing for Next-generation Multimedia Communication Systems. Kluwer Academic Publishers.
Beilock, S. L. (2009). ‘Grounding Cognition in Action: Expertise, Comprehension, and Judgment’, In: Progress in Brain Research, 714.
Bell, A., Hein, E. and Ratcliffe, J. (2015) Beyond Skeuomorphism: The Evolution of Music Production Software User Interface Metaphors. In Journal of the Art of Record Production, Issue 09, April.
Chion, M. (1994). Audio-Vision: Sound on Screen. New York: Columbia U.P.
Corey, J. (2010). Audio Production and Critical Listening: Techncical Ear Training. Focal Press.
Cunningham, M. (1996). Good Vibrations: A History of Recording Production. Chessington, Surrey: Castle Communications.
Gelineck, S. et al. (2013). ‘Towards an Interface for Music Mixing based on Smart Tangibles and Multitouch’. In: New Interface for Musical Expression. Daejeon/Seoul, Korea Republic: Association for Computing Machinery.
Gelineck, S. and Walther-Hansen, M. (2016) Tangible Mix as a Learning Tool for Music Production. Forthcoming presentation at the Art of Record Production Conference, Aalborg, Denmark.
Gibbs, J. R. W. (2008). The Cambridge Handbook of Metaphor and Thought. Cambridge and New York: Cambridge University Press.
Gibson, D. (2005). The Art of Mixing: A Visual Guide to Recording, Engineering, and Production. Boston: Thomson Course Technology.
Izhaki, R. (2008). Mixing Audio: Concepts, Practices and Tools. Boston: Focal Press.
Johnson, M. (1987). The Body in the Mind: The Bodily Basis of Meaning, Imagination, and Reason. Chicago London: The University of Chicago Press.
Katz, B. (2007). Mastering Audio: The Art and the Science. Oxford: Focal Press.
Kövecses, Z. (2002). Metaphor: A Practical Introduction. Oxford and New York: Oxford University Press.
Lakoff, G. (1987). Women, Fire and Dangerous Things: What Categories Reveal about the Mind. Chicago and London: The University of Chicago Press.
Loeffler, D. et al. (2013). ‘Developing Intuitive User Interfaces by Integrating Users’ Mental Models into Requirements Engineering. In: Proceedings of the 27th International BCS Human Computer Interaction Conference.
Long, B. et al. (2014). ‘Rendering Volumetric Haptic Shapes in Mid-Air using Ultrasound’. In: ACM Transactions on Graphiphs, 33(6).
Moore, A. F., Schmidt, P., and Dockwray, R. (2009) A Hermeneutics of Spatialization for Recorded Song, In: twentieth-century music 6 (01), pp.83-114
Moorman, M. (2003). ‘Tom Dowd and the Language of Music’ (documentary movie).
Moylan, W. (2002). The Art of Recording: Understanding and Crafting the Mix. New York and Oxford: Focal Press.
Rumsey, F. (2001). Spatial Audio. Oxford: Focal Press.
Slepian, M. L., and Ambady, N. (2014). ‘Simulating Sensorimotor Metaphors: Novel Metaphors Influence Sensory Judgments’. In: Cognition, 130, pp. 309-314.
Sodhi, R., Poupyrev, I., Glisson, M., Israr, A. (2013): ‘AIREAL: Interactive Tactile Experiences in Free Air’. In: SIGGRAPH Conference Proceedings, Pittsburg.
Walther-Hansen, M. (2014) ‘The Force Dynamic Structure of the Phonographic Container: How Sound Engineers Conceptualise the ‘Inside’ of the Mix’. In: Journal of Music and Meaning, 12.
Walther-Hansen, M. (2015) ‘Balancing Audio: Towards a Cognitive Structure of Sound Interaction in Music Production’. In: Proceedings of Computer Music Multidisciplinary Research, Plymouth, UK.
Wilkie, K., Holland, S., and Mulholland, P. (2009). ‘Evaluating Musical Software Using Conceptual Metaphors. In: Proceedings of BCS Human-Computer Interaction, 1- 5 September 2009, Cambridge, UK.
Zagorski-Thomas, S. (2014). The Musicology of Record Production, Cambridge: Cambridge University Press.
Zak, A. (2001). The Poetics of Rock: Cutting Tracks, Making Records. Berkeley: University of California Press.
 For several decades, mixing desks at the BBC in fact had reversed faders. This was not, however, an attempt to adapt to a phenomenal perspective rather than a systemic logic. The reversed faders had a more pragmatic explanation. Allegedly the radio speaker would accidently hit the faders once in a while with his notebook. Instead of accidently increasing the sound level to the max, hitting the faders would shut the sound off, which was seen as far better.