Music is made with our hands. Music is made with our feet. Music is made with our bodies, our voices, our rhythms—our movements make music. We see this in the way we make instruments, the way we play them, and the way we move to the sounds they make.

How can we take such an inherently physical art and create digital tools and artifacts for it? How can we understand the designs of our software as being participant in mind-body-artifact relationships? How can we design software tools that amplify our natural human rhythms?

The purpose of this piece is to consider the role that Gesture plays in the music-making process, to examine the ways that our tools interact with Gesture, and to explore how different digital devices provide sensing abilities to capture and translate physical gestures to digital data.

Defining Gesture

Gesture is the product of motion, a “configuration of curves in space and time,” per Wikipedia**.** In his introduction to Interpreting Musical Gestures, Topics, and Tropes, Robert Hatten describes how “musical gestures are often made distinctive through specific articulations, dynamics, and pacing or timing—and given unique shape by the systematic potential of rhythm and meter, texture, and timbre.”

Gesture is the lens through which we view the shapes and interactions in movements, and how we understand the dynamics in motion. Gesture forms the medium through which we can communicate meaning, where expression can rise above the discrete, serial nature of linguistics into a continuous world of parallel, entangled constructions.

Put by Hatten, “human gesture may be understood as a fundamental and inescapable mode of understanding that links us directly to music’s potential expressive meaning”. Put by Tversky/Jamalian, “gestures transform actions on perceptible objects to actions on imagined thoughts, carrying meaning with them rapidly, precisely, and directly”.

Tversky/Jamalian focus on how gesture, as well as language and graphics, makes connection between physical tools, like our bodies and created artifacts, and cognitive tools like thoughts. Artifacts like hammers and chalk allow us to make use of our minds and bodies in ways that augment and extend our ability to act on the world, and have tangible effect on expanding the outcomes we can achieve in the world. Usage of these artifacts allows us to flow thoughts through our bodies and into the world in amplified and extended ways, making their existences useful and meaningful.

Designing with Gesture

In designing and creating musical tools, it is paramount that we preserve the ability to make music with our bodily movements and pay attention to the ways we are extending or constraining the set of possible expressible gestures. Musical tools and artifacts, means of amplifying our minds and bodies, are not neutral—they have attitudes and natures that promote certain outcomes.

If we consider the computer, the instrument du jour for making modern music, our means of interaction is limited to our eyes and hands. While we can still play instruments, record sounds, record automations into our computers—each having the potential to engage different arrangements of our bodies—when we execute actions within the computer, our hands must be close together and work along fixed planes to press buttons or move a mouse/trackpad.

Bret Victor leverages a critique towards this default position towards our modern technology (paper, books, computer) in The Humane Representation of Thought, saying that we’ve invented a style of knowledge work that involves “sitting at a desk, staring at a little tiny rectangle and making little motions with our hands.” His critique extends back several hundred years, to the time before computers, but it has become especially relevant in music work in the recent decades, as we moved our work into smaller rectangles and do work through smaller motions of our hands.

While theorists like Hatten are excited by the “synthetic and emergent aspects” of gesture leading to potential for human expression, Victor cautions that in designing software artifacts, we are directly involved with an environment that limits human gesture, to an extent he views as “inhumane.” Victor’s way out is to think outside the box, to view the computer as a set of tangible, interactable objects within a physical space. This allows a broader set of gestures to be executed within a computing environment. So what is this computing environment, in the musical sense?

Musical Gestures

When we consider musical devices like the flute, piano, and harp, we find that the way we arrange our body to facilitate gestures is not so different from digital musical devices like computers and phones. In the upright, seated positions, we find similarities to how we sit when using a computer. In the ways we arrange and move our arms, we find similarities to how we use a mouse, keyboard and touchscreen.

However, though we find that our bodies can occupy similar positions, we find fewer degrees of freedom to express gestures with our hands on computers and phones than musical instruments.

When we play the piano, we transmit some piece of intentional data through a gesture of the hands onto the keys. The keys we strike, the pressure at which we strike them and the length we hold them down communicates our intention for what sounds we wish the instrument to make and what properties they should have. Our gestures model this sound, and the piano reacts mechanically to produce it.

The keys on a computer keyboard are not so sensitive. While different gesturing to a piano key can create different “values,” the gesturing a computer key only transmits one value, that the key has been pressed. In contrast the piano, while we can communicate what note to play, when to play it and for how long, we can’t communicate well at what strength to play it and more subtler notions of how the note should sound.

This inability to express certain properties within music computing environments can be thought of as a loss of gestural possibility. Physical objects have high gestural possibilities, namely all the ways we can think to interact with them. Digital objects have gestural possibilities defined by their APIs or programmed abilities, so it is up to the programmers to design ways that all can interact with these objects. Often, these designs under-appreciate the full range of gestures possible with these objects, and in doing so constrict the interaction possibilities.

Digital objects require an extra level of indirection, in which a model for the acceptable gestures is produced, specifying the format for the data that is to represent the gesture. This allows sensors to record and format signals in accordance with this model, such that the signals can be understood and acted upon.