The Beat-Boxer and the Steam Machine: a paradigm of primitive speech

Let us suppose a beat-boxer, one of those gifted with the skill to reproduce a whole orchestra of percussive instruments using only his voice, and let us have him put in charge of a large and complicated steam-driven machine of the sort which has something fed in at one end, processes it, and puts the finished article out at the other, all the while whistling and hissing and clanking with various rods and cranks and and cogs and arms moving up and down and round and round. It is not entirely automatic, so the beat-boxer has to dance about it, furbishing it here, oiling it there, pulling this lever, turning that valve, as well as loading it at the start and unloading it at the end of each cycle.

Engines of PS Savoie, Lac Leman, built by Sulzer of Winterthur

As he moves, he accompanies himself with a soundtrack of his own improvising, which imitates the rhythm and the sounds of the machine, with some interjections of his own, and punctuated with movements and gestures. If he has a purpose at all in doing this, it is primarily aesthetic: he does it for the joy of it. But that is not to deny that it is useful to him: it keeps him attuned to the rhythm of the machine and serves as a kind of mnemonic (maybe the noises of his own that he inserts correspond to different actions that he performs at various points in the cycle). You might even imagine a circumstance where he uses his beat-box track as a tool to instruct someone else in operating the machine, because there is a correspondence between it and the various stages of the process of operating it.

If you recorded the beat-boxer going about his business, you could analyse the sound-track to bring out that correspondence, identifying elements of the track that corresponded to this movement of the machine, or that part of the process, or this action on the part of the operator. In this world of viral videos on Tik-tok and YouTube you could imagine him expanding his repertoire to include other kinds of machinery, different sorts of operation, each with its own soundtrack. The soundtracks on their own would, in a sense, embody the operations – an instance of the true sense of synecdoche, where a whole is conjured by a part. Yet at the same time each soundtrack would be an improvisation, not consciously devised, only incidentally having a structure that corresponded to something else.

This, I would suggest, serves as a paradigm for how human speech could incidentally evolve a structure corresponding to the world in which it was created, a structure that (once discovered) could be parsed and analysed into elements that correspond to things in that world, standing in relations that correspond to the relations in that world – and yet at no time is there a conscious ‘naming of parts’, no ostensive definition where we say a word and point to what it means.

This is the solution, for me, of a problem that has troubled me in a theory I have been evolving for some time. My thesis is that Language, as we know it today, is an artefact of writing, specifically of writing used to transcribe speech (something that does not happen till about a thousand years after writing is invented). My reasoning is that it is only when speech is made visible and we have a chance to study it that we can discover the structure that underlies it, a structure we can then analyse into words and grammatical relations.

The question that requires to be answered is where that structure came from, and how does it come to be made up of elements that correspond to things in the world, if it was not expressly devised to do so? And that, I believe, is the question that my example of the beat-boxer and the steam machine answers. Speech in its initial form, I suggest, is no more than the soundtrack of specific human activities, bound up with a larger pattern of gesture, movement and expression that comes naturally to humans engaged in any activity. Its key element is probably rhythm and its character is largely mimetic (or interpretive, if you like): we bind ourselves to the task in hand by improvising sounds and gestures to accompany it. It is, I would suggest, a pleasurable activity, akin to music-making, and its primary motivation is aesthetic: it expresses how it feels to be doing whatever it is – or if you like a larger canvas, how it feels to be human, in this world, doing this thing.

Tens, or probably hundreds, of thousands of years of human activity (which is probably of a fairly consistent character, given that it’s the same sort of creatures living in the same world doing the same sorts of things) will render the improvisation of such soundtracks a matter of instinct and intuition, much like birdsong, with the young attuned to learn it from their elders. And of course I say ‘soundtracks’ only to emphasise the role played by speech – in reality, it is an expressive performance, led by facial expression, gesture and movement, in which speech plays only a contributory part, one very much bound up with the rest and only separable from it when, much later on, the invention of writing (eventually) provides the means of making speech visible – and so capable of study.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.