Speech-to-Text and Algorithmic Smoothing

C&W 2022

Speech-to-text technologies bear affinities to other modes of writing and have the potential to become a dominant writing method in the future. As the composition of human speaker/writer and technological listener/writer continues to evolve, new modes of subjectivation and new kinds of writing/speaking subjects are produced, offering new ways of understanding and theorizing both writing and subjectivation.

Algorithm as Talisman

Algorithms are an often misunderstood tool, and can range in complexity from the mundane to the abstruse. In common usage, the term "algorithm" often stands in for tasks performed by a computer that are beyond a layperson’s understanding.

Facebook’s algorithm,” for example, “really means Facebook, and Facebook really means the people, things, priorities, infrastructures, aims, and discourses that animate the site” (Gillespie, 2016, p. 24).

By invoking the term “algorithm” as a talisman, one hopes to ward away claims of subjectivity or manipulation. If an algorithm is involved, this view presumes, the process must be “mathematical, logical, impartial, [and] consistent” (Gillespie, 2016, p. 23). Of course, while the rhetorical aims of using the term might imply otherwise, algorithms themselves have no such requirements.

Statistics and Probability

While it might seem logical to create a speech-to-text system based on a database of words and their corresponding sounds, such library-based models are wildly impractical and actually don’t work as well as probabilistic models.

Probabilistic models for speech-to-text rely on Hidden Markov Models (HMMs).

Deterritorialized Smoothing

Félix Guattari describes deterritorialized smoothing as the way in which entities must be altered somehow in order to prepare to fit into an existing schema in the world.

A key must be detailed enough to open a specific lock but not so detailed that it contains superfluous information which might cause the lock to reject it. Smoothing is a sort of paring down to make complexity manageable and operable, and it is a feature of all semiologies, signifying or a-signifying. Reality is too “rough” to be integrated into language, for example, so representations shave off certain details, and catch only the inflection points of interest.

Alphabetic Algorithms

Alphabetic cultural techniques take an oral input (spoken sounds, or at least imagined spoken sounds) and produce a literate output (symbolic writing)—and vice versa.

Linguist Roy Harris is quick to point out that there is nothing intrinsically phonetic about the letterforms of alphabetic characters. There are many ways to use writing to “record” a sound produced during speech (as evidenced by, among other things, the experimental spellings of children), and many ways to divide up the stream of sounds (certain alphabets have a distinct letter, for the ch sound, for example, while some use two letters). It is not the material of the alphabet that allows it to correspond to speech, but (in Guattari’s terms) its diagrammatic form—the inflection points that allow it to be distinguished from other letters and to be arranged as part of a system with those other letters

Nonsymbolic Futures of Writing

Writing not as final output, but as intermediary “inside” software.

“When we problematize the conventional, representational definition of alphabetic writing, we have the opportunity to redefine alphabetic writing as a (relatively low-resolution) transductive mark-up language; and when that happens, we have the opportunity to think in far broader terms about the scope of our field” (Rieder, 2017, p. 135)