Cross-Modal Abstraction and Selection Pressures in the Evolution of Language

The question of the origin of language has fascinated scientists and philosophers for centuries. Speculation has ranged from “divine intervention” to human invention to Darwinian accounts of the emergence of language (Donald, 1991). In fact, the speculation was so rampant that in the 1860’s, several scientific societies banned the publication of papers on the topic (Holden, 2004). Though recent research has focused on evolution as the driving force in the emergence of language, the view that language is not the product of natural selection has been put forth (Chomsky, 1972). The present discussion considers some of the evolutionary mechanisms and selection pressures that might have resulted in the emergence of language.

Cross-Modal Abstraction and Language

Although it has been suggested that language ability arose merely as a byproduct of other cognitive adaptations, it appears likely that language was a specific target of natural selection (Pinker & Bloom, 1990). This is not to say that general cognitive abilities are not necessary for language, only that they are not sufficient to account for linguistic ability. In fact, there is evidence that language ability can be disturbed without impairing other cognitive function (Donald, 1991).

One ability that has been suggested as a pre-requisite to language has been termed ‘cross-modal abstraction’ (Ramachandran & Hubbard, 2001). Essentially, this refers to the ability to make abstract associations between two different sensory modalities, such as vision and audition. People with a rare condition known as synesthesia, as well as some users of hallucinogenic drugs, report experiencing cross-modal interactions. Specifically, some synesthetes report actually seeing a particular color in response to specific musical notes. Others report that each number is a particular color. For example, the number 5 might be red, and the number 2 green and so forth. The ability to make abstract associations between senses has also been demonstrated in people without synesthesia, however they do not actually physically experience the sensation as in synesthesia.

Evidence for cross-modal abstraction in humans comes from a variety of sources. Ramachandran & Hubbard (2001) report that when participants are shown a curvilinear shape and a rectilinear shape and asked to identify which one is a ‘bouba’ and which one is a ‘kiki’, 95% identify the rectilinear shape as the kiki and the curvilinear shape as the bouba (*). Presumably this is due to the "e;roundness"e; of the sound bouba and the "e;sharpness"e; of the sound kiki. Though this finding is consistent with the idea that cross-modal abstraction occurs, more cross-linguistic research is needed to determine whether this effect is confined only to certain languages or cultures. For example, it may be that the effect does not exist in cultures where there is a low occurrence of rectilinearity in the environment. In addition, it could be due to a similarity between the letters that represent the sounds rather than the sounds themselves. There is at least one indication that the effect is not universal, as Rogers & Ross (1975) failed to replicate the findings in a sample from a tribe in Papua New Guinea.

Another line of research alternately termed phonosemantics or sound symbolism has investigated the notion that, contrary to the belief that words are arbitrary, some sounds possess a non-arbitrary relationship to their semantic content (Allot, 1995; Magnus, 2001; Nuckolls, 1999). For example, the phoneme combination /sn/ appears disproportionately in words involving the nose and mouth (e.g. sneeze, snot, snorkel, sniff, snout, etc.). In another example, English speaking participants were presented with words for different types of birds or fish from a completely unrelated language. Participants guessed at better than chance levels if the name was for a bird or fish (Berlin, 1994, cited in Ramachandran & Hubbard, 2001).

A final line of research investigating cross-modal abstraction is that of metaphor, especially synesthetic metaphor (e.g. “loud shirt”). There has been suggestion that metaphors in general are systematic (Lakoff & Johnson, 1980) and that synesthetic metaphors in particular tend to follow the same direction between senses as synesthesia (Ramachandran & Hubbard, 2001). Interestingly, the most common sensory modality included in a synesthetic metaphor is hearing (particularly combinations of hearing and touch, e.g. “soft music”), at least in German and English (Day, 1996). These findings contradict the claims of Ramachandran & Hubbard, however, who indicate that hearing is most commonly paired with vision in people with synesthesia. Ramachandran & Hubbard do point out, however, an apparent relationship between sound and touch that they attribute to a cross-activation between auditory maps and motor maps (i.e. Broca's Area) in the brain. They go on to say that:

…we conjecture that the representation of certain lip and tongue movements in motor brain maps may be mapped in non-arbitrary ways onto certain sound inflections and phonemic representations in auditory regions and the latter in turn may have non-arbitrary links to an external object’s visual appearance (p. 20).
This is certainly consistent with the notion that language and motor movements (especially gestural movements) are highly related (Donald, 1991; Holden, 2004).

Though the evidence for cross-modal abstraction presented thus far does not show conclusively that it occurs, it does suggest that that there are non-arbitrary relationships between semantics and sounds, and possibly between sounds and visual stimuli. This would provide a framework that could serve as the basis for the emergence of a proto-language in human evolution, however the evolutionary advantages conferred by such an arrangement are not yet clear.

Selection Pressures on Language

Pinker & Bloom (1990) outline several reproductive advantages of language that should improve overall fitness. First, they note the pedagogical value of language not only for describing the location of food sources, but also the behavior of predators. Transmission of information about poisonous foods or dangerous animals is simplified and without the need for first-hand experience. Also, syntax should have an advantage here as well. Pinker and Bloom point out that “It makes a difference whether that region has animals that you can eat or animals that can eat you.” Interestingly, it is commonly believed that language is a primary vehicle of culture (because of the ability to pass more knowledge to successive generations), and that culture itself may offer selective advantages (Wilson, 1998), thus resulting in a co-evolution of culture, language and the brain. Strong selection pressures should also originate from the social organization of human groups: namely, social interaction among non-kin. The cooperative nature of early humans should have favored transferring information about abstract concepts such as time, intentions, beliefs, etc. (Pinker & Bloom, 1990).

Terrence Deacon (1997) dismisses these as selection pressures for the emergence of language, but accepts that they certainly played a role in increasing the complexity of symbolic communication, which Deacon argues must have emerged after the ability to make symbolic associations. In fact, Deacon argues that:

…the computational demands of symbolization…are likely also the indirect source for the selection pressures that initiated and drove the prolonged evolution of an entire suite of capacities and propensities that now constitute our language ‘instinct’ (p. 340).
But what selection pressures would have favored the emergence of symbolic communication in the first place? The answer, Deacon argues, is sexual selection.

In evolutionary theory (Buss, 1988) , sexual selection comes into play when an organism must (1) attract a mate, (2) retain that mate, (3) reproduce with the mate, and (4) invest parentally in the resulting offspring (p. 101). The ultimate goal of these acts is to increase reproductive success. The high level of parental investment in humans suggests the need for a high degree of certainty that when choosing a potential mate, that mate intends to provide the required investment. Furthermore, that investment should be for one’s own offspring, not for those of another male. Though there are nonverbal cues that might suggest a male’s ability to invest in offspring (such as possessions, size, etc.), more complex communication is needed to communicate the intention for future investment. This may be one reason why women tend to be more attracted to males who can provide immediate resources for short-term relationships, and males who have a higher-earning potential and can communicate that potential for long-term relationships (Cashdan, 1996). Such tendencies in humans are not immutable, however. Buss, et al. (1990) found cultural differences in the importance placed on chastity in females (which is important if a male is to avoid expending resources on another male’s offspring). It could be that these cultural differences are due to variations in expected male investment (Cashdan, 1996).

Of particular note is Deacon’s emphasis on the role of meat-eating in strengthening the pressures of sexual selection. A pregnant female must rely on males to provide meat for her and her offspring during times when other food sources may be scarce. A male, on the other hand, wants to ensure that his mate is maintaining sexual exclusivity, lest the meat he provides should go to the children of another male. This could be accomplished through language, as women often use language to indicate their virtue, and males often use language to indicate their ability to procure resources (Buss, 1998).

In any case, Deacon argues, in mate selection and retention, there arises a need to communicate in complex ways. For example, it may be advantageous to employ deception, which in turn introduces a selection pressure for detecting deception, which in turn introduces a selection pressure for more complex deception. Thus, the emergence of symbolic communication was driven primarily by sexual selection and not by other selection mechanisms such as social cooperation, technological transmission or other selection pressures, but these played a role in increasing the complexity of the communication.


In conclusion, there is some evidence of cross-modal abstraction, and there is the suggestion that this abstraction was important in the emergence of language. It has been demonstrated that sound symbolism may exist and that it may be universal, which could indicate that specific sound-semantic mappings have a genetic basis. Though, Deacon (1997) points out that universality is not a reliable indicator of a genetic basis, in addition to arguing that such mappings are "e;impossible to assimilate genetically"e; (p. 332). Finally, though sexual selection likely played the major role in the emergence of symbolic communication, a variety of other selection pressures helped to maintain and enhance the complexity of the communication, resulting in what we now know as language.


Allott, R. (1995). Sound symbolism. In U. Figge (Ed.) Language in the Würm Glaciation (pp. 15-38). Bochum: Brockmeyer.
Buss, D. M. (1988). Love acts: The evolutionary biology of love. In R. Sternberg & M. Barnes (Eds.), The Psychology of Love (pp. 100-118). New Haven: Yale University Press.
Buss, D. M., Abbott, M., Angleitner, A., Asherian, A., Biaggio, A., Blanco-Villasenor, A., et al. (1990). International preferences in selecting mates: A study of 37 cultures. Journal of Cross-Cultural Psychology, 21, 5-47.
Chomsky, N. (1972). Language and Mind. New York: Harcourt, Brace, Jovanovich.
Cashdan, E. (1996). Women's mating strategies. Evolutionary Anthropology, 5(4), 134-143.
Day, S. (1996). Synaesthesia and synaesthetic metaphors. Psyche, 2, n.p.
Deacon, T. W. (1997). The Symbolic Species: The co-evolution of language and the brain. New York: W. W. Norton.
Donald, M. (1991). Origins of the Modern Mind. Cambridge, MA: Harvard University Press.
Holden, C. (2004). The origin of speech. Science, 303, 1316-1319.
Lakoff, G. & Johnson, M. (1980). Metaphors We Live By. Chicago: University of Chicago Press.
Magnus, M. (2001). What’s in a word? Studies in phonosemantics. Unpublished doctoral dissertation, NTNU.
Pinker, S. & Bloom, P. (1990). Natural language and natural selection. Behavioral and Brain Sciences, 13, 707-784.
Nuckolls, J. B. (1999). The case for sound symbolism. Annual Review of Anthropology, 28, 225-52.
Ramachandran, V. S. & Hubbard, E. M. (2001). Synaesthesia – A window into perception, thought and language. Journal of Consciousness Studies, 8, 3-34.
Rogers, S. K. & Ross, A. S. (1975). A cross-cultural test of the maluma-takete phenomenon. Perception, 4, 105-106.
Wilson, E. O. (1998). Consilience: The Unity of Knowledge. New York: Vintage Books.

Log in or register to write something here or to contact authors.