On Language and Humanity: In Conversation With Noam Chomsky

Noam Chomsky Interviewed by Amy Brand

August 12, 2019. The MIT Press Reader.

I have been fascinated with how the mind structures information for as long as I can remember. As a kid, my all-time favorite activity in middle school was diagramming sentences with their parts of speech. Perhaps it’s not surprising, then, that I ended up at MIT earning my doctorate on formal models of language and cognition. It was there, in the mid-1980s, that I had the tremendous good fortune of taking several classes on syntax with Noam Chomsky.

Although I ultimately opted off the professorial career track, I’ve been at MIT for most of my career and have stayed true in many ways to that original focus on how language conveys information. Running an academic publishing house is, after all, also about the path from language to information, text to knowledge. It has also given me the opportunity to serve as Chomsky’s editor and publisher. Chomsky and the core values he embodies of deep inquiry, consciousness, and integrity continue to loom large for me and so many others here at MIT, and are well reflected in the interview that follows.

Amy Brand: You have tended to separate your work on language from your political persona and writings. But is there a tension between arguing for the uniqueness of Homo sapiens when it comes to language, on the one hand, and decrying the human role in climate change and environmental degradation, on the other? That is, might our distance from other species be tied up in how we’ve engaged (or failed to engage) with the natural environment?

Noam Chomsky: The technical work itself is in principle quite separate from personal engagements in other areas. There are no logical connections, though there are some more subtle and historical ones that I’ve occasionally discussed (as have others) and that might be of some significance.

Homo sapiens is radically different from other species in numerous ways, too obvious to review. Possession of language is one crucial element, with many consequences. With some justice, it has often in the past been considered to be the core defining feature of modern humans, the source of human creativity, cultural enrichment, and complex social structure.

As for the “tension” you refer to, I don’t quite see it. It is of course conceivable that our distance from other species is related to our criminal race to destroy the environment, but I don’t think that conclusion is sustained by the historical record. For almost all of human history, humans have lived pretty much in harmony with the natural environment, and indigenous groups still do, when they can (they are, in fact, in the forefront of efforts to preserve the environment, worldwide). Human actions have had environmental effects; thus large mammals tended to disappear as human activity extended. But it wasn’t until the agricultural revolution and more dramatically the industrial revolution that the impact became of major significance. And the largest and most destructive effects are in very recent years, and mounting all too fast. The sources of the destruction — which is verging on catastrophe — appear to be institutional, not somehow rooted in our nature.

A.B.: In a foreword to a book on birdsong and language, you wrote, with computational linguist Robert Berwick, that “the bridge between birdsong research and speech and language dovetails extremely well with recent developments in certain strands of current linguistic thinking.” Could you talk about that? What kind of insight might birdsong offer into our own language?

N.C.: Here we have to be rather cautious, distinguishing between language per se and speech, which is an amalgam, involving both language and a specific sensorimotor system used for externalization of language. The two systems are unrelated in evolutionary history; the sensorimotor systems were in place long before language (and Homo sapiens) appeared, and have scarcely been influenced by language, if at all. Speech is also only one form of externalization, even if the most common. It could be sign, which develops among the deaf very much the way speech does, or even touch.

Berwick and I have argued (I think plausibly, but not uncontroversially) that language, an internal system of the mind, is independent of externalization and basically provides expressions of linguistically formulated thought. As such, it is a system of pure structure, lacking linear order and other arrangements that are not really part of language as such but are imposed by requirements of the articulatory system (sign, which uses visual space, exploits some other options). Internal language is based on recursive operations that yield what we called the Basic Property of language: generation of an infinite array of hierarchically structured expressions that are interpreted as thoughts. The externalization system for language has no recursive operations — it is, basically, a deterministic mapping that introduces linear order and other arrangements that are required by the output system.

Birdsong is very different: it is an output system that is based crucially on linear order with very little structure. There are some suggestive analogies at the output levels, but my own judgment at least, shared I think by my colleagues who do extensive work on birdsong, is that while the phenomena are of considerable interest in themselves, they tell us little about human language.

A.B.: You developed the theory of transformational grammar — the idea that there is a deep, rule-based structure underpinning human language — while a graduate student in the 1950s, and published your first book on it, “Syntactic Structures,” in 1957. How does the field of theoretical linguistics today compare with the future you might have imagined 60 years ago?

N.C.: Unrecognizable. At the time, linguistics altogether was a rather small field, and interest in the work you describe was pretty much confined to RLE [the Research Laboratory of Electronics]. Journals were few, and were scarcely open to work of this kind. My first book — “The Logical Structure of Linguistic Theory” (LSLT) — was submitted to MIT Press in 1955 at Roman Jakobson’s suggestion. It was rejected by reviewers with the rather sensible comment that it didn’t seem to fit in any known field (a 1956 version, with some parts omitted, was published in 1975, when the field was well-established).

Actually there was a tradition, a rather rich one in fact, which this work in some ways revived and extended. But it was completely unknown at the time (and still mostly is).

In the late ’50s Morris Halle and I requested and quickly received authorization to establish a linguistics department, which we considered a pretty wild idea. Linguistics departments were rare. Why should there be one at MIT, of all places? And for a kind of linguistics almost no one had ever heard of? Why would any student apply to such a department? We decided to try anyway, and amazingly, it worked. The first class turned out to be a remarkable group of students, all of whom went on to distinguished careers with original and exciting work — and so it has continued, to the present. Curiously — or maybe not — the same pattern was followed pretty much in other countries, with the “generative enterprise” taking root outside the major universities.

Our first Ph.D. student in fact preceded the establishment of the department: Robert Lees (a colleague at RLE), who wrote a highly influential study on Turkish nominalization. Since there was as yet no department, our friend Peter Elias, chair of the Department of Electrical Engineering at MIT, had the Ph.D. submitted there — rather typical and highly productive MIT informality. It must have surprised proud parents reading the titles of dissertations at graduation.

By now the situation is dramatically different. There are flourishing departments everywhere, with major contributions from Europe, Japan, and many other countries. There are many journals — including MIT Press’s Linguistic Inquiry, which just celebrated its 50th anniversary. Studies of generative grammar have been carried out for a very wide range of typologically varied languages, at a level of depth and scope never previously imaginable. New domains of inquiry have opened up that scarcely existed in the ’50s. Students are exploring questions that could not have been formulated a few years ago. And theoretical work has reached completely new levels of depth and empirical validation, with many promising new avenues being explored.

Morris and I, in later years, often reflected on how little could have been foreseen, even imagined, when we began working together in the early ’50s. What has taken place since seemed almost magical.

A.B.: It was in that 1957 book that you used your well-known sentence: “Colorless green ideas sleep furiously” — a demonstration of how a sentence can be grammatically correct but semantically not make sense, thereby pointing to structure and syntax as something primordial and independent from meaning. A poet may object to the idea that such a sentence is meaningless (and could perhaps describe it as demonstrating what linguist and literary theorist Roman Jakobson called the “poetic function” of language), and a number of people have set about “injecting” the sentence, so to speak, with meaning. Why do you think that is?

N.C.: It’s because of a failure to comprehend the point of this and other examples like it. The point was to refute commonly held beliefs about grammatical status: that it was determined by statistical approximation to a corpus of material, by formal frames, by meaningfulness in some structurally-independent sense, etc. The sentence you cite, call it (1), is plainly grammatical but violates all of the standard criteria. That’s why it was invented as an example. (1) differs from the structurally similar sentence (2) “revolutionary new ideas appear infrequently” (which, unlike (1), has an immediate literal meaning) and from (3) “furiously sleep ideas green colorless,” the original read backwards, which can hardly even be pronounced with normal prosody. The special status of (1) of course arises from the fact that although it violates all of the then-standard criteria for grammaticality, it is of the same grammatical form as (2), with an instantly interpreted literal meaning and in no respect deviant. For that reason, it’s not hard to construct non-literal interpretations for (1) (it is possible, but much more difficult for (3), lacking the structural similarity to fully grammatical expressions like (2)).

All of this is discussed in “Syntactic Structures,” and much more fully in LSLT (which included some work of mine jointly with Pete Elias developing an account of categorization with an information-theoretic flavor that worked quite well with small samples and the hand-calculation that was the only option in the early ‘50s, when we were doing this work as grad students at Harvard).

Failing to grasp the point, quite a few people have offered metaphoric (“poetic”) interpretations of (1), exactly along the lines suggested by the discussion in “Syntactic Structures.” Less so for (3), though even that is possible, with effort. It’s typically possible to concoct some kind of interpretation for just about any word sequence. The relevant question, for the study of language, is how the rules yield literal interpretations (as for (2)) — and secondarily, how other cognitive processes, relying in part on language structure (as in the case of (1), (3)), can provide a wealth of other interpretations.

A.B.: In our interview with Steven Pinker in May, we asked for his thoughts about the impact that the recent explosion of interest in AI and machine learning might have on the field of cognitive science. Pinker said he felt there was “theoretical barrenness” in these realms that was going to produce dead ends unless they were more closely integrated with the study of cognition. The field of cognitive science that you helped originate was a clear break from behaviorism — the emphasis on the impact of environmental factors on behavior over innate or inherited factors — and the work of B. F. Skinner. Do you see the growth of machine learning as something akin to a return to behaviorism? Do you feel the direction in which the field of computing is developing is cause for concern, or might it breathe new life into the study of cognition?

N.C.: Sometimes it is explicitly claimed, even triumphantly. In Terrence Sejnowski’s recent “Deep Learning Revolution,” for example, proclaiming that Skinner was right! A rather serious misunderstanding of Skinner, and of the achievements of the “Revolution,” I think.

There are some obvious questions to raise about “machine learning” projects. Take a typical example, the Google Parser. The first question to ask is: what is it for? If the goal is to create a useful device — a narrow form of engineering — there’s nothing more to say.

Suppose the goal is science, that is, to learn something about the world, in this case, about cognition — specifically about how humans process sentences. Then other questions arise. The most uninteresting question, and the only one raised it seems, is how well the program does, say, in parsing the Wall Street Journal corpus.

Let’s say it has 95 percent success, as proclaimed in Google PR, which declares that the parsing problem is basically solved and scientists can move on to something else. What exactly does that mean? Recall that we’re now considering this to be part of science. Each sentence of the corpus can be regarded as the answer to a question posed by experiment: Are you a grammatical sentence of English with such-and-such structure? The answer is: Yes (usually). We then pose the question that would be raised in any area of science. What interest is there in a theory, or method, that gets the answer right in 95 percent of randomly chosen experiments, performed with no purpose? Answer: Virtually no interest at all. What is of interest are the answers to theory-driven critical experiments, designed to answer some significant question.

So if this is “science,” it is of some unknown kind.

The next question is whether the methods used are similar to those used by humans. The answer is: Radically not. Again, some unknown kind of science.

There is also another question, apparently never raised. How well does the Parser work on impossible languages, those that violate universal principles of language? Note that success in parsing such systems counts as failure, if the goals are science. Though it hasn’t been tried to my knowledge, the answer is almost certainly that success would be high, in some cases even higher (many fewer training trials, for example) than for human languages, particularly for systems designed to use elementary properties of computation that are barred in principle for human languages (using linear order, for example). A good deal is by now known about impossible languages, including some psycholinguistic and neurolinguistic evidence about how humans handle such systems — if at all, as puzzles, not languages.

In short, in just about every relevant respect it is hard to see how this work makes any kind of contribution to science, specifically to cognitive science, whatever value it may have for constructing useful devices or for exploring the properties of the computational processes being employed.

It might be argued that the last question is misformulated because there are no impossible languages: any arbitrarily chosen collection of word sequences is as much of a language as any other. Even apart from ample evidence to the contrary, the claim should be rejected on elementary logical grounds: if it were true, no language could ever be learned, trivially. Nevertheless, some such belief was widely held in the heyday of behaviorism and structuralism, sometimes quite explicitly, in what was called “the Boasian thesis” that languages can differ from one another in arbitrary ways, and each must be studied without preconceptions (similar claims were expressed by biologists with regard to the variety of organisms). Similar ideas are at least implicit in some of the machine learning literature. It is however clear that the claims cannot be seriously entertained, and are now known to be incorrect (with regard to organisms as well).

A further word may be useful on the notion of critical experiment — that is, theory-driven experiment designed to answer some question of linguistic interest. With regard to these, the highly-touted mechanical parsing systems happen to perform quite badly, as has been shown effectively by computational cognitive scientist Sandiway Fong. And that’s what matters to science at least, not just matching results of arbitrary experiments with no purpose (such as simulation, or parsing some corpus). The most interesting experiments have to do with “exotic” linguistic constructions that are rare in normal speech but that people instantly understand along with all of the curious conditions they satisfy. Quite a few have been discovered over the years. Their properties are particularly illuminating because they bring to light the unlearned principles that make language acquisition possible — and though I won’t pursue the matter here, investigation of infant language acquisition and careful statistical studies of the linguistic material available to the child (particularly by Charles Yang) reveal that the notion “exotic” extends very broadly for the infant’s experience.

A.B.: You remain as active as ever, on more than one front — collaborating with colleagues from other fields such as computer science and neuroscience on a series of papers in recent years, for example. What are you currently working on?

N.C.: It’s worth going back briefly to the beginning. I began experimenting with generative grammars as a private hobby in the late ‘40s. My interest was partly in trying to account for the data in an explicit rule-based generative grammar, but even more so in exploring the topic of simplicity of grammar, “shortest program,” a non-trivial problem, only partially solvable by hand computation because of the intricacy of deeply-ordered rule systems. When I came to Harvard shortly after, I met Morris Halle and Eric Lenneberg. We quickly became close friends, in part because of shared skepticism about prevailing behavioral science doctrines — virtual dogmas at the time. Those shared interests soon led to what came to be called later the “biolinguistics program,” the study of generative grammar as a biological trait of the organism (Eric went on to found the contemporary field of biology of language through his now classic work).

Within the biolinguistic framework, it is at once clear that the “holy grail” would be explanations of fundamental properties of language on the basis of principled generative grammars that meet the twin conditions of learnability and evolvability. That is the criterion for genuine explanation. But that goal was far out of reach.

The immediate task was to try to make sense of the huge amount of new data, and puzzling problems, that rapidly accumulated as soon as the first efforts were made to construct generative grammars. To do so seemed to require quite complex mechanisms. I won’t review the history since, but its basic thrust has been the effort to show that simpler and more principled assumptions can yield the same or better empirical results over a broad range.

By the early ‘90s, it seemed to some of us that it was now becoming possible to bite the bullet: to adopt the simplest computational mechanisms that could at least yield the Basic Property and to try to show that fundamental properties of language can be explained in those terms — in terms of what has been called “the strong minimalist thesis (SMT).” By now there has, I think, been considerable progress in this endeavor, with the first genuine explanations of significant universal properties of language that plausibly satisfy the twin conditions.

The task ahead has several parts. A primary task is to determine to what extent SMT can encompass fundamental principles of language that have come to light in research of the past years, and to deal with the critical experiments, those that are particularly revealing with regard to the principles that enter into the functioning of the language faculty and that account for the acquisition of language.

A second task is to distinguish between principles that are specific to language — specific to the innate structure of the language faculty — and other principles that are more general. Particularly illuminating in this regard are principles of computational efficiency — not surprising for a computational system like language. Of particular interest are computational principles specific to systems with limited short-term resource capacity, a category that has recently been shown to have critical empirical consequences. Yet another task is to sharpen these principles so as to include those that play a role in genuine explanation while excluding others that look superficially similar but can be shown to be illegitimate both empirically and conceptually.

Quite interesting work is proceeding in all of these areas, and the time seems ripe for a comprehensive review of developments which, I think, provide a rather new and exciting stage in an ancient field of inquiry.

In brief, for the first time I think that the Holy Grail is at least in view in some core areas, maybe even within reach. That’s the main topic of work I’ve been engaged in recently and hope to be able to put together soon.

A.B.: You recently celebrated — along with a large body of friends and colleagues here on the MIT campus — your 90th birthday. Such milestones are of course cause for reflection, even as one looks ahead. Looking over your work to date, what would you say has been your most significant theoretical contribution to the field of linguistics?

N.C.: Opening up new kinds of questions and topics for inquiry.

A.B.: A very broad question, but perhaps one that speaks to the times we’re living in right now: What do you regard these days as cause for optimism?

N.C.: Several points. First, the times we’re living in are extremely dangerous, in some ways more so than ever before in human history — which will essentially come to an end in any recognizable form if we do not deal effectively with the increasing threats of nuclear war and of environmental catastrophe. That requires reversing the course of the U.S. in dismantling arms control agreements and proceeding — along with Russia — to develop ever more lethal and destabilizing weapons systems; and in not only refusing to join the world in trying to do something about the severe environmental crisis but even aggressively seeking to escalate the threat, a form of criminality with literally no historical antecedent.

Not easy, but it can be done.

There have been other severe crises in human history, even if not on this scale. I’m old enough to remember the days when it seemed that the spread of fascism was inexorable — and I’m not referring to what is referred to as fascism today but something incomparably more awful. But it was overcome.

There are very impressive forms of activism and engagement taking place, mainly among younger people. That’s very heartening.

In the final analysis, we always have two choices: We can choose to descend into pessimism and apathy, assuming that nothing can be done, and helping to ensure that the worst will happen. Or we can grasp the opportunities that exist — and they do — and pursue them to the extent that we can, thus helping to contribute to a better world.

Not a very hard choice.