The Golden Age — A Look at the Original Roots of Artificial Intelligence, Cognitive Science, and Neuroscience (partial transcript)

Noam Chomsky

MIT150 Symposia: Brains, Minds and Machines Symposium, June 16, 2011

Steven Pinker: Um I have a question for uh both Noam and Martin,
that I think is on the minds of many people in this room, and I know it’s been expressed
in some of the questions that I’ve received by email prior to this event,
that there is a- a narrative in which the new direction of uh both artificial intelligence and cognitive science
is one that makes a great deal more use of probabilistic information
that is gleaned from enormous amounts of uh experience during learning –
this is uh uh manifested in branches of cognitive science such as uh neural networks and connectionism,
bayesian inference models, uh application of machine learning to intelligence,
many of the models both of Tommy Poggio and Josh Tennenbaum.
Uh in the classic work from the Golden Age, and indeed in many of the models since then,
including the models of generative grammar and models of semantic memory,
uh probabilities don’t play a big role.
Uh is the narrative that says that the direction of the field is in
making use of massive amounts of statistical information via learning
uh well maybe I’ll ask you to complete that sentence.
Uh what is the uh ((uh)) well I’ll let you complete the sentence. Noam?

Noam Chomsky: Well there’s a-
uh fir- it’s- um it’s true there’s been a lot of work on um trying to apply
uh statistical uh models to various linguistic problems
uh I think there have been some successes, but a lot of failures.
Uh the successes to my knowledge, at least, and there’s no time to go into details,
the successes that I know of are those that integrate statistical analysis
with some uh ((uh u-)) universal grammar properties, some fundamental properties of language;
when they’re integrated, you sometimes do get results.
Uh it’s uh ((uh uh the)) simplest case maybe is uh
uh the problem of uh identifying words in a running discourse,
something apparently children do, you know, they hear a running discourse, they pick out units,
and the obvious units are words, which are really phonological units.
um and uh ((uh uh)) and uh uh there’s a natural property that I- wrote about it in the 1950s,
and it was taken for granted that if you just take a look at the-
if you have a series of sounds and you look at the transitional probabilities
uh at each point, what’s likely to come next,
uh when you get to a word boundary the probabilities go way down, a
you don’t know what’s coming next, if it’s internal to a word, you can predict pretty well,
so if you kind of trace transitional probabilities you ought to get something like word boundaries.
((actually it’s)) I wrote about it in 1955.
And I assumed that that’s correct. Turns out to be false.
Uh it was shown actually by Charles Yang, a former student here,
((comput-)) a PhD in the computer science department, that uh
if you apply that method, it- it basically gives syllables uh in a language like English.
Uh on the other hand if you apply that message under a constraint,
namely the constraint that a word has what’s called a prosodic peak,
you know like a pitch stress peak which is true then you get much better results.
Now there’s more recent work which is still in press
uh by Shookla (?) Aslin and others which shows that you get still better results
if you apply that uh together with what are called prosodic phrases
uh and it turns out that the you know uh an expre- a sentence let’s say has
units i- uh units of pitch and stress and so on which
are uh connected related to the syntactic structure actually in ways which were studied uh
maybe first seriously by another former student Lisa Selkirk, a colleague of Barbara’s,
uh but uh when you connect- when you interact prosodic peaks with transitional probabilities
then you get a pretty good identification of word boundaries.
Uh well you know that’s the kind of work that I think makes sense, if you uh
uh uh and there are more complex examples but uh
it- it’s a simple of the kind of thing that can work
On the other hand there’s a lot work which tries to do sophisticated statistical analysis,
you know bayesian and so on and so forth,
without any concern for the uh actual structure of language,
as far as I’m aware uh that only achieves success in a very odd sense of success.
There is a succ- notion of success which has developed in
uh computational cognitive science in recent years
which I think is novel in the history of science.
It interprets success as uh approximating unanalyzed data.
Uh so for example if your were say to study bee communication this way,
instead of doing the complex experiments that bee scientists do, you know like
uh having fly to an island to see if they leave an odor trail and this sort of thing,
if you simply did extensive videotaping of bees swarming, OK,
and you did you know a lot of statistical analysis of it,
uh you would get a pretty good prediction for what bees are likely to do next time they swarm,
actually you’d get a better prediction than bee scientists do,
and they wouldn’t care because they’re not trying to do that.
Uh but and you can make it a better and better approximation by more video tapes
and more statistics and so on.
Uh I mean actually you could do physics this way,
uh instead of studying things like balls rolling down frictionless planes, which can’t happen in nature,
uh if you uh uh took a ton of video tapes of what’s happening outside my office window, let’s say,
you know, leaves flying and various things,
and you did an extensive analysis of them,
uh you would get some kind of prediction of what’s likely to happen next,
certainly way better than anybody in the physics department could do.
Well that’s a notion of success which is I think novel,
I don’t know of anything like it in the history of science.
Uh and in- in those terms you get some kind of successes, and if you look at the literature in the field,
a lot of these papers are listed as successes.
And when you look at them carefully,
they’re successes in this particular sense,
and not the sense that science has ever been interested in.
But it does give you ways of approximating unanalyzed data,
you know analysis of ((a)) corpus and so on and so forth.
I don’t know of any other cases, frankly.
Uh so there are successes where things are integrated with some of the properties of language, but I know of-
((the sec-)) know of none in which they’re not.