ALANTURING.NET
What is Artificial Intelligence?
Jack Copeland
©Copyright B.J. Copeland, May 2000
Artificial Intelligence (AI) is usually defined as the science of making computers do things that require intelligence when done by humans. AI has had some success in limited, or simplified, domains. However, the five decades since the inception of AI have brought only very slow progress, and early optimism concerning the attainment of humanlevel intelligence has given way to an appreciation of the profound difficulty of the problem.
Table of Contents What is Intelligence? 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14.
Strong AI, Applied AI, and CS Alan Turing and the Origins of AI Early AI Programs AI Programming Languages Micro-World AI Expert Systems The CYC Project Top-Down AI vs Bottom-Up AI Connectionism Nouvelle AI Chess Is Strong AI Possible? The Chinese Room Objection For More Information...
What is Intelligence? Quite simple human behaviour can be intelligent yet quite complex behaviour performed by insects is unintelligent. What is the difference? Consider the behaviour of the digger wasp, Sphex ichneumoneus. When the female wasp brings food to her burrow, she deposits it on the threshold, goes inside the burrow to check for intruders, and then if the coast is clear carries in the food. The unintelligent nature of the wasp's behaviour is revealed if the watching experimenter moves the food a few inches while the wasp is inside the burrow checking. On emerging, the wasp repeats the whole procedure: she carries the food to the threshold once again, goes in to look around, and emerges. She can be made to repeat this cycle of behaviour upwards of forty times in succession. Intelligence--conspicuously absent in the case of Sphex--is the ability to adapt one's behaviour to fit new circumstances. Mainstream thinking in psychology regards human intelligence not as a single ability or cognitive process but rather as an array of separate components. Research in AI has focussed chiefly on the following components of intelligence: learning, reasoning, problem-solving, perception, and languageunderstanding.
Learning Learning is distinguished into a number of different forms. The simplest is learning by trial-and-error. For example, a simple program for solving mate-in-one chess problems might try out moves at random until one is found that achieves mate. The program remembers the successful move and next time the computer is given the same problem it is able to produce the answer immediately. The simple memorising of individual items--solutions to problems, words of vocabulary, etc.--is known as rote learning. Rote learning is relatively easy to implement on a computer. More challenging is the problem of implementing what is called generalisation. Learning that involves generalisation leaves the learner able to perform better in situations not previously encountered. A program that learns past tenses of regular English verbs by rote will not be able to produce the past tense of e.g. "jump" until presented at least once with "jumped", whereas a program that is able to generalise from examples can learn the "add-ed" rule, and so form the past tense of "jump" in the absence of any previous encounter with this verb. Sophisticated modern techniques enable programs to generalise complex rules from data.
Reasoning To reason is to draw inferences appropriate to the situation in hand. Inferences are classified as either deductive or inductive. An example of the former is "Fred is either in the museum or the cafŽ; he isn't in the cafŽ; so he's in the museum", and of the latter "Previous accidents just like this one have been caused by instrument failure; so probably this one was caused by instrument failure". The difference between the two is that in the deductive case, the truth of the premisses guarantees the truth of the conclusion, whereas in the inductive case, the truth of the premiss lends support to the conclusion that
the accident was caused by instrument failure, but nevertheless further investigation might reveal that, despite the truth of the premiss, the conclusion is in fact false. There has been considerable success in programming computers to draw inferences, especially deductive inferences. However, a program cannot be said to reason simply in virtue of being able to draw inferences. Reasoning involves drawing inferences that are relevant to the task or situation in hand. One of the hardest problems confronting AI is that of giving computers the ability to distinguish the relevant from the irrelevant.
Problem-solving Problems have the general form: given such-and-such data, find x. A huge variety of types of problem is addressed in AI. Some examples are: finding winning moves in board games; identifying people from their photographs; and planning series of movements that enable a robot to carry out a given task. Problem-solving methods divide into special-purpose and general-purpose. A special-purpose method is tailor-made for a particular problem, and often exploits very specific features of the situation in which the problem is embedded. A general-purpose method is applicable to a wide range of different problems. One general-purpose technique used in AI is means-end analysis, which involves the step-bystep reduction of the difference between the current state and the goal state. The program selects actions from a list of means--which in the case of, say, a simple robot, might consist of pickup, putdown, moveforward, moveback, moveleft, and moveright--until the current state is transformed into the goal state.
Perception In perception the environment is scanned by means of various sense-organs, real or artificial, and processes internal to the perceiver analyse the scene into objects and their features and relationships. Analysis is complicated by the fact that one and the same object may present many different appearances on different occasions, depending on the angle from which it is viewed, whether or not parts of it are projecting shadows, and so forth. At present, artificial perception is sufficiently well advanced to enable a self-controlled car-like device to drive at moderate speeds on the open road, and a mobile robot to roam through a suite of busy offices searching for and clearing away empty soda cans. One of the earliest systems to integrate perception and action was FREDDY, a stationary robot with a moving TV 'eye' and a pincer 'hand' (constructed at Edinburgh University during the period 1966-1973 under the direction of Donald Michie). FREDDY was able to recognise a variety of objects and could be instructed to assemble simple artefacts, such as a toy car, from a random heap of components.
Language-understanding
A language is a system of signs having meaning by convention. Traffic signs, for example, form a minilanguage, it being a matter of convention that, for example, the hazard-ahead sign means hazard ahead. This meaning-by-convention that is distinctive of language is very different from what is called natural meaning, exemplified in statements like 'Those clouds mean rain' and 'The fall in pressure means the valve is malfunctioning'. An important characteristic of full-fledged human languages, such as English, which distinguishes them from, e.g. bird calls and systems of traffic signs, is their productivity. A productive language is one that is rich enough to enable an unlimited number of different sentences to be formulated within it. It is relatively easy to write computer programs that are able, in severely restricted contexts, to respond in English, seemingly fluently, to questions and statements, for example the Parry and Shrdlu programs described in the section Early AI Programs. However, neither Parry nor Shrdlu actually understands language. An appropriately programmed computer can use language without understanding it, in principle even to the point where the computer's linguistic behaviour is indistinguishable from that of a native human speaker of the language (see the section Is Strong AI Possible?). What, then, is involved in genuine understanding, if a computer that uses language indistinguishably from a native human speaker does not necessarily understand? There is no universally agreed answer to this difficult question. According to one theory, whether or not one understands depends not only upon one's behaviour but also upon one's history: in order to be said to understand one must have learned the language and have been trained to take one's place in the linguistic community by means of interaction with other language-users.
Strong AI, Applied AI, and CS Research in AI divides into three categories: "strong" AI, applied AI, and cognitive simulation or CS. Strong AI aims to build machines whose overall intellectual ability is indistinguishable from that of a human being. Joseph Weizenbaum, of the MIT AI Laboratory, has described the ultimate goal of strong AI as being "nothing less than to build a machine on the model of man, a robot that is to have its childhood, to learn language as a child does, to gain its knowledge of the world by sensing the world through its own organs, and ultimately to contemplate the whole domain of human thought". The term "strong AI", now in wide use, was introduced for this category of AI research in 1980 by the philosopher John Searle, of the University of California at Berkeley. Some believe that work in strong AI will eventually lead to computers whose intelligence greatly exceeds that of human beings. Edward Fredkin, also of MIT AI Lab, has suggested that such machines "might keep us as pets". Strong AI has caught the attention of the media, but by no means all AI researchers view strong AI as worth pursuing. Excessive optimism in the 1950s and 1960s concerning strong AI has given way to an appreciation of the extreme difficulty of the problem, which is possibly the hardest that science has ever undertaken. To date, progress has been meagre. Some critics doubt whether research in the next few decades will produce even a system with the overall intellectual ability of an ant. Applied AI, also known as advanced information-processing, aims to produce commercially viable "smart" systems--such as, for example, a security system that is able to recognise the faces of people who are permitted to enter a particular building. Applied AI has already enjoyed considerable success. Various applied systems are described in this article. In cognitive simulation, computers are used to test theories about how the human mind works--for example, theories about how we recognise faces and other objects, or about how we solve abstract problems (such as the "missionaries and cannibals" problem described later). The theory that is to be tested is expressed in the form of a computer program and the program's performance at the task--e.g. face recognition--is compared to that of a human being. Computer simulations of networks of neurons have contributed both to psychology and to neurophysiology (some of this work is described in the section Connectionism). The program Parry, described below, was written in order to test a particular theory concerning the nature of paranoia. Researchers in cognitive psychology typically view CS as a powerful tool.
Alan Turing and the Origins of AI The earliest substantial work in the field was done by the British logician and computer pioneer Alan Mathison Turing. In 1935, at Cambridge University, Turing conceived the modern computer. He described an abstract computing machine consisting of a limitless memory and a scanner that moves back and forth through the memory, symbol by symbol, reading what it finds and writing further symbols. The actions of the scanner are dictated by a program of instructions that is also stored in the memory in the form of symbols. This is Turing's "stored-program concept", and implicit in it is the possibility of the machine operating on, and so modifying or improving, its own program. Turing's computing machine of 1935 is now known simply as the universal Turing machine. All modern computers are in essence universal Turing machines. During the Second World War Turing was a leading cryptanalyst at the Government Code and Cypher School, Bletchley Park (where the Allies were able to decode a large proportion of the Wehrmacht's radio communications). Turing could not turn to the project of building a stored-program electronic computing machine until the cessation of hostilities in Europe in 1945. Nevertheless, during the wartime years he gave considerable thought to the issue of machine intelligence. Colleagues at Bletchley Park recall numerous off-duty discussions with him on the topic, and at one point Turing circulated a typewritten report (now lost) setting out some of his ideas. One of these colleagues, Donald Michie (who later founded the Department of Machine Intelligence and Perception at the University of Edinburgh), remembers Turing talking often about the possibility of computing machines (1) learning from experience and (2) solving problems by means of searching through the space of possible solutions, guided by rule-of-thumb principles. The modern term for the latter idea is "heuristic search", a heuristic being any rule-of-thumb principle that cuts down the amount of searching required in order to find the solution to a problem. Programming using heuristics is a major part of modern AI, as is the area now known as machine learning. At Bletchley Park Turing illustrated his ideas on machine intelligence by reference to chess. (Ever since, chess and other board games have been regarded as an important test-bed for ideas in AI, since these are a useful source of challenging and clearly defined problems against which proposed methods for problem-solving can be tested.) In principle, a chess-playing computer could play by searching exhaustively through all the available moves, but in practice this is impossible, since it would involve examining an astronomically large number of moves. Heuristics are necessary to guide and to narrow the search. Michie recalls Turing experimenting with two heuristics that later became common in AI, minimax and best-first. The minimax heuristic (described by the mathematician John von Neumann in 1928) involves assuming that one's opponent will move in such a way as to maximise their gains; one then makes one's own move in such a way as to minimise the losses caused by the opponent's expected move. The best-first heuristic involves ranking the moves available to one by means of a rule-of-thumb scoring system and examining the consequences of the highest-scoring move first.
In London in 1947 Turing gave what was, so far as is known, the earliest public lecture to mention computer intelligence, saying "What we want is a machine that can learn from experience", adding that the "possibility of letting the machine alter its own instructions provides the mechanism for this". In 1948 he wrote (but did not publish) a report entitled "Intelligent Machinery". This was the first manifesto of AI and in it Turing brilliantly introduced many of the concepts that were later to become central, in some cases after reinvention by others. One of these was the concept of "training" a network of artificial neurons to perform specific tasks. In 1950 Turing introduced the test for computer intelligence that is now known simply as the Turing test. This involves three participants, the computer, a human interrogator, and a human "foil". The interrogator attempts to determine, by asking questions of the other two participants, which is the computer. All communication is via keyboard and screen. The interrogator may ask questions as penetrating and wide-ranging as he or she likes, and the computer is permitted to do everything possible to force a wrong identification. (So the computer might answer "No" in response to "Are you a computer?" and might follow a request to multiply one large number by another with a long pause and an incorrect answer.) The foil must help the interrogator to make a correct identification. A number of different people play the roles of interrogator and foil, and if sufficiently many interrogators are unable to distinguish the computer from the human being then (according to proponents of the test) it is to be concluded that the computer is an intelligent, thinking entity. In 1991, the New York businessman Hugh Loebner started the annual Loebner Prize competition, offering a $100,000 prize for the first computer program to pass the Turing test (with $2,000 awarded each year for the best effort). However, no AI program has so far come close to passing an undiluted Turing test. In 1951 Turing gave a lecture on machine intelligence on British radio and in 1953 he published a classic early article on chess programming. Both during and after the war Turing experimented with machine routines for playing chess. (One was called the Turochamp.) In the absence of a computer to run his heuristic chess program, Turing simulated the operation of the program by hand, using paper and pencil. Play was poor! The first true AI programs had to await the arrival of stored-program electronic digital computers.
Early AI Programs The first working AI programs were written in the UK by Christopher Strachey, Dietrich Prinz, and Anthony Oettinger. Strachey was at the time a teacher at Harrow School and an amateur programmer; he later became Director of the Programming Research Group at Oxford University. Prinz worked for the engineering firm of Ferranti Ltd, which built the Ferranti Mark I computer in collaboration with Manchester University. Oettinger worked at the Mathematical Laboratory at Cambridge University, home of the EDSAC computer. Strachey chose the board game of checkers (or draughts) as the domain for his experiment in machine intelligence. Strachey initially coded his checkers program in May 1951 for the pilot model of Turing's Automatic Computing Engine at the National Physical Laboratory. This version of the program did not run successfully; StracheyÕs efforts were defeated first by coding errors and subsequently by a hardware change that rendered his program obsolete. In addition, Strachey was dissatisfied with the method employed in the program for evaluating board positions. He wrote an improved version for the Ferranti Mark I at Manchester (with Turing's encouragement and utilising the latter's recently completed Programmers' Handbook for the Ferranti computer). By the summer of 1952 this program could, Strachey reported, "play a complete game of Draughts at a reasonable speed". Prinz's chess program, also written for the Ferranti Mark I, first ran in November 1951. It was for solving simple problems of the mate-in-two variety. The program would examine every possible move until a solution was found. On average several thousand moves had to be examined in the course of solving a problem, and the program was considerably slower than a human player. Turing started to program his Turochamp chess-player on the Ferranti Mark I but never completed the task. Unlike Prinz's program, the Turochamp could play a complete game and operated not by exhaustive search but under the guidance of rule-of-thumb principles devised by Turing.
Machine learning Oettinger was considerably influenced by Turing's views on machine learning. His "Shopper" was the earliest program to incorporate learning (details of the program were published in 1952). The program ran on the EDSAC. Shopper's simulated world was a mall of eight shops. When sent out to purchase an item Shopper would if necessary search for it, visiting shops at random until the item was found. While searching, Shopper would memorise a few of the items stocked in each shop visited (just as a human shopper would). Next time Shopper was sent out for the same item, or for some other item that it had already located, it would go to the right shop straight away. As previously mentioned, this simple form of learning is called "rote learning" and is to be contrasted with learning involving "generalisation", which is exhibited by the program described next. Learning involving generalisation leaves the learner able to perform better in situations not previously encountered. (Strachey also investigated aspects of machine learning, taking the game of NIM as his focus, and in 1951 he reported a simple rote-learning scheme in a letter to Turing.)
The first AI program to run in the U.S. was also a checkers program, written in 1952 by Arthur Samuel of IBM for the IBM 701. Samuel took over the essentials of Strachey's program (which Strachey had publicised at a computing conference in Canada in 1952) and over a period of years considerably extended it. In 1955 he added features that enabled the program to learn from experience, and therefore improve its play. Samuel included mechanisms for both rote learning and generalisation. The program soon learned enough to outplay its creator. Successive enhancements that Samuel made to the learning apparatus eventually led to the program winning a game against a former Connecticut checkers champion in 1962 (who immediately turned the tables and beat the program in six games straight). To speed up learning, Samuel would set up two copies of the program, Alpha and Beta, on the same computer, and leave them to play game after game with each other. The program used heuristics to rank moves and board positions ("looking ahead" as many as ten turns of play). The learning procedure consisted in the computer making small numerical changes to Alpha's ranking procedure, leaving Beta's unchanged, and then comparing Alpha's and Beta's performance over a few games. If Alpha played worse than Beta, these changes to the ranking procedure were discarded, but if Alpha played better than Beta then Beta's ranking procedure was replaced with Alpha's. As in biological evolution, the fitter survived, and over many such cycles of mutation and selection the program's skill would increase. (However, the quality of learning displayed by even a simple living being far surpasses that of Samuel's and Oettinger's programs.)
Evolutionary computing The work by Samuel just described was among the earliest in a field now called evolutionary computing and is an example of the use of a genetic algorithm or GA. The term "genetic algorithm" was introduced in about 1975 by John Holland and his research group at the University of Michigan, Ann Arbor. Holland's work is principally responsible for the current intense interest in GAs. GAs employ methods analogous to the processes of natural evolution in order to produce successive generations of software entities that are increasingly fit for their intended purpose. The concept in fact goes back to Turing's manifesto of 1948, where he employed the term "genetical search". The use of GAs is burgeoning in AI and elsewhere. In one recent application, a GA-based system and a witness to a crime cooperate to generate on-screen faces that become closer and closer to the recollected face of the criminal.
Reasoning and problem-solving The ability to reason logically is an important aspect of intelligence and has always been a major focus of AI research. In his 1948 manifesto, Turing emphasised that once a computer can prove logical theorems it will be able to search intelligently for solutions to problems. (An example of a simple logical theorem is "given that either X is true or Y is true, and given that X is in fact false, it follows that Y is true".) Prinz used the Ferranti Mark I, the first commercially available computer, to solve logical problems, and in 1949 and 1951 Ferranti built two small experimental special-purpose computers for theorem-proving and other logical work.
An important landmark in this area was a theorem-proving program written in 1955-1956 by Allen Newell and J. Clifford Shaw of the RAND Corporation at Santa Monica and Herbert Simon of the Carnegie Institute of Technology (now Carnegie-Mellon University). The program was designed to prove theorems from the famous logical work Principia Mathematica by Alfred North Whitehead and Bertrand Russell. In the case of one theorem, the proof devised by the program was more elegant than the proof given by Whitehead and Russell. The Logic Theorist, as the program became known, was the major exhibit at a conference organised in 1956 at Dartmouth College, New Hampshire, by John McCarthy, who subsequently became one of the most influential figures in AI. The title of the conference was "The Dartmouth Summer Research Project on Artificial Intelligence". This was the first use of the term "Artificial Intelligence". Turing's original term "machine intelligence" has also persisted, especially in Britain. Newell, Simon and Shaw went on to construct the General Problem Solver, or GPS. The first version of GPS ran in 1957 and work continued on the project for about a decade. GPS could solve an impressive variety of puzzles, for example the "missionaries and cannibals" problem: How are a party of three missionaries and three cannibals to cross a river in a small boat that will take no more than two at a time, without the missionaries on either bank becoming outnumbered by cannibals? GPS would search for a solution in a trial-and-error fashion, under the guidance of heuristics supplied by the programmers. One criticism of GPS, and other programs that lack learning, is that the program's "intelligence" is entirely second-hand, coming from the programmer (mainly via the heuristics, in the case of GPS).
Natural language communication Two of the best-known early programs are Eliza and Parry. Details of both were first published in 1966. These programs gave an eerie semblance of conversing intelligently. Parry, written by Stanford University psychiatrist Kenneth Colby, simulated a human paranoiac. Parry's responses are capitalised in the following extract from a "conversation" between Parry and a psychiatric interviewer. Why are you in the hospital? I SHOULDN'T BE HERE. Who brought you here? THE POLICE. What trouble did you have with the police? COPS DON'T DO THEIR JOB. What do they do instead? COPS ARREST THE WRONG PEOPLE. They arrest the right ones sometimes. THE LAW SHOULD GET THOSE ITALIAN CROOKS. Are you Italian? MY BACKGROUND IS BRITISH BUT I WAS BORN IN THIS COUNTRY. Do you know any Italian crooks? I TRY TO AVOID THE UNDERWORLD.
Psychiatrists who were asked to decide whether they were communicating with Parry or a human paranoiac were often unable to tell. Eliza, written by Joseph Weizenbaum at MIT, simulated a human therapist. In the following extract, Eliza "speaks" second. Men are all alike. IN WHAT WAY They're always bugging us about something or other. CAN YOU THINK OF A SPECIFIC EXAMPLE Well, my boyfriend made me come here. YOUR BOYFRIEND MADE YOU COME HERE He says I'm depressed much of the time. I AM SORRY TO HEAR THAT YOU ARE DEPRESSED It's true. I am unhappy. DO YOU THINK COMING HERE WILL HELP YOU NOT TO BE UNHAPPY I need some help, that much seems certain. WHAT WOULD IT MEAN TO YOU IF YOU GOT SOME HELP Neither Parry nor Eliza can reasonably be described as intelligent. Parry's contributions to the conversation are "canned"--constructed in advance by the programmer and stored away in the computer's memory. As the philosopher Ned Block says, systems like Parry are no more intelligent than is a juke box. Eliza, too, relies on canned sentences and simple programming tricks (such as editing and returning the remark that the human participant has just made).
AI Programming Languages In the course of their work on the Logic Theorist and GPS, Newell, Simon and Shaw developed their Information Processing Language, or IPL, a computer language tailored for AI programming. At the heart of IPL was a highly flexible data-structure they called a "list". A list is simply an ordered sequence of items of data. Some or all of the items in a list may themselves be lists. This leads to richly branching structures. In 1960 John McCarthy combined elements of IPL with elements of the lambda calculus--a powerful logical apparatus dating from 1936--to produce the language that he called LISP (from LISt Processor). In the U.S., LISP remains the principal language for AI work. (The lambda calculus itself was invented by Princeton logician Alonzo Church, while investigating the abstract Entscheidungsproblem, or decision problem, for predicate logic--the same problem that Turing was attacking when he invented the universal Turing machine.) The logic programming language PROLOG (from PROgrammation en LOGique) was conceived by Alain Colmerauer at the University of Marseilles, where the language was first implemented in 1973. PROLOG was further developed by logician Robert Kowalski, a member of the AI group at Edinburgh University. This language makes use of a powerful theorem-proving technique known as "resolution", invented in 1963 at the Atomic Energy Commission's Argonne National Laboratory in Illinois by the British logician Alan Robinson. PROLOG can determine whether or not a given statement follows logically from other given statements. For example, given the statements "All logicians are rational" and "Robinson is a logician", a PROLOG program responds in the affirmative to the query "Robinson is rational?". PROLOG is widely used for AI work, especially in Europe and Japan. Researchers at the Institute for New Generation Computer Technology in Tokyo have used PROLOG as the basis for sophisticated logic programming languages. These languages are in use on non-numerical parallel computers developed at the Institute. (The languages and the computers are known as "Fifth Generation" software and hardware.) Other recent work includes the development of languages for reasoning about time-dependent data such as "the account was paid yesterday". These languages are based on tense logic, a type of logic that permits statements to be located in the flow of time. (Tense logic was invented in 1953 by the philosopher Arthur Prior at the University of Canterbury, New Zealand.)
Micro-World AI The real world is full of distracting and obscuring detail: generally science progresses by focussing on artificially simple models of reality (in physics, frictionless planes and perfectly rigid bodies, for example). In 1970 Marvin Minsky and Seymour Papert, of the MIT AI Laboratory, proposed that AI research should likewise focus on developing programs capable of intelligent behaviour in artificially simple situations known as micro-worlds. Much research has focussed on the so-called blocks world, which consists of coloured blocks of various shapes and sizes arrayed on a flat surface. An early success of the micro-world approach was SHRDLU, written by Terry Winograd of MIT (details of the program were published in 1972). SHRDLU controlled a robot arm that operated above a flat surface strewn with play blocks (both the arm and the blocks were virtual). SHRDLU would respond to commands typed in natural English, such as "Will you please stack up both of the red blocks and either a green cube or a pyramid". The program would plan out a sequence of actions and in the virtual world the robot arm would arrange the blocks appropriately. SHRDLU could correctly answer questions about its world of blocks, for example "Can a pyramid be supported by a pyramid?" (SHRDLU attempts to stack up two pyramids and fails) and "Is there anything which is bigger than every pyramid but is not as wide as the thing that supports it?" (to which SHRDLU answered "Yes, the blue block"). SHRDLU could also answer questions about its own actions: Had you touched any pyramid before you put one on the green block? YES, THE GREEN ONE. When did you pick it up? WHILE I WAS STACKING UP A RED CUBE, A LARGE GREEN CUBE, AND A LARGE RED BLOCK. Why? TO GET RID OF IT. Why did you do that? TO CLEAN OFF THE RED CUBE. Why did you clear off that cube? TO PUT IT ON A LARGE GREEN CUBE. Why did you do that? TO STACK UP A RED CUBE, A LARGE GREEN CUBE, AND A LARGE RED BLOCK. Why did you do that? BECAUSE YOU ASKED ME TO. Although SHRDLU was initially hailed as a major breakthrough, Winograd soon announced that the program was in fact a dead end. The techniques pioneered in the program proved unsuitable for application in wider, more interesting worlds. Moreover, the appearance that Shrdlu gives of understanding the blocks micro-world, and English statements concerning it, is in fact an illusion. Shrdlu has no idea what a red block is. Another product of the micro-world approach was Shakey, a mobile robot developed at the Stanford Research Institute by Bertram Raphael, Nils Nilsson and their group, during the period 1968-1972. (Shakey can now be viewed at the Boston Computing Museum.) The robot occupied a specially built
micro-world consisting of walls, doorways, and a few simply-shaped wooden blocks. Each wall had a carefully painted baseboard to enable the robot to "see" where the wall met the floor (a simplification of reality that is typical of the micro-world approach). Shakey had about a dozen basic abilities, such as TURN, PUSH and CLIMB-RAMP. These could be combined in various ways by the robot's planning programs. Shakey's primary sensor was a black-and-white television camera. Other sensors included a "bump bar", and odometry that enabled the robot to calculate its position by "dead reckoning". A demonstration video showed Shakey obeying an instruction to move a certain block from one room to another by locating a ramp, pushing the ramp to the platform on which the block happened to be located, trundling up the ramp, toppling the block onto the floor, descending the ramp, and manoeuvring the block to the required room, this sequence of actions having been devised entirely by the robot's planning program without human intervention. Critics emphasise the highly simplified nature of Shakey's environment and point out that, despite these simplifications, Shakey operated excruciatingly slowly--the sequence of actions in the demonstration video in fact took days to complete. The reasons for Shakey's inability to operate on the same time-scale as a human being are examined later in this article. FREDDY, a stationary robot with a TV "eye" mounted on a steerable platform, and a pincer "hand", was constructed at Edinburgh University under the direction of Donald Michie. FREDDY was able to recognise a small repertoire of objects, including a hammer, a cup and a ball, with about 95% accuracy; recognising a single object would take several minutes of computing time. The robot could be "taught" to assemble simple objects, such as a toy car, from a kit of parts. Envisaged applications included production-line assembly work and automatic parcel handling. FREDDY was conceived in 1966 but work was interrupted in 1973, owing to a change in the British Government's funding policy in the wake of a disparaging report on AI (and especially robotics) by the Cambridge mathematician Sir James Lighthill. Work on FREDDY resumed in 1982 with U.S. funding. Roger Schank and his group at Yale applied a form of the micro-world approach to language processing. Their program SAM (1975) could answer questions about simple stories concerning stereotypical situations, such as dining in a restaurant and travelling on the subway. The program could infer information that was implicit in the story. For example, when asked "What did John order?", SAM replies "John ordered lasagne", even though the story states only that John went to a restaurant and ate lasagne. FRUMP, another program by Schank's group (1977), produced summaries in three languages of wire-service news reports. Impressive though SAM and FRUMP are, it is important to bear in mind that these programs are disembodied and have no real idea what lasagne and eating are. As critics point out, understanding a story requires more than an ability to produce strings of symbols in response to other strings of symbols. The greatest success of the micro-world approach is a type of programs known as an expert system.
Expert Systems An expert system is a computer program dedicated to solving problems and giving advice within a specialised area of knowledge. A good system can match the performance of a human specialist. The field of expert systems is the most advanced part of AI, and expert systems are in wide commercial use. Expert systems are examples of micro-world programs: their "worlds"--for example, a model of a ship's hold and the containers that are to be stowed in it--are self-contained and relatively uncomplicated. Uses of expert systems include medical diagnosis, chemical analysis, credit authorisation, financial management, corporate planning, document routing in financial institutions, oil and mineral prospecting, genetic engineering, automobile design and manufacture, camera lens design, computer installation design, airline scheduling, cargo placement, and the provision of an automatic customer help service for home computer owners. The basic components of an expert system are a "knowledge base" or KB and an "inference engine". The information in the KB is obtained by interviewing people who are expert in the area in question. The interviewer, or "knowledge engineer", organises the information elicited from the experts into a collection of rules, typically of "if-then" structure. Rules of this type are called "production rules". The inference engine enables the expert system to draw deductions from the rules in the KB. For example, if the KB contains production rules "if x then y" and "if y then z", the inference engine is able to deduce "if x then z". The expert system might then query its user "is x true in the situation that we are considering?" (e.g. "does the patient have a rash?") and if the answer is affirmative, the system will proceed to infer z. In 1965 the AI researcher Edward Feigenbaum and the geneticist Joshua Lederberg, both of Stanford University, began work on Heuristic Dendral, the high-performance program that was the model for much of the ensuing work in the area of expert systems (the name subsequently became DENDRAL). The program's task was chemical analysis. The substance to be analysed might, for example, be a complicated compound of carbon, hydrogen and nitrogen. Starting from spectrographic data obtained from the substance, DENDRAL would hypothesise the substance's molecular structure. DENDRAL's performance rivalled that of human chemists expert at this task, and the program was used in industry and in universities. Work on MYCIN, an expert system for treating blood infections, began at Stanford in 1972. MYCIN would attempt to identify the organism responsible for an infection from information concerning the patient's symptoms and test results. The program would request further information if necessary, asking questions such as "has the patient recently suffered burns?". Sometimes MYCIN would suggest additional laboratory tests. When the program had arrived at a diagnosis it would recommend a course of medication. If requested, MYCIN would explain the reasoning leading to the diagnosis and recommendation. Examples of production rules from MYCIN's knowledge base are (1) If the site of the culture is blood, and the stain of the organism is gramneg, and the morphology of the organism is rod, and the patient
has been seriously burned, then there is evidence (.4) that the identity of the organism is pseudomonas. (The decimal number is a certainty factor, indicating the extent to which the evidence supports the conclusion.) (2) If the identity of the organism is pseudomonas then therapy should be selected from among the following drugs: Colistin (.98) Polymyxin (.96) Gentamicin (.96) Carbenicillin (.65) Sulfisoxazole (.64). (The decimal numbers represent the statistical probability of the drug arresting the growth of pseudomonas.) The program would make a final choice of drug from this list after quizzing the user concerning contra-indications such as allergies. Using around 500 such rules MYCIN achieved a high level of performance. The program operated at the same level of competence as human specialists in blood infections, and rather better than general practitioners. Janice Aikins' medical expert system Centaur (1983) was designed to determine the presence and severity of lung disease in a patient by interpreting measurements from pulmonary function tests. The following is actual output from the expert system concerning a patient at Pacific Medical Center in San Francisco. The findings about the diagnosis of obstructive airways disease are as follows: Elevated lung volumes indicate overinflation. The RV/TLC ratio is increased, suggesting a severe degree of air trapping. Low mid-expiratory flow is consistent with severe airway obstruction. Obstruction is indicated by curvature of the flow-volume loop which is of a severe degree. Conclusions: Smoking probably exacerbates the severity of the patient's airway obstruction. Discontinuation of smoking should help relieve the symptoms. Good response to bronchodilators is consistent with an asthmatic condition, and their continued use is indicated. Pulmonary function diagnosis: Severe obstructive airways disease, asthmatic type. Consultation finished. An important feature of expert systems is that they are able to work cooperatively with their human users, enabling a degree of human-computer symbiosis. AI researcher Douglas Lenat says of his expert system Eurisko, which became a champion player in the star-wars game Traveller, that the "final crediting of the win should be about 60/40% Lenat/Eurisko, though the significant point here is that neither Lenat nor Eurisko could have won alone". Eurisko and Lenat cooperatively designed a fleet of warships which exploited the rules of the Traveller game in unconventional ways, and which was markedly superior to the fleets designed by human participants in the game.
Fuzzy logic Some expert systems use fuzzy logic. In standard, non-fuzzy, logic there are only two "truth values", true and false. This is a somewhat unnatural restriction, since we normally think of statements as being nearly true, partly false, truer than certain other statements, and so on. According to standard logic, however, there are no such in-between values--no "degrees of truth"--and any statement is either completely true or completely false. In 1920 and 1930 the Polish philosopher Jan Lukasiewicz introduced a form of logic that employs not just two values but many. Lotfi Zadeh, of the University of California at Berkeley, subsequently proposed that the many values of Lukasiewicz's logic be regarded as degrees of truth, and he coined the expression "fuzzy logic" for the result. (Zadeh published the first of many papers on the subject in 1965.) Fuzzy logic is particularly useful when it is necessary to deal with vague
expressions, such as "bald", "heavy", "high", "low", "hot", "cold" and so on. Vague expressions are difficult to deal with in standard logic because statements involving them--"Fred is bald", say--may be neither completely true nor completely false. Non-baldness shades gradually into baldness, with no sharp dividing line at which the statement "Fred is bald" could change from being completely false to completely true. Often the rules that knowledge engineers elicit from human experts contain vague expressions, so it is useful if an expert system's inference engine employs fuzzy logic. An example of such a rule is: "If the pressure is high but not too high, then reduce the fuel flow a little". (Fuzzy logic is used elsewhere in AI, for example in robotics and in neuron-like computing. There are literally thousands of commercial applications of fuzzy logic, many developed in Japan, ranging from an automatic subway train controller to control systems for washing machines and cameras.)
Limitations of expert systems Expert systems have no "common sense". They have no understanding of what they are for, nor of what the limits of their applicability are, nor of how their recommendations fit into a larger context. If MYCIN were told that a patient who has received a gunshot wound is bleeding to death, the program would attempt to diagnose a bacterial cause for the patient's symptoms. Expert systems can make absurd errors, such as prescribing an obviously incorrect dosage of a drug for a patient whose weight and age are accidentally swapped by the clerk. One project aimed at improving the technology further is described in the next section. The knowledge base of an expert system is small and therefore manageable--a few thousand rules at most. Programmers are able to employ simple methods of searching and updating the KB which would not work if the KB were large. Furthermore, micro-world programming involves extensive use of what are called "domain-specific tricks"--dodges and shortcuts that work only because of the circumscribed nature of the program's "world". More general simplifications are also possible. One example concerns the representation of time. Some expert systems get by without acknowledging time at all. In their micro-worlds everything happens in an eternal present. If reference to time is unavoidable, the microworld programmer includes only such aspects of temporal structure as are essential to the task--for example, that if a is before b and b is before c then a is before c. This rule enables the expert system to merge suitable pairs of before-statements and so extract their implication (e.g. that the patient's rash occurred before the application of penicillin). The system may have no other information at all concerning the relationship "before"--not even that it orders events in time rather than space. The problem of how to design a computer program that performs at human levels of competence in the full complexity of the real world remains open.
The CYC Project CYC (the name comes from "encyclopaedia") is the largest experiment yet in symbolic AI. The project began at the Microelectronics and Computer Technology Corporation in Texas in 1984 under the direction of Douglas Lenat, with an initial budget of U.S.$50 million, and is now Cycorp Inc. The goal is to build a KB containing a significant percentage of the common sense knowledge of a human being. Lenat hopes that the CYC project will culminate in a KB that can serve as the foundation for future generations of expert systems. His expectation is that when expert systems are equipped with common sense they will achieve an even higher level of performance and be less prone to errors of the sort just mentioned. By "common sense", AI researchers mean that large corpus of worldly knowledge that human beings use to get along in daily life. A moment's reflection reveals that even the simplest activities and transactions presuppose a mass of trivial-seeming knowledge: to get to a place one should (on the whole) move in its direction; one can pass by an object by moving first towards it and then away from it; one can pull with a string, but not push; pushing something usually affects its position; an object resting on a pushed object usually but not always moves with the pushed object; water flows downhill; city dwellers do not usually go outside undressed; causes generally precede their effects; time constantly passes and future events become past events ... and so on and so on. A computer that is to get along intelligently in the real world must somehow be given access to millions of such facts. Winograd, the creator of SHRDLU, has remarked "It has long been recognised that it is much easier to write a program to carry out abstruse formal operations than to capture the common sense of a dog". The CYC project involves "hand-coding" many millions of assertions. By the end of the first six years, over one million assertions had been entered manually into the KB. Lenat estimates that it will require some 2 person-centuries of work to increase this figure to the 100 million assertions that he believes are necessary before CYC can begin learning usefully from written material for itself. At any one time as many as 30 people may be logged into CYC, all simultaneously entering data. These knowledge-enterers (or "cyclists") go through newspaper and magazine articles, encyclopaedia entries, advertisements, and so forth, asking themselves what the writer assumed the reader would already know: living things get diseases, the products of a commercial process are more valuable than the inputs, and so on. Lenat describes CYC as "the complement of an encyclopaedia": the primary goal of the project is to encode the knowledge that any person or machine must have before they can begin to understand an encyclopaedia. He has predicted that in the early years of the new millennium, CYC will become "a system with human-level breadth and depth of knowledge". CYC uses its common-sense knowledge to draw inferences that would defeat simpler systems. For example, CYC can infer "Garcia is wet" from the statement "Garcia is finishing a marathon run", employing its knowledge that running a marathon entails high exertion, that people sweat at high levels of exertion, and that when something sweats it is wet. Among the outstanding fundamental problems with CYC are (1) issues in search and problem-solving, for example how to automatically search the KB for information that is relevant to a given problem
(these issues are aspects of the frame problem, described in the section Nouvelle AI) and (2) issues in knowledge representation, for example how basic concepts such as those of substance and causation are to be analyzed and represented within the KB. Lenat emphasises the importance of large-scale knowledge-entry and is devoting only some 20 percent of the project's effort to development of mechanisms for searching, updating, reasoning, learning, and analogizing. Critics argue that this strategy puts the cart before the horse.
Top-Down AI vs Bottom-Up AI Turing's manifesto of 1948 distinguished two different approaches to AI, which may be termed "top down" and "bottom up". The work described so far in this article belongs to the top-down approach. In top-down AI, cognition is treated as a high-level phenomenon that is independent of the low-level details of the implementing mechanism--a brain in the case of a human being, and one or another design of electronic digital computer in the artificial case. Researchers in bottom-up AI, or connectionism, take an opposite approach and simulate networks of artificial neurons that are similar to the neurons in the human brain. They then investigate what aspects of cognition can be recreated in these artificial networks. The difference between the two approaches may be illustrated by considering the task of building a system to discriminate between W, say, and other letters. A bottom-up approach could involve presenting letters one by one to a neural network that is configured somewhat like a retina, and reinforcing neurons that happen to respond more vigorously to the presence of W than to the presence of aany other letter. A top-down approach could involve writing a computer program that checks inputs of letters against a description of W that is couched in terms of the angles and relative lengths of intersecting line segments. Simply put, the currency of the bottom-up approach is neural activity and of the top-down approach descriptions of relevant features of the task. The descriptions employed in the top-down approach are stored in the computer's memory as structures of symbols (e.g. lists). In the case of a chess or checkers program, for example, the descriptions involved are of board positions, moves, and so forth. The reliance of top-down AI on symbolically encoded descriptions has earned it the name "symbolic AI". In the 1970s Newell and Simon-vigorous advocates of symbolic AI--summed up the approach in what they called the Physical Symbol System Hypothesis, which says that the processing of structures of symbols by a digital computer is sufficient to produce artificial intelligence, and that, moreover, the processing of structures of symbols by the human brain is the basis of human intelligence. While it remains an open question whether the Physical Symbol System Hypothesis is true or false, recent successes in bottom-up AI have resulted in symbolic AI being to some extent eclipsed by the neural approach, and the Physical Symbol System Hypothesis has fallen out of fashion.
Connectionism Connectionism, or neuron-like computing, developed out of attempts to understand how the brain works at the neural level, and in particular how we learn and remember.
A natural neural network. The Golgi method of staining brain tissue renders the neurons and their interconnecting fibres visible in silhouette.
In one famous connectionist experiment (conducted at the University of California at San Diego and published in 1986), David Rumelhart and James McClelland trained a network of 920 artificial neurons to form the past tenses of English verbs. The network consisted of two layers of 460 neurons:
Each of the 460 neurons in the input layer is connected to each of the 460 neurons in the output layer Root forms of verbs--such as "come", "look", and "sleep"--were presented (in an encoded form) to one layer of neurons, the input layer. A supervisory computer program observed the difference between the
actual response at the layer of output neurons and the desired response--"came", say--and then mechanically adjusted the connections throughout the network in such a way as to give the network a slight push in the direction of the correct response (this procedure is explained in more detail in what follows). About 400 different verbs were presented one by one to the network and the connections were adjusted after each presentation. This whole procedure was repeated about 200 times using the same verbs. By this stage the network had learned its task satisfactorily and would correctly form the past tense of unfamiliar verbs as well as of the original verbs. For example, when presented for the first time with "guard" the network responded "guarded", with "weep" "wept", with "cling" "clung", and with "drip" "dripped" (notice the double "p"). This is a striking example of learning involving generalisation. (Sometimes, though, the peculiarities of English were too much for the network and it formed "squawked" from "squat", "shipped" from "shape", and "membled" from "mail".) The simple neural network shown below illustrates the central ideas of connectionism.
A pattern-classifier Four of the network's five neurons are for input and the fifth--to which each of the others is connected-is for output. Each of the neurons is either firing (1) or not firing (0). This network can learn to which of two groups, A and B, various simple patterns belong. An external agent is able to "clamp" the four input neurons into a desired pattern, for example 1100 (i.e. the two neurons to the left are firing and the other two are quiescent). Each such pattern has been pre-assigned to one of two groups, A and B. When a pattern is presented as input, the trained network will correctly classify it as belonging to group A or group B, producing 1 as output if the pattern belongs to A, and 0 if it belongs to B (i.e. the output neuron fires in the former case, does not fire in the latter).
Each connection leading to N, the output neuron, has a "weight". What is called the "total weighted input" into N is calculated by adding up the weights of all the connections leading to N from neurons that are firing. For example, suppose that only two of the input neurons, X and Y, are firing. Since the weight of the connection from X to N is 1.5 and the weight of the connection from Y to N is 2, it follows that the total weighted input to N is 3.5. N has a "firing threshold" of 4. That is to say, if N's total weighted input exceeds or equals N's threshold, then N fires; and if the total weighted input is less than the threshold, then N does not fire. So, for example, N does not fire if the only input neurons to fire are X and Y, but N does fire if X, Y and Z all fire. Training the network involves two steps. First, the external agent inputs a pattern and observes the behaviour of N. Second, the agent adjusts the connection-weights in accordance with the rules: (1) If the actual output is 0 and the desired output is 1, increase by a small fixed amount the weight of each connection leading to N from neurons that are firing (thus making it more likely that N will fire next time the network is given the same pattern) (2) If the actual output is 1 and the desired output is 0, decrease by that same small amount the weight of each connection leading to the output neuron from neurons that are firing (thus making it less likely that the output neuron will fire the next time the network is given that pattern as input). The external agent--actually a computer program--goes through this two-step procedure with each of the patterns in the sample that the network is being trained to classify. The agent then repeats the whole process a considerable number of times. During these many repetitions, a pattern of connection weights is forged that enables the network to respond correctly to each of the patterns. The striking thing is that the learning process is entirely mechanistic and requires no human intervention or adjustment. The connection weights are increased or decreased mechanically by a constant amount and the procedure remains the same no matter what task the network is learning. Another name for connectionism is "parallel distributed processing" or PDP. This terminology emphasises two important features of neuron-like computing. (1) A large number of relatively simple processors--the neurons--operate in parallel. (2) Neural networks store information in a distributed or holistic fashion, with each individual connection participating in the storage of many different items of information. The know-how that enables the past-tense network to form "wept" from "weep", for example, is not stored in one specific location in the network but is spread through the entire pattern of connection weights that was forged during training. The human brain also appears to store information in a distributed fashion, and connectionist research is contributing to attempts to understand how the brain does so. Recent work with neural networks includes: (1) The recognising of faces and other objects from visual data. A neural network designed by John Hummel and Irving Biederman at the University of Minnesota can identify about ten objects from simple line drawings. The network is able to recognise the objects--which include a mug and a frying pan--even
when they are drawn from various different angles. Networks investigated by Tomaso Poggio of MIT are able to recognise (a) bent-wire shapes drawn from different angles (b) faces photographed from different angles and showing different expressions (c) objects from cartoon drawings with grey-scale shading indicating depth and orientation. (An early commercially available neuron-like face recognition system was WISARD, designed at the beginning of the 1980s by Igor Aleksander of Imperial College London. WISARD was used for security applications.) (2) Language processing. Neural networks are able to convert handwriting and typewritten material to standardised text. The U.S. Internal Revenue Service has commissioned a neuron-like system that will automatically read tax returns and correspondence. Neural networks also convert speech to printed text and printed text to speech. (3) Neural networks are being used increasingly for loan risk assessment, real estate valuation, bankruptcy prediction, share price prediction, and other business applications. (4) Medical applications include detecting lung nodules and heart arrhythmia, and predicting patients' reactions to drugs. (5) Telecommunications applications of neural networks include control of telephone switching networks and echo cancellation in modems and on satellite links.
History of connectionism In 1933 the psychologist Edward Thorndike suggested that human learning consists in the strengthening of some (then unknown) property of neurons, and in 1949 psychologist Donald Hebb suggested that it is specifically a strengthening of the connections between neurons in the brain that accounts for learning. In 1943, the neurophysiologist Warren McCulloch of the University of Illinois and the mathematician Walter Pitts of the University of Chicago published an influential theory according to which each neuron in the brain is a simple digital processor and the brain as a whole is a form of computing machine. As McCulloch put it subsequently, "What we thought we were doing (and I think we succeeded fairly well) was treating the brain as a Turing machine". McCulloch and Pitts gave little discussion of learning and apparently did not envisage fabricating networks of artificial neuron-like elements. This step was first taken, in concept, in 1947-48, when Turing theorized that a network of initially randomly connected artificial neurons--a Turing Net--could be "trained" (his word) to perform a given task by means of a process that renders certain neural pathways effective and others ineffective. Turing foresaw the procedure--now in common use by connectionists--of simulating the neurons and their interconnections within an ordinary digital computer (just as engineers create virtual models of aircraft wings and skyscrapers). However, Turing's own research on neural networks was carried out shortly before the first storedprogram electronic computers became available. It was not until 1954 (the year of Turing's death) that Belmont Farley and Wesley Clark, working at MIT, succeeded in running the first computer simulations
of small neural networks. Farley and Clark were able to train networks containing at most 128 neurons to recognise simple patterns (using essentially the training procedure described above). In addition, they discovered that the random destruction of up to 10% of the neurons in a trained network does not affect the network's performance at its task--a feature that is reminiscent of the brain's ability to tolerate limited damage inflicted by surgery, an accident, or disease. During the 1950s neuron-like computing was studied on both sides of the Atlantic. Important work was done in England by W.K. Taylor at University College, London, J.T. Allanson at Birmingham University, R.L. Beurle and A.M. Uttley at the Radar Research Establishment, Malvern; and in the U.S. by Frank Rosenblatt, at the Cornell Aeronautical Laboratory. In 1957 Rosenblatt began investigating artificial neural networks that he called "perceptrons". Although perceptrons differed only in matters of detail from types of neural network investigated previously by Farley and Clark in the U.S. and byTaylor, Uttley, Beurle and Allanson in Britain, Rosenblatt made major contributions to the field, through his experimental investigations of the properties of perceptrons (using computer simulations), and through his detailed mathematical analyses. Rosenblatt was a charismatic communicator and soon in the U.S. there were many research groups studying perceptrons. Rosenblatt and his followers called their approach connectionist to emphasise the importance in learning of the creation and modification of connections between neurons and modern researchers in neuron-like computing have adopted this term. Rosenblatt distinguished between simple perceptrons with two layers of neurons--the networks described earlier for forming past tenses and classifying patterns both fall into this category--and multilayer perceptrons with three or more layers.
A three-layer perceptron. Between the input layer (bottom) and the output layer (top) lies a so-called 'hidden layer' of neurons.
One of Rosenblatt's important contributions was to generalise the type of training procedure that Farley and Clark had used, which applied only to two-layer networks, so that the procedure can be applied to multi-layer networks. Rosenblatt used the phrase "back-propagating error correction" to describe his method. The method, and the term "back-propagation", are now in everyday use in neuron-like computing (with improvements and extensions due to Bernard Widrow and M.E. Hoff, Paul Werbos, David Rumelhart, Geoffrey Hinton, Ronald Williams, and others). During the 1950s and 1960s, the top-down and bottom-up approaches to AI both flourished, until in 1969 Marvin Minsky and Seymour Papert of MIT, who were both committed to symbolic AI, published a critique of Rosenblatt's work. They proved mathematically that there are a variety of tasks that simple two-layer perceptrons cannot accomplish. Some examples they gave are: (1) No two-layer perceptron can correctly indicate at its output neuron (or neurons) whether there are an even or an odd number of neurons firing in its input layer. (2) No two-layer perceptron can produce at its output layer the exclusive disjunction of two binary inputs X and Y (the so-called "XOR problem").
The exclusive disjunction is defined by this table.
of
two
binary
inputs
X
and
Y
It is important to realise that the mathematical results obtained by Minsky and Papert about two-layer perceptrons, while interesting and technically sophisticated, showed nothing about the abilities of perceptrons in general, since multi-layer perceptrons are able to carry out tasks that no two-layer perceptron can accomplish. Indeed, the "XOR problem" illustrates this fact: a simple three-layer perceptron can form the exclusive disjunction of X and Y (as Minsky and Papert knew). Nevertheless, Minsky and Papert conjectured--without any real evidence--that the multi-layer approach is "sterile" (their word). Somehow their analysis of the limitations of two-layer perceptrons convinced the AI community--and the bodies that fund it--of the fruitlessness of pursuing work with neural networks, and the majority of researchers turned away from the approach (although a small number remained loyal). This hiatus in research into neuron-like computing persisted for well over a decade before a renaissance occurred. Causes of the renaissance included (1) a widespread perception that symbolic AI was stagnating (2) the possibility of simulating larger and more complex neural networks, owing to the improvements that had occurred in the speed and memory of digital computers, and (3) results published in the early and mid 1980s by McClelland, Rumelhart and their research group (for example, the past-tenses experiment) which were widely viewed as a powerful demonstration of the potential of neural networks. There followed an explosion of interest in neuron-like computing, and symbolic AI moved into the back seat.
Nouvelle AI The approach to AI now known as "nouvelle AI" was pioneered at the MIT AI Laboratory by the Australian Rodney Brooks, during the latter half of the 1980s. Nouvelle AI distances itself from traditional characterisations of AI, which emphasize human-level performance. One aim of nouvelle AI is the relatively modest one of producing systems that display approximately the same level of intelligence as insects. Practitioners of nouvelle AI reject micro-world AI, emphasising that true intelligence involves the ability to function in a real-world environment. A central idea of nouvelle AI is that the basic building blocks of intelligence are very simple behaviours, such as avoiding an object, and moving forward. More complex behaviours "emerge" from the interaction of these simple behaviours. For example, a micro-robot whose simple behaviours are (1) collision-avoidance and (2) motion toward a moving object will appear to chase the moving object while hanging back from it a little. Brooks focussed in his initial work on building robots that behave somewhat like simplified insects (and in doing so he deliberately turned away from traditional characterisations of AI such as the one given at the beginning of this article). Examples of his insect-like mobile robots are Allen (after Allen Newell) and Herbert (after Herbert Simon). Allen has a ring of twelve ultrasonic sonars as its primary sensors and three independent behaviour-producing modules. The lowest-level module makes the robot avoid both stationary and moving objects. With only this module activated, Allen sits in the middle of a room until approached and then scurries away, avoiding obstacles as it goes. The second module makes the robot wander about at random when not avoiding objects, and the third pushes the robot to look for distant places with its sensors and to move towards them. (The second and third modules are in tension--just as our overall behaviour may sometimes be the product of conflicting drives, such as the drive to seek safety and the drive to avoid boredom.) Herbert has thirty infrared sensors for avoiding local obstacles, a laser system that collects threedimensional depth data over a distance of about twelve feet in front of the robot, and a hand equipped with a number of simple sensors. Herbert's real-world environment consists of the busy offices and work-spaces of the AI lab. The robot searches on desks and tables in the lab for empty soda cans, which it picks up and carries away. Herbert's seemingly coordinated and goal-directed behaviour emerges from the interactions of about fifteen simple behaviours. Each simple behaviour is produced by a separate module, and each of these modules functions without reference to the others. (Unfortunately, Herbert's mean time from power-on to hardware failure is no more than fifteen minutes, owing principally to the effects of vibration.) Other robots produced by Brooks and his group include Genghis, a six-legged robot that walks over rough terrain and will obediently follow a human, and Squirt, which bides in dark corners until a noise beckons it out, when it will begin to follow the source of the noise, moving with what appears to be circumspection from dark spot to dark spot. Other experiments involve tiny "gnat" robots. Speaking of
potential applications, Brooks describes possible colonies of gnat robots designed to inhabit the surface of TV and computer screens and keep them clean. Brooks admits that even his more complicated artificial insects come nowhere near the complexity of real insects. One question that must be faced by those working in situated AI is whether insect-level behaviour is a reasonable initial goal. John von Neumann, the computer pioneer and founder, along with Turing, of the research area now known as "artificial life", thought otherwise. In a letter to the cyberneticist Norbert Wiener in 1946, von Neumann argued that automata theorists who select the human nervous system as their model are unrealistically picking "the most complicated object under the sun", and that there is little advantage in selecting instead the ant, since any nervous system at all exhibits "exceptional complexity". Von Neumann believed that "the decisive break" is "more likely to come in another theater" and recommended attention to "organisms of the virus or bacteriophage type" which, he pointed out, are "self-reproductive and ... are able to orient themselves in an unorganised milieu, to move towards food, to appropriate it and to use it". This starting point would, as he put it, provide "a degree of complexity which is not necessarily beyond human endurance".
The frame problem The products of nouvelle AI are quite different from those of symbolic AI, for example Shakey and FREDDY. These contained an internal model (or "representation") of their micro-worlds, consisting of symbolic descriptions. This structure of symbols had to be updated continuously as the robot moved or the world changed. The robots' planning programs would juggle with this huge structure of symbols until descriptions were derived of actions that would transform the current situation into the desired situation. All this computation required a large amount of processing time. This is why Shakey performed its tasks with extreme slowness, even though careful design of the robot's environment minimised the complexity of the internal model. In contrast, Brooks' robots contain no internal model of the world. Herbert, for example, continuously discards the information that is received from its sensors, sensory information persisting in the robot's memory for no more than two seconds. AI researchers call the problem of updating, searching, and otherwise manipulating, a large structure of symbols in realistic amounts of time the frame problem. The frame problem is endemic to symbolic AI. Some critics of symbolic AI believe that the frame problem is largely insolvable and so maintain that the symbolic approach will not "scale up" to yield genuinely intelligent systems. It is possible that CYC, for example, will succumb to the frame problem long before the system achieves human levels of knowledge. Nouvelle AI sidesteps the frame problem. Nouvelle systems do not contain a complicated symbolic model of their environment. Information is left "out in the world" until such time as the system needs it. A nouvelle system refers continuously to its sensors rather than to an internal model of the world: it "reads off" the external world whatever information it needs, at precisely the time it needs it. As Brooks puts it, the world is its own best model--always exactly up to date and complete in every detail.
Situated AI
Traditional AI has by and large attempted to build disembodied intelligences whose only way of interacting with the world has been via keyboard and screen or printer. Nouvelle AI attempts to build embodied intelligences situated in the real world. Brooks quotes approvingly from the brief sketches that Turing gave in 1948 and 1950 of the "situated" approach. Turing wrote of equipping a machine "with the best sense organs that money can buy" and teaching it "to understand and speak English" by a process that would "follow the normal teaching of a child". Turing contrasted this with the approach to AI that focuses on abstract activities, such as the playing of chess. He advocated that both approaches be pursued, but until now relatively little attention has been paid to the situated approach. The situated approach is anticipated in the writings of the philosopher Bert Dreyfus, of the University of California at Berkeley. Dreyfus is probably the best-known critic of symbolic AI. He has been arguing against the Physical Symbol System Hypothesis since the early 1960s, urging the inadequacy of the view that everything relevant to intelligent behaviour can be captured by means of structures (e.g. lists) of symbolic descriptions. At the same time he has advocated an alternative view of intelligence, which stresses the need for an intelligent agent to be situated in the world, and he has emphasised the role of the body in intelligent behaviour and the importance of such basic activities as moving about in the world and dealing with obstacles. Once reviled by admirers of AI, Dreyfus is now regarded as a prophet of the situated approach.
Cog Brooks' own recent work has taken the opposite direction to that proposed by von Neumann in the quotations given earlier. Brooks is pursuing AI's traditional goal of human-level intelligence, and with Lynn Andrea Stein, he has built a humanoid robot known as Cog. Cog has four microphone-type sound sensors and is provided with saccading foveated vision by cameras mounted on its "head". Cog's (legless) torso is capable of leaning and twisting. Strain gauges on the spine give Cog information about posture. Heat and current sensors on the robot's motors provide feedback concerning exertion. The arm and manipulating hand are equipped with strain gauges and heat and current sensors. Electricallyconducting rubber membranes on the hand and arm provide tactile information. Brooks believes that Cog will learn to correlate noises with visual events and to extract human voices from background noise; and that in the long run Cog will, through its interactions with its environment and with human beings, learn for itself some of the wealth of common sense knowledge that Lenat and his team are patiently hand-coding into CYC. Critics of nouvelle AI emphasis that so far the approach has failed to produce a system exhibiting anything like the complexity of behaviour found in real insects. Suggestions by some advocates of nouvelle AI that it is only a short step to systems which are conscious and which possess language seem entirely premature.
Chess Some of AI's most conspicuous successes have been in chess, its oldest area of research. In 1945 Turing predicted that computers would one day play "very good chess", an opinion echoed in 1949 by Claude Shannon of Bell Telephone Laboratories, another early theoretician of computer chess. By 1958 Simon and Newell were predicting that within ten years the world chess champion would be a computer, unless barred by the rules. Just under 40 years later, on May 11 1997, in midtown Manhattan, IBM's Deep Blue beat the reigning world champion, Gary Kasparov, in a six-game match. Critics question the worth of research into computer chess. MIT linguist Noam Chomsky has said that a computer program's beating a grandmaster at chess is about as interesting as a bulldozer's "winning" an Olympic weight-lifting competition. Deep Blue is indeed a bulldozer of sorts--its 256 parallel processors enable it to examine 200 million possible moves per second and to look ahead as many as fourteen turns of play. The huge improvement in machine chess since Turing's day owes much more to advances in hardware engineering than to advances in AI. Massive increases in cpu speed and memory have meant that each generation of chess machine has been able to examine increasingly more possible moves. Turing's expectation was that chess-programming would contribute to the study of how human beings think. In fact, little or nothing about human thought processes has been learned from the series of projects that culminated in Deep Blue.
Is Strong AI Possible? The ongoing success of applied Artificial Intelligence and of cognitive simulation seems assured. However, strong AI, which aims to duplicate human intellectual abilities, remains controversial. The reputation of this area of research has been damaged over the years by exaggerated claims of success that have appeared both in the popular media and in the professional journals. At the present time, even an embodied system displaying the overall intelligence of a cockroach is proving elusive, let alone a system rivalling a human being. The difficulty of "scaling up" AI's so far relatively modest achievements cannot be overstated. Five decades of research in symbolic AI has failed to produce any firm evidence that a symbol-system can manifest human levels of general intelligence. Critics of nouvelle AI regard as mystical the view that high-level behaviours involving language-understanding, planning, and reasoning will somehow "emerge" from the interaction of basic behaviours like obstacle avoidance, gaze control and object manipulation. Connectionists have been unable to construct working models of the nervous systems of even the simplest living things. Caenorhabditis elegans, a much-studied worm, has approximately 300 neurons, whose pattern of interconnections is perfectly known. Yet connectionist models have failed to mimic the worm's simple nervous system. The "neurons" of connectionist theory are gross oversimplifications of the real thing. However, this lack of substantial progress may simply be testimony to the difficulty of strong AI, not to its impossibility. Let me turn to the very idea of strong artificial intelligence. Can a computer possibly be intelligent, think and understand? Noam Chomsky suggests that debating this question is pointless, for it is a question of decision, not fact: decision as to whether to adopt a certain extension of common usage. There is, Chomsky claims, no factual question as to whether any such decision is right or wrong--just as there is no question as to whether our decision to say that aeroplanes fly is right, or our decision not to say that ships swim is wrong. However, Chomsky is oversimplifying matters. Of course we could, if we wished, simply decide to describe bulldozers, for instance, as things that fly. But obviously it would be misleading to do so, since bulldozers are not appropriately similar to the other things that we describe as flying. The important questions are: could it ever be appropriate to say that computers are intelligent, think, and understand, and if so, what conditions must a computer satisfy in order to be so described? Some authors offer the Turing test as a definition of intelligence: a computer is intelligent if and only if the test fails to distinguish it from a human being. However, Turing himself in fact pointed out that his test cannot provide a definition of intelligence. It is possible, he said, that a computer which ought to be described as intelligent might nevertheless fail the test because it is not capable of successfully imitating a human being. For example, why should an intelligent robot designed to oversee mining on the moon necessarily be able to pass itself off in conversation as a human being? If an intelligent entity can fail the test, then the test cannot function as a definition of intelligence.
It is even questionable whether a computer's passing the test would show that the computer is intelligent. In 1956 Claude Shannon and John McCarthy raised the objection to the test that it is possible in principle to design a program containing a complete set of "canned" responses to all the questions that an interrogator could possibly ask during the fixed time-span of the test. Like Parry, this machine would produce answers to the interviewer's questions by looking up appropriate responses in a giant table. This objection--which has in recent years been revived by Ned Block, Stephen White, and myself-seems to show that in principle a system with no intelligence at all could pass the Turing test. In fact AI has no real definition of intelligence to offer, not even in the sub-human case. Rats are intelligent, but what exactly must a research team achieve in order for it to be the case that the team has created an artefact as intelligent as a rat? In the absence of a reasonably precise criterion for when an artificial system counts as intelligent, there is no way of telling whether a research program that aims at producing intelligent artefacts has succeeded or failed. One result of AI's failure to produce a satisfactory criterion of when a system counts as intelligent is that whenever AI achieves one of its goals--for example, a program that can summarise newspaper articles, or beat the world chess champion--critics are able to say "That's not intelligence!" (even critics who have previously maintained that no computer could possibly do the thing in question). Marvin Minsky's response to the problem of defining intelligence is to maintain that "intelligence" is simply our name for whichever problem-solving mental processes we do not yet understand. He likens intelligence to the concept "unexplored regions of Africa": it disappears as soon as we discover it. Earlier Turing made a similar point, saying "One might be tempted to define thinking as consisting of those mental processes that we don't understan"'. However, the important problem remains of giving a clear criterion of what would count as success in strong artificial intelligence research.
The Chinese Room Objection One influential objection to strong AI, the Chinese room objection, originates with the philosopher John Searle. Searle claims to be able to prove that no computer program--not even a computer program from the far-distant future--could possibly think or understand. Searle's alleged proof is based on the fact that every operation that a computer is able to carry out can equally well be performed by a human being working with paper and pencil. As Turing put the point, the very function of an electronic computer is to carry out any process that could be carried out by a human being working with paper and pencil in a "disciplined but unintelligent manner". For example, one of a computer's basic operations is to compare the binary numbers in two storage locations and to write 1 in some further storage location if the numbers are the same. A human can perfectly well do this, using pieces of paper as the storage locations. To believe that strong AI is possible is to believe that intelligence can "emerge" from long chains of basic operations each of which is as simple as this one. Given a list of the instructions making up a computer program, a human being can in principle obey each instruction using paper and pencil. This is known as "handworking" a program. Searle's Chinese room objection is as follows. Imagine that, at some stage in the future, AI researchers in, say, China announce a program that really does think and understand, or so they claim. Imagine further that in a Turing test (conducted in Chinese) the program cannot be distinguished from human beings. Searle maintains that, no matter how good the performance of the program, and no matter what algorithms and datastructures are employed in the program, it cannot in fact think and understand. This can be proved, he says, by considering an imaginary human being, who speaks no Chinese, handworking the program in a closed room. (Searle extends the argument to connectionist AI by considering not a room containing a single person but a gymnasium containing a large group of people, each one of whom simulates a single artificial neuron.) The interogator's questions, expressed in the form of Chinese ideograms, enter the room through an input slot. The human in the room--Clerk, let's say--follows the instructions in the program and carries out exactly the same series of computations that an electronic computer running the program would carry out. These computations eventually produce strings of binary symbols that the program instructs Clerk to correlate, via a table, with patterns of squiggles and squoggles (actually Chinese ideograms). Clerk finally pushes copies of the ideograms through an output slot. As far as the waiting interogator is concerned, the ideograms form an intelligent response to the question that was posed. But as far as Clerk is concerned, the output is just squiggles and squoggles--hard won, but completely meaningless. Clerk does not even know that the inputs and outputs are linguistic expressions. Yet Clerk has done everything that a computer running the program would do. It surely follows, says Searle, that since Clerk does not understand the input and the output after working through the program, then nor does an electronic computer. Few accept Searle's objection, but there is little agreement as to exactly what is wrong with it. My own response to Searle, known as the Logical Reply to the Chinese room objection, is this. The fact that Clerk says "No" when asked whether he understands the Chinese input and output by no means shows that the wider system of which Clerk is a part does not understand Chinese. The wider system consists of
Clerk, the program, quantities of data (such as the table correlating binary code with ideograms), the input and output slots, the paper memory store, and so forth. Clerk is just a cog in a wider machine. Searle's claim is that the statement "The system as a whole does not understand" follows logically from the statement "Clerk does not understand". The logical reply holds that this claim is fallacious, for just the same reason that it would be fallacious to claim that the statement "The organisation of which Clerk is a part has no taxable assets in Japan" follows logically from the statement "Clerk has no taxable assets in Japan". If the logical reply is correct then Searle's objection to strong AI proves nothing.
For More Information About AI Read my book: Artificial Intelligence: A Philosophical Introduction Jack Copeland Oxford UK and Cambridge, Mass.: Basil Blackwell, September 1993, reprinted 1994, 1995, 1997 (twice), 1998, 1999 (xii, 320). Translated into Hebrew (1995), Spanish (1996). Second edition forthcoming in 2001.
To view some of the book Order this book through: Blackwell | Amazon.com | Barnes & Noble | Borders