Introducing:
The Cyc Foundation
April 13, 2006 April 13, 2006
1
Motivations Wikimedia Foundation: “Imagine a world in which every single person is given free access to the sum of all human knowledge. That's what we're doing.”
Cyc Foundation: “Imagine a world in which every single person is given free access to programs that reason with the sum of all human knowledge. That's what we're doing.” April 13, 2006
2
Topic Map – Top Level
April 13, 2006
3
Cyc Reasoning System Knowledge Users User Interface (with Natural Language Dialog)
Cyc
Reasoning Modules
Cyc Ontology & Knowledge Base
Interface to External Data Sources External Data Sources
Data Bases
Web Pages
Text Sources
Other Applications Cyc API
Knowledge Entry Tools
Knowledge Authors
Other KBs
Help Find Information by Inference (+KB) Query: “Someone happy”
Caption: “A man watching his daughter take her first step” April 13, 2006
5
Help Find Information by Inference (+KB) (∃x) (feelsEmotion x Happiness Positive)
Lo gic a
l In
fe
re
nc e
(d ed
uc ti
on
)
Query: “Someone happy”
Caption: “A man watching his daughter take her first step”
(∃x,y) (and (father x y) (gender x Female) (sees x y) (walking
April 13, 2006
6
Help Find Information by Inference (+KB)
Lo gic a
l In
fe
re
nc e
(d ed
uc ti
on
)
(∃x) (feelsEmotion x Happiness Positive) ...
(implies (and (isa ?BIG-EVENT HumanLifecycleMilestone) (doneBy ?BIG-EVENT ?CHILD) (sees ?PARENT ?BIG-EVENT) (children ?PARENT ?CHILD)) (holdsIn ?BIG-EVENT (feelsEmotionTypeAtLevel ?PARENT (PositiveAmountFn Pride))))
...
(∃x,y) (and (father x y) (gender x Female) (sees x y) (walking
April 13, 2006
7
ALICE • • • • • • • • • •
JUDGE11[9:26:0]I WILL ASK YOU SOME QUESTIONS PROGRAM[9:26:0]Do you think your plan will succeed? JUDGE11[9:26:16]WHAT IS THE COLOR OF A BLUE CAR PROGRAM[9:26:16]WHAT COLOR IS a blue car JUDGE11[9:26:35]WHERE IS SUE’S NOSE WHEN SUE IS IN HER HOUSE PROGRAM[9:26:35]Where it belongs. Try searching the World Wide Web. JUDGE11[9:26:49]WHAT HAPPENS TO AN ICE CUBE IN A HOT DRINK PROGRAM[9:26:49]Try it. JUDGE11[9:27:25]HOW IS THE FATHER OF ANDY S MOTHER RELATED TO ANDY PROGRAM[9:27:25]Fine as far as I know.
Natural Language Understanding requires having lots of knowledge 1.The pen is in the box. The box is in the pen. 2. The police watched the demonstrators… …because they feared violence. …because they advocated violence.
System: Ready for input. User: what do you know about anthrax? System: By “anthrax” do you mean (1) the heavy metal band or (2) bacteria? User: 2 System: Here is what I know about it: Anthrax is a species of organism. All anthrax is a bacterium. All anthrax can lethally infect mammals. Anthrax is extremely toxic for people to inhale. Iraq has biological weapons that deliver anthrax. Anthrax is opaque, tangible, microscopic, . Anthrax can be dried into spores stored indefinitely April 13, 2006
10
System: Ready for input. User: what do you know about TR1? System: I know nothing about it. What sort of thing is TR1? User: bacteria System: TR1 lethally infects_________? User: People System: By “People”, I assume you mean human beings, not People Magazine. System: Does it infect some broader category of animals (e.g., air-breathing vertebrates, chordates, placental mammals, primates, eucaryotes, etc.)? User: mammals April 13, 2006
11
System: Is TR1 similar to anthrax? User: yes System: Which of the following are true of TR1: . . . User: Al Qaida possesses some TR1. System: Ready for input. User: Does Osama Bin Laden control anything that can be the basis of weapons of mass destruction? System: Yes, some TR1. User: please explain System: Osama Bin Laden controls Al Qaida. Al Qaida posseses some TR1. TR1 is a bacteria that lethally infects people. April 13, 2006
12
April 13, 2006
13
April 13, 2006
14
April 13, 2006
15
April 13, 2006
16
April 13, 2006
17
April 13, 2006
18
April 13, 2006
19
April 13, 2006
20
April 13, 2006
21
Efficiency vs. Expressiveness
Efficiency
C++
Continuing improvements in inference performance won’t negatively effect expressiveness. Use two cooperating languages (EL and HL) to escape the limitations of an age-old tradeoff.
PASCAL
HL (heuristic level language) EL (epistemological level language)
LISP First-order logic
nth-order logic English, German
Expressiveness April 13, 2006
22
NOW: CyN in Doom3 (2005)
April 13, 2006
23
BURC: Bootstrapping Using ResearchCyc • Goal: To extend Cyc’s knowledge base using “relationships implied to be possible, normal or commonplace in the world” • Prior work with Cyc knowledge entry has been manually oriented • How will we collect common sense without a body and manual labor…? • Read, Parse, Mine! • Proposal: Read text, Parse into a database, Extract relations between words, Propose hypothetical relations between concepts April 13, 2006
24
BURC: Basic Analogy • • • • •
The Shotgun approach to the Human Genome Extract millions of fragments Knit them back together by finding commonalities Will it work for the Human Memome? James Burke: ‘Mr. Connections’
Lenat’s Bootstrap Hypothesis: once Cyc reaches a certain level/scale it can help in its own development and start using NLP to augment its knowledge base April 13, 2006
25
Mining Adjective Knowledge Example • “white blouse” as factoid fragment
• Hypothesis: (plausibleValueOfType Blouse mainColorOfObject WhiteColor)
April 13, 2006
26
Flow of Processing BNC Data
Parser 1
Parser 2
Parser 3
Parser 4
Parser 5
Frag File
Frag File
Frag File
Frag File
Frag File
Merged Frag File Upper Ontology Core Theories
Extractor / DB Manager
Domain-Specific Theories Facts (Database)
Cyc/Rcyc April 13, 2006
Hypothesis File
Link Fragments DB
27
(Very) Brief History of Cyc • c. 1967 – AI is used on toy problems. • c. 1977 – Expert systems reason in narrow domains. • c. 1983 – Lenat, Minsky, Feigenbaum, Kay, and others recognize need for a substrate of shared world knowledge; and realize it would take hundreds of person-years to “prime the pump”. • 1984 – Admiral Bob Inman convinces Lenat to leave Stanford and pursue this high-risk, high-payoff project (Cyc) within MCC. • 1994 – Cycorp is formed.
April 13, 2006
28
April 13, 2006
29
“The driver of the power of intelligent systems is the knowledge the systems have about their universe of discourse, not the sophistication of the reasoning process the systems employ. Cyc has not only the world’s largest knowledge base, but the best represented from a technical point of view.”
April 13, 2006
Ed Feigenbaum inventor of the first expert system 30 editor of the AI Handbook
“People have silly reasons why computers don’t really think. The answer is we haven’t programmed them right; they just don’t have much common sense. There’s been only one large project to do something about that, that’s the famous Cyc project…”. April 13, 2006
-- Marvin Minsky 31
How has Cycorp done? • • • • • •
20 years 3 million facts and rules (hand-entered) Compelling demos Some applications (constrained by business model) The basis for much greater growth “If the right way to build an A.I. involves giving Cyc away for free, that is what we will do.” – Doug Lenat (repeatedly) – Note: Jury is out on what the “right way” is
April 13, 2006
32
Cycorp: True to its Promise • OpenCyc – The entire Cyc structural ontology: FREE – 300,000 concept terms, ~2M facts and rules
• ResearchCyc – – – –
Equal to Full Cyc (w/ Research-only license) Source code for inference engine not released And it doesn’t API with 18,000 functions and macros! really matter. Ability to compile in your own additions
• Q: Will more be released? A: It depends. – Cycorp must financially support its own R&D. – Existing releases must result in major project benefits. April 13, 2006
33
Time for the Next Phase • Cycorp has gotten us to where we are – Representational ability – Inference ability – …and will continue (R&D leader, commercialization
• The rest of the world will help get us where we are going – Breadth of content – Broad real-world diffusion The thinking that got us to where we are today is insufficient to solve the problems that exist today. To solve today's problems requires a new level of thinking. April 13, 2006 34 -- Einstein
rate of learning
2004 2006
1984
Building Cyc qua Engineering Task g via learnin e anguag l l a r u t na
ng rni
is yd
c
ery v o
b
lea
codify & enter each piece of knowledge, by hand
CYC amount known
n-years o s r e p 750 rs ime yea 21 realt April 13, 2006 lion $75 mil
Fro
nti
er
of h
um
an
kn
ow led
ge
35
Building Cyc qua Engineering Task
rate of learning
2004 2006
1984
10 years
codify & enter each piece of knowledge, by hand
CYC
1000 years
amount known n-years o s r e p 750 rs ime yea 21 realt April 13, 2006 lion $75 mil
36
How will we get the knowledge?
Games That Matter! April 13, 2006
37
Foundation as Continuation • Are we trying to make an A.I.? – No.
• Are we trying to make computers behave much more intelligently? – Yes!
April 13, 2006
38
Mission (DRAFT) The Cyc Foundation has been formed as an independent not-for-profit organization to hasten the arrival of intelligent tools that will help humanity.
April 13, 2006
39
Assumptions • (Currently) 9 ideas that shape strategy, objectives and policy • These may need to be validated, modified or augmented • In some cases, assumptions are followed by related policy April 13, 2006
40
Assumption #1 Long before computers are as smart as people, they will be (in some cases already have been) put to use to cure disease, address hunger problems, make important new scientific discoveries and help people work together. Smarter computers will do a better job of this.
April 13, 2006
41
Assumption #2 Cycorp has developed and cared for what we believe is an important piece of the AI puzzle. They have always wanted to release it to the public, but it had to be when people could realistically develop it further on their own without in some way endangering the project. One fear was “forking”, or creating incompatible variants of the knowledge base. Cycorp and The Foundation will cooperate on 1 KB. April 13, 2006
42
Flow of Cyc Data Cyc Foundation Cycorp RCyc User
Gamer / Wikipedia user April 13, 2006
Team: - Subject-matter expert - Ontologist 43
Assumption #3 The knowledge that will give computers human-like intelligence ultimately needs to be free. That's our best hope of having it put to best use. Portions of knowledge will always be held proprietary. The more shared a piece of knowledge, the greater will be the force pulling all of its representations toward freedom (to avoid the burden of maintaining a non-standard representation). April 13, 2006 44
Assumption #4 Proposed Semantic Web standards (such as those related to OWL) are an important step in the right direction, because they provide a foundation for working with meaning on the Web. The Cyc ontology will be a valuable addition, because it can act as a semantic hub, allowing us to have shared meaning. There is some concern that a top-down central ontology will dictate use of terms that may not meet a project’s needs. We will be able to show that use of the Cyc ontology can satisfy both needs and will be a useful complement to the great work that has already been done toward the Semantic Web. April 13, 2006
45
Assumption #5 We all have something to learn. We all have something to teach. The Foundation mission will benefit from a very broad base of support, rather than the traditional rule by the technical elite.
April 13, 2006
46
Assumption #6 For this effort, focused work by many will be more valuable than genius work by a few.
To be most helpful, people should work together, and on tasks where they are capable of contributing successfully. (Example: don’t go off and try to “solve the A.I. problem” by yourself.) April 13, 2006
47
Assumption #7 Regular humans can be turned off by overly technical talk that is out of place – and rightly so.
We need to be inclusive in our language and in our activities in order to ensure the broadest base of support and participation. This is especially true in the Cyclify initiative. April 13, 2006
48
Assumption #8 There is no “us” and “them” • The Foundation is managed by its volunteer board and run by its volunteer members • The Foundation will start with no employees • The will be no BDFL – Benevolent Dictator for Life April 13, 2006
49
Assumption #9 Fun is mandatory! • By comparison, contributing to SETI is like cleaning your oven while you sleep. • This work will be hands-on, compelling and (hopefully) addictive. • If you’re not having fun, find out why and fix it. April 13, 2006
50
y f i cl
Foundation Goals
C•y Convert human knowledge to a form that computers can reason with
– Grow the Cyc Ontology and KB Exponentially
• Establish a standard vocabulary and language for representing concepts & knowledge • Support the creation of intelligent tools • Promote free and efficient knowledge transfer April 13, 2006
51
Cyclify Knowledge Collection Activities • Web Games – Validate acquired knowledge – Multiple-choice fact entry – More?
• Wikipedia Linking • KR Dating Service – Wiki-based knowledge entry – A SME paired with an ontologist
• WordNet Linking April 13, 2006
52
Playflow Within Cyclify RCyc User Wikipedia Data
RCyc
Cycorp K. Acquisiton Data
Game Server
Wiki Knowledge Server
Team: - Subject-matter expert - Ontologist April 13, 2006
Wikipedia user
RCyc
Gamer
53
Status:
I’m thinking of a sentence…
I have 2 answers True
False Don’t Know
Doesn’t make sense
Fibromyalgia is caused by ticks.
Because I read about it on the web. Score: April 13, 2006 24
54
Status:
Submitting...
I think this sentence is probably not right
Thank you! Answers: 2 You agreed with: 100% I now have a better understanding of: Fibromyalgia is caused by ticks. Next
Score: April 13, 2006 26
Score: +2
55
Current Architecture
computer (inside)
computer (outside)
Cyc Image
GAFs web gathered hypothesized asserted …
Applet Applet Forward rules
Applet
Question Server (java)
KAGs PostGRES database
SubL form, running KAG-collecting query
Populator (java)
XML file April 13, 2006
scp
XML file
DMZ Boundary
56
Cyc Foundation Projects • • • • • • •
Nonprofit Formation (planning/budgeting/filing) Foundation Website Cyclify Fundraising Membership management Events ResearchCyc – Recommend Cyc features / functions / design – Help with ResearchCyc testing, documentation
April 13, 2006
57
Budgeting • Must develop budget related to Year 1 plan • Possible areas of spending – – – – –
Legal filings Server hosting W3C membership Conference attendance Fundraising
April 13, 2006
58
Foundation Website • Requirements – – – –
Content management features Collaboration features Out-of-the-box ease of use Free
• Currently evaluating Joomla (Mambo) • Desired launch: May 15 April 13, 2006
59
Cyclify Projects • First Web Game – Develop game – Viral marketing – Add wiki linking activity
• Wiki Knowledge Collection – Set up wikip.cyclify.org – Add frame for ontologizing – Feed wikip links to Web game
• Back End – Design and implement PlayFlow – Submit collected knowledge to Cycorp April 13, 2006
60
Fundraising • Individual Memberships – Free membership for first 6 months for Cyclify members and ResearchCyc users? – How much? – What do you get?
• Corporate Donations – Need to prepare story – Seems feasible to get donations April 13, 2006
61
What does nonprofit mean? • Cannot have investors or disburse earnings • Can have earnings, though • Revenues must come from services that are within mission • 501(c)(3)? (like Wikimedia Foundation) • Or 501(c)(6)? (like Eclipse Foundation)
April 13, 2006
62
The Foundation Board of Directors Name
Position
Role
John De Oliveira
Founder and President
Strategy, Corp. Fundraising
Mark Baltzegar
Co-Founder and Vice President
Strategy, Game Devel., IT
OPEN
Secretary, Treasurer
Secretary, Treasurer
David James
Board Member
Organizational Dynamics
OPEN
Board Member
Standards
OPEN
Board Member
Events, Operations Delegator
OPEN
Board Member
Architecture, Playflow Design
TBD Sept. 2006
Board Member
Oversight
April 13, 2006
63
The Foundation: Membership Project Leader, Cyclify
Stu Baurman
Keith Wright
Project Leader, ResearchCyc
Kino Coursey
Pierluigi Miraglia
High Scorer (current month)
Douglas Miles
Gavin Matthews
High Scorer (all time)
Arturo Hernandez
Joe Simone
David Whitten
Guyren Howe
~100 ResearchCyc Users
Brad Bouldin
John Cabral
YOU!
Larry Lefkowitz
Ben Rode
Bill Jarrold
Jason Azbahr
April 13, 2006
64
ResearchCyc Users Government-related Government
Language Computer Corporation
Air Force Rome Labs
21 Century Technologies st
Houston VA Medical Center
April 13, 2006
Austin Info Systems
Lockheed Martin ATLD
MIT Media Lab
Stanford NLP Dept.
U of Pennsylvania
Rensselaer AI and Reasoning Lab
Radboud U (Netherlands)
Stone’s Throw Technologies
U of Illinois Urbana-Champaign
Northwestern U
U of Toronto
ISI
Knowledge Media Institute, Open University U of Stuttgart U of Minnesota Witan International
Linkoping U (Sweden)
NTT Communications Science Laboratories (Japan) Fraunhofer Institute Sapio Systems (Denmark) Terra Incognita
Trimtab Consulting TNO-DMV (Netherlands) Microfabrica, Inc.
New Mexico Highlands Univ. Harvard U
Commercial
ANSER, Inc.
SRI
Daxtron Labs
University U of Maryland
LBJ School of Public Affairs
Xerox PARC
U of Hawaii
Tokyo Inst. of Technology
Institute for the Study Of Accelerating Change
NPOs 65
How can I help? • Humans (a.k.a. common sense experts) • Programmers – Web programmers – Cyc programmers
• Ontologists • Subject-matter experts • Bloggers April 13, 2006
66
Human Cyclists* • • • • • • • • • •
Play the Web Game Come up with new game ideas Link Wikipedia to Cyc Learn more about Cyc Befriend an ontologist Tell a friend about Cyclify Write to a blog about Cyclify Help with viral marketing Design a logo T-Shirts: Buy one, or Create and sell them
*April From 13,now 2006on, we’re all “Cyclists” – people who interact with Cyc in one way or another. 67
Programmers • • • • • • • •
Help design and build a web services interface Learn the architecture of Web Game #1 Design an add-on for the Web game Learn how to use the question server Propose a new game Help develop/support technical infrastructure Help organize documentation Help write the Cyc books – to be published by O'Reilly
April 13, 2006
68
Ontologists • Identify gaps in the knowledge base • Befriend a Subject Matter Expert – Work together on a domain
• Befriend a Human Cyclist – Teach one who wants to learn basic ontology skills
• Help organize documentation • Help write the Cyc books April 13, 2006
69
Bloggers • Blog about Cyclify • Link to each other’s blogs
April 13, 2006
70
Timeline (Milestones) • May 15 – Launch Foundation Website • Build membership up until July 15 • June 15 – File Articles of Formation w/ Sec. Of State – First Web game in beta
• July 15 – Launch Game • October – First OpenCyc build containing game data April 13, 2006
71