New Approach to Rapid Construction of Knowledge-Based Tutoring System Vu Le1,2, Gheorghe Tecuci1, Mihai Boicu1 1
MSN 6B3, Learning Agents Center, Volgenau School of Information Technology and Engineering, George Mason University, 4400 University Dr., Fairfax, VA 22030, USA email:
[email protected],
[email protected],
[email protected], http://lac.gmu.edu 2 Advanced Information Technology, BAE Systems, 3811 N. Fairfax, Arlington, VA 22203
Abstract: This paper introduces a new approach to rapid construction of knowledge-based tutoring system. The new approach provides methods to rapidly acquire the basic abstract problem solving strategies of the application domain, directly from a subject matter expert. It allows an instructional designer to rapidly design lessons for teaching these abstract problem solving strategies, without the need of defining examples because they are automatically generated by the system from the domain knowledge base. The paper also focuses on the machine learning based approach in teaching and generating test questions. This novel approach is implemented in Disciple framework and successfully used and evaluated in several courses at US Army War College and George Mason University. The paper presents this approach as a way to process and tutor the “ill-defined” domains which have been elusive in many intelligent tutoring systems.
Introduction This paper presents a new approach to rapid building knowledge-based tutoring systems. This type of tutoring system can teach students the expert knowledge that is stored and used in knowledge based systems. There are several such tutoring systems available such as Demonstr8 (Blessing, 1997), CTAT (Koedinger et al., 2003), Diag (Eugenio et al., 2005), and Simulated Student (Matsuda et al., 2005). These systems are usually difficult to build (Murray, 1999) and time-consuming (Anderson, 1992). It takes a group of software developers, knowledge engineers, subject matter experts, and instructional designers more than 300 hours to produce an hour of instructional material (Ainsworth & Fleming, 2004). Researchers in this area have been working on the authoring systems that can simplify the task of building the tutoring systems. The resulting systems, however, tend to have a common problem: if the tutoring systems are easy to build, they are coupled with limited domain knowledge; and if they can handle deeper knowledge, they are hard to build. For the tutoring systems that can teach complex knowledge, their domains are limited to “well-defined” ones (which allow for a clear distinction between good and bad answers or solutions). The approach presented in this paper provides a compromise to the problem, i.e., it is fast to build and can teach complex problem solving knowledge in “ill-defined” domains such as design, law, medical diagnosis, history, intelligence analysis, and military planning (Lynch et al., 2006). There are two main components of a knowledge-based tutoring system. They are: the knowledge-based system - or expert system - and the tutoring system. The expert system provides necessary expert knowledge for a particular domain and the tutoring system uses the knowledge to construct tutorial material. This paper presents the methodology to rapidly transform the domain knowledge into pedagogical knowledge, and it is also important to note that this approach allows the tutoring system to teach students the way the expert system reasons in problem solving. This new approach is implemented in the Disciple system and has been used in teaching courses at the US Army War College and George Mason University (Tecuci et al., 2007). This paper is organized in several sections. The sections are orderly summarized as follows: the next section introduces the problem solving paradigm used in this approach, which is pedagogically tuned, as well as the intelligence analysis domain. This section is followed by our approach to the abstraction of the reasoning trees This material is based on research partially sponsored by the Air Force Office of Scientific Research (FA9550-07-1-0268) and the Air Force Research Laboratory (FA8750-04-1-0257). The US Government is authorized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright notation thereon. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official polices or endorsements, either expressed or implied, of the Air Force Office of Scientific Research, the Air Force Research Laboratory or the U.S. Government .
which facilitates the identification and tutoring of the main problem solving strategies of a domain. Another section presents the lesson design and generation process. This section is followed by the layout of learning and generation of test questions using the machine learning techniques. At the end are some experimental results, summary and future research directions.
Problem Solving Paradigm The new approach assumes the expert system using the problem reduction/solution synthesis paradigm to solve problems. This paradigm is also well known as “divide and conquer” or “problem decomposition” (Durham, 2000). In this paradigm, which is illustrated in Figure 1, a complex problem P1 is successively reduced to simpler problems P11, …, P1n via the reduction operators ROi. The reduction continues until elementary problems P111,…, P11m are reached for which there are known solutions S111, …, S11m respectively. Then the synthesis process begins to synthesize all the solutions successively from the simplest problems upwards via the synthesis operators SOj, until a solution S1 is found for the original problem P1. The process of solving problems creates two reasoning trees: a reduction tree and a synthesis tree, presented as blue and green trees respectively in Figure 1. These two trees are symmetric in terms of nodes and their links (edges), but the directions of the edges of the two trees are opposite. The edge direction in a reduction tree is top-down, whereas that of a synthesis tree is bottom-up. The direction of the reduction tree indicates how a problem is reduced to simpler problems and that of the synthesis tree shows how the solutions are combined to become the solution of the upper-level problem.
Figure 1: Problem reduction/solution synthesis paradigm To illustrate the problem solving paradigm, Figure 2 presents a fragment of one of the reasoning trees in Intelligence Analysis domain. The trees solves problems such as “Assess whether Iran is pursuing nuclear power for peaceful purposes” or “Assess whether Al Qaeda has nuclear weapons”, based on partial, uncertain, and even false information from open-source pieces of evidence (such as, newspaper articles, web sites, news agency reports, books, etc.). The trees presented in Figure 2 are augmented with pairs of questions and answers for explicit explanations of how the problems decomposed and solutions synthesized. I need to: Assess whether Al Qaeda has nuclear weapons. Q: What factors should I consider to determine whether Al Qaeda has nuclear weapons? A: Characteristics associated with possession of nuclear weapons and current evidence that it has nuclear weapons. Therefore I need to solve two sub-problems: Assess the possibility that Al Qaeda might have nuclear weapons based on the characteristics associated with the possession of nuclear weapons. Assess the current evidence that Al Qaeda has nuclear weapons. Q: What are the characteristics associated with possession of nuclear weapons? A: Reasons, desire, and ability to obtain nuclear weapons. Therefore I need to solve three sub-problems: Assess whether Al Qaeda has reasons to obtain nuclear weapons.
Assess whether Al Qaeda has desire to obtain nuclear weapons. Assess whether Al Qaeda has the ability to obtain nuclear weapons. In this way, the initial problem is successively reduced to simpler and simpler problems which are shown with a blue background in Figure 2. Then the solutions of the simplest problems are found, and these solutions (which are shown with a green background) are successively composed, from bottom up, until the solution of the initial problem is obtained (i.e. “It is likely that Al Qaeda has nuclear weapons.”). The intelligence analysts, who have used our system, evaluated this type of reasoning as being very appropriate for teaching new analysts because it is very explicit and natural. However, the reasoning trees generated for real-world problems are very large. For example, Figure 2 shows only the top 23 nodes of a tree that has 1,758 nodes. The question is how to systematically teach new analysts based on such complex trees. Our solution is described in the next sections.
Figure 2: Hypothesis analysis through problem reduction and solution synthesis
Abstraction of Reasoning Tree Although the reasoning trees generated during the problem solving period are very large, their parts are repeated applications of a few abstract reasoning strategies. This is illustrated in Figure 3 where the blue bordered sub-trees from the left-hand side are the problem solving strategies, i.e., concrete applications of the abstract reduction strategy shown with a red border in the right-hand side. Indeed, each of the blue sub-trees is an application of the following abstract strategy: In order to assess to what extent a certain piece of evidence (e.g. EVD-Reuters-01-01c, a fragment from a Reuters News Agency report) favors a certain hypothesis (e.g. “Al Qaeda desires to obtain nuclear weapons.”), one has to solve two sub-problems: 1) Assess to what extent that piece of evidence favors that hypothesis, assuming that the piece of evidence is believable, and 2) Assess the believability of that piece of evidence. There are other abstract strategies for analyzing the believability of direct testimonial evidence, or the believability of testimonial evidence obtained at second hand. etc. (Schum, 2001). The abstraction of the reasoning trees results in a set of abstraction rules that dictate how reasoning trees are to be abstracted or what set of problem solving strategies are concrete applications of an abstract reasoning strategy. The abstraction rules are important in generating examples for tutoring lesson as will be discussed in later sections. In all, there are only 22 abstract reduction strategies and 22 abstract synthesis strategies that are repeatedly applied to generate the large reasoning trees for solving the problem Assess whether Al Qaeda has nuclear weapons . As a consequence, there are only 217 abstract nodes in the abstract reasoning tree that correspond to 1758 nodes in the concrete tree.
Concrete Tree
Assess to what extent the piece of evidence EVD-Reuters-01-01c favors the hypothesis that Al Qaeda desires to obtain nuclear weapons.
Assess to what extent the piece of evidence favors the hypothesis.
Q: What factors determine how a piece of evidence favors a hypothesis? A: Its relevance and believability.
Consider the relevance and the believability of the piece of evidence
Assess to what extent the piece of evidence EVD-Reuters-01-01c favors the hypothesis that Al Qaeda desires to obtain nuclear weapons, assuming that EVDReuters-01-01c is believable.
Assess the believability of EVDReuters-0101c
Assess to what extent the piece of evidence favors the hypothesis, assuming that the piece of evidence is believable
Abstract Tree
Assess the believability of the piece of evidence
Figure 3: A reasoning tree and its abstract tree In conclusion, the abstraction process helps identify the problem solving strategies based on which complex reasoning trees are generated. Therefore, an approach to teaching new professionals how to solve problems is to teach them the abstract reasoning strategies of their domain, illustrating them with concrete examples, as will be discussed in the next section.
Abstraction-Based Lesson Given available abstractions of reasoning trees in the knowledge base, the lessons can be designed to tutor the abstract reasoning strategies. One or several related strategies are the foundation of a lesson. Each strategy, once included in the design, provides the content to tutor how to reduce a problem to simpler problems or how to synthesize several solutions to obtain the solution of a more complex problem. The instructor can however modify the content and add additional explanations or definitions. Depending on the complexity of a lesson, each lesson can contain several sections, each of which is associated with an abstract strategy. The selection of the abstract strategies and their ties to the lesson sections are facilitated by the drag-and-drop capability. An example of lesson sections can be seen in Figure 4. The lesson teaches how to break a problem “Assess the believability of the reporter of the piece of evidence” into two sub-problems, and focuses on the second sub-problem “Assess the credibility of the reporter of the piece of evidence”. The instructor can specify the order of how lesson sections are to be presented during the tutoring process. The ordering capability allows the instructor to apply several tutoring strategies to the lessons such as breadth-first, depth-first or any way that the instructor deems fit to the student knowledge or to his/her own taste.
Figure 4: Lesson designed with two abstract strategies
Once the design is done, a lesson script is generated automatically based on the configuration of the lesson sections and their content. The script is used in generation of lessons for tutoring systems. The lesson script is based on Abstraction-Based Lesson Emulation (ABLE) scripting language. ABLE allows the instructor to design and build the abstraction-based lessons in a very flexible way (Le, 2008). As a consequence, the instructor does not have to write a single line of ABLE and that significantly speeds up the process of designing lessons. In addition, the design process is also simplified in the sense that the instructor does not have to define examples to illustrate the lesson. The examples are generated automatically due to the associations between lesson sections and abstract strategies. The abstract strategy is in fact the abstraction of several concrete applications in the reasoning tree, each of which is presented as an example for lesson illustration. That helps drastically reduce the design time and instructor effort. Note that the generation of lesson examples provides a high level of generality where the lessons can be automatically customized based on student knowledge or interests (Le et al., 2008). Indeed, the student may select a different domain knowledge base such as drug trafficking or crime scene investigation, and the system will generate examples correspondingly. One can use the lesson script to generate sets of lessons for tutoring purposes. The generated lessons teach the abstract strategies to solve the problem and illustrate the concept by also providing automatically generated examples. Figure 5 displays part of a lesson to teach how to assess a piece of evidence that favors a hypothesis. The top part teaches the analyst how to solve this problem at the abstract level and the bottom part illustrates this abstract reasoning with examples generated from the domain knowledge base. The abstract reasoning teaches four basic abstract strategies but only the top strategy and one of the three bottom strategies are visible in Figure 5. Each of the three bottom strategies shows an alternative way of assessing the extent to which the information provided by a piece of evidence is believable, depending on the type of evidence (i.e. direct testimonial evidence, testimonial evidence obtained at second hand, or testimonial evidence about tangible evidence). The tutor fades out the two strategies which are not illustrated by the example from the bottom part of Figure 5 (see the right hand side of Figure 5). A student can request additional examples that illustrate the other strategies, or could directly select them from a list. He or she could also click on the blue hyperlinks to receive brief or detailed definitions or even entire presentations on important concepts such as believability or objectivity. Some lessons may display these descriptions automatically, as part of the lesson’s flow, which may also use spoken text.
Figure 5: Part of a lesson Figure 5 illustrates only the first part of the lesson which teaches the reduction strategies for assessing the support provided by a piece of evidence for a hypothesis. The second part of the lesson teaches the corresponding synthesis strategies. The structures of the synthesis lessons are similar to their counterpart’s. Once the instructor
finishes the lesson design and generation phase, the test question design phase begins and is explained in next section.
Machine-Learning Based Test Generation The new approach also alleviates the problem of creating the test questions to test the students knowledge. The instructional designer teaches the system one test question example, and the system learns from it to generate a class of similar test questions. The principle behind this type of learning is generalization. The designer selects a reduction step from a reasoning tree. This step, represented by a problem and its sub-problems, is an example E of a previously learned problem reduction rule R from the domain knowledge base. Then the instructional designer changes the reduction step E, either by deleting some sub-problems, or by modifying them, or by adding additional deliberately incorrect sub-problems, to create an omission, modification, or construction test example, respectively. The instructional designer also provides a specific hint for the test example, as well as specific feedback for each of the possible student answers (i.e. correct, incomplete, and incorrect). The result is a specific test example TE which is an extension and modification of the reduction example E. By performing corresponding extensions and modifications of the general rule R (which generated E), the agent learns the general test question rule TR. TR can generate a class of test questions similar to TE, based on the current domain knowledge base (Le et al., 2008). To test the knowledge level, the designer drops one or several sub-problems in a reasoning step to produce a deliberately wrong reasoning step. This strategy creates the omission test. The reasoning step therefore becomes incomplete and that would alert the student who learned it by heart and encounters it during the testing period. This type of test question examines the student’s knowledge level – based on six levels of cognition of Bloom’s Taxonomy (Bloom, 1956). Another type of test question that can examine the student’s comprehension level by modifying the content of one of the sub-problems of a reasoning step is called the modification test. This test requires the students to have deeper knowledge about the subject compared to knowledge level tests. Another type of test question that is more challenging than the above two is the construction test. The designer defines several sub-problems which may be unrelated or incorrectly related to the correct sub-problems of a problem. The test question will present a problem and a list of potential sub-problems, including the correct and the incorrect ones. The student must select the correct sub-problems. This type of test requires the student to analyze the sub-problems to build a correct reasoning step (Le, 2008). An example of a generated test question is shown in Figure 6 where a red-bordered problem is reduced to two red-bordered sub-problems (see bottom of Figure 6), in the context of a larger reasoning tree. The student is asked whether this reasoning is complete (i.e. includes all the necessary sub-problems of the reduced problem), or incomplete (misses some sub-problems but the present ones are correct), or incorrect (includes incorrect subproblems), by clicking on the corresponding answer in the upper right-hand of Figure 6. The student will receive appropriate feedback either confirming the answer or explaining the mistake. He or she may also request a hint and, in the case of self-testing mode (as opposed to assessment), may review the corresponding lesson by clicking on the “Go To Lesson” button.
Figure 6: A generated omission test question
Experimentation We have built a tutoring system based on the principles of this new approach in the Disciple environment (Tecuci, 1998). The tutoring system has been used and evaluated in several courses at the US Army War College and at George Mason University. The Army War College students were high ranking military officers that were either experienced intelligence analysts or users of intelligence. In contrast, George Mason students were computer science graduate students with no significant intelligence analysis experience. In both cases, the students followed the lessons defined in the system and then were assessed based on the test questions generated by the system. As expected, the system was perceived as more useful by the novice analysts. However, even the expert analysts from the Army War College considered that the system was useful in teaching them a rigorous systematic approach for the “ill-defined” domain of intelligence analysis. Both before and after taking the lessons, the students from George Mason University (i.e. the novice analysts) were asked to subjectively assess their knowledge of several topics taught, on a 6-point scale, from none to very high. The results showed that the students considered that their postlessons knowledge of the application domain was much better than their pre-lessons knowledge. Moreover, their self-assessed post-lesson knowledge was confirmed by the good results obtained by the students taking the tests generated by the system at the end of the classes.
Conclusion In this paper we have presented an overview of our research on a new approach to the quick development of systems for tutoring expert problem solving knowledge in an “ill-defined” domain. This approach is based on methods from the areas of expert systems, machine learning, and intelligent tutoring systems, which we have researched and developed and evaluated over the years. This approach allows rapid development of a new type of tutoring system that can rapidly transform the domain knowledge into pedagogical knowledge, and automates most of the tasks of constructing a tutoring system: from designing lessons, defining lesson examples, to generating test questions. This method is also applicable to a wide range of ill-defined domains, due to its use of the general problem-reduction/solution-synthesis approach to problem solving. The new technique has the capability to rapidly acquire the basic abstract problem solving strategies of the application domain. It allows an instructional designer to rapidly design lessons for teaching the abstract problem solving strategies, without the need of defining examples because they are automatically generated by the system from the domain knowledge base. It also allows rapid learning and generation of test questions. These capabilities confer a high degree of generality to the tutoring system that can be applied to several related application domains (such as nuclear proliferation, drug trafficking, crime investigation, or law) with no change to the lessons. It also allows the students themselves to customize the lessons, by selecting not only the lesson’s examples that illustrate the taught problem solving strategies, but even the domain knowledge base, to better fit their interests and knowledge. Using this approach we have developed a tutoring system for intelligence analysts. This tutoring system has been successfully used in several courses with both expert analysts and novice analysts.
The developed tutoring-related methods and their current implementation in the Disciple environment have several limitations which point to future research directions. For example, the lesson design module requires that a lesson should first introduce an abstract strategy and then illustrate it with examples. It is easy to extend this module to allow an instructional designer to define other types of lesson organizations, such as introducing first examples and then their abstraction. The types of test questions currently learned and generated by the system are not very complex and diverse, each test question being based on a single problem solving rule from the domain knowledge base. It would not be very difficult to learn more complex test questions that are based on several related reasoning rules. It is also necessary to imagine new and more challenging types of test questions. The current student model is quite limited and more research is needed both to develop a more complex model, and to more effectively use it in tutoring. In addition, more work is needed to significantly improve the interaction with the student.
References Ainsworth, S.E., & Fleming, P.F. (2004). Teachers as instructional designers: Does involving a classroom teacher in the design of computer-based learning environments improve their effectiveness? Proceedings of the first joint meeting of the EARLI SIGs Instructional Design and Learning and Instruction with Computers, pp 283—291. Anderson, J.R. (1992). Intelligent Tutoring and High School Mathematics. The second International Conference on Intelligent Tutoring System, (Berlin, Germany), Spring–Verlag. Blessing, S.B. (1997). A programming by demonstration authoring tool for model tracing tutors. Artificial Intelligence in Education, 233-261. Bloom, B.S. (1956). Taxonomy of Educational Objectives, Handbook I: The Cognitive Domain. David McKay Co Inc, New York. Durham, S. (2000). Product-Centered Approach to Information Fusion, AFOSR Forum on Information Fusion, Arlington, VA. Eugenio, B., Fossati, D., Yu, D., Haller, S., & Glass, M. (2005). Natural language generation for intelligent tutoring systems: a case study in AIED 2005. Koedinger, K. R., Aleven, V., & Heffernan, N. T. (2003). Toward a Rapid Development Environment for Cognitive Tutors. In U. Hoppe, F. Verdejo, & J. Kay (Eds.), Proceedings of the 11th International Conference on Artificial Intelligence in Education, AI-ED 2003 (pp. 455-457). Amsterdam: IOS Press. Le, V. (2008). Abstraction of Reasoning for Problem Solving and Tutoring Assistants. Ph.D. Dissertation in Information Technology. Learning Agents Center, Volgenau School of IT&E, George Mason University. Le, V., Tecuci, G. & Boicu, M. (2008). Agent Shell for the Development of Tutoring Systems for Expert Problem Solving Knowledge. Proceedings of the 9th International Conference on Intelligence Tutoring Systems, Montreal, Canada. Lynch, C., Ashley, K., Aleven, V., & Pinkwart, N. (2006). Defining Ill-Defined Domains; A literature survey. Proceedings of the Workshop on Intelligent Tutoring Systems for Ill-Defined Domains at the 8th Int. Conference on Intelligent Tutoring Systems. pp.1--10. Jhongli (Taiwan). Matsuda, N., Cohen, W. W., & Koedinger, K. R. (2005). Building Cognitive Tutors with Programming by Demonstration. International Conference on Inductive Logic Programming, 41-46. Murray, T. (1999). Authoring Intelligent Tutoring Systems: An Analysis of the State of The Art. International Journal of Artificial Intelligence in Education, 98-129. Schum, D.A. (2001). The Evidential Foundations of Probabilistic Reasoning, Northwestern University Press (2001). Tecuci, G. (1998). Building Intelligent Agents: An Apprenticeship Multistrategy Learning Theory, Methodology, Tool and Case Studies. London, England: Academic Press. Tecuci, G., Boicu, M., Marcu, D., Boicu, C., Barbulescu, M., Ayers, C., & Cammons, C. (2007). Cognitive Assistants for Analysts, Journal of Intelligence Community Research and Development.