This document was uploaded by user and they confirmed that they have the permission to share
it. If you are author or own the copyright of this book, please report to us by using this DMCA
report form. Report DMCA
Overview
Download & View 107 Text Reference as PDF for free.
Contributors: Omar Alonso, Shamim Alpha, Steve Buxton, Chung-Ho Chen, Jack Chen, Yun Cheng, Michele Cyran, Paul Dixon, Mohammad Faisal, Elena Huang, Garrett Kaminaga, Ji Sun Kang, Bryn Llewellyn, Wesley Lin, Yasuhiro Matsuda, Gerda Shank, and Steve Yang. The Programs (which include both the software and documentation) contain proprietary information of Oracle Corporation; they are provided under a license agreement containing restrictions on use and disclosure and are also protected by copyright, patent and other intellectual and industrial property laws. Reverse engineering, disassembly or decompilation of the Programs, except to the extent required to obtain interoperability with other independently created software or as specified by law, is prohibited. The information contained in this document is subject to change without notice. If you find any problems in the documentation, please report them to us in writing. Oracle Corporation does not warrant that this document is error-free. Except as may be expressly permitted in your license agreement for these Programs, no part of these Programs may be reproduced or transmitted in any form or by any means, electronic or mechanical, for any purpose, without the express written permission of Oracle Corporation. If the Programs are delivered to the U.S. Government or anyone licensing or using the programs on behalf of the U.S. Government, the following notice is applicable: Restricted Rights Notice Programs delivered subject to the DOD FAR Supplement are "commercial computer software" and use, duplication, and disclosure of the Programs, including documentation, shall be subject to the licensing restrictions set forth in the applicable Oracle license agreement. Otherwise, Programs delivered subject to the Federal Acquisition Regulations are "restricted computer software" and use, duplication, and disclosure of the Programs shall be subject to the restrictions in FAR 52.227-19, Commercial Computer Software - Restricted Rights (June, 1987). Oracle Corporation, 500 Oracle Parkway, Redwood City, CA 94065. The Programs are not intended for use in any nuclear, aviation, mass transit, medical, or other inherently dangerous applications. It shall be the licensee's responsibility to take all appropriate fail-safe, backup, redundancy, and other measures to ensure the safe use of such applications if the Programs are used for such purposes, and Oracle Corporation disclaims liability for any damages caused by such use of the Programs. Oracle is a registered trademark, and ConText, Gist, Oracle Store, Oracle8, Oracle8i, Oracle9i, PL/SQL, SQL*Net and SQL*Plus are trademarks or registered trademarks of Oracle Corporation. Other names may be trademarks of their respective owners.
Contents Send Us Your Comments .................................................................................................................. xv Preface......................................................................................................................................................... xvii What’s New in Oracle Text? ........................................................................................................... xxv 1
SQL Statements and Operators ALTER INDEX .................................................................................................................................... ALTER TABLE: Supported Partitioning Statements
Lexer ............................................................................................................................................. Section Group.............................................................................................................................. Stoplist.......................................................................................................................................... Storage.......................................................................................................................................... Wordlist........................................................................................................................................ System Parameters............................................................................................................................ General System Parameters ...................................................................................................... Default Index Parameters..........................................................................................................
3
2-89 2-90 2-91 2-91 2-91 2-92 2-92 2-93
CONTAINS Query Operators Operator Precedence .......................................................................................................................... Group 1 Operators........................................................................................................................ Group 2 Operators and Characters............................................................................................ Procedural Operators................................................................................................................... Precedence Examples .................................................................................................................. Altering Precedence ..................................................................................................................... ABOUT ................................................................................................................................................. ACCUMulate ( , )............................................................................................................................... AND (&).............................................................................................................................................. Broader Term (BT, BTG, BTP, BTI) ................................................................................................ EQUIValence (=)................................................................................................................................ Fuzzyarrower Term (NT, NTG, NTP, NTI) .......................................................................................... NEAR (;) ............................................................................................................................................. NOT (~)............................................................................................................................................... OR (|) .................................................................................................................................................. Preferred Term (PT) .......................................................................................................................... Related Term (RT) ............................................................................................................................. soundex (!) .......................................................................................................................................... stem ($) ................................................................................................................................................ Stored Query Expression (SQE)..................................................................................................... SYNonym (SYN) ...............................................................................................................................
threshold (>)....................................................................................................................................... Translation Term (TR) ...................................................................................................................... Translation Term Synonym (TRSYN) ........................................................................................... Top Term (TT) .................................................................................................................................... weight (*) ............................................................................................................................................ wildcards (% _) .................................................................................................................................. Right-Truncated Queries ........................................................................................................... Left- and Double-Truncated Queries....................................................................................... Improving Wildcard Query Performance............................................................................... WITHIN ..............................................................................................................................................
4
Special Characters in Queries Grouping Characters .......................................................................................................................... Escape Characters................................................................................................................................ Querying Escape Characters ....................................................................................................... Reserved Words and Characters.......................................................................................................
ctxkbtc Limitations ..................................................................................................................... 12-8 ctxkbtc Constraints on Thesaurus Terms ................................................................................ 12-8 ctxkbtc Constraints on Thesaurus Relations........................................................................... 12-8 Extending the Knowledge Base ................................................................................................ 12-9 Adding a Language-Specific Knowledge Base..................................................................... 12-10 Order of Precedence for Multiple Thesauri .......................................................................... 12-11 Size Limits for Extended Knowledge Base ........................................................................... 12-12
A
Result Tables CTX_QUERY Result Tables............................................................................................................... A-2 EXPLAIN Table............................................................................................................................. A-2 HFEEDBACK Table...................................................................................................................... A-5 CTX_DOC Result Tables ................................................................................................................... A-8 Filter Table ..................................................................................................................................... A-8 Gist Table ....................................................................................................................................... A-8 Highlight Table .......................................................................................................................... A-10 Markup Table ............................................................................................................................. A-10 Theme Table ............................................................................................................................... A-11 Token Table ................................................................................................................................ A-11 CTX_THES Result Tables and Data Types.................................................................................. A-12 EXP_TAB Table Type ................................................................................................................ A-12
B
Supported Document Formats About Document Filtering Technology .......................................................................................... Supported Platforms .................................................................................................................... Environment Variables ................................................................................................................ Requirements for UNIX Platforms............................................................................................. OLE2 Object Support.................................................................................................................... Supported Document Formats ......................................................................................................... Word Processing - Generic .......................................................................................................... Word Processing - DOS ............................................................................................................... Word Processing - International................................................................................................. Word Processing - Windows....................................................................................................... Word Processing - Macintosh ..................................................................................................... Word Processing - Unix ...............................................................................................................
Scoring Algorithm Scoring Algorithm for Word Queries ............................................................................................. F-2 Example.......................................................................................................................................... F-3 DML and Scoring.......................................................................................................................... F-3
Stopword Transformations Understanding Stopword Transformations .................................................................................. Word Transformations................................................................................................................. AND Transformations ................................................................................................................ OR Transformations .................................................................................................................... ACCUMulate Transformations ................................................................................................. MINUS Transformations ............................................................................................................
H-2 H-3 H-3 H-3 H-4 H-5
xiii
NOT Transformations ................................................................................................................. EQUIValence Transformations ................................................................................................. NEAR Transformations ............................................................................................................. Weight Transformations ............................................................................................................ Threshold Transformations ....................................................................................................... WITHIN Transformations .........................................................................................................
I
English Knowledge Base Category Hierarchy Branch 1: science and technology ..................................................................................................... Branch 2: business and economics.................................................................................................... Branch 3: government and military .................................................................................................. Branch 4: social environment........................................................................................................... Branch 5: geography .......................................................................................................................... Branch 6: abstract ideas and concepts ............................................................................................
Index
xiv
H-5 H-6 H-6 H-7 H-7 H-7
I-2 I-8 I-9 I-10 I-14 I-17
Send Us Your Comments Oracle Text Reference, Release 9.2 Part No. A96518-01
Oracle Corporation welcomes your comments and suggestions on the quality and usefulness of this document. Your input is an important part of the information used for revision. ■ ■ ■ ■ ■
Did you find any errors? Is the information clearly presented? Do you need more information? If so, where? Are the examples correct? Do you need more examples? What features did you like most?
If you find any errors or have any other suggestions for improvement, please indicate the document title and part number, and the chapter, section, and page number (if available). You can send comments to us in the following ways: ■ ■ ■
Electronic mail: [email protected] FAX: (650) 506-7227 Attn: Server Technologies Documentation Manager Postal service: Oracle Corporation Server Technologies Documentation 500 Oracle Parkway, Mailstop 4op11 Redwood Shores, CA 94065 USA
If you would like a reply, please give your name, address, telephone number, and (optionally) electronic mail address. If you have problems with the software, please contact your local Oracle Support Services.
xv
xvi
Preface This manual provides reference information for Oracle Text. Use it as a reference for creating Oracle Text indexes, for issuing Oracle Text queries, for presenting documents, and for using the Oracle Text PL/SQL packages. This preface contains these topics: ■
Audience
■
Organization
■
Related Documentation
■
Conventions
■
Documentation Accessibility
xvii
Audience Oracle Text Reference is intended for an Oracle Text application developer or a system administrator responsible for maintaining the Oracle Text system. To use this document, you need experience with the Oracle relational database management system, SQL, SQL*Plus, and PL/SQL. See the documentation provided with your hardware and software for additional information. If you are unfamiliar with the Oracle RDBMS and related tools, read Chapter 1, “An Introduction to the Oracle Server”, in Oracle9i Concepts. The chapter is a comprehensive introduction to the concepts and terminology used throughout Oracle documentation.
Organization This document contains: Chapter 1, "SQL Statements and Operators" This chapter describes the SQL statements and operators you can use with Oracle Text. Chapter 2, "Indexing" This chapter describes the indexing types you can use to create an Oracle Text index. Chapter 3, "CONTAINS Query Operators" This chapter describes the operators you can use in CONTAINS queries. Chapter 4, "Special Characters in Queries" This chapter describes the special characters you can use in CONTAINS queries. Chapter 5, "CTX_ADM Package" This chapter describes the procedures in the CTX_ADM PL/SQL package. Chapter 7, "CTX_DDL Package" This chapter describes the procedures in the CTX_DDL PL/SQL package. Use this package for maintaining your index.
xviii
Chapter 8, "CTX_DOC Package" This chapter describes the procedures in the CTX_DOC PL/SQL package. Use this package for document services such as document presentation. Chapter 9, "CTX_OUTPUT Package" This chapter describes the procedures in the CTX_OUTPUT PL/SQL package. Use this package to manage your index error log files. Chapter 10, "CTX_QUERY Package" This chapter describes the procedures in the CTX_QUERY PL/SQL package. Use this package to manage queries such as to count hits and to generate query explain plan information. Chapter 11, "CTX_REPORT" This chapter describes the procedures in the CTX_REPORT PL/SQL package. Use this package to create various index reports. Chapter 12, "CTX_THES Package" This chapter describes the procedures in the CTX_THES PL/SQL package. Use this package to manage your thesaurus.
Chapter 13, "CTX_ULEXER Package" This chapter describes the data types in the CTX_ULEXER PL/SQL package. Use this package with the user defined lexer. Chapter 14, "Executables" This chapter describes the supplied executables for Oracle Text including ctxload, the thesaurus loading program, and ctxkbtc, the knowledge base compiler. Appendix A, "Result Tables" This appendix describes the result tables for some of the procedures in CTX_DOC, CTX_QUERY, and CTX_THES packages. Appendix B, "Supported Document Formats" This appendix describes the supported document formats that can be filtered with the Inso filter for indexing.
xix
Appendix C, "Loading Examples" This appendix provides some basic examples for populating a text table. Appendix D, "Supplied Stoplists" This appendix describes the supplied stoplist for each supported language. Appendix E, "Alternate Spelling Conventions" This appendix describes the alternate spelling conventions used for German, Danish, and Swedish. Appendix F, "Scoring Algorithm" This appendix describes the scoring algorithm used for word queries. Appendix G, "Views" This appendix describes the Oracle Text views. Appendix H, "Stopword Transformations" This appendix describes stopword transformations. Appendix I, "English Knowledge Base Category Hierarchy" This appendix describes the supplied English Knowledge Base.
Related Documentation For more information, see these Oracle resources: For more information about Oracle Text, see: ■
You can obtain Oracle Text technical information, collateral, code samples, training slides and other material at: http://otn.oracle.com/products/text/
In North America, printed documentation is available for sale in the Oracle Store at http://oraclestore.oracle.com/
Customers in Europe, the Middle East, and Africa (EMEA) can purchase documentation from http://www.oraclebookshop.com/
Other customers can contact their Oracle representative to purchase printed documentation. To download free release notes, installation documentation, white papers, or other collateral, please visit the Oracle Technology Network (OTN). You must register online before using OTN; registration is free and can be done at http://otn.oracle.com/admin/account/membership.html
If you already have a username and password for OTN, then you can go directly to the documentation section of the OTN Web site at http://otn.oracle.com/docs/index.htm
To access the database documentation search engine directly, please visit http://tahiti.oracle.com
xxi
Conventions This section describes the conventions used in the text and code examples of this documentation set. It describes: ■
Conventions in Text
■
Conventions in Code Examples
Conventions in Text We use various conventions in text to help you more quickly identify special terms. The following table describes those conventions and provides examples of their use. Convention
Meaning
Example
Bold
Bold typeface indicates terms that are The C datatypes such as ub4, sword, or defined in the text or terms that appear in OCINumber are valid. a glossary, or both. When you specify this clause, you create an index-organized table.
Italics
Italic typeface indicates query terms, book The following query searches for oracle. titles, emphasis, syntax clauses, or Oracle9i Concepts placeholders. You can specify the parallel_clause. Run Uold_release.SQL where old_release refers to the release you installed prior to upgrading.
UPPERCASE Uppercase monospace typeface indicates monospace elements supplied by the system. Such (fixed-width font) elements include parameters, privileges, datatypes, RMAN keywords, SQL keywords, SQL*Plus or utility commands, packages and methods, as well as system-supplied column names, database objects and structures, user names, and roles.
You can specify this clause only for a NUMBER column. You can back up the database using the BACKUP command. Query the TABLE_NAME column in the USER_ TABLES data dictionary view. Specify the ROLLBACK_SEGMENTS parameter. Use the DBMS_STATS.GENERATE_STATS procedure.
xxii
Convention
Meaning
lowercase Lowercase monospace typeface indicates monospace executables and sample user-supplied (fixed-width font) elements. Such elements include computer and database names, net service names, and connect identifiers, as well as user-supplied database objects and structures, column names, packages and classes, user names and roles, program units, and parameter values.
Example Enter sqlplus to open SQL*Plus. The department_id, department_name, and location_id columns are in the hr.departments table. Set the QUERY_REWRITE_ENABLED initialization parameter to true. Connect as oe user.
Conventions in Code Examples Code examples illustrate SQL, PL/SQL, SQL*Plus, or other command-line statements. They are displayed in a monospace (fixed-width) font and separated from normal text as shown in this example: SELECT username FROM dba_users WHERE username = ’MIGRATE’;
The following table describes typographic conventions used in code examples and provides examples of their use. Convention
Meaning
Example
[]
Brackets enclose one or more optional items. Do not enter the brackets.
DECIMAL (digits [ , precision ])
{}
Braces enclose two or more items, one of which is required. Do not enter the braces.
{ENABLE | DISABLE}
|
A vertical bar represents a choice of two {ENABLE | DISABLE} or more options within brackets or braces. [COMPRESS | NOCOMPRESS] Enter one of the options. Do not enter the vertical bar.
...
Horizontal ellipsis points indicate either: ■
■
. . .
That we have omitted parts of the code that are not directly related to the example
CREATE TABLE ... AS subquery;
That you can repeat a portion of the code
SELECT col1, col2, ... , coln FROM employees;
Vertical ellipsis points indicate that we have omitted several lines of code not directly related to the example.
xxiii
Convention
Meaning
Other notation
You must enter symbols other than brackets, braces, vertical bars, and ellipsis points as it is shown.
Example acctbal NUMBER(11,2); acct
CONSTANT NUMBER(4) := 3;
Italics
Italicized text indicates variables for CONNECT SYSTEM/system_password which you must supply particular values.
UPPERCASE
Uppercase typeface indicates elements supplied by the system. We show these terms in uppercase in order to distinguish them from terms you define. Unless terms appear in brackets, enter them in the order and with the spelling shown. However, because these terms are not case sensitive, you can enter them in lowercase.
SELECT last_name, employee_id FROM employees;
Lowercase typeface indicates programmatic elements that you supply. For example, lowercase indicates names of tables, columns, or files.
SELECT last_name, employee_id FROM employees;
lowercase
SELECT * FROM USER_TABLES; DROP TABLE hr.employees;
sqlplus hr/hr
Documentation Accessibility Our goal is to make Oracle products, services, and supporting documentation accessible, with good usability, to the disabled community. To that end, our documentation includes features that make information available to users of assistive technology. This documentation is available in HTML format, and contains markup to facilitate access by the disabled community. Standards will continue to evolve over time, and Oracle Corporation is actively engaged with other market-leading technology vendors to address technical obstacles so that our documentation can be accessible to all of our customers. For additional information, visit the Oracle Accessibility Program Web site at http://www.oracle.com/accessibility/
JAWS, a Windows screen reader, may not always correctly read the code examples in this document. The conventions for writing code require that closing braces should appear on an otherwise empty line; however, JAWS may not always read a line of text that consists solely of a bracket or brace. Accessibility of Code Examples in Documentation
xxiv
What’s New in Oracle Text? This chapter describes new features of Oracle Text (formerly Oracle8i interMedia Text) and provides pointers to additional information. The following topics are covered: ■
Release 9.2 New Features in Oracle Text
■
Release 9.0.1 New Features in Oracle Text
xxv
Release 9.2 New Features in Oracle Text The following features are new for this release: ■
Document Classification The new CTX_CLS.TRAIN procedure enables you to generate rules for routing documents to different categories. See Also: TRAIN in Chapter 6, "CTX_CLS Package"
■
User Defined Lexer The user-defined lexer enables you to create lexing solutions for indexing and querying languages not supported by Oracle Text such as Arabic. See Also: USER_LEXER in Chapter 2, "Indexing"
■
Query Templating CONTAINS and CATSEARCH are no longer limited to their respective CONTEXT and CTXCAT grammars. Query templating enables you to use the CONTEXT grammar and associated operators in CATSEARCH queries and vice-versa. See Also: CATSEARCH in Chapter 1, "SQL Statements and
Operators" ■
CREATE INDEX ONLINE Support You can create a CONTEXT index while allowing inserts, updates, and deletes to your base table. See Also: CREATE INDEX in Chapter 1, "SQL Statements and
Operators" ■
Parallel Indexing Enhancements Parallel indexing is now supported for non-partitioned tables. You can use parallelism with CREATE INDEX and ALTER INDEX with parameters
xxvi
replace, resume, and sync. You can also run CTX_DDL.SYNC_INDEX and CTX_DDL.OPTIMIZE_INDEX with a parallel degree. See Also:
CREATE INDEX in Chapter 1, "SQL Statements and Operators" SYNC_INDEX in Chapter 7, "CTX_DDL Package" ■
Stem Indexing Stem indexing enables better performance for stem ($) queries by indexing the stem form in addition to the base form. See Also: BASIC_LEXER in Chapter 2, "Indexing"
■
Chinese Lexer New CHINESE_LEXER enables you to index traditional and simplified Chinese text more efficiently. See Also: CHINESE_LEXER in Chapter 2, "Indexing"
■
URIType indexing You can create CONTEXT indexes on URIType columns. See Also: CREATE INDEX in Chapter 1, "SQL Statements and
Operators" ■
CTXXPATH The CTXXPATH indextype enables you to speed up ExistsNode() queries on XMLType columns. See Also: Syntax for CTXXPATH Indextype in Chapter 1, "SQL
Statements and Operators" Oracle9i Application Developer’s Guide - XML
xxvii
■
ORA:CONTAINS Support in ExistsNode() You can call the CONTAINS function within an ExistsNode() statement without a Text index. See Also:
Oracle9i Application Developer’s Guide - XML CREATE_POLICY in Chapter 7, "CTX_DDL Package".
xxviii
Release 9.0.1 New Features in Oracle Text The following sections outline the new features in this release. ■
Document Classification A document classification application is one that classifies an incoming stream of documents based on their content. These applications are also known as document routing or filtering applications. For example, an online news agency might need to classify its incoming stream of articles as they arrive into categories such as politics, crime, and sports. Oracle Text enables you to build such applications with the new CTXRULE index type. This index type indexes the rules (queries) that define classifications or routing criteria. When documents arrive, the new MATCHES operator can be used to categorize and route each document. Note: Oracle Text supports document classification for only plain
text, XML, and HTML documents.
CREATE INDEX and MATCHES statements in Chapter 1, "SQL Statements and Operators". See Also:
Oracle Text Application Developer’s Guide for more information about document classification. ■
Local Partitioned Index Support You can create local partitioned indexes on partitioned text tables. To do so, use CREATE INDEX with the LOCAL PARTITION clause. You can also rebuild partitioned indexes with ALTER INDEX. See Also: CREATE INDEX and ALTER INDEX in Chapter 1,
"SQL Statements and Operators". ■
IGNORE Format Column Value The format column in your text table allows you to specify whether binary or text data is stored in the text column. A new format column value of IGNORE is provided. When you issue the CREATE INDEX statement and specify a format column, any row whose format column is set to IGNORE is ignored during indexing. This feature is useful for
xxix
indexing text columns that contain data incompatible with text indexing such as images or raw binary data. See Also: CREATE INDEX in Chapter 1, "SQL Statements and
Operators". ■
USER_DATASTORE Enhancement When you specify your user procedure for the USER_DATSTORE, you can return permanent BLOB and CLOB locators for your IN/OUT parameter. See Also: USER_DATASTORE in Chapter 2, "Indexing".
■
New Korean Lexer In this release, Oracle Text continues to support the indexing and querying of Korean text with a new Korean lexer, KOREAN_MORPH_LEXER. The KOREAN_ MORPH_LEXER lexer offers the following benefits over the KOREAN_LEXER: ■
better morphological analysis of Korean text
■
faster indexing
■
smaller indexes
■
more accurate query searching See Also: KOREAN_MORPH_LEXER in Chapter 2, "Indexing".
■
New Japanese Lexer In this release, Oracle Text continues to support the indexing and querying of Japanese text with a new Japanese lexer JAPANESE_LEXER. This lexer offers the following benefits over the JAPANESE_VGRAM_LEXER: ■
generates a smaller index
■
better query response time
■
generates real word tokens resulting in better query precision See Also: JAPANESE_LEXER in Chapter 2, "Indexing".
xxx
■
XMLType Indexing Oracle Text supports the indexing of text columns of type XMLType.
Note: XMLType indexing is supported only for the CONTEXT
index type.
See Also: Oracle Text Application Developer’s Guide for more information about XMLType indexing. ■
All Language Stopwords You can create a MULTI_STOPLIST type stoplist that contains words that are to be stopped in more than one language. This new stopword type is called ALL. For example, you can use an ALL stopword when you need to index international documents that contain English fragments. See Also: ADD_STOPWORD in Chapter 7, "CTX_DDL Package".
■
UTF-16 Auto-detection Oracle Text supports UTF-16 conversion to the database character set with the charset and Inso filters. These filters can convert documents that are UTF-16 big-endian (AL16UTF16) or little-endian (AL16UTF16LE). Oracle Text also supports endian auto-detection when the character set column or charset filter is set to UTF16AUTO. See Also: CHARSET_FILTER in Chapter 2, "Indexing".
■
INSO_FILTER Timeout Attribute The INSO_FILTER document filter has a new timeout attribute that allows you to specify the maximum time Oracle waits for a document to be filtered during indexing. You can use this mechanism to avoid hanging during the index operation. See Also: INSO_FILTER in Chapter 2, "Indexing".
xxxi
■
XML Path Searching XML documents can have parent-child tag structures such as the following: dog
Use this group type for indexing HTML documents and for defining sections in HTML documents.
XML_SECTION_GROUP
Use this group type for indexing XML documents and for defining sections in XML documents.
Indexing 2-81
Section Group Types
Section Group Preference
Description
AUTO_SECTION_GROUP
Use this group type to automatically create a zone section for each start-tag/end-tag pair in an XML document. The section names derived from XML tags are case sensitive as in XML. Attribute sections are created automatically for XML tags that have attributes. Attribute sections are named in the form attribute@tag. Stop sections, empty tags, processing instructions, and comments are not indexed. The following limitations apply to automatic section groups: ■
■
■
PATH_SECTION_GROUP
You cannot add zone, field, or special sections to an automatic section group. Automatic sectioning does not index XML document types (root elements.) However, you can define stop sections with document type. The length of the indexed tags, including prefix and namespace, cannot exceed 64 characters. Tags longer than this are not indexed.
Use this group type to index XML documents. Behaves like the AUTO_SECTION_GROUP. The difference is that with this section group you can do path searching with the INPATH and HASPATH operators. Queries are also case-sensitive for tag and attribute names. Stop sections are not allowed.
NEWS_SECTION_GROUP
Use this group for defining sections in newsgroup formatted documents according to RFC 1036.
Section Group Examples Creating Section Groups in HTML Documents The following statement creates a section group called htmgroup with the HTML group type. begin ctx_ddl_create_section_group(’htmgroup’, ’HTML_SECTION_GROUP’); end;
2-82
Oracle Text Reference
Section Group Types
You can optionally add sections to this group using the CTX_DDL.ADD_SECTION procedure. To index your documents, you can issue a statement such as: create index myindex on docs(htmlfile) indextype is ctxsys.context parameters(’filter ctxsys.null_filter section group htmgroup’);
Creating Sections Groups in XML Documents The following statement creates a section group called xmlgroup with the XML_ SECTION_GROUP group type. begin ctx_ddl_create_section_group(’xmlgroup’, ’XML_SECTION_GROUP’); end;
You can optionally add sections to this group using the CTX_DDL.ADD_SECTION procedure. To index your documents, you can issue a statement such as: create index myindex on docs(htmlfile) indextype is ctxsys.context parameters(’filter ctxsys.null_filter section group xmlgroup’);
Automatic Sectioning in XML Documents The following statement creates a section group called auto with the AUTO_ SECTION_GROUP group type. This section group automatically creates sections from tags in XML documents. begin ctx_ddl_create_section_group(’auto’, ’AUTO_SECTION_GROUP’); end; CREATE INDEX myindex on docs(htmlfile) INDEXTYPE IS ctxsys.context PARAMETERS(’filter ctxsys.null_filter section group auto’);
Indexing 2-83
Classifier Types
Classifier Types This section describes the classifier type used to create a preference for CTX_ CLS.TRAIN.
RULE_CLASSIFIER Use the RULE_CLASSIFIER type for creating preferences for the rule generating procedure, CTX_CLS.TRAIN. This type has the following attributes:
2-84
Attribute Name
Data Type
Default
Min Value
Max Value
THRESHOLD
I
50
1
99
Threshold (in percentage) from rule generation. One rule is output only when its confidence level is larger than threshold.
MAX_TERMS
I
100
20
2000
For each class, a list of relevant terms is selected to form rules. This attribute specifies the maximum number of terms that can be selected for each class.
MEMORY_SIZE
I
500
10
4000
Typical memory usage for training in MB. Larger values improve performance.
NT_THRESHOLD
F
0.001
0
0.90
A threshold for term selection. There are two thresholds guiding two steps in selecting relevant terms. This threshold controls the behavior of the first step. At this step, terms are selected as candidate terms for the further consideration in the second step. The term is chosen when the ratio of the occurrence frequency over the number of documents in the training set is larger than this threshold.
Oracle Text Reference
Description
Classifier Types
Attribute Name TERM_ THRESHOLD
Data Type
Default
Min Value
Max Value
I
10
0
100
Description Threshold as a percentage for term selection. This threshold controls the second step term selection. Each candidate term has a numerical quantity calculated to imply its correlation with a given class. The candidate term will be selected for this class only when the ratio of its quantity value over the maximum value for all candidate terms in the class is larger than this threshold.
Indexing 2-85
Stoplists
Stoplists Stoplists identify the words in your language that are not to be indexed. In English, you can also identify stopthemes that are not to be indexed. By default, the system indexes text using the system-supplied stoplist that corresponds to your database language. Oracle Text provides default stoplists for most languages including English, French, German, Spanish, Dutch, and Danish. These default stoplists contain only stopwords. See Also: For more information about the supplied default
stoplists, see Appendix D, "Supplied Stoplists".
Multi-Language Stoplists You can create multi-language stoplists to hold language-specific stopwords. A multi-language stoplist is useful when you use the MULTI_LEXER to index a table that contains documents in different languages, such as English, German, and Japanese. To create a multi-language stoplist, use the CTX_DLL.CREATE_STOPLIST procedure and specify a stoplist type of MULTI_STOPLIST. You add language specific stopwords with CTX_DDL.ADD_STOPWORD. At indexing time, the language column of each document is examined, and only the stopwords for that language are eliminated. At query time, the session language setting determines the active stopwords, like it determines the active lexer when using the multi-lexer.
Creating Stoplists You can create your own stoplists using the CTX_DLL.CREATE_STOPLIST procedure. With this procedure you can create a BASIC_STOPLIST for single language stoplist, or you can create a MULTI_STOPLIST for a multi-language stoplist. When you create your own stoplist, you must specify it in the parameter clause of CREATE INDEX.
Modifying the Default Stoplist The default stoplist is always named CTXSYS.DEFAULT_STOPLIST. You can use the following procedures to modify this stoplist:
2-86
Oracle Text Reference
Stoplists
■
CTX_DDL.ADD_STOPWORD
■
CTX_DDL.REMOVE_STOPWORD
■
CTX_DDL.ADD_STOPTHEME
■
CTX_DDL.ADD_STOPCLASS
When you modify CTXSYS.DEFAULT_STOPLIST with the CTX_DDL package, you must re-create your index for the changes to take effect.
Dynamic Addition of Stopwords You can add stopwords dynamically to a default or custom stoplist with ALTER INDEX. When you add a stopword dynamically, you need not re-index, because the word immediately becomes a stopword and is removed from the index.
Note: Even though you can dynamically add stopwords to an
index, you cannot dynamically remove stopwords. To remove a stopword, you must use CTX_DDL.REMOVE_STOPWORD, drop your index and re-create it.
See Also: ALTER INDEX in Chapter 1, "SQL Statements and
Operators".
Indexing 2-87
System-Defined Preferences
System-Defined Preferences When you install Oracle Text, some indexing preferences are created. You can use these preferences in the parameter clause of CREATE INDEX or define your own. The default index parameters are mapped to some of the system-defined preferences described in this section. See Also: For more information about default index parameters,
see "Default Index Parameters" on page 2-93. System-defined preferences are divided into the following categories: ■
Data Storage
■
Filter
■
Lexer
■
Section Group
■
Stoplist
■
Storage
■
Wordlist
Data Storage CTXSYS.DEFAULT_DATASTORE This preference uses the DIRECT_DATASTORE type. You can use this preference to create indexes for text columns in which the text is stored directly in the column.
CTXSYS.FILE_DATASTORE This preference uses the FILE_DATASTORE type.
CTXSYS.URL_DATASTORE This preference uses the URL_DATASTORE type.
2-88
Oracle Text Reference
System-Defined Preferences
Filter CTXSYS.NULL_FILTER This preference uses the NULL_FILTER type.
CTXSYS.INSO_FILTER This preference uses the INSO_FILTER type.
Lexer CTXSYS.DEFAULT_LEXER The default lexer depends on the language used at install time. The following sections describe the default settings for CTXSYS.DEFAULT_LEXER for each language. American and English Language Settings If your language is English, this preference uses the BASIC_LEXER with the index_themes attribute disabled. Danish Language Settings If your language is Danish, this preference uses the BASIC_ LEXER with the following option enabled: ■
alternate spelling (alternate_spelling attribute set to DANISH)
Dutch Language Settings If your language is Dutch, this preference uses the BASIC_ LEXER with the following options enabled: ■
composite indexing (composite attribute set to DUTCH)
German and German DIN Language Settings If your language is German, this preference uses the BASIC_LEXER with the following options enabled: ■
composite indexing (composite attribute set to GERMAN)
■
alternate spelling (alternate_spelling attribute set to GERMAN)
Finnish, Norwegian, and Swedish Language Settings If your language is Finnish, Norwegian, or Swedish, this preference uses the BASIC_LEXER with the following option enabled:
Indexing 2-89
System-Defined Preferences
■
alternate spelling (alternate_spelling attribute set to SWEDISH)
Japanese Language Settings If you language is Japanese, this preference uses the JAPANESE_VGRAM_LEXER. Korean Language Settings If your language is Korean, this preference uses the KOREAN_MORPH_LEXER. All attributes for the KOREAN_MORPH_LEXER are enabled. Chinese Language Settings If your language is Simplified or Traditional Chinese, this preference uses the CHINESE_VGRAM_LEXER. Other Languages For all other languages not listed in this section, this preference uses the BASIC_LEXER with no attributes set. See Also: To learn more about these options, see BASIC_LEXER
on page 2-38.
CTXSYS.BASIC_LEXER This preference uses the BASIC_LEXER.
Section Group CTXSYS.NULL_SECTION_GROUP This preference uses the NULL_SECTION_GROUP type.
CTXSYS.HTML_SECTION_GROUP This preference uses the HTML_SECTION_GROUP type.
CTXSYS.AUTO_SECTION_GROUP This preference uses the AUTO_SECTION_GROUP type.
CTXSYS.PATH_SECTION_GROUP This preference uses the PATH_SECTION_GROUP type.
2-90
Oracle Text Reference
System-Defined Preferences
Stoplist CTXSYS.DEFAULT_STOPLIST This stoplist preference defaults to the stoplist of your database language. See Also: For a complete list of the stop words in the supplied stoplists, see Appendix D, "Supplied Stoplists".
CTXSYS.EMPTY_STOPLIST This stoplist has no words.
Storage CTXSYS.DEFAULT_STORAGE This storage preference uses the BASIC_STORAGE type.
Wordlist CTXSYS.DEFAULT_WORDLIST This preference uses the language stemmer for your database language. If your language is not listed in Table 2–7 on page 2-71, this preference defaults to the NULL stemmer and the GENERIC fuzzy matching attribute.
Indexing 2-91
System Parameters
System Parameters This section describes the Oracle Text system parameters. They fall into the following categories: ■
General System Parameters
■
Default Index Parameters
General System Parameters When you install Oracle Text, in addition to the system-defined preferences, the following system parameters are set:
System Parameter
Description
MAX_INDEX_MEMORY
This is the maximum indexing memory that can be specified in the parameter clause of CREATE INDEX and ALTER INDEX.
DEFAULT_INDEX_MEMORY
This is the default indexing memory used with CREATE INDEX and ALTER INDEX.
LOG_DIRECTORY
This is the directory for CTX_OUTPUT log files.
CTX_DOC_KEY_TYPE
This is the default input key type, either ROWID or PRIMARY_KEY, for the CTX_DOC procedures. Set to ROWID at install time. See also: CTX_DOC. SET_KEY_TYPE on page 8-22.
You can view system defaults by querying the CTX_PARAMETERS view. You can change defaults using the CTX_ADM.SET_PARAMETER procedure.
2-92
Oracle Text Reference
System Parameters
Default Index Parameters This section describes the index parameters you can use when you create context and ctxcat indexes.
CONTEXT Index Parameters The following default parameters are used when you do not specify preferences in the parameter clause of CREATE INDEX when you create a context index. Each default parameter names a system-defined preference to use for data storage, filtering, lexing, and so on. System Parameter
Used When
Default Value
DEFAULT_DATASTORE
No datastore preference specified in parameter clause of CREATE INDEX.
CTXSYS.DEFAULT_DATASTORE
DEFAULT_FILTER_FILE
No filter preference specified in parameter clause of CREATE INDEX, and either of the following conditions is true:
CTXSYS.INSO_FILTER
■
■
Your files are stored in external files (BFILES) or You specify a datastore preference that uses FILE_DATASTORE
DEFAULT_FILTER_ BINARY
CTXSYS.INSO_FILTER No filter preference specified in parameter clause of CREATE INDEX, and Oracle detects that the text column datatype is RAW, LONG RAW, or BLOB.
DEFAULT_FILTER_TEXT
CTXSYS.NULL_FILTER No filter preference specified in parameter clause of CREATE INDEX, and Oracle detects that the text column datatype is either LONG, VARCHAR2, VARCHAR, CHAR, or CLOB.
DEFAULT_SECTION_HTML No section group specified in parameter clause of CREATE INDEX,
CTXSYS.HTML_SECTION_GROUP
and when either of the following conditions is true: ■
■
Your datastore preference uses URL_DATASTORE or Your filter preference uses INSO_
FILTER.
Indexing 2-93
System Parameters
System Parameter
Used When
DEFAULT_SECTION_TEXT No section group specified in parameter clause of CREATE INDEX, and when you do not use either URL_ DATASTORE or INSO_FILTER.
Default Value CTXSYS.NULL_SECTION_GROUP
DEFAULT_STORAGE
No storage preference specified in parameter clause of CREATE INDEX.
CTXSYS.DEFAULT_STORAGE
DEFAULT_LEXER
No lexer preference specified in parameter clause of CREATE INDEX.
CTXSYS.DEFAULT_LEXER
DEFAULT_STOPLIST
No stoplist specified in parameter clause of CREATE INDEX.
CTXSYS.DEFAULT_STOPLIST
DEFAULT_WORDLIST
No wordlist preference specified in parameter clause of CREATE INDEX.
CTXSYS.DEFAULT_WORDLIST
CTXCAT Index Parameters The following default parameters are used when you create a CTXCAT index with CREATE INDEX and do not specify any parameters in the parameter string. The CTXCAT index supports only the index set, lexer, storage, stoplist, and wordlist parameters. Each default parameter names a system-defined preference. System Parameter
Used When
DEFAULT_CTXCAT_INDEX_SET
No index set specified in parameter clause of CREATE INDEX.
DEFAULT_CTXCAT_STORAGE
No storage preference specified in parameter clause of CREATE INDEX.
CTXSYS.DEFAULT_STORAGE
DEFAULT_CTXCAT_LEXER
No lexer preference specified in parameter clause of CREATE INDEX.
CTXSYS.DEFAULT_LEXER
DEFAULT_CTXCAT_STOPLIST
No stoplist specified in parameter clause of CREATE INDEX.
CTXSYS.DEFAULT_STOPLIST
2-94
Oracle Text Reference
Default Value
System Parameters
System Parameter
Used When
Default Value
DEFAULT_CTXCAT_WORDLIST
No wordlist preference specified in parameter clause of CREATE INDEX.
CTXSYS.DEFAULT_WORDLIST
Note that while you can specify a wordlist preference for CTXCAT indexes, most of the attributes do not apply, since the catsearch query language does not support wildcarding, fuzzy, and stemming. The only attribute that is useful is PREFIX_INDEX for Japanese data.
CTXRULE Index Parameters The following default parameters are used when you create a CTXRULE index with CREATE INDEX and do not specify any parameters in the parameter string. The CTXRULE index supports only the lexer, storage, stoplist, and wordlist parameters. Each default parameter names a system-defined preference.
System Parameter
Used When
Default Value
DEFAULT_CTXRULE_LEXER
No lexer preference specified in parameter clause of CREATE INDEX.
CTXSYS.DEFAULT_LEXER
DEFAULT_CTXRULE_STORAGE
No storage preference specified in parameter clause of CREATE INDEX.
CTXSYS.DEFAULT_STORAGE
DEFAULT_CTXRULE_STOPLIST
No stoplist specified in parameter clause of CREATE INDEX.
CTXSYS.DEFAULT_STOPLIST
DEFAULT_CTXRULE_WORDLIST
No wordlist preference specified in parameter clause of CREATE INDEX.
CTXSYS.DEFAULT_WORDLIST
Viewing Default Values You can view system defaults by querying the CTX_PARAMETERS view. For example, to see all parameters and values, you can issue: SQL> SELECT par_name, par_value from ctx_parameters;
Indexing 2-95
System Parameters
Changing Default Values You can change a default value using the CTX_ADM.SET_PARAMETER procedure to name another custom or system-defined preference to use as default.
2-96
Oracle Text Reference
3 CONTAINS Query Operators This chapter describes operator precedence and provides description, syntax, and examples for every CONTAINS operator. The following topics are covered: ■
Operator Precedence
■
ABOUT
■
ACCUMulate ( , )
■
AND (&)
■
Broader Term (BT, BTG, BTP, BTI)
■
EQUIValence (=)
■
Fuzzy
■
HASPATH
■
INPATH
■
MINUS (-)
■
Narrower Term (NT, NTG, NTP, NTI)
■
NEAR (;)
■
NOT (~)
■
OR (|)
■
Preferred Term (PT)
■
Related Term (RT)
■
soundex (!)
■
stem ($)
CONTAINS Query Operators 3-1
■
Stored Query Expression (SQE)
■
SYNonym (SYN)
■
threshold (>)
■
Translation Term (TR)
■
Translation Term Synonym (TRSYN)
■
Top Term (TT)
■
weight (*)
■
wildcards (% _)
■
WITHIN
3-2 Oracle Text Reference
Operator Precedence
Operator Precedence Operator precedence determines the order in which the components of a query expression are evaluated. Text query operators can be divided into two sets of operators that have their own order of evaluation. These two groups are described below as Group 1 and Group 2. In all cases, query expressions are evaluated in order from left to right according to the precedence of their operators. Operators with higher precedence are applied first. Operators of equal precedence are applied in order of their appearance in the expression from left to right.
Group 1 Operators Within query expressions, the Group 1 operators have the following order of evaluation from highest precedence to lowest: 1.
EQUIValence (=)
2.
NEAR (;)
3.
weight (*), threshold (>)
4.
MINUS (-)
5.
NOT (~)
6.
WITHIN
7.
AND (&)
8.
OR (|)
9.
ACCUMulate ( , )
Group 2 Operators and Characters Within query expressions, the Group 2 operators have the following order of evaluation from highest to lowest: 1.
Wildcard Characters
2.
ABOUT
3.
stem ($)
4.
Fuzzy
5.
soundex (!)
CONTAINS Query Operators 3-3
Operator Precedence
Procedural Operators Other operators not listed under Group 1 or Group 2 are procedural. These operators have no sense of precedence attached to them. They include the SQE and thesaurus operators.
Precedence Examples Query Expression
Order of Evaluation
w1 | w2 & w3
(w1) | (w2 & w3)
w1 & w2 | w3
(w1 & w2) | w3
?w1, w2 | w3 & w4
(?w1), (w2 | (w3 & w4))
abc = def ghi & jkl = mno
((abc = def) ghi) & (jkl=mno)
dog and cat WITHIN body
dog and (cat WITHIN body)
In the first example, because AND has a higher precedence than OR, the query returns all documents that contain w1 and all documents that contain both w2 and w3. In the second example, the query returns all documents that contain both w1 and w2 and all documents that contain w3. In the third example, the fuzzy operator is first applied to w1, then the AND operator is applied to arguments w3 and w4, then the OR operator is applied to term w2 and the results of the AND operation, and finally, the score from the fuzzy operation on w1 is added to the score from the OR operation. The fourth example shows that the equivalence operator has higher precedence than the AND operator. The fifth example shows that the AND operator has lower precedence than the WITHIN operator.
3-4 Oracle Text Reference
Operator Precedence
Altering Precedence Precedence is altered by grouping characters as follows: ■
■
■
Within parentheses, expansion or execution of operations is resolved before other expansions regardless of operator precedence Within parentheses, precedence of operators is maintained during evaluation of expressions. Within brackets, expansion operators are not applied to expressions unless the operators are also within the brackets See Also: Grouping Characters in Chapter 4, "Special Characters
in Queries".
CONTAINS Query Operators 3-5
ABOUT
ABOUT General Behavior In all languages, an ABOUT query increases the number of relevant documents returned from the same query without this operator. Oracle scores results for an ABOUT query with the most relevant document receiving the highest score.
English and French Behavior In English and French, use the ABOUT operator to query on concepts. The system looks up concept information in the theme component of the index. Note: You need not have a theme component in the index to issue
ABOUT queries in English. However, having a theme component in the index yields the best results for ABOUT queries. Oracle retrieves documents that contain concepts that are related to your query word or phrase. For example, if you issue an ABOUT query on California, the system might return documents that contain the terms Los Angeles and San Francisco, which are cities in California.The document need not contain the term California to be returned in this ABOUT query. The word or phrase specified in your ABOUT query need not exactly match the themes stored in the index. Oracle normalizes the word or phrase before performing lookup in the index. You can use the ABOUT operator with the CONTAINS and CATSEARCH SQL operators.
Improving ABOUT Results The ABOUT operator uses the supplied knowledge base in English and French to interpret the phrase you enter. Your ABOUT query therefore is limited to knowing and interpreting the concepts in the knowledge base. You can improve the results of your ABOUT queries by adding your application-specific terminology to the knowledge base. See Also: Extending the Knowledge Base in Chapter 14,
"Executables".
3-6 Oracle Text Reference
ABOUT
Syntax Syntax
Description
about(phrase)
In all languages, increases the number of relevant documents returned for the same query without the ABOUT operator.The phrase parameter can be a single word or a phrase, or a string of words in free text format. In English, returns documents that contain concepts related to phrase. The score returned is a relevance score. Oracle ignores any query operators that are included in phrase. If your index contains only theme information, an ABOUT operator and operand must be included in your query on the text column or else Oracle returns an error. The phrase you specify cannot be more than 4000 characters.
Case-Sensitivity ABOUT queries give the best results when your query is formulated with proper case. This is because the normalization of your query is based on the knowledge catalog which is case-sensitive. However, you need not type your query in exact case to obtain results from an ABOUT query. The system does its best to interpret your query. For example, if you enter a query of CISCO and the system does not find this in the knowledge catalog, the system might use Cisco as a related concept for look-up.
Limitations ■
■
■
The phrase you specify in an ABOUT query cannot be more than 4000 characters. You cannot combine the WITHIN operator with ABOUT operator like ’ABOUT (xyz) WITHIN abc’. You cannot combine ABOUT with any operator involving offset information, such as NEAR or WITHIN.
Examples Single Words To search for documents that are about soccer, use the following syntax: ’about(soccer)’
CONTAINS Query Operators 3-7
ABOUT
Phrases You can further refine the query to include documents about soccer rules in international competition by entering the phrase as the query term: ’about(soccer rules in international competition)’
In this English example, Oracle returns all documents that have themes of soccer, rules, or international competition. In terms of scoring, documents which have all three themes will generally score higher than documents that have only one or two of the themes.
Unstructured Phrases You can also query on unstructured phrases, such as the following: ’about(japanese banking investments in indonesia)’
Combined Queries You can use other operators, such as AND or NOT, to combine ABOUT queries with word queries. For example, you can issue the following combined ABOUT and word query: ’about(dogs) and cat’
You can combine an ABOUT query with another ABOUT query as follows: ’about(dogs) not about(labradors)’
Note: You cannot combine ABOUT with the WITHIN operator
like ’ABOUT (xyz) WITHIN abc’.
ABOUT Query with CATSEARCH You can issue ABOUT queries with CATSEARCH using the query template method with grammar set to CONTEXT as follows: select pk||’ ==> ’||text from test where catsearch(text, ’ about(California) <score datatype="integer"/>
3-8 Oracle Text Reference
ABOUT
’,’’)>0 order by pk;
CONTAINS Query Operators 3-9
ACCUMulate ( , )
ACCUMulate ( , ) Use the ACCUM operator to search for documents that contain at least one occurrence of any of the query terms. The accumulate operator ranks documents according to the total term weight of a document.
Syntax Syntax
Description
term1,term2
Returns documents that contain term1 or term2. Ranks documents according to document term weight, with the highest scores assigned to documents that have the highest total term weight.
term1 accum term2
Examples The following example returns documents that contain either soccer, Brazil, or cup and assigns the highest scores to the documents that contain all three terms: ’soccer, Brazil, cup’
The following example also returns documents that contain either soccer, Brazil, or cup. However, the weight operator ensures that documents with Brazil score higher than documents that contain only soccer and cup. ’soccer, Brazil*3, cup’
Notes Accumulate Scoring ACCUM scores documents based on two criteria: ■
document term weights
■
document term scores
Term weight refers to the weight you place on a query term. A query such as x,y,z has term weights of 1 for each term. A query of x, 3*y, z, has term weights of 1, 3, and 1 for the individual terms. Accumulate scoring guarantees that if a document A matches p terms with a total term weight of m, and document B matches q terms with a total term weight of m+1,
3-10
Oracle Text Reference
ACCUMulate ( , )
document B is guaranteed to have a higher relevance score than document A, regardless of the numbers p and q. If two documents have the same weight M, the higher relevance score goes to the document with the higher weighted average term score. This following table illustrates accumulate scoring:
Document
query
Score(x)
Score(y)
Score(z)
Total Term Weight
A
x,y,z
10
0
0
1
3
B
x,y,z
10
20
0
2
38
C
x,y,z
10
20
30
3
73
D
x,y,z
50
50
0
2
50
E
x, y*3, z
100
0
100
2
40
F
x, y*3, z
0
1
0
3
41
Score(query)
Each row in the table shows the score for an accumulate query. The first four rows show the scores for query x,y,z for documents A, B, C, D. The next two rows show the scores for query x, y*3,z for documents E and F. Assume that x, y and z stand for three different words. The query for document E and F has a weight of 3 on the second query term to arbitrarily make it the most important query term. The total document term weight is shown for each document. For example, document A has a matching weight of one since only one query term matches the document. Similarly document C has a weight of 3 since all query terms with weight 1 match the document. The table shows that documents that have higher query term weights are always scored higher than those that contain lower query term weights. For example, document C always scores higher than documents A, B, and D, since document C has the highest query term weight. Similarly, document F scores higher than document E, since F has a higher matching weight. For documents that have equal term weights, such as document B and D, the higher score goes to the document with the higher weighted average term score, which is document D.
CONTAINS Query Operators
3-11
AND (&)
AND (&) Use the AND operator to search for documents that contain at least one occurrence of each of the query terms.
Syntax Syntax
Description
term1&term2
Returns documents that contain term1 and term2. Returns the minimum score of its operands. All query terms must occur; lower score taken.
term1 and term2
Examples To obtain all the documents that contain the terms blue and black and red, issue the following query: ’blue & black & red’
In an AND query, the score returned is the score of the lowest query term. In this example, if the three individual scores for the terms blue, black, and red is 10, 20 and 30 within a document, the document scores 10.
3-12
Oracle Text Reference
Broader Term (BT, BTG, BTP, BTI)
Broader Term (BT, BTG, BTP, BTI) Use the broader term operators (BT, BTG, BTP, BTI) to expand a query to include the term that has been defined in a thesaurus as the broader or higher level term for a specified term. They can also expand the query to include the broader term for the broader term and the broader term for that broader term, and so on up through the thesaurus hierarchy.
Syntax Syntax
Description
BT(term[(qualifier)][,level][,thes])
Expands a query to include the term defined in the thesaurus as a broader term for term.
BTG(term[(qualifier)][,level][,thes])
Expands a query to include all terms defined in the thesaurus as broader generic terms for term.
BTP(term[(qualifier)][,level][,thes])
Expands a query to include all the terms defined in the thesaurus as broader partitive terms for term.
BTI(term[(qualifier)][,level][,thes])
Expands a query to include all the terms defined in the thesaurus as broader instance terms for term.
term
Specify the operand for the broader term operator. Oracle expands term to include the broader term entries defined for the term in the thesaurus specified by thes. For example, if you specify BTG(dog), the expansion includes only those terms that are defined as broader term generic for dog. You cannot specify expansion operators in the term argument. The number of broader terms included in the expansion is determined by the value for level. qualifier
Specify a qualifier for term, if term is a homograph (word or phrase with multiple meanings, but the same spelling) that appears in two or more nodes in the same hierarchy branch of thes. If a qualifier is not specified for a homograph in a broader term query, the query expands to include the broader terms of all the homographic terms.
CONTAINS Query Operators
3-13
Broader Term (BT, BTG, BTP, BTI)
level
Specify the number of levels traversed in the thesaurus hierarchy to return the broader terms for the specified term. For example, a level of 1 in a BT query returns the broader term entry, if one exists, for the specified term. A level of 2 returns the broader term entry for the specified term, as well as the broader term entry, if one exists, for the broader term. The level argument is optional and has a default value of one (1). Zero or negative values for the level argument return only the original query term. thes
Specify the name of the thesaurus used to return the expansions for the specified term. The thes argument is optional and has a default value of DEFAULT. A thesaurus named DEFAULT must exist in the thesaurus tables if you use this default value. Note: If you specify thes, you must also specify level.
Examples The following query returns all documents that contain the term tutorial or the BT term defined for tutorial in the DEFAULT thesaurus: ’BT(tutorial)’
When you specify a thesaurus name, you must also specify level as in: ’BT(tutorial, 2, mythes)’
Broader Term Operator on Homographs If machine is a broader term for crane (building equipment) and bird is a broader term for crane (waterfowl) and no qualifier is specified for a broader term query, the query BT(crane)
expands to: ’{crane} or {machine} or {bird}’
If waterfowl is specified as a qualifier for crane in a broader term query, the query BT(crane{(waterfowl)})
expands to the query:
3-14
Oracle Text Reference
Broader Term (BT, BTG, BTP, BTI)
’{crane} or {bird}’
Note: When specifying a qualifier in a broader or narrower term
query, the qualifier and its notation (parentheses) must be escaped, as is shown in this example.
Related Topics You can browse a thesaurus using procedures in the CTX_THES package. .
See Also: For more information on browsing the broader terms in your thesaurus, see CTX_THES.BT in Chapter 12, "CTX_THES Package".
CONTAINS Query Operators
3-15
EQUIValence (=)
EQUIValence (=) Use the EQUIV operator to specify an acceptable substitution for a word in a query.
Syntax Syntax
Description
term1=term2
Specifies that term2 is an acceptable substitution for term1. Score calculated as the sum of all occurrences of both terms.
term1 equiv term2
Examples The following example returns all documents that contain either the phrase alsatians are big dogs or labradors are big dogs: ’labradors=alsatians are big dogs’
Operator Precedence The EQUIV operator has higher precedence than all other operators except the expansion operators (fuzzy, soundex, stem).
3-16
Oracle Text Reference
Fuzzy
Fuzzy Use the fuzzy operator to expand queries to include words that are spelled similarly to the specified term. This type of expansion is helpful for finding more accurate results when there are frequent misspellings in your document set. The new fuzzy syntax enables you to rank the result set so that documents that contain words with high similarity to the query word are scored higher than documents with lower similarity. You can also limit the number of expanded terms. Unlike stem expansion, the number of words generated by a fuzzy expansion depends on what is in the index. Results can vary significantly according to the contents of the index.
Supported Languages Oracle Text supports fuzzy definitions for English, German, Italian, Dutch, Spanish, and OCR.
Stopwords If the fuzzy expansion returns a stopword, the stopword is not included in the query or highlighted by CTX_DOC.HIGHLIGHT or CTX_DOC.MARKUP.
Base-Letter Conversion If base-letter conversion is enabled for a text column and the query expression contains a fuzzy operator, Oracle operates on the base-letter form of the query.
This query expands to the first six fuzzy variations of government in the index that have a similarity score over 70. In addition, documents in the result set are weighted according to their similarity to government. Documents containing words most similar to government receive the highest score. You can skip unnecessary parameters using the appropriate number of commas. For example: ’fuzzy(government,,,weight)’
Backward Compatibility Syntax The old fuzzy syntax from previous releases is still supported. This syntax is as follows:
3-18
Parameter
Description
?term
Expands term to include all terms with similar spellings as the specified term.
Oracle Text Reference
HASPATH
HASPATH Use this operator to find all XML documents that contain a specified section path. You can also use this operator to do section equality testing. Your index must be created with the PATH_SECTION_GROUP for this operator to work.
Syntax Syntax
Description
HASPATH(path)
Searches an XML document set and returns a score of 100 for all documents where path exists. Separate parent and child paths with the / character. For example, you can specify A/B/C. See example.
HASPATH(A="value")
Searches an XML document set and returns a score of 100 for all documents that have the element A with content value and only value. See example.
Example Path Testing The query HASPATH(A/B/C)
finds and returns a score of 100 for the document dog
To limit the query to the term dog and nothing else, you can use a section equality test with the HASPATH operator. For example, HASPATH(A="dog")
finds and returns a score of 100 only for the first document, and not the second.
Limitations Because of how XML section data is recorded, false matches might occur with XML sections that are completely empty as follows: <E> A query of HASPATH(A/B/E) or HASPATH(A/D/C) falsely matches this document. This type of false matching can be avoided by inserting text between empty tags.
3-20
Oracle Text Reference
INPATH
INPATH Use this operator to do path searching in XML documents. This operator is like the WITHIN operator except that the right-hand side is a parentheses enclosed path, rather than a single section name. Your index must be created with the PATH_SECTION_GROUP for the INPATH operator to work.
Syntax The INPATH operator has the following syntax:
Top-Level Tag Searching Syntax
Description
term INPATH (/A)
Returns documents that have term within the top-level tags and . The A tag must be a top-level tag, which is the document-type tag.
Returns documents where term appears in a B element which is a grandchild (two levels down) of a top-level A element. For example a document containing term is returned.
Multi-level Wildcard Searching Syntax
Description
term INPATH (A/*/B/*/*/C)
Returns documents where term appears in a C element which is 3 levels down from a B element which is two levels down (grandchild) of a top-level A element.
Any-Level Descendant Searching Syntax
Description
term INPATH(A//B)
Returns documents where term appears in a B element which is some descendant (any level) of a top-level A element.
Attribute Searching
3-22
Syntax
Description
term INPATH (//A/@B)
Returns documents where term appears in the B attribute of an A element at any level. Attributes must be bound to a direct parent.
Oracle Text Reference
INPATH
Descendant/Attribute Existence Testing Syntax
Description
term INPATH (A[B])
Returns documents where term appears in a top-level A element which has a B element as a direct child.
term INPATH (A[.//B])
Returns documents where term appears in a top-level A element which has a B element as a descendant at any level.
term INPATH (//A[@B])
Finds documents where term appears in an A element at any level which has a B attribute. Attributes must be tied to a direct parent.
Attribute Value Testing Syntax
Description
term INPATH (A[@B = "value"])
Finds all documents where term appears in a top-level A element which has a B attribute whose value is value.
term INPATH (A[@B != "value"])
Finds all documents where term appears in a top-level A element which has a B attribute whose value is not value.
Tag Value Testing Syntax
Description
term INPATH (A[B = "value"]))
Returns documents where term appears in an A tag which has a B tag whose value is value.
Not Syntax
Description
term INPATH (A[NOT(B)])
Finds documents where term appears in a top-level A element which does not have a B element as an immediate child.
CONTAINS Query Operators
3-23
INPATH
AND and OR Testing Syntax
Description
term INPATH (A[B and C])
Finds documents where term appears in a top-level A element which has a B and a C element as an immediate child.
term INPATH (A[B and @C="value"]])
Finds documents where term appears in a top-level A element which has a B element and a C attribute whose value is value.
term INPATH (A [B OR C])
Finds documents where term appears in a top-level A element which has a B element or a C element.
Combining Path and Node Tests Syntax
Description
term INPATH (A[@B = "value"]/C/D)
Returns documents where term appears in aD element which is the child of a C element, which is the child of a top-level A element with a B attribute whose value is value.
Nested INPATH You can nest the entire INPATH expression in another INPATH expression as follows: (dog INPATH (//A/B/C) INPATH (D)
When you do so, the two INPATH paths are completely independent. The outer INPATH path does not change the context node of the inner INPATH path. For example: (dog INPATH (A)) INPATH (D)
never finds any documents, because the inner INPATH is looking for dog within the top-level tag A, and the outer INPATH constrains that to document with top-level tag D. A document can have only one top-level tag, so this expression never finds any documents.
Case-Sensitivity Tags and attribute names in path searching are case-sensitive. That is,
3-24
Oracle Text Reference
INPATH
dog INPATH (A)
finds dog but does not find dog. Instead use dog INPATH (a)
Examples Top-Level Tag Searching To find all documents that contain the term dog in the top-level tag : dog INPATH (/A)
Direct Parentage Searching To find all documents that contain the term dog in a B element that is a direct child of a top-level A element: dog INPATH(A/B)
Path Testing You can test if a path exists with the HASPATH operator. For example, the query: HASPATH(A/B/C)
finds and returns a score of 100 for the document dog
without the query having to reference dog at all.
3-26
Oracle Text Reference
INPATH
Limitations Testing for Equality The following is an example of an INPATH equality test. dog INPATH (A[@B = "foo"])
The following limitations apply for these expressions: ■
■
■
■
Only equality and inequality are supported. Range operators and functions are not supported. The left hand side of the equality must be an attribute. Tags and literals here are not allowed. The right hand side of the equality must be a literal. Tags and attributes here are not allowed. The test for equality depends on your lexer settings. With the default settings, the query dog INPATH (A[@B= "pot of gold"])
because OF is a default stopword in English and the query matches any word in that position. dog
because the underscore character is not a join character by default.
CONTAINS Query Operators
3-27
MINUS (-)
MINUS (-) Use the MINUS operator to search for documents that contain one query term and you want the presence of a second query term to cause the document to be ranked lower. The MINUS operator is useful for lowering the score of documents that contain unwanted noise terms.
Syntax Syntax
Description
term1-term2
Returns documents that contain term1. Calculates score by subtracting the score of term2 from the score of term1. Only documents with positive score are returned.
term1 minus term2
Examples Suppose a query on the term cars always returned high scoring documents about Ford cars. You can lower the scoring of the Ford documents by using the expression: ’cars - Ford’
In essence, this expression returns documents that contain the term cars and possibly Ford. However, the score for a returned document is the score of cars minus the score of Ford.
3-28
Oracle Text Reference
Narrower Term (NT, NTG, NTP, NTI)
Narrower Term (NT, NTG, NTP, NTI) Use the narrower term operators (NT, NTG, NTP, NTI) to expand a query to include all the terms that have been defined in a thesaurus as the narrower or lower level terms for a specified term. They can also expand the query to include all of the narrower terms for each narrower term, and so on down through the thesaurus hierarchy.
Syntax Syntax
Description
NT(term[(qualifier)][,level][,thes])
Expands a query to include all the lower level terms defined in the thesaurus as narrower terms for term.
NTG(term[(qualifier)][,level][,thes])
Expands a query to include all the lower level terms defined in the thesaurus as narrower generic terms for term.
NTP(term[(qualifier)][,level][,thes])
Expands a query to include all the lower level terms defined in the thesaurus as narrower partitive terms for term.
NTI(term[(qualifier)][,level][,thes])
Expands a query to include all the lower level terms defined in the thesaurus as narrower instance terms for term.
term
Specify the operand for the narrower term operator. term is expanded to include the narrower term entries defined for the term in the thesaurus specified by thes. The number of narrower terms included in the expansion is determined by the value for level. You cannot specify expansion operators in the term argument. qualifier
Specify a qualifier for term, if term is a homograph (word or phrase with multiple meanings, but the same spelling) that appears in two or more nodes in the same hierarchy branch of thes. If a qualifier is not specified for a homograph in a narrower term query, the query expands to include all of the narrower terms of all homographic terms.
CONTAINS Query Operators
3-29
Narrower Term (NT, NTG, NTP, NTI)
level
Specify the number of levels traversed in the thesaurus hierarchy to return the narrower terms for the specified term. For example, a level of 1 in an NT query returns all the narrower term entries, if any exist, for the specified term. A level of 2 returns all the narrower term entries for the specified term, as well as all the narrower term entries, if any exist, for each narrower term. The level argument is optional and has a default value of one (1). Zero or negative values for the level argument return only the original query term. thes
Specify the name of the thesaurus used to return the expansions for the specified term. The thes argument is optional and has a default value of DEFAULT. A thesaurus named DEFAULT must exist in the thesaurus tables if you use this default value. Note: If you specify thes, you must also specify level.
Examples The following query returns all documents that contain either the term cat or any of the NT terms defined for cat in the DEFAULT thesaurus: ’NT(cat)’
If you specify a thesaurus name, you must also specify level as in: ’NT(cat, 2, mythes)’
The following query returns all documents that contain either fairy tale or any of the narrower instance terms for fairy tale as defined in the DEFAULT thesaurus: ’NTI(fairy tale)’
That is, if the terms cinderella and snow white are defined as narrower term instances for fairy tale, Oracle returns documents that contain fairy tale, cinderella, or snow white.
Notes Each hierarchy in a thesaurus represents a distinct, separate branch, corresponding to the four narrower term operators. In a narrower term query, Oracle only expands the query using the branch corresponding to the specified narrower term operator.
3-30
Oracle Text Reference
Narrower Term (NT, NTG, NTP, NTI)
Related Topics You can browse a thesaurus using procedures in the CTX_THES package. See Also: For more information on browsing the narrower terms in your thesaurus, see CTX_THES.NT in Chapter 12, "CTX_THES Package".
CONTAINS Query Operators
3-31
NEAR (;)
NEAR (;) Use the NEAR operator to return a score based on the proximity of two or more query terms. Oracle returns higher scores for terms closer together and lower scores for terms farther apart in a document. Note: The NEAR operator works with only word queries. You cannot use NEAR in ABOUT queries.
Specify the terms in the query separated by commas. The query terms can be single words or phrases. max_span
Optionally specify the size of the biggest clump. The default is 100. Oracle returns an error if you specify a number greater than 100. A clump is the smallest group of words in which all query terms occur. All clumps begin and end with a query term. For near queries with two terms, max_span is the maximum distance allowed between the two terms. For example, to query on dog and cat where dog is within 6 words of cat, issue the following query: ’near((dog, cat), 6)’
order
Specify TRUE for Oracle to search for terms in the order you specify. The default is FALSE. For example, to search for the words monday, tuesday, and wednesday in that order with a maximum clump size of 20, issue the following query: ’near((monday, tuesday, wednesday), 20, TRUE)’
3-32
Oracle Text Reference
NEAR (;)
Note: To specify order, you must always specify a number for the
max_span parameter. Oracle might return different scores for the same document when you use identical query expressions that have the order flag set differently. For example, Oracle might return different scores for the same document when you issue the following queries: ’near((dog, cat), 50, FALSE)’ ’near((dog, cat), 50, TRUE)’
NEAR Scoring The scoring for the NEAR operator combines frequency of the terms with proximity of terms. For each document that satisfies the query, Oracle returns a score between 1 and 100 that is proportional to the number of clumps in the document and inversely proportional to the average size of the clumps. This means many small clumps in a document result in higher scores, since small clumps imply closeness of terms. The number of terms in a query also affects score. Queries with many terms, such as seven, generally need fewer clumps in a document to score 100 than do queries with few terms, such as two. A clump is the smallest group of words in which all query terms occur. All clumps begin and end with a query term. You can define clump size with the max_span parameter as described in this section.
NEAR with Other Operators You can use the NEAR operator with other operators such as AND and OR. Scores are calculated in the regular way. For example, to find all documents that contain the terms tiger, lion, and cheetah where the terms lion and tiger are within 10 words of each other, issue the following query: ’near((lion, tiger), 10) AND cheetah’
The score returned for each document is the lower score of the near operator and the term cheetah.
CONTAINS Query Operators
3-33
NEAR (;)
You can also use the equivalence operator to substitute a single term in a near query: ’near((stock crash, Japan=Korea), 20)’
This query asks for all documents that contain the phrase stock crash within twenty words of Japan or Korea.
Backward Compatibility NEAR Syntax You can write near queries using the syntax of previous ConText releases. For example, to find all documents where lion occurs near tiger, you can write: ’lion near tiger’
or with the semi-colon as follows: ’lion;tiger’
This query is equivalent to the following query: ’near((lion, tiger), 100, FALSE)’
Note: Only the syntax of the NEAR operator is backward
compatible. In the example, the score returned is calculated using the clump method as described in this section.
Highlighting with the NEAR Operator When you use highlighting and your query contains the near operator, all occurrences of all terms in the query that satisfy the proximity requirements are highlighted. Highlighted terms can be single words or phrases. For example, assume a document contains the following text: Chocolate and vanilla are my favorite ice cream flavors. I like chocolate served in a waffle cone, and vanilla served in a cup with carmel syrup.
If the query is near((chocolate, vanilla)), 100, FALSE), the following is highlighted: <> and <> are my favorite ice cream flavors. I like <> served in a waffle cone, and <> served in a cup with carmel syrup.
3-34
Oracle Text Reference
NEAR (;)
However, if the query is near((chocolate, vanilla)), 4, FALSE), only the following is highlighted: <> and <> are my favorite ice cream flavors. I like chocolate served in a waffle cone, and vanilla served in a cup with carmel syrup. See Also: For more information about the procedures you can use for highlighting, see Chapter 8, "CTX_DOC Package".
Section Searching and NEAR You can use the NEAR operator with the WITHIN operator for section searching as follows: ’near((dog, cat), 10) WITHIN Headings’
When evaluating expressions such as these, Oracle looks for clumps that lie entirely within the given section. In this example, only those clumps that contain dog and cat that lie entirely within the section Headings are counted. That is, if the term dog lies within Headings and the term cat lies five words from dog, but outside of Headings, this pair of words does not satisfy the expression and is not counted.
CONTAINS Query Operators
3-35
NOT (~)
NOT (~) Use the NOT operator to search for documents that contain one query term and not another.
Syntax Syntax
Description
term1~term2
Returns documents that contain term1 and not term2.
term1 not term2
Examples To obtain the documents that contain the term animals but not dogs, use the following expression: ’animals ~ dogs’
Similarly, to obtain the documents that contain the term transportation but not automobiles or trains, use the following expression: ’transportation not (automobiles or trains)’
Note: The NOT operator does not affect the scoring produced by
the other logical operators.
3-36
Oracle Text Reference
OR (|)
OR (|) Use the OR operator to search for documents that contain at least one occurrence of any of the query terms.
Syntax Syntax
Description
term1|term2
Returns documents that contain term1 or term2. Returns the maximum score of its operands. At least one term must exist; higher score taken.
term1 or term2
Examples For example, to obtain the documents that contain the term cats or the term dogs, use either of the following expressions: ’cats | dogs’ ’cats OR dogs’
Scoring In an OR query, the score returned is the score for the highest query term. In the example, if the scores for cats and dogs is 30 and 40 within a document, the document scores 40.
CONTAINS Query Operators
3-37
Preferred Term (PT)
Preferred Term (PT) Use the preferred term operator (PT) to replace a term in a query with the preferred term that has been defined in a thesaurus for the term.
Syntax Syntax
Description
PT(term[,thes])
Replaces the specified word in a query with the preferred term for term.
term
Specify the operand for the preferred term operator. term is replaced by the preferred term defined for the term in the specified thesaurus. However, if no PT entries are defined for the term, term is not replaced in the query expression and term is the result of the expansion. You cannot specify expansion operators in the term argument. thes
Specify the name of the thesaurus used to return the expansions for the specified term. The thes argument is optional and has a default value of DEFAULT. As a result, a thesaurus named DEFAULT must exist in the thesaurus tables before using any of the thesaurus operators.
Examples The term automobile has a preferred term of car in a thesaurus. A PT query for automobile returns all documents that contain the word car. Documents that contain the word automobile are not returned.
Related Topics You can browse a thesaurus using procedures in the CTX_THES package. .
See Also: For more information on browsing the preferred terms
in your thesaurus, see CTX_THES.PT in Chapter 12, "CTX_THES Package".
3-38
Oracle Text Reference
Related Term (RT)
Related Term (RT) Use the related term operator (RT) to expand a query to include all related terms that have been defined in a thesaurus for the term.
Syntax Syntax
Description
RT(term[,thes])
Expands a query to include all the terms defined in the thesaurus as a related term for term.
term
Specify the operand for the related term operator. term is expanded to include term and all the related entries defined for term in thes. You cannot specify expansion operators in the term argument. thes
Specify the name of the thesaurus used to return the expansions for the specified term. The thes argument is optional and has a default value of DEFAULT. As a result, a thesaurus named DEFAULT must exist in the thesaurus tables before using any of the thesaurus operators.
Examples The term dog has a related term of wolf. A RT query for dog returns all documents that contain the word dog and wolf.
Related Topics You can browse a thesaurus using procedures in the CTX_THES package .
See Also: For more information on browsing the related terms in your thesaurus, see CTX_THES.RT in Chapter 12, "CTX_THES Package".
CONTAINS Query Operators
3-39
soundex (!)
soundex (!) Use the soundex (!) operator to expand queries to include words that have similar sounds; that is, words that sound like other words. This function allows comparison of words that are spelled differently, but sound alike in English.
Syntax Syntax
Description
!term
Expands a query to include all terms that sound the same as the specified term (English-language text only).
Examples SELECT ID, COMMENT FROM EMP_RESUME WHERE CONTAINS (COMMENT, ’!SMYTHE’) > 0 ; ID COMMENT -- -----------23 Smith is a hard worker who..
Language Soundex works best for languages that use a 7-bit character set, such as English. It can be used, with lesser effectiveness, for languages that use an 8-bit character set, such as many Western European languages. If you have base-letter conversion specified for a text column and the query expression contains a soundex operator, Oracle operates on the base-letter form of the query.
3-40
Oracle Text Reference
stem ($)
stem ($) Use the stem ($) operator to search for terms that have the same linguistic root as the query term. Stemming performance can be improved by using the index_stems attribute of the BASIC_LEXER preference. The Oracle Text stemmer, licensed from Xerox Corporation’s XSoft Division, supports the following languages: English, French, Spanish, Italian, German, and Dutch.
Syntax Syntax
Description
$term
Expands a query to include all terms having the same stem or root word as the specified term.
Examples Input
Expands To
$scream
scream screaming screamed
$distinguish
distinguish distinguished distinguishes
$guitars
guitars guitar
$commit
commit committed
$cat
cat cats
$sing
sang sung sing
Behavior with Stopwords If stem returns a word designated as a stopword, the stopword is not included in the query or highlighted by CTX_QUERY.HIGHLIGHT or CTX_QUERY.MARKUP.
CONTAINS Query Operators
3-41
Stored Query Expression (SQE)
Stored Query Expression (SQE) Use the SQE operator to call a stored query expression created with the CTX_ QUERY.STORE_SQE procedure. Stored query expressions can be used for creating predefined bins for organizing and categorizing documents or to perform iterative queries, in which an initial query is refined using one or more additional queries.
Syntax Syntax
Description
SQE(SQE_name)
Returns the results for the stored query expression SQE_name.
Examples To create an SQE named disasters, use CTX_QUERY.STORE_SQE as follows: begin ctx_query.store_sqe(’disasters’, ’hurricane or earthquake or blizzard’); end;
This stored query expression returns all documents that contain either hurricane, earthquake or blizzard. This SQE can then be called within a query expression as follows: SELECT SCORE(1), docid FROM news WHERE CONTAINS(resume, ’sqe(disasters)’, 1)> 0 ORDER BY SCORE(1);
3-42
Oracle Text Reference
SYNonym (SYN)
SYNonym (SYN) Use the synonym operator (SYN) to expand a query to include all the terms that have been defined in a thesaurus as synonyms for the specified term.
Syntax Syntax
Description
SYN(term[,thes])
Expands a query to include all the terms defined in the thesaurus as synonyms for term.
term
Specify the operand for the synonym operator. term is expanded to include term and all the synonyms defined for term in thes. You cannot specify expansion operators in the term argument. thes
Specify the name of the thesaurus used to return the expansions for the specified term. The thes argument is optional and has a default value of DEFAULT. A thesaurus named DEFAULT must exist in the thesaurus tables if you use this default value.
Examples The following query expression returns all documents that contain the term dog or any of the synonyms defined for dog in the DEFAULT thesaurus: ’SYN(dog)’
Compound Phrases in Synonym Operator Expansion of compound phrases for a term in a synonym query are returned as AND conjunctives. For example, the compound phrase temperature + measurement + instruments is defined in a thesaurus as a synonym for the term thermometer. In a synonym query for thermometer, the query is expanded to: {thermometer} OR ({temperature}&{measurement}&{instruments})
CONTAINS Query Operators
3-43
SYNonym (SYN)
Related Topics You can browse your thesaurus using procedures in the CTX_THES package. .
See Also: For more information on browsing the synonym terms
in your thesaurus, see CTX_THES.SYN in Chapter 12, "CTX_THES Package".
3-44
Oracle Text Reference
threshold (>)
threshold (>) Use the threshold operator (>) in two ways: ■
at the expression level
■
at the query term level
The threshold operator at the expression level eliminates documents in the result set that score below a threshold number. The threshold operator at the query term level selects a document based on how a term scores in the document.
Syntax Syntax
Description
expression>n
Returns only those documents in the result set that score above the threshold n.
term>n
Within an expression, returns documents that contain the query term with score of at least n.
Examples At the expression level, to search for documents that contain relational databases and to return only documents that score greater than 75, use the following expression: ’relational databases > 75’
At the query term level, to select documents that have at least a score of 30 for lion and contain tiger, use the following expression: ’(lion > 30) and tiger’
CONTAINS Query Operators
3-45
Translation Term (TR)
Translation Term (TR) Use the translation term operator (TR) to expand a query to include all defined foreign language equivalent terms.
Syntax Syntax
Description
TR(term[, lang, [thes]])
Expands term to include all the foreign equivalents that are defined for term.
term
Specify the operand for the translation term operator. term is expanded to include all the foreign language entries defined for term in thes.You cannot specify expansion operators in the term argument. lang
Optionally, specify which foreign language equivalents to return in the expansion. The language you specify must match the language as defined in thes. If you omit this parameter, the system expands to use all defined foreign language terms. thes
Optionally, specify the name of the thesaurus used to return the expansions for the specified term. The thes argument has a default value of DEFAULT. As a result, a thesaurus named DEFAULT must exist in the thesaurus tables before you can use any of the thesaurus operators. Note: If you specify thes, you must also specify lang.
Examples Consider a thesaurus MY_THES with the following entries for cat: cat SPANISH: gato FRENCH: chat
3-46
Oracle Text Reference
Translation Term (TR)
To search for all documents that contain cat and the spanish translation of cat, issue the following query: ’tr(cat, spanish, my_thes)’
This query expands to: ’{cat}|{gato}|{chat}’
Related Topics You can browse a thesaurus using procedures in the CTX_THES package. See Also: For more information on browsing the related terms in your thesaurus, see CTX_THES.TR in Chapter 12, "CTX_THES Package".
CONTAINS Query Operators
3-47
Translation Term Synonym (TRSYN)
Translation Term Synonym (TRSYN) Use the translation term operator (TR) to expand a query to include all the defined foreign equivalents of the query term, the synonyms of query term, and the foreign equivalents of the synonyms.
Syntax Syntax
Description
TRSYN(term[, lang, [thes]])
Expands term to include foreign equivalents of term, the synonyms of term, and the foreign equivalents of the synonyms.
term
Specify the operand for this operator. term is expanded to include all the foreign language entries and synonyms defined for term in thes.You cannot specify expansion operators in the term argument. lang
Optionally, specify which foreign language equivalents to return in the expansion. The language you specify must match the language as defined in thes. If you omit this parameter, the system expands to use all defined foreign language terms. thes
Optionally, specify the name of the thesaurus used to return the expansions for the specified term. The thes argument has a default value of DEFAULT. As a result, a thesaurus named DEFAULT must exist in the thesaurus tables before you can use any of the thesaurus operators. Note: If you specify thes, you must also specify lang.
Examples Consider a thesaurus MY_THES with the following entries for cat:
3-48
Oracle Text Reference
Translation Term Synonym (TRSYN)
cat SPANISH: gato FRENCH: chat SYN lion SPANISH: leon
To search for all documents that contain cat, the spanish equivalent of cat, the synonym of cat, and the spanish equivalent of lion, issue the following query: ’trsyn(cat, spanish, my_thes)’
This query expands to: ’{cat}|{gato}|{lion}|{leon}’
Related Topics You can browse a thesaurus using procedures in the CTX_THES package. See Also: For more information on browsing the translation and synonym terms in your thesaurus, see CTX_THES.TRSYN in Chapter 12, "CTX_THES Package".
CONTAINS Query Operators
3-49
Top Term (TT)
Top Term (TT) Use the top term operator (TT) to replace a term in a query with the top term that has been defined for the term in the standard hierarchy (BT, NT) in a thesaurus. Top terms in the generic (BTG, NTG), partitive (BTP, NTP), and instance (BTI, NTI) hierarchies are not returned.
Syntax Syntax
Description
TT(term[,thes])
Replaces the specified word in a query with the top term in the standard hierarchy (BT, NT) for term.
term
Specify the operand for the top term operator. term is replaced by the top term defined for the term in the specified thesaurus. However, if no TT entries are defined for term, term is not replaced in the query expression and term is the result of the expansion. You cannot specify expansion operators in the term argument. thes
Specify the name of the thesaurus used to return the expansions for the specified term. The thes argument is optional and has a default value of DEFAULT. A thesaurus named DEFAULT must exist in the thesaurus tables if you use this default value.
Examples The term dog has a top term of animal in the standard hierarchy of a thesaurus. A TT query for dog returns all documents that contain the phrase animal. Documents that contain the word dog are not returned.
Related Topics You can browse your thesaurus using procedures in the CTX_THES package.
3-50
Oracle Text Reference
Top Term (TT)
See Also: For more information on browsing the top terms in your thesaurus, see CTX_THES.TT in Chapter 12, "CTX_THES Package".
CONTAINS Query Operators
3-51
weight (*)
weight (*) The weight operator multiplies the score by the given factor, topping out at 100 when the score exceeds 100. For example, the query cat, dog*2 sums the score of cat with twice the score of dog, topping out at 100 when the score is greater than 100. In expressions that contain more than one query term, use the weight operator to adjust the relative scoring of the query terms. You can reduce the score of a query term by using the weight operator with a number less than 1; you can increase the score of a query term by using the weight operator with a number greater than 1 and less than 10. The weight operator is useful in accumulate, OR, or AND queries when the expression has more than one query term. With no weighting on individual terms, the score cannot tell you which of the query terms occurs the most. With term weighting, you can alter the scores of individual terms and hence make the overall document ranking reflect the terms you are interested in.
Syntax Syntax
Description
term*n
Returns documents that contain term. Calculates score by multiplying the raw score of term by n, where n is a number from 0.1 to 10.
Examples You have a collection of sports articles. You are interested in the articles about soccer, in particular Brazilian soccer. It turns out that a regular query on soccer or Brazil returns many high ranking articles on US soccer. To raise the ranking of the articles on Brazilian soccer, you can issue the following query: ’soccer or Brazil*3’
Table 3–1 illustrates how the weight operator can change the ranking of three hypothetical documents A, B, and C, which all contain information about soccer.
3-52
Oracle Text Reference
weight (*)
The columns in the table show the total score of four different query expressions on the three documents. Table 3–1 soccer
Brazil
soccer or Brazil
soccer or Brazil*3
A
20
10
20
30
B
10
30
30
90
C
50
20
50
60
The score in the third column containing the query soccer or Brazil is the score of the highest scoring term. The score in the fourth column containing the query soccer or Brazil*3 is the larger of the score of the first column soccer and of the score Brazil multiplied by three, Brazil*3. With the initial query of soccer or Brazil, the documents are ranked in the order C B A. With the query of soccer or Brazil*3, the documents are ranked B C A, which is the preferred ranking.
CONTAINS Query Operators
3-53
wildcards (% _)
wildcards (% _) Wildcard characters can be used in query expressions to expand word searches into pattern searches. The wildcard characters are: Wildcard Character Description %
The percent wildcard specifies that any characters can appear in multiple positions represented by the wildcard.
_
The underscore wildcard specifies a single position in which any character can occur.
Note: When a wildcard expression translates to a stopword, the
stopword is not included in the query and not highlighted by CTX_ DOC.HIGHLIGHT or CTX_DOC.MARKUP.
Right-Truncated Queries Right truncation involves placing the wildcard on the right-hand-side of the search string. For example, the following query expression finds all terms beginning with the pattern scal: ’scal%’
Left- and Double-Truncated Queries Left truncation involves placing the wildcard on the left-hand-side of the search string. To find words such as king, wing or sing, you can write your query as follows: ’_ing’
You can write this query more generally as: ’%ing’
3-54
Oracle Text Reference
wildcards (% _)
You can also combine left-truncated and right-truncated searches to create double-truncated searches. The following query finds all documents that contain words that contain the substring %benz% ’%benz%’
Improving Wildcard Query Performance You can improve wildcard query performance by adding a substring or prefix index. When your wildcard queries are left- and double-truncated, you can improve query performance by creating a substring index. Substring indexes improve query performance for all types of left-truncated wildcard searches such as %ed, _ing, or %benz%. When your wildcard queries are right-truncated, you can improve performance by creating a prefix index. A prefix index improves query performance for wildcard searches such as to%. See Also: For more information about creating substring and prefix indexes, see "BASIC_WORDLIST" in Chapter 2.
CONTAINS Query Operators
3-55
WITHIN
WITHIN You can use the WITHIN operator to narrow a query down into document sections. Document sections can be one of the following: ■
zone sections
■
field sections
■
attribute sections
■
special sections (sentence or paragraph)
Syntax Syntax
Description
expression WITHIN section
Searches for expression within the pre-defined zone, field, or attribute section. If section is a zone, expression can contain one or more
WITHIN operators (nested WITHIN) whose section is a zone or special section. If section is a field or attribute section, expression cannot contain another WITHIN operator. expression WITHIN SENTENCE
Searches for documents that contain expression within a sentence. Specify an AND or NOT query for expression. The expression can contain one or more WITHIN operators (nested WITHIN) whose section is a zone or special section.
expression WITHIN PARAGRAPH
Searches for documents that contain expression within a paragraph. Specify an AND or NOT query for expression. The expression can contain one or more WITHIN operators (nested WITHIN) whose section is a zone or special section.
3-56
Oracle Text Reference
WITHIN
Examples Querying Within Zone Sections To find all the documents that contain the term San Francisco within the section Headings, write your query as follows: ’San Francisco WITHIN Headings’
To find all the documents that contain the term sailing and contain the term San Francisco within the section Headings, write your query in one of two ways: ’(San Francisco WITHIN Headings) and sailing’ ’sailing and San Francisco WITHIN Headings’
Compound Expressions with WITHIN To find all documents that contain the terms dog and cat within the same section Headings, write your query as follows: ’(dog and cat) WITHIN Headings’
This query is logically different from: ’dog WITHIN Headings and cat WITHIN Headings’
This query finds all documents that contain dog and cat where the terms dog and cat are in Headings sections, regardless of whether they occur in the same Headings section or different sections. Near with WITHIN To find all documents in which dog is near cat within the section Headings, write your query as follows: ’dog near cat WITHIN Headings’
Note: The near operator has higher precedence than the WITHIN
operator so braces are not necessary in this example. This query is equivalent to (dog near cat) WITHIN Headings.
CONTAINS Query Operators
3-57
WITHIN
Nested WITHIN Queries You can nest the within operator to search zone sections within zone sections. For example, assume that a document set had the zone section AUTHOR nested within the zone BOOK section. You write a nested WITHIN query to find all occurrences of scott within the AUTHOR section of the BOOK section as follows: ’(scott WITHIN AUTHOR) WITHIN BOOK’
Querying Within Field Sections The syntax for querying within a field section is the same as querying within a zone section. The syntax for most of the examples given in the previous section, "Querying Within Zone Sections", apply to field sections. However, field sections behave differently from zone sections in terms of ■
Visibility: You can make text within a field section invisible.
■
Repeatability: WITHIN queries cannot distinguish repeated field sections.
■
Nestability: You cannot issue a nested WITHIN query with a field section.
The following sections describe these differences. Visible Flag in Field Sections When a field section is created with the visible flag set to FALSE in CTX_DDL.ADD_FIELD_SECTION, the text within a field section can only be queried using the WITHIN operator. For example, assume that TITLE is a field section defined with visible flag set to FALSE. Then the query dog without the WITHIN operator will not find a document containing: <TITLE>The dog I like my pet.
To find such a document, you can use the WITHIN operator as follows: ’dog WITHIN TITLE’
Alternatively, you can set the visible flag to TRUE when you define TITLE as a field section with CTX_DDL.ADD_FIELD_SECTION. See Also: For more information about creating field sections, see
ADD_FIELD_SECTION in Chapter 7, "CTX_DDL Package".
3-58
Oracle Text Reference
WITHIN
Repeated Field Sections WITHIN queries cannot distinguish repeated field sections in a document. For example, consider the document with the repeated section : Charles Dickens Martin Luther King
Assuming that is defined as a field section, a query such as (charles and martin) within author returns the document, even though these words occur in separate tags. To have WITHIN queries distinguish repeated sections, define the sections as zone sections. Nested Field Sections You cannot issue a nested WITHIN query with field sections. Doing so raises an error.
Querying Within Sentence or Paragraphs Querying within sentence or paragraph boundaries is useful to find combinations of words that occur in the same sentence or paragraph. To query sentence or paragraphs, you must first add the special section to your section group before you index. You do so with CTX_DDL.ADD_SPECIAL_SECTION. To find documents that contain dog and cat within the same sentence: ’(dog and cat) WITHIN SENTENCE’
To find documents that contain dog and cat within the same paragraph: ’(dog and cat) WITHIN PARAGRAPH’
To find documents that contain sentences with the word dog but not cat: ’(dog not cat) WITHIN SENTENCE’
Querying Within Attribute Sections You can query within attribute sections when you index with either XML_ SECTION_GROUP or AUTOMATIC_SECTION_GROUP as your section group type. Assume you have an XML document as follows: It was the best of times.
CONTAINS Query Operators
3-59
WITHIN
You can define the section title@book to be the attribute section title. You can do so with the CTX_DLL.ADD_ATTR_SECTION procedure or dynamically after indexing with ALTER INDEX. Note: When you use the AUTO_SECTION_GROUP to index XML documents, the system automatically creates attribute sections and names them in the form attribute@tag.
If you use the XML_SECTION_GROUP, you can name attribute sections anything with CTX_DDL.ADD_ATTR_SECTION. To search on Tale within the attribute section title, you issue the following query: ’Tale WITHIN title’
Constraints for Querying Attribute Sections The following constraints apply to querying within attribute sections: ■
Regular queries on attribute text do not hit the document unless qualified in a within clause. Assume you have an XML document as follows:
It was the best of times.
A query on Tale by itself does not produce a hit on the document unless qualified with WITHIN title@book. (This behavior is like field sections when you set the visible flag set to false.) ■
You cannot use attribute sections in a nested WITHIN query.
■
Phrases ignore attribute text. For example, if the original document looked like:
Now is the time for all good <word type="noun"> men to come to the aid.
Then this document would hit on the regular query good men, ignoring the intervening attribute text.
3-60
Oracle Text Reference
WITHIN
■
WITHIN queries can distinguish repeated attribute sections. This behavior is like zone sections but unlike field sections. For example, you have a document as follows:
It was the best of times.The sky broke dull and gray.
Assume that book is a zone section and book@author is an attribute section. Consider the query: ’(Tale and Bondage) WITHIN book@author’
This query does not hit the document, because tale and bondage are in different occurrences of the attribute section book@author.
Notes Section Names The WITHIN operator requires you to know the name of the section you search. A list of defined sections can be obtained using the CTX_SECTIONS or CTX_USER_ SECTIONS views.
Section Boundaries For special and zone sections, the terms of the query must be fully enclosed in a particular occurrence of the section for the document to satisfy the query. This is not a requirement for field sections. For example, consider the query where bold is a zone section: ’(dog and cat) WITHIN bold’
This query finds: dog cat
but it does not find: dogcat
This is because dog and cat must be in the same bold section. This behavior is especially useful for special sections, where ’(dog and cat) WITHIN sentence’
CONTAINS Query Operators
3-61
WITHIN
means find dog and cat within the same sentence. Field sections on the other hand are meant for non-repeating, embedded meta-data such as a title section. Queries within field sections cannot distinguish between occurrences. All occurrences of a field section are considered to be parts of a single section. For example, the query: (dog and cat) WITHIN title
can find a document like this: <TITLE>dog<TITLE>cat In return for this field section limitation and for the overlap and nesting limitations, field section queries are generally faster than zone section queries, especially if the section occurs in every document, or if the search term is common.
Limitations The WITHIN operator has the following limitations: ■
■
■
■
3-62
You cannot embed the WITHIN clause in a phrase. For example, you cannot write: term1 WITHIN section term2 You cannot combine WITHIN with expansion operators, such as $ ! and *. Since WITHIN is a reserved word, you must escape the word with braces to search on it. You cannot combine the WITHIN operator with ABOUT operator like ’ABOUT (xyz) WITHIN abc’.
Oracle Text Reference
4 Special Characters in Queries This chapter describes the special characters that can be used in Text queries. In addition, it provides a list of the words and characters that Oracle Text treats as reserved words and characters. The following topics are covered in this chapter: ■
Grouping Characters
■
Escape Characters
■
Reserved Words and Characters
Special Characters in Queries 4-1
Grouping Characters
Grouping Characters The grouping characters control operator precedence by grouping query terms and operators in a query expression. The grouping characters are: Grouping Character
Description
()
The parentheses characters serve to group terms and operators found between the characters
[]
The bracket characters serve to group terms and operators found between the characters; however, they prevent penetrations for the expansion operators (fuzzy, soundex, stem).
The beginning of a group of terms and operators is indicated by an open character from one of the sets of grouping characters. The ending of a group is indicated by the occurrence of the appropriate close character for the open character that started the group. Between the two characters, other groups may occur. For example, the open parenthesis indicates the beginning of a group. The first close parenthesis encountered is the end of the group. Any open parentheses encountered before the close parenthesis indicate nested groups.
4-2 Oracle Text Reference
Escape Characters
Escape Characters To query on words or symbols that have special meaning to query expressions such as and & or| accum, you must escape them. There are two ways to escape characters in a query expression: Escape Character
Description
{}
Use braces to escape a string of characters or symbols. Everything within a set of braces in considered part of the escape sequence. When you use braces to escape a single character, the escaped character becomes a separate token in the query.
\
Use the backslash character to escape a single character or symbol. Only the character immediately following the backslash is escaped.
In the following examples, an escape sequence is necessary because each expression contains a Text operator or reserved symbol: ’AT\&T’ ’{AT&T}’ ’high\-voltage’ ’{high-voltage}’
Note: If you use braces to escape an individual character within
a word, the character is escaped, but the word is broken into three tokens. For example, a query written as high{-}voltage searches for high voltage, with the space on either side of the hyphen.
Querying Escape Characters The open brace { signals the beginning of the escape sequence, and the closed brace } indicates the end of the sequence. Everything between the opening brace and the closing brace is part of the escaped query expression (including any open brace characters). To include the close brace character in an escaped query expression, use }}. To escape the backslash escape character, use \\.
Special Characters in Queries 4-3
Reserved Words and Characters
Reserved Words and Characters The following table lists the Oracle Text reserved words and characters that must be escaped when you want to search them in CONTAINS queries: Reserved Word Reserved Character
Operator
ABOUT
(none)
ABOUT
ACCUM
,
Accumulate
AND
&
And
BT
(none)
Broader Term
BTG
(none)
Broader Term Generic
BTI
(none)
Broader Term Instance
BTP
(none)
Broader Term Partitive
FUZZY
?
fuzzy
(none)
{}
escape characters (multiple)
(none)
\
escape character (single)
(none)
()
grouping characters
(none)
[]
grouping characters
HASPATH
(none)
HASPATH
INPATH
(none)
INPATH
MINUS
-
MINUS
NEAR
;
NEAR
NOT
~
NOT
NT
(none)
Narrower Term
NTG
(none)
Narrower Term Generic
NTI
(none)
Narrower Term Instance
NTP
(none)
Narrower Term Partitive
OR
|
OR
PT
(none)
Preferred Term
RT
(none)
Related Term
4-4 Oracle Text Reference
Reserved Words and Characters
Reserved Word Reserved Character
Operator
(none)
$
stem
(none)
!
soundex
SQE
(none)
Stored Query Expression
SYN
(none)
Synonym
(none)
>
threshold
TR
(none)
Translation Term
TRSYN
(none)
Translation Term Synonym
TT
(none)
Top Term
(none)
*
weight
(none)
%
wildcard character (multiple)
(none)
_
wildcard character (single)
WITHIN
(none)
WITHIN
Special Characters in Queries 4-5
Reserved Words and Characters
4-6 Oracle Text Reference
5 CTX_ADM Package This chapter provides reference information for using the CTX_ADM PL/SQL package to administer servers and the data dictionary. CTX_ADM contains the following stored procedures: Name
Description
RECOVER
Cleans up database objects for deleted Text tables.
SET_PARAMETER
Sets system-level defaults for index creation.
SHUTDOWN
Shuts down a single ctxsrv server or all currently running servers.
Note: Only the CTXSYS user can use the procedures in CTX_
ADM.
CTX_ADM Package 5-1
RECOVER
RECOVER The RECOVER procedure cleans up the Text data dictionary, deleting objects such as leftover preferences.
Syntax CTX_ADM.RECOVER;
Example begin ctx_adm.recover; end;
Notes You need not call CTX_ADM.RECOVER to perform system recovery if ctxsrv servers are running; any ctxsrv servers that are running automatically perform system recovery approximately every fifteen minutes. RECOVER provides a method for users to perform recovery on command.
5-2 Oracle Text Reference
SET_PARAMETER
SET_PARAMETER The SET_PARAMETER procedure sets system-level parameters for index creation.
Syntax CTX_ADM.SET_PARAMETER(param_name IN VARCHAR2, param_value IN VARCHAR2);
param_name
Specify the name of the parameter to set, which can be one of the following: ■
max_index_memory (maximum memory allowed for indexing)
■
default_index_memory (default memory allocated for indexing)
■
log_directory (directory for ctx_ouput files)
■
ctx_doc_key_type (default input key type for CTX_DOC procedures)
■
file_access_role
■
default_datastore (default datastore preference)
■
default_filter_file (default filter preference for data stored in files)
■
default_filter_text (default text filter preference)
default_section_html (default html section group preference)
■
default_section_xml (default xml section group preference)
■
default_section_text (default text section group preference)
■
default_lexer (default lexer preference)
■
default_wordlist (default wordlist preference)
■
default_stoplist (default stoplist preference)
■
default_storage (default storage preference)
■
default_ctxcat_lexer
■
default_ctxcat_stoplist
■
default_ctxcat_storage
CTX_ADM Package 5-3
SET_PARAMETER
■
default_ctxcat_wordlist
■
default_ctxrule_lexer
■
default_ctxrule_stoplist
■
default_ctxrule_storage
■
default_ctxrule_wordlist See Also: To learn more about the default values for these
parameters, see "System Parameters" in Chapter 2. param_value
Specify the value to assign to the parameter. For max_index_memory and default_index_memory, the value you specify must have the following syntax: number[M|G|K]
where M stands for megabytes, G stands for gigabytes, and K stands for kilobytes. For each of the other parameters, specify the name of a preference to use as the default for indexing.
Example begin ctx_adm.set_parameter(’default_lexer’, ’my_lexer’); end;
5-4 Oracle Text Reference
SHUTDOWN
SHUTDOWN The SHUTDOWN procedure shuts down the specified ctxsrv server.
Syntax CTX_ADM.SHUTDOWN(name IN VARCHAR2 DEFAULT ’ALL’, sdmode IN NUMBER DEFAULT NULL);
name
Specify the name (internal identifier) of the ctxsrv server to shutdown. Default is ALL. sdmode
Specify the shutdown mode for the server: ■
0 or NULL (Normal)
■
1 (Immediate)
■
2 (Abort)
Default is NULL.
Examples begin ctx_adm.shutdown(’DRSRV_3321’, 1); end;
Notes If you do not specify a ctxsrv server to shut down, CTX_ADM.SHUTDOWN shuts down all currently running ctxsrv servers. The names of all currently running ctxsrv servers can be obtained from the CTX_ SERVERS view.
Related Topics Thesaurus Loader (ctxload) in Chapter 14, "Executables"
CTX_ADM Package 5-5
SHUTDOWN
5-6 Oracle Text Reference
6 CTX_CLS Package This chapter provides reference information for using the CTX_CLS PL/SQL package to generate CTXRULE rules for a set of documents. Name
Description
TRAIN
Generates rules that define document categories. Output based on input training document set.
CTX_CLS Package 6-1
TRAIN
TRAIN Use this procedure to generate query rules that select document categories. You must supply a training set consisting of categorized documents. Each document must belong to one or more categories. This procedure generates the queries that define the categories and then writes the results to a table. This procedure requires that your document table have an associated populated context index. For best results, the index should be synchronized before running this procedure. You must also have a document table and a category table. The documents can be in any format supported by Oracle Text. For example your document and category tables can be defined as: create table trainingdoc( docid number primary key, text varchar2(4000)); create table category ( docid CONSTRAINT fk_id REFERENCES trainingdoc(docid), categoryid number);
Syntax CTX_CLS.TRAIN( index_name in varchar2, doc_id in varchar2, cattab in varchar2, catdocid in varchar2, catid in varchar2, restab in varchar2, rescatid in varchar2, resquery in varchar2, resconfid in varchar2, preference_name in varchar2 DEFAULT NULL );
index_name
Specify the name of the context index associated with your document training set.
6-2 Oracle Text Reference
TRAIN
doc_id
Specify the name of the document id column in the document table. This column must contain unique document ids. This column must a NUMBER. cattab
Specify the name of the category table. You must have SELECT privilege on this table. catdocid
Specify the name of the document id column in the category table. The document ids in this table must also exist in the document table. This column must a NUMBER. catid
Specify the name of the category ID column in the category table. This column must a NUMBER. restab
Specify the name of the result table. You must have INSERT privilege on this table. rescatid
Specify the name of the category ID column in the result table. This column must a NUMBER. resquery
Specify the name of the query column in the result table. This column must be VARACHAR2, CHAR CLOB, NVARCHAR2, or NCHAR. The queries generated in this column connects terms with AND or NOT operators, such as: ’T1 & T2 ~ T3’ Terms can also be theme tokens and be connected with the ABOUT operator, such as: ’about(T1) & about(T2) ~ about(T3)’ resconfid
Specify the name of the confidence column in result table. This column contains the estimated probability from training data that a document is relevant if that document satisfies the query.
CTX_CLS Package 6-3
TRAIN
preference_name
Specify the name of the preference. For attributes, see "Classifier Types" in Chapter 2, "Indexing".
Example The CTX_CLS.TRAIN procedure requires that your document table have an associated context index. For example your document table can be defined and populated as follows: set serverout on exec dbms_output.put_line(TO_CHAR(SYSDATE,'MM-DD-YYYY HH24:MI:SS')||':start'); create insert insert insert insert insert insert
table doc (id number primary key, text varchar2(2000)); into doc values(1,'In 2002, Europe changed its currency to the EURO'); into doc values(2,'The NASDAQ rose today in heavy stock trading.'); into doc values(3,'The EURO lost 1 cent today against the US dollar'); into doc values(4,'Salt Lake City hosts the winter Olympic games'); into doc values(5,'ESPN broadcasts World Cup Soccer games.'); into doc values(6,'Soccer champion Diego Maradona retires.');
Create the CONTEXT index: exec ctx_ddl.drop_preference(’my_lexer’); exec ctx_ddl.create_preference(’my_lexer’,’BASIC_LEXER’); exec ctx_ddl.set_attribute(’my_lexer’,’INDEX_THEMES’,’NO’); exec ctx_ddl.set_attribute(’my_lexer’,’INDEX_TEXT’,’YES’); CREATE INDEX docx on doc(text) INDEXTYPE IS ctxsys.context PARAMETERS(’LEXER my_lexer’);
You must also create a category table as follows to relate the documents to categories: create insert insert insert insert insert insert
table category (doc_id number, cat_id number, cat_name varchar2(100)); into category values (1,1,'Finance'); into category values (2,1,'Finance'); into category values (3,1,'Finance'); into category values (4,2,'Sports'); into category values (5,2,'Sports'); into category values (6,2,'Sports’);
CTX_CLS.TRAIN writes to result table that can be defined like: create table restab (cat_id number, query VARCHAR2(400), conf number);
6-4 Oracle Text Reference
TRAIN
To populate the result table for later CTXRULE indexing, set your RULE_ CLASSIFIER preference attributes and call CTX_CLS.TRAIN as follows: exec exec exec exec exec exec exec exec
exec ctx_cls.train('docx','id','category','doc_id','cat_id','restab','cat_ id','query', 'conf','my_classifier'); exec ctx_output.end_log(); create table catname as (select distinct cat_id, cat_name from category); set termout on select rpad(id,6) doc_id , rpad(cat_name,8) cat_name, rpad(text,50) text from doc, category where id=doc_id; select rpad(a.cat_id,8) cat_id, rpad(cat_name,8) cat_name, rpad(query,30) rule from restab a, catname b where b.cat_id=a.cat_id;
TEXT -------------------------------------------------In 2002, Europe changed its currency to the EURO The NASDAQ rose today in heavy stock trading. The EURO lost 1 cent today against the US dollar Salt Lake City hosts the winter Olympic games ESPN broadcasts World Cup Soccer games. Soccer champion Diego Maradona retires.
6 rows selected.
The generated rules for the categories of FINANCE and SPORTS are as follows: CAT_ID -------1 1
CAT_NAME -------Finance Finance
RULE -----------------------------EURO TODAY ~ EURO
CTX_CLS Package 6-5
TRAIN
2 2
6-6 Oracle Text Reference
Sports Sports
GAMES SOCCER ~ GAMES
7 CTX_DDL Package This chapter provides reference information for using the CTX_DDL PL/SQL package to create and manage the preferences, section groups, and stoplists required for Text indexes. CTX_DDL contains the following stored procedures and functions: Name
Description
ADD_ATTR_SECTION
Adds an attribute section to a section group.
ADD_FIELD_SECTION
Creates a filed section and assigns it to the specified section group
ADD_INDEX
Adds an index to a catalog index preference.
ADD_SPECIAL_SECTION
Adds a special section to a section group.
ADD_STOPCLASS
Adds a stopclass to a stoplist.
ADD_STOP_SECTION
Adds a stop section to an automatic section group.
ADD_STOPTHEME
Adds a stoptheme to a stoplist.
ADD_STOPWORD
Adds a stopword to a stoplist.
ADD_SUB_LEXER
Adds a sub-lexer to a multi-lexer preference.
ADD_ZONE_SECTION
Creates a zone section and adds it to the specified section group.
CREATE_INDEX_SET
Creates an index set for CTXCAT index types.
CREATE_POLICY
Create a policy to use with ORA:CONTAINS().
CREATE_PREFERENCE
Creates a preference in the Text data dictionary
CREATE_SECTION_GROUP
Creates a section group in the Text data dictionary
CTX_DDL Package 7-1
Name
Description
CREATE_STOPLIST
Creates a stoplist.
DROP_INDEX_SET
Drops an index set.
DROP_POLICY
Drops a policy.
DROP_PREFERENCE
Deletes a preference from the Text data dictionary
DROP_SECTION_GROUP
Deletes a section group from the Text data dictionary
DROP_STOPLIST
Drops a stoplist.
OPTIMIZE_INDEX
Optimize the index.
REMOVE_INDEX
Removes an index from a CTXCAT index preference.
REMOVE_SECTION
Deletes a section from a section group
REMOVE_STOPCLASS
Deletes a stopclass from a section group.
REMOVE_STOPTHEME
Deletes a stoptheme from a stoplist.
REMOVE_STOPWORD
Deletes a stopword from a section group.
SET_ATTRIBUTE
Sets a preference attribute.
SYNC_INDEX
Synchronize index.
UNSET_ATTRIBUTE
Removes a set attribute from a preference.
UPDATE_POLICY
Updates a policy.
7-2 Oracle Text Reference
ADD_ATTR_SECTION
ADD_ATTR_SECTION Adds an attribute section to an XML section group. This procedure is useful for defining attributes in XML documents as sections. This allows you to search XML attribute text with the WITHIN operator. Note: When you use AUTO_SECTION_GROUP, attribute sections are created automatically. Attribute sections created automatically are named in the form tag@attribute.
Syntax CTX_DDL.ADD_ATTR_SECTION( group_name in varchar2, section_name in varchar2, tag in varchar2);
group_name
Specify the name of the XML section group. You can add attribute sections only to XML section groups. section_name
Specify the name of the attribute section. This is the name used for WITHIN queries on the attribute text. The section name you specify cannot contain the colon (:), comma (,), or dot (.) characters. The section name must also be unique within group_name. Section names are case-insensitive. Attribute section names can be no more than 64 bytes long. tag
Specify the name of the attribute in tag@attr form. This parameter is case-sensitive.
Examples Consider an XML file that defines the BOOK tag with a TITLE attribute as follows: It was the best of times.
CTX_DDL Package 7-3
ADD_ATTR_SECTION
To define the title attribute as an attribute section, create an XML_SECTION_GROUP and define the attribute section as follows: begin ctx_ddl_create_section_group(’myxmlgroup’, ’XML_SECTION_GROUP’); ctx_ddl.add_attr_section(’myxmlgroup’, ’booktitle’, ’BOOK@TITLE’); end;
When you define the TITLE attribute section as such and index the document set, you can query the XML attribute text as follows: ’Cities within booktitle’
7-4 Oracle Text Reference
ADD_FIELD_SECTION
ADD_FIELD_SECTION Creates a field section and adds the section to an existing section group. This enables field section searching with the WITHIN operator. Field sections are delimited by start and end tags. By default, the text within field sections are indexed as a sub-document separate from the rest of the document. Unlike zone sections, field sections cannot nest or overlap. As such, field sections are best suited for non-repeating, non-overlapping sections such as TITLE and AUTHOR markup in email- or news-type documents. Because of how field sections are indexed, WITHIN queries on field sections are usually faster than WITHIN queries on zone sections.
Syntax CTX_DDL.ADD_FIELD_SECTION( group_name in varchar2, section_name in varchar2, tag in varchar2, visible in boolean default FALSE );
group_name
Specify the name of the section group to which section_name is added. You can add up to 64 field sections to a single section group. Within the same group, section zone names and section field names cannot be the same. section_name
Specify the name of the section to add to the group_name. You use this name to identify the section in queries. Avoid using names that contain non-alphanumeric characters such as _, since these characters must be escaped in queries. Section names are case-insensitive. Within the same group, zone section names and field section names cannot be the same. The terms Paragraph and Sentence are reserved for special sections. Section names need not be unique across tags. You can assign the same section name to more than one tag, making details transparent to searches.
CTX_DDL Package 7-5
ADD_FIELD_SECTION
tag
Specify the tag which marks the start of a section. For example, if the tag is
, specify H1. The start tag you specify must be unique within a section group. If group_name is an HTML_SECTION_GROUP, you can create field sections for the META tag’s NAME/CONTENT attribute pairs. To do so, specify tag as meta@namevalue where namevalue is the value of the NAME attribute whose CONTENT attribute is to be indexed as a section. Refer to the example. Oracle knows what the end tags look like from the group_type parameter you specify when you create the section group. visible
Specify TRUE to make the text visible within rest of document. By default the visible flag is FALSE. This means that Oracle indexes the text within field sections as a sub-document separate from the rest of the document. However, you can set the visible flag to TRUE if you want text within the field section to be indexed as part of the enclosing document.
Use this group type for indexing HTML documents and for defining sections in HTML documents.
XML_SECTION_GROUP
Use this group type for indexing XML documents and for defining sections in XML documents.
CTX_DDL Package 7-33
CREATE_SECTION_GROUP
Section Group Preference
Description
AUTO_SECTION_GROUP
Use this group type to automatically create a zone section for each start-tag/end-tag pair in an XML document. The section names derived from XML tags are case sensitive as in XML. Attribute sections are created automatically for XML tags that have attributes. Attribute sections are named in the form attribute@tag. Stop sections, empty tags, processing instructions, and comments are not indexed. The following limitations apply to automatic section groups: ■
■
■
PATH_SECTION_GROUP
You cannot add zone, field, or special sections to an automatic section group. Automatic sectioning does not index XML document types (root elements.) However, you can define stop sections with document type. The length of the indexed tags, including prefix and namespace, cannot exceed 64 characters. Tags longer than this are not indexed.
Use this group type to index XML documents. Behaves like the AUTO_SECTION_GROUP. The difference is that with this section group you can do path searching with the INPATH and HASPATH operators. Queries are also case-sensitive for tag and attribute names.
NEWS_SECTION_GROUP
Use this group for defining sections in newsgroup formatted documents according to RFC 1036.
Example The following command creates a section group called htmgroup with the HTML group type. begin ctx_ddl.create_section_group(’htmgroup’, ’HTML_SECTION_GROUP’); end;
The following command creates a section group called auto with the AUTO_ SECTION_GROUP group type to be used to automatically index tags in XML documents.
7-34
Oracle Text Reference
CREATE_SECTION_GROUP
begin ctx_ddl.create_section_group(’auto’, ’AUTO_SECTION_GROUP’); end;
Related Topics WITHIN operator in Chapter 3, "CONTAINS Query Operators". "Section Group Types" in Chapter 2, "Indexing". ADD_ZONE_SECTION ADD_FIELD_SECTION ADD_SPECIAL_SECTION REMOVE_SECTION DROP_SECTION_GROUP
CTX_DDL Package 7-35
CREATE_STOPLIST
CREATE_STOPLIST Use this procedure to create a new, empty stoplist. Stoplists can contain words or themes that are not to be indexed. You can also create multi-language stoplists to hold language-specific stopwords. A multi-language stoplist is useful when you index a table that contains documents in different languages, such as English, German, and Japanese. When you do so, you text table must contain a language column. You can add either stopwords, stopclasses, or stopthemes to a stoplist using ADD_ STOPWORD, ADD_STOPCLASS, or ADD_STOPTHEME. You can specify a stoplist in the parameter string of CREATE INDEX or ALTER INDEX to override the default stoplist CTXSYS.DEFAULT_STOPLIST.
Syntax CTX_DDL.CREATE_STOPLIST( stoplist_name IN VARCHAR2, stoplist_type IN VARCHAR2 DEFAULT ’BASIC_STOPLIST’);
stoplist_name
Specify the name of the stoplist to be created. stoplist_type
Specify BASIC_STOPLIST to create a stoplist for a single language. This is the default. Specify MULTI_STOPLIST to create a stoplist with language-specific stopwords. At indexing time, the language column of each document is examined, and only the stopwords for that language are eliminated. At query time, the session language setting determines the active stopwords, like it determines the active lexer when using the multi-lexer. Note: When indexing a multi-language table with a
multi-language stoplist, your table must have a language column.
7-36
Oracle Text Reference
CREATE_STOPLIST
Example Single Language Stoplist The following code creates a stoplist called mystop: begin ctx_ddl.create_stoplist(’mystop’, ’BASIC_STOPLIST’); end;
Multi-Language Stoplist The following code creates a multi-language stoplist called multistop and then adds tow language-specific stopwords: begin ctx_ddl.create_stoplist(’multistop’, ’MULTI_STOPLIST’); ctx_ddl.add_stopword(’mystop’, ’Die’,’german’); ctx_ddl.add_stopword(’mystop’, ’Or’,’english’); end;
Related Topics ADD_STOPWORD ADD_STOPCLASS ADD_STOPTHEME DROP_STOPLIST CREATE INDEX in Chapter 1, "SQL Statements and Operators". ALTER INDEX in Chapter 1, "SQL Statements and Operators". Appendix D, "Supplied Stoplists"
CTX_DDL Package 7-37
DROP_INDEX_SET
DROP_INDEX_SET Drops an index set.
Syntax CTX_DDL.DROP_INDEX_SET(set_name in varchar2);
set_name
Specify the name of the index set to drop.
7-38
Oracle Text Reference
DROP_POLICY
DROP_POLICY Drops a policy create with CREATE_POLICY.
Syntax CTX_DDL.DROP_POLICY(policy_name IN VARCHAR2);
policy_name
Specify the name of the policy to drop.
CTX_DDL Package 7-39
DROP_PREFERENCE
DROP_PREFERENCE The DROP_PREFERENCE procedure deletes the specified preference from the Text data dictionary. Dropping a preference does not affect indexes that have already been created using that preference.
Syntax CTX_DDL.DROP_PREFERENCE(preference_name IN VARCHAR2);
preference_name
Specify the name of the preference to be dropped.
Example The following code drops the preference my_lexer. begin ctx_ddl.drop_preference(’my_lexer’); end;
Related Topics CREATE_PREFERENCE
7-40
Oracle Text Reference
DROP_SECTION_GROUP
DROP_SECTION_GROUP The DROP_SECTION_GROUP procedure deletes the specified section group, as well as all the sections in the group, from the Text data dictionary.
Syntax CTX_DDL.DROP_SECTION_GROUP(group_name IN VARCHAR2);
group_name
Specify the name of the section group to delete.
Examples The following code drops the section group htmgroup and all its sections: begin ctx_ddl.drop_section_group(’htmgroup’); end;
Related Topics CREATE_SECTION_GROUP
CTX_DDL Package 7-41
DROP_STOPLIST
DROP_STOPLIST Drops a stoplist from the Text data dictionary. When you drop a stoplist, you must re-create or rebuild the index for the change to take effect.
Syntax CTX_DDL.DROP_STOPLIST(stoplist_name in varchar2);
stoplist_name
Specify the name of the stoplist.
Example The following code drops the stoplist mystop: begin ctx_ddl.drop_stoplist(’mystop’); end;
Related Topics CREATE_STOPLIST
7-42
Oracle Text Reference
OPTIMIZE_INDEX
OPTIMIZE_INDEX Use this procedure to optimize the index. You optimize your index after you synchronize it. Optimizing the index removes old data and minimizes index fragmentation. Optimizing the index can improve query response time. You can optimize in fast, full, or token mode. In token mode, you specify a specific token to be optimized. You can use token mode to optimize index tokens that are frequently searched, without spending time on optimizing tokens that are rarely referenced. An optimized token can improve query response time for that token. Note: Optimizing an index can result in better response time only
if you insert, delete, or update documents in your base table after your initial indexing operation. Using this procedure to optimize your index is recommended over using the ALTER INDEX statement.
Limitations The CTX_DDL.OPTIMIZE_INDEX procedure optimizes at most 16,000 document ids. To continue optimizing more document ids, re-run this procedure.
Syntax CTX_DDL.OPTIMIZE_INDEX( idx_name IN VARCHAR2, optlevel IN VARCHAR2, maxtime IN NUMBER DEFAULT NULL, token IN VARCHAR2 DEFAULT NULL, part_name IN VARCHAR2 DEFAULT NULL, parallel_degree IN VARCHAR2); );
idx_name
Specify the name of the index. If you do not specify an index name, Oracle chooses a single index to optimize.
CTX_DDL Package 7-43
OPTIMIZE_INDEX
optlevel
Specify optimization level as a string. You can specify one of the following methods for optimization: Value
Description
FAST or
This method compacts fragmented rows. However, old data is not removed.
CTX_DDL.OPTLEVEL_FAST FULL or
CTX_DDL.OPTLEVEL_FULL TOKEN
In this mode you can optimize the entire index or a portion of the index. This method compacts rows and removes old data (deleted rows). Optimizing in full mode runs even when there are no deleted rows. This method lets you specify a specific token to be optimized. Oracle does a FULL optimization on the token you specify with token. Use this method to optimize those tokens that are searched frequently. Token optimization is not supported for CTXRULE indexes.
maxtime
Specify maximum optimization time, in minutes, for FULL optimize. When you specify the symbol CTX_DDL.MAXTIME_UNLIMITED (or pass in NULL), the entire index is optimized. This is the default. token
Specify the token to be optimized. part_name
Specify the name of the index partition to optimize. parallel_degree
Specify the parallel degree as a number for parallel optimization. The actual parallel degree depends on your resources.
7-44
Oracle Text Reference
OPTIMIZE_INDEX
Examples The following two examples optimize the index for fast optimization. begin ctx_ddl.optimize_index(’myidx’,’FAST’); end; begin ctx_ddl.optimize_index(’myidx’,CTX_DDL.OPTLEVEL_FAST); end;
The following example optimizes the index token Oracle: begin ctx_ddl.optimize_index(’myidx’,’token’, TOKEN=>’Oracle’); end;
Related Topics ALTER INDEX in Chapter 1, "SQL Statements and Operators".
CTX_DDL Package 7-45
REMOVE_INDEX
REMOVE_INDEX Removes the index with the specified column list from a CTXCAT index set preference. Note: This procedure does not remove a CTXCAT sub-index from
the existing index. To do so, you must drop your index and re-index with the modified index set preference.
Syntax CTX_DDL.REMOVE_INDEX( set_name in varchar2, column_list in varchar2 language in varchar2 default NULL );
set_name
Specify the name of the index set column_list
Specify the name of the column list to remove.
7-46
Oracle Text Reference
REMOVE_SECTION
REMOVE_SECTION The REMOVE_SECTION procedure removes the specified section from the specified section group. You can specify the section by name or by id. You can view section id with the CTX_USER_SECTIONS view.
Syntax 1 Use the following syntax to remove a section by section name: CTX_DDL.REMOVE_SECTION( group_name in varchar2, section_name in varchar2 );
group_name
Specify the name of the section group from which to delete section_name. section_name
Specify the name of the section to delete from group_name.
Syntax 2 Use the following syntax to remove a section by section id: CTX_DDL.REMOVE_SECTION( group_name in varchar2, section_id in number );
group_name
Specify the name of the section group from which to delete section_id. section_id
Specify the section id of the section to delete from group_name.
Examples The following code drops a section called Title from the htmgroup: begin ctx_ddl.remove_section(’htmgroup’, ’Title’); end;
CTX_DDL Package 7-47
REMOVE_SECTION
Related Topics ADD_FIELD_SECTION ADD_SPECIAL_SECTION ADD_ZONE_SECTION
7-48
Oracle Text Reference
REMOVE_STOPCLASS
REMOVE_STOPCLASS Removes a stopclass from a stoplist.
Syntax CTX_DDL.REMOVE_STOPCLASS( stoplist_name in varchar2, stopclass in varchar2 );
stoplist_name
Specify the name of the stoplist. stopclass
Specify the name of the stopclass to be removed.
Example The following code removes the stopclass NUMBERS from the stoplist mystop. begin ctx_ddl.remove_stopclass(’mystop’, ’NUMBERS’); end;
Related Topics ADD_STOPCLASS
CTX_DDL Package 7-49
REMOVE_STOPTHEME
REMOVE_STOPTHEME Removes a stoptheme from a stoplist.
Syntax CTX_DDL.REMOVE_STOPTHEME( stoplist_name in varchar2, stoptheme in varchar2 );
stoplist_name
Specify the name of the stoplist. stoptheme
Specify the stoptheme to be removed from stoplist_name.
Example The following code removes the stoptheme banking from the stoplist mystop: begin ctx_ddl.remove_stoptheme(’mystop’, ’banking’); end;
Related Topics ADD_STOPTHEME
7-50
Oracle Text Reference
REMOVE_STOPWORD
REMOVE_STOPWORD Removes a stopword from a stoplist. To have the removal of a stopword be reflected in the index, you must rebuild your index.
Syntax CTX_DDL.REMOVE_STOPWORD( stoplist_name in varchar2, stopword in varchar2, language in varchar2 default NULL );
stoplist_name
Specify the name of the stoplist. stopword
Specify the stopword to be removed from stoplist_name. language
Specify the language of stopword to remove when the stoplist you specify with stoplist_name is of type MULTI_STOPLIST. You must specify the Globalization Support name or abbreviation of an Oracle-supported language. You can also remove ALL stopwords.
Example The following code removes a stopword because from the stoplist mystop: begin ctx_ddl.remove_stopword(’mystop’,’because’); end;
Related Topics ADD_STOPWORD
CTX_DDL Package 7-51
SET_ATTRIBUTE
SET_ATTRIBUTE Sets a preference attribute. You use this procedure after you have created a preference with CTX_DDL.CREATE_PREFERENCE.
Syntax ctx_ddl.set_attribute(preference_name in varchar2, attribute_name in varchar2, attribute_value in varchar2);
preference_name
Specify the name of the preference. attribute_name
Specify the name of the attribute. attribute_value
Specify the attribute value. You can specify boolean values as TRUE or FALSE, T or F, YES or NO, Y or N, ON or OFF, or 1 or 0.
Example Specifying File Data Storage The following example creates a data storage preference called filepref that tells the system that the files to be indexed are stored in the operating system. The example then uses CTX_DDL.SET_ATTRIBUTE to set the PATH attribute to the directory /docs. begin ctx_ddl.create_preference(’filepref’, ’FILE_DATASTORE’); ctx_ddl.set_attribute(’filepref’, ’PATH’, ’/docs’); end;
See Also: For more information about data storage, see
"Datastore Types" in Chapter 2, "Indexing". For more examples of using SET_ATTRIBUTE, see CREATE_ PREFERENCE.
7-52
Oracle Text Reference
SYNC_INDEX
SYNC_INDEX Synchronizes the index to process inserts, updates, and deletes to the base table.
Syntax ctx_ddl.sync_index( idx_name IN VARCHAR2 DEFAULT NULL memory IN VARCHAR2 DEFAULT NULL, part_name IN VARCHAR2 DEFAULT NULL parallel_degree IN NUMBER DEFAULT 1);
idx_name
Specify the name of the index. memory
Specify the runtime memory to use for synchronization. This value overrides the DEFAULT_INDEX_MEMORY system parameter. The memory parameter specifies the amount of memory Oracle uses for the synchronization operation before flushing the index to disk. Specifying a large amount of memory: ■
■
improves indexing performance because there is less I/O improves query performance and maintenance because there is less fragmentation
Specifying smaller amounts of memory increases disk I/O and index fragmentation, but might be useful when runtime memory is scarce. part_name
Specify the name of the index partition to synchronize. parallel_degree
Specify the degree to run parallel synchronize. A number greater than 1 turns on parallel synchronize. The actual degree of parallelism might be smaller depending on your resources.
Example The following example synchronizes the index myindex with 2 megabytes of memory:
CTX_DDL Package 7-53
SYNC_INDEX
begin ctx_ddl.sync_index(’myindex’, ’2M’); end;
The following example synchronizes the part1 index partition with 2 megabytes of memory: begin ctx_ddl.sync_index(’myindex’, ’2M’, ’part1’); end;
Related Topics ALTER INDEX in Chapter 1, "SQL Statements and Operators".
7-54
Oracle Text Reference
UNSET_ATTRIBUTE
UNSET_ATTRIBUTE Removes a set attribute from a preference.
Specify the name of the preference. attribute_name
Specify the name of the attribute.
Example Enabling/Disabling Alternate Spelling The following example shows how you can enable alternate spelling for German and disable alternate spelling with CTX_DDL.UNSET_ATTRIBUTE: begin ctx_ddl.create_preference(’GERMAN_LEX’, ’BASIC_LEXER’); ctx_ddl.set_attribute(’GERMAN_LEX’, ’ALTERNATE_SPELLING’, ’GERMAN’); end;
To disable alternate spelling, use the CTX_DDL.UNSET_ATTRIBUTE procedure as follows: begin ctx_ddl.unset_attribute(’GERMAN_LEX’, ’ALTERNATE_SPELLING’); end;
Related Topics SET_ATTRIBUTE
CTX_DDL Package 7-55
UPDATE_POLICY
UPDATE_POLICY Updates a policy created with CREATE_POLICY. Replaces the preferences of the policy. Null arguments are not replaced.
Specify the filter preference to use. section_group
Specify the section group to use. lexer
Specify the lexer preference to use. stoplist
specify the stoplist to use. wordlist
Specify the wordlist to use.
7-56
Oracle Text Reference
NULL, NULL, NULL, NULL, NULL, NULL);
8 CTX_DOC Package This chapter describes the CTX_DOC PL/SQL package for requesting document services. Note: You can use this package only when your index type is
CONTEXT. This package does not support the CTXCAT index type.
The CTX_DOC package includes the following procedures and functions: Name
Description
FILTER
Generates a plain text or HTML version of a document
GIST
Generates a Gist or theme summaries for a document
HIGHLIGHT
Generates plain text or HTML highlighting offset information for a document
IFILTER
Generates a plain text version of binary data. Can be called from a
USER_DATASTORE procedure. MARKUP
Generates a plain text or HTML version of a document with query terms highlighted
PKENCODE
Encodes a composite textkey string (value) for use in other CTX_DOC procedures
SET_KEY_TYPE
Sets CTX_DOC procedures to accept rowid or primary key document identifiers.
THEMES
Generates a list of themes for a document
TOKENS
Generates all index tokens for a document.
CTX_DOC Package 8-1
FILTER
FILTER Use the CTX_DOC.FILTER procedure to generate either a plain text or HTML version of a document. You can store the rendered document in either a result table or in memory. This procedure is generally called after a query, from which you identify the document to be filtered.
Syntax 1:In-memory Result Storage CTX_DOC.FILTER( index_name IN VARCHAR2, textkey IN VARCHAR2, restab IN OUT NOCOPY CLOB, plaintext IN BOOLEAN DEFAULT FALSE);
VARCHAR2, VARCHAR2, VARCHAR2, NUMBER DEFAULT 0, BOOLEAN DEFAULT FALSE);
index_name
Specify the name of the index associated with the text column containing the document identified by textkey. textkey
Specify the unique identifier (usually the primary key) for the document. The textkey parameter can be one of the following: ■
■
■
a single column primary key value encoded specification for a composite (multiple column) primary key. Use CTX_ DOC.PKENCODE. the rowid of the row containing the document
You toggle between primary key and rowid identification using CTX_DOC.SET_ KEY_TYPE.
8-2 Oracle Text Reference
FILTER
restab
You can specify that this procedure store the marked-up text to either a table or to an in-memory CLOB. To store results to a table specify the name of the table. The result table must exist before you make this call. See Also: "Filter Table" in Appendix A, "Result Tables" for more information about the structure of the filter result table.
To store results in memory, specify the name of the CLOB locator. If restab is NULL, a temporary CLOB is allocated and returned. You must de-allocate the locator after using it. If restab is not NULL, the CLOB is truncated before the operation. query_id
Specify an identifier to use to identify the row inserted into restab. When query_id is not specified or set to NULL, it defaults to 0. You must manually truncate the table specified in restab. plaintext
Specify TRUE to generate a plaintext version of the document. Specify FALSE to generate an HTML version of the document if you are using the INSO filter or indexing HTML documents.
Example In-Memory Filter The following code shows how to filter a document to HTML in memory. declare mklob clob; amt number := 40; line varchar2(80); begin ctx_doc.filter(’myindex’,’1’, mklob, FALSE); -- mklob is NULL when passed-in, so ctx-doc.filter will allocate a temporary -- CLOB for us and place the results there. dbms_lob.read(mklob, amt, 1, line); dbms_output.put_line(’FIRST 40 CHARS ARE:’||line); -- have to de-allocate the temp lob
CTX_DOC Package 8-3
FILTER
dbms_lob.freetemporary(mklob); end;
Create the filter result table to store the filtered document as follows: create table filtertab (query_id number, document clob);
To obtain a plaintext version of document with textkey 20, issue the following statement: begin ctx_doc.filter(’newsindex’, ’20’, ’filtertab’, ’0’, TRUE); end;
8-4 Oracle Text Reference
GIST
GIST Use the CTX_DOC.GIST procedure to generate a gist and theme summaries for a document. You can generate paragraph-level or sentence-level gists or theme summaries.
Syntax 1: In-Memory Storage CTX_DOC.GIST( index_name IN VARCHAR2, textkey IN VARCHAR2, restab IN OUT CLOB, glevel IN VARCHAR2 DEFAULT ’P’, pov IN VARCHAR2 DEFAULT ’GENERIC’, numParagraphs IN NUMBER DEFAULT NULL, maxPercent IN NUMBER DEFAULT NULL, num_themes IN NUMBER DEFAULT 50);
IN VARCHAR2, IN VARCHAR2, IN VARCHAR2, IN NUMBER DEFAULT 0, IN VARCHAR2 DEFAULT ’P’, IN VARCHAR2 DEFAULT NULL, IN NUMBER DEFAULT NULL, IN NUMBER DEFAULT NULL, IN NUMBER DEFAULT 50);
index_name
Specify the name of the index associated with the text column containing the document identified by textkey. textkey
Specify the unique identifier (usually the primary key) for the document. The textkey parameter can be one of the following: ■
■
a single column primary key value an encoded specification for a composite (multiple column) primary key. To encode a composite textkey, use the CTX_DOC.PKENCODE procedure.
CTX_DOC Package 8-5
GIST
■
the rowid of the row containing the document
You toggle between primary key and rowid identification using CTX_DOC.SET_ KEY_TYPE. restab
You can specify that this procedure store the gist and theme summaries to either a table or to an in-memory CLOB. To store results to a table specify the name of the table. See Also: "Gist Table" in Appendix A, "Result Tables" for more information about the structure of the gist result table, see
To store results in memory, specify the name of the CLOB locator. If restab is NULL, a temporary CLOB is allocated and returned. You must de-allocate the locator after using it. If restab is not NULL, the CLOB is truncated before the operation. query_id
Specify an identifier to use to identify the row(s) inserted into restab. glevel
Specify the type of gist or theme summary to produce. The possible values are: ■
P for paragraph
■
S for sentence
The default is P. pov
Specify whether a gist or a single theme summary is generated. The type of gist or theme summary generated (sentence-level or paragraph-level) depends on the value specified for glevel. To generate a gist for the entire document, specify a value of ‘GENERIC’ for pov. To generate a theme summary for a single theme in a document, specify the theme as the value for pov. When using result table storage and you do not specify a value for pov, this procedure returns the generic gist plus up to fifty theme summaries for the document.
8-6 Oracle Text Reference
GIST
When using in-memory result storage to a CLOB, you must specify a pov. However, if you do not specify pov, this procedure generates only a generic gist for the document. Note: The pov parameter is case sensitive. To return a gist for a
document, specify ‘GENERIC’ in all uppercase. To return a theme summary, specify the theme exactly as it is generated for the document. Only the themes generated by THEMES for a document can be used as input for pov. numParagraphs
Specify the maximum number of document paragraphs (or sentences) selected for the document gist or theme summaries. The default is 0. Note: The numParagraphs parameter is used only when this
parameter yields a smaller gist or theme summary size than the gist or theme summary size yielded by the maxPercent parameter. This means that the system always returns the smallest size gist or theme summary.
maxPercent
Specify the maximum number of document paragraphs (or sentences) selected for the document gist or theme summaries as a percentage of the total paragraphs (or sentences) in the document. The default is 0. Note: The maxPercent parameter is used only when this parameter yields a smaller gist or theme summary size than the gist or theme summary size yielded by the numParagraphs parameter.
This means that the system always returns the smallest size gist or theme summary. num_themes
Specify the number of theme summaries to produce when you do not specify a value for pov. For example, if you specify 10, this procedure returns the top 10 theme summaries. The default is 50.
CTX_DOC Package 8-7
GIST
If you specify 0 or NULL, this procedure returns all themes in a document. If the document contains more than 50 themes, only the top 50 themes show conceptual hierarchy.
Examples In-Memory Gist The following example generates a non-default size generic gist of at most 10 paragraphs. The result is stored in memory in a CLOB locator. The code then de-allocates the returned CLOB locator after using it. set serveroutput on; declare gklob clob; amt number := 40; line varchar2(80); begin ctx_doc.gist(’newsindex’,’34’,gklob, pov => ’GENERIC’,numParagraphs => 10); -- gklob is NULL when passed-in, so ctx-doc.gist will allocate a temporary -- CLOB for us and place the results there. dbms_lob.read(gklob, amt, 1, line); dbms_output.put_line(’FIRST 40 CHARS ARE:’||line); -- have to de-allocate the temp lob dbms_lob.freetemporary(gklob); end;
Result Table Gists The following example creates a gist table called CTX_GIST: create table CTX_GIST (query_id number, pov varchar2(80), gist CLOB);
Gists and Theme Summaries The following example returns a default sized paragraph level gist for document 34 as well as the top 10 theme summaries in the document: begin ctx_doc.gist(’newsindex’,’34’,’CTX_GIST’, 1, num_themes=10); end;
8-8 Oracle Text Reference
GIST
The following example generates a non-default size gist of at most 10 paragraphs: begin ctx_doc.gist(’newsindex’,’34’,’CTX_GIST’,1,pov =>’GENERIC’,numParagraphs=>10); end;
The following example generates a gist whose number of paragraphs is at most 10 percent of the total paragraphs in document: begin ctx_doc.gist(’newsindex’,’34’,’CTX_GIST’,1,pov => ’GENERIC’, maxPercent => 10); end;
Theme Summary The following example returns a paragraph level theme summary for insects for document 34. The default theme summary size is returned. begin ctx_doc.gist(’newsindex’,’34’,’CTX_GIST’,1, pov => ’insects’); end;
CTX_DOC Package 8-9
HIGHLIGHT
HIGHLIGHT Use the CTX_DOC.HIGHLIGHT procedure to generate highlight offsets for a document. The offset information is generated for the terms in the document that satisfy the query you specify. These highlighted terms are either the words that satisfy a word query or the themes that satisfy an ABOUT query. You can generate highlight offsets for either plaintext or HTML versions of the document. Yo can apply the offset information to the same documents filtered with CTX_DOC.FILTER. You usually call this procedure after a query, from which you identify the document to be processed. You can store the highlight offsets in either an in-memory PL/SQL table or a result table.
Syntax 1:In-Memory Result Storage CTX_DOC.HIGHLIGHT( index_name textkey text_query restab plaintext
IN IN IN IN IN
VARCHAR2, VARCHAR2, VARCHAR2, OUT NOCOPY HIGHLIGHT_TAB, BOOLEAN DEFAULT FALSE);
VARCHAR2, VARCHAR2, VARCHAR2, VARCHAR2, NUMBER DEFAULT 0, BOOLEAN DEFAULT FALSE);
index_name
Specify the name of the index associated with the text column containing the document identified by textkey. textkey
Specify the unique identifier (usually the primary key) for the document. The textkey parameter can be one of the following:
8-10
Oracle Text Reference
HIGHLIGHT
■
■
■
a single column primary key value encoded specification for a composite (multiple column) primary key. Use the CTX_DOC.PKENCODE procedure. the rowid of the row containing the document
You toggle between primary key and rowid identification using CTX_DOC.SET_ KEY_TYPE. text_query
Specify the original query expression used to retrieve the document. If NULL, no highlights are generated. If text_query includes wildcards, stemming, fuzzy matching which result in stopwords being returned, HIGHLIGHT does not highlight the stopwords. If text_query contains the threshold operator, the operator is ignored. The HIGHLIGHT procedure always returns highlight information for the entire result set. restab
You can specify that this procedure store highlight offsets to either a table or to an in-memory PL/SQL table. To store results to a table specify the name of the table. The table must exist before you call this procedure. See Also: see "Highlight Table" in Appendix A, "Result Tables" for more information about the structure of the highlight result table.
To store results to an in-memory table, specify the name of the in-memory table of type CTX_DOC.HIGHLIGHT_TAB. The HIGHLIGHT_TAB datatype is defined as follows: type highlight_rec is record ( offset number; length number; ); type highlight_tab is table of highlight_rec index by binary_integer;
CTX_DOC.HIGHLIGHT clears HIGHLIGHT_TAB before the operation. query_id
Specify the identifier used to identify the row inserted into restab.
CTX_DOC Package 8-11
HIGHLIGHT
When query_id is not specified or set to NULL, it defaults to 0. You must manually truncate the table specified in restab. plaintext
Specify TRUE to generate a plaintext offsets of the document. Specify FALSE to generate HTML offsets of the document if you are using the INSO filter or indexing HTML documents.
Examples Create Highlight Table Create the highlight table to store the highlight offset information: create table hightab(query_id number, offset number, length number);
Word Highlight Offsets To obtain HTML highlight offset information for document 20 for the word dog: begin ctx_doc.highlight(’newsindex’, ’20’, ’dog’, ’hightab’, 0, FALSE); end;
Theme Highlight Offsets Assuming the index newsindex has a theme component, you obtain HTML highlight offset information for the theme query of politics by issuing the following query: begin ctx_doc.highlight(’newsindex’, ’20’, ’about(politics)’, ’hightab’, 0, FALSE); end;
The output for this statement are the offsets to highlighted words and phrases that represent the theme of politics in the document.
8-12
Oracle Text Reference
IFILTER
IFILTER Use this procedure when you need to filter binary data to text. This procedure takes binary data (BLOB IN), filters the data through with the Inso filter, and writes the text version to a CLOB. CTX_DOC.IFILTER employs the safe callout, so can be called from a user datastore procedure, and it does not require an index to use, as CTX_DOC.FILTER does.
Requirements Because CTX_DOC.IFILTER employs the safe callout mechanism, the SQL*Net listener must be running and configured for extproc agent startup.
Syntax CTX_DOC.IFILTER(data IN BLOB, text IN OUT NOCOPY CLOB);
data
Specify the binary data to be filtered. text
Specify the destination CLOB. The filtered data is placed in here. This parameter must be a valid CLOB locator that is writable. Passing NULL or a non-writable CLOB will result in an error. Filtered text will be appended to the end of existing content, if any.
CTX_DOC Package 8-13
MARKUP
MARKUP The CTX_DOC.MARKUP procedure takes a query specification and a document textkey and returns a version of the document in which the query terms are marked-up. These marked-up terms are either the words that satisfy a word query or the themes that satisfy an ABOUT query. You can set the marked-up output to be either plaintext or HTML. You can use one of the pre-defined tagsets for marking highlighted terms, including a tag sequence that enables HTML navigation. You usually call CTX_DOC.MARKUP after a query, from which you identify the document to be processed. You can store the marked-up document either in memory or in a result table.
Specify the name of the index associated with the text column containing the document identified by textkey. textkey
Specify the unique identifier (usually the primary key) for the document. The textkey parameter can be one of the following: ■
■
■
a single column primary key value encoded specification for a composite (multiple column) primary key. Use the CTX_DOC.PKENCODE procedure. the rowid of the row containing the document
You toggle between primary key and rowid identification using CTX_DOC.SET_ KEY_TYPE. text_query
Specify the original query expression used to retrieve the document. If text_query includes wildcards, stemming, fuzzy matching which result in stopwords being returned, MARKUP does not highlight the stopwords. If text_query contains the threshold operator, the operator is ignored. The MARKUP procedure always returns highlight information for the entire result set. restab
You can specify that this procedure store the marked-up text to either a table or to an in-memory CLOB.
CTX_DOC Package 8-15
MARKUP
To store results to a table specify the name of the table. The result table must exist before you call this procedure. See Also: For more information about the structure of the
markup result table, see "Markup Table" in Appendix A, "Result Tables". To store results in memory, specify the name of the CLOB locator. If restab is NULL, a temporary CLOB is allocated and returned. You must de-allocate the locator after using it. If restab is not NULL, the CLOB is truncated before the operation. query_id
Specify the identifier used to identify the row inserted into restab. When query_id is not specified or set to NULL, it defaults to 0. You must manually truncate the table specified in restab. plaintext
Specify TRUE to generate plaintext marked-up document. Specify FALSE to generate a marked-up HTML version of document if you are using the INSO filter or indexing HTML documents. tagset
Specify one of the following pre-defined tagsets. The second and third columns show how the four different tags are defined for each tagset:
Specify the character(s) inserted by MARKUP to indicate the start of a highlighted term. The sequence of starttag, endtag, prevtag and nexttag with respect to the highlighted word is as follows: ... prevtag starttag word endtag nexttag...
endtag
Specify the character(s) inserted by MARKUP to indicate the end of a highlighted term. prevtag
Specify the markup sequence that defines the tag that navigates the user to the previous highlight. In the markup sequences prevtag and nexttag, you can specify the following offset variables which are set dynamically: Offset Variable
Value
%CURNUM
the current offset number
%PREVNUM
the previous offset number
%NEXTNUM
the next offset number
See the description of the HTML_NAVIGATE tagset for an example. nexttag
Specify the markup sequence that defines the tag that navigates the user to the next highlight tag.
CTX_DOC Package 8-17
MARKUP
Within the markup sequence, you can use the same offset variables you use for prevtag. See the explanation for prevtag and the HTML_NAVIGATE tagset for an example.
Examples In-Memory Markup The following code generates a marked-up document and stores it in memory. The code passes a NULL CLOB locator to MARKUP and then de-allocates the returned CLOB locator after using it. set serveroutput on declare mklob clob; amt number := 40; line varchar2(80); begin ctx_doc.markup(’myindex’,’1’,’dog & cat’, mklob); -- mklob is NULL when passed-in, so ctx-doc.markup will allocate a temporary -- CLOB for us and place the results there. dbms_lob.read(mklob, amt, 1, line); dbms_output.put_line(’FIRST 40 CHARS ARE:’||line); -- have to de-allocate the temp lob dbms_lob.freetemporary(mklob); end;
Markup Table Create the highlight markup table to store the marked-up document as follows: create table markuptab (query_id number, document clob);
Word Highlighting in HTML To create HTML highlight markup for the words dog or cat for document 23, issue the following statement:
Theme Highlighting in HTML To create HTML highlight markup for the theme of politics for document 23, issue the following statement: begin ctx_doc.markup(index_name => ’my_index’, textkey => ’23’, text_query => ’about(politics)’, restab => ’markuptab’, query_id => ’1’, tagset => ’HTML_DEFAULT’); end;
CTX_DOC Package 8-19
PKENCODE
PKENCODE The CTX_DOC.PKENCODE function converts a composite textkey list into a single string and returns the string. The string created by PKENCODE can be used as the primary key parameter textkey in other CTX_DOC procedures, such as CTX_DOC.THEMES and CTX_DOC.GIST.
Syntax CTX_DOC.PKENCODE( pk1 IN pk2 IN pk4 IN pk5 IN pk6 IN pk7 IN pk8 IN pk9 IN pk10 IN pk11 IN pk12 IN pk13 IN pk14 IN pk15 IN pk16 IN RETURN VARCHAR2;
Each PK argument specifies a column element in the composite textkey list. You can encode at most 16 column elements.
Returns String that represents the encoded value of the composite textkey.
8-20
Oracle Text Reference
PKENCODE
Examples begin ctx_doc.gist(’newsindex’,CTX_DOC.PKENCODE(’smith’, 14), ’CTX_GIST’); end;
In this example, smith and 14 constitute the composite textkey value for the document.
CTX_DOC Package 8-21
SET_KEY_TYPE
SET_KEY_TYPE Use this procedure to set the CTX_DOC procedures to accept either the ROWID or the PRIMARY_KEY document identifiers. This setting affects the invoking session only.
Syntax ctx_doc.set_key_type(key_type in varchar2);
key_type
Specify either ROWID or PRIMARY_KEY as the input key type (document identifier) for CTX_DOC procedures. This parameter defaults to the value of the CTX_DOC_KEY_TYPE system parameter. Note: When your base table has no primary key, setting key_type
to PRIMARY_KEY is ignored. The textkey parameter you specify for any CTX_DOC procedure is interpreted as a ROWID.
Example To set CTX_DOC procedures to accept primary key document identifiers, do the following: begin ctx_doc.set_key_type(’PRIMARY_KEY’); end
8-22
Oracle Text Reference
THEMES
THEMES Use the CTX_DOC.THEMES procedure to generate a list of themes for a document. You can store each theme as a row in either a result table or an in-memory PL/SQL table you specify.
IN VARCHAR2, IN VARCHAR2, IN VARCHAR2, IN NUMBER DEFAULT 0, IN BOOLEAN DEFAULT FALSE, IN NUMBER DEFAULT 50);
index_name
Specify the name of the index for the text column. textkey
Specify the unique identifier (usually the primary key) for the document. The textkey parameter can be one of the following: ■
■
■
a single column primary key value an encoded specification for a composite (multiple column) primary key. When textkey is a composite key, you must encode the composite textkey string using the CTX_DOC.PKENCODE procedure. the rowid of the row containing the document
You toggle between primary key and rowid identification using CTX_DOC.SET_ KEY_TYPE.
CTX_DOC Package 8-23
THEMES
restab
You can specify that this procedure store results to either a table or to an in-memory PL/SQL table. To store results in a table, specify the name of the table. See Also: "Theme Table" in Appendix A, "Result Tables" for more
information about the structure of the theme result table. To store results in an in-memory table, specify the name of the in-memory table of type THEME_TAB. The THEME_TAB datatype is defined as follows: type theme_rec is record ( theme varchar2(2000); weight number; ); type theme_tab is table of theme_rec index by binary_integer;
CTX_DOC.THEMES clears the THEME_TAB you specify before the operation. query_id
Specify the identifier used to identify the row(s) inserted into restab. full_themes
Specify whether this procedure generates a single theme or a hierarchical list of parent themes (full themes) for each document theme. Specify TRUE for this procedure to write full themes to the THEME column of the result table. Specify FALSE for this procedure to write single theme information to the THEME column of the result table. This is the default. num_themes
Specify the number of themes to retrieve. For example, if you specify 10, the top 10 themes are returned for the document. The default is 50. If you specify 0 or NULL, this procedure returns all themes in a document. If the document contains more than 50 themes, only the top 50 themes show conceptual hierarchy.
8-24
Oracle Text Reference
THEMES
Examples In-Memory Themes The following example generates the top 10 themes for document 1 and stores them in an in-memory table called the_themes. The example then loops through the table to display the document themes. declare the_themes ctx_doc.theme_tab; begin ctx_doc.themes(’myindex’,’1’,the_themes, numthemes=>10); for i in 1..the_themes.count loop dbms_output.put_line(the_themes(i).theme||’:’||the_themes(i).weight); end loop; end;
Theme Table The following example creates a theme table called CTX_THEMES: create table CTX_THEMES (query_id number, theme varchar2(2000), weight number);
Single Themes To obtain a list of the top 20 themes where each element in the list is a single theme, issue a statement like the following: begin ctx_doc.themes(’newsindex’,’34’,’CTX_THEMES’,1,full_themes => FALSE, num_themes=> 20); end;
Full Themes To obtain a list of the top 20 themes where each element in the list is a hierarchical list of parent themes, issue a statement like the following: begin ctx_doc.themes(’newsindex’,’34’,’CTX_THEMES’,1,full_themes => TRUE, num_themes=>20); end;
CTX_DOC Package 8-25
TOKENS
TOKENS Use this procedure to identify all text tokens in a document. The tokens returned are those tokens which are inserted into the index. This feature is useful for implementing document classification, routing, or clustering. Stopwords are not returned. Section tags are not returned because they are not text tokens.
IN VARCHAR2, IN VARCHAR2, IN OUT NOCOPY TOKEN_TAB);
Syntax 2: Result Table Storage CTX_DOC.TOKENS(index_name textkey restab query_id
IN IN IN IN
VARCHAR2, VARCHAR2, VARCHAR2, NUMBER DEFAULT 0);
index_name
Specify the name of the index for the text column. textkey
Specify the unique identifier (usually the primary key) for the document. The textkey parameter can be one of the following: ■
■
■
a single column primary key value encoded specification for a composite (multiple column) primary key. To encode a composite textkey, use the CTX_DOC.PKENCODE procedure. the rowid of the row containing the document
You toggle between primary key and rowid identification using CTX_DOC.SET_ KEY_TYPE. restab
You can specify that this procedure store results to either a table or to an in-memory PL/SQL table.
8-26
Oracle Text Reference
TOKENS
The tokens returned are those tokens which are inserted into the index for the document (or row) named with textkey. Stop words are not returned. Section tags are not returned because they are not text tokens. Specifying a Token Table To store results to a table, specify the name of the table. Token tables can be named anything, but must include the following columns, with names and data types as specified. Table 8–1 Column Name
Type
Description
QUERY_ID
NUMBER
The identifier for the results generated by a particular call to CTX_DOC.TOKENS (only populated when table is used to store results from multiple TOKEN calls)
TOKEN
VARCHAR2(64) The token string in the text.
OFFSET
NUMBER
The position of the token in the document, relative to the start of document which has a position of 1.
LENGTH
NUMBER
The character length of the token.
Specifying an In-Memory Table To store results to an in-memory table, specify the name of the in-memory table of type TOKEN_TAB. The TOKEN_TAB datatype is defined as follows: type token_rec is record ( token varchar2(64); offset number; length number; ); type token_tab is table of token_rec index by binary_integer;
CTX_DOC.TOKENS clears the TOKEN_TAB you specify before the operation. query_id
Specify the identifier used to identify the row(s) inserted into restab.
CTX_DOC Package 8-27
TOKENS
Examples In-Memory Tokens The following example generates the tokens for document 1 and stores them in an in-memory table, declared as the_tokens. The example then loops through the table to display the document tokens. declare the_tokens ctx_doc.token_tab; begin ctx_doc.tokens(’myindex’,’1’,the_tokens); for i in 1..the_tokens.count loop dbms_output.put_line(the_tokens(i).token); end loop; end;
8-28
Oracle Text Reference
9 CTX_OUTPUT Package This chapter provides reference information for using the CTX_OUTPUT PL/SQL package. CTX_OUTPUT contains the following stored procedures: Name
Description
ADD_EVENT
Add an event to the index log.
END_LOG
Halts logging of index and document services requests.
LOGFILENAME
Returns the name of the current log file.
REMOVE_EVENT
Remove an event from the index log.
START_LOG
Starts logging index and document service requests.
CTX_OUTPUT Package 9-1
ADD_EVENT
ADD_EVENT Use this procedure to add an event to the index log for more detailed log output. Currently the only event you can add is the CTX_OUTPUT.EVENT_INDEX_PRINT_ ROWID which logs the rowid of each row after it is indexed. This is useful for debugging a failed index operation.
Syntax CTX_OUTPUT.ADD_EVENT(event in varchar2);
event
Specify the type of index event to log. Currently the only event you can add is the CTX_OUTPUT.EVENT_INDEX_PRINT_ROWID which logs the rowid of each row after it is indexed.
Example begin CTX_OUTPUT.ADD_EVENT(CTX_OUTPUT.EVENT_INDEX_PRINT_ROWID); end;
9-2 Oracle Text Reference
END_LOG
END_LOG Halt logging index and document service requests
Syntax CTX_OUTPUT.END_LOG;
Example begin CTX_OUTPUT.END_LOG; end;
CTX_OUTPUT Package 9-3
LOGFILENAME
LOGFILENAME Returns the filename for the current log. This procedure looks for the log file in the directory specified by the LOG_DIRECTORY system parameter.
Syntax CTX_OUTPUT.LOGFILENAME RETURN VARCHAR2;
Returns Log file name.
Example declare logname varchar2(100); begin logname := CTX_OUTPUT.LOGFILENAME; dbms_output.put_line('The current log file is: '||logname); end;
9-4 Oracle Text Reference
REMOVE_EVENT
REMOVE_EVENT Use this procedure to remove an event from the index log.
Syntax CTX_OUTPUT.REMOVE_EVENT(event in varchar2);
event
Specify the type of index event to remove from the log. Currently the only event you can add and remove is the CTX_OUTPUT.EVENT_INDEX_PRINT_ROWID.
Example begin CTX_OUTPUT.REMOVE_EVENT(CTX_OUTPUT.EVENT_INDEX_PRINT_ROWID); end;
CTX_OUTPUT Package 9-5
START_LOG
START_LOG Begin logging index and document service requests.
Syntax CTX_OUTPUT.START_LOG(logfile in varchar2);
logfile
Specify the name of the log file. The log is stored in the directory specified by the system parameter LOG_DIRECTORY.
Example begin CTX_OUTPUT.START_LOG(’mylog1’); end;
9-6 Oracle Text Reference
10 CTX_QUERY Package This chapter describes the CTX_QUERY PL/SQL package you can use for generating query feedback, counting hits, and creating stored query expressions. Note: You can use this package only when your index type is
CONTEXT. This package does not support the CTXCAT index type. The CTX_QUERY package includes the following procedures and functions: Name
Description
BROWSE_WORDS
Returns the words around a seed word in the index.
COUNT_HITS
Returns the number hits to a query.
EXPLAIN
Generates query expression parse and expansion information.
HFEEDBACK
Generates hierarchical query feedback information (broader term, narrower term, and related term).
REMOVE_SQE
Removes a specified stored query expression from the SQL tables.
STORE_SQE
Executes a query and stores the results in stored query expression tables.
CTX_QUERY Package 10-1
BROWSE_WORDS
BROWSE_WORDS This procedure enables you to browse words in an Oracle Text index. You specify a seed word and BROWSE_WORDS returns the words around it in the index, and a rough count of the number of documents that contain each word. This feature is useful for refining queries. You can identify the following: ■
unselective words (words that have low document count)
■
misspelled words in the document set
Syntax 1: To Store Results in Table ctx_query.browse_words( index_name IN VARCHAR2, seed IN VARCHAR2, restab IN VARCHAR2, browse_id IN NUMBER DEFAULT numwords IN NUMBER DEFAULT direction IN VARCHAR2 DEFAULT part_name IN VARCHAR2 DEFAULT );
0, 10, BROWSE_AROUND, NULL
Syntax 2: To Store Results in Memory ctx_query.browse_words( index_name IN seed IN resarr IN OUT numwords IN direction IN part_name IN );
Specify the name of the index. You can specify schema.name. Must be a local index. seed
Specify the seed word. This word is lexed before browse expansion. The word need not exist in the token table. seed must be a single word. Using multiple words as the seed will result in an error.
10-2
Oracle Text Reference
BROWSE_WORDS
restab
Specify the name of the result table. You can enter restab as schema.name. The table must exist before you call this procedure, and you must have INSERT permissions on the table. This table must have the following schema. Column
Datatype
browse_id
number
word
varchar2(64)
doc_count
number
Existing rows in restab are not deleted before BROWSE_WORDS is called. resarr
Specify the name of the result array. resarr is of type ctx_query.browse_tab. type browse_rec is record ( word varchar2(64), doc_count number ); type browse_tab is table of browse_rec index by binary_integer;
browse_id
Specify a numeric identifier between 0 and 232. The rows produced for this browse have a value of in the browse_id column in restab. When you do not specify browse_id, it defaults to 0. numwords
Specify the number of words returned. direction
Specify the direction for the browse. You can specify one of:
value
behavior
BEFORE
Browse seed word and words alphabetically before the seed.
AROUND
Browse seed word and words alphabetically before and after the seed.
AFTER
Browse seed word and words alphabetically after the seed.
CTX_QUERY Package 10-3
BROWSE_WORDS
Symbols CTX_QUERY.BROWSE_BEFORE, CTX_QUERY.BROWSE_AROUND, and CTX_ QUERY.BROWSE_AFTER are defined for these literal values as well. part_name
Specify the name of the index partition to browse.
Example Browsing Words with Result Table begin ctx_query.browse_words(’myindex’,’dog’,’myres’,numwords=>5,direction=>’AROUND’); end; select word, doc_count from myres order by word; WORD -------CZAR DARLING DOC DUNK EAR
DOC_COUNT ---------15 5 73 100 3
Browsing Words with Result Array set serveroutput on; declare resarr ctx_query.browse_tab; begin ctx_query.browse_words(’myindex’,’dog’,resarr,5,CTX_QUERY.BROWSE_AROUND); for i in 1..resarr.count loop dbms_output.put_line(resarr(i).word || ’:’ || resarr(i).doc_count); end loop; end;
10-4
Oracle Text Reference
COUNT_HITS
COUNT_HITS Returns the number of hits for the specified query. You can call COUNT_HITS in exact or estimate mode. Exact mode returns the exact number of hits for the query. Estimate mode returns an upper-bound estimate but runs faster than exact mode.
Syntax CTX_QUERY.COUNT_HITS ( index_name IN VARCHAR2, text_query IN VARCHAR2, exact IN BOOLEAN DEFAULT TRUE, part_name IN VARCHAR2 DEFAULT NULL ) RETURN NUMBER;
index_name
Specify the index name. text_query
Specify the query. exact
Specify TRUE for an exact count. Specify FALSE for an upper-bound estimate. Specifying FALSE returns a less accurate number but runs faster. part_name
Specify the name of the index partition to query.
Notes If the query contains structured criteria, you should use SELECT COUNT(*).
CTX_QUERY Package 10-5
EXPLAIN
EXPLAIN Use CTX_QUERY.EXPLAIN to generate explain plan information for a query expression. The EXPLAIN plan provides a graphical representation of the parse tree for a Text query expression. This information is stored in a result table. This procedure does not execute the query. Instead, this procedure can tell you how a query is expanded and parsed before you issue the query. This is especially useful for stem, wildcard, thesaurus, fuzzy, soundex, or about queries. Parse trees also show the following information: ■
order of execution (precedence of operators)
■
ABOUT query normalization
■
query expression optimization
■
stop-word transformations
■
breakdown of composite-word tokens
Knowing how Oracle evaluates a query is useful for refining and debugging queries. You can also design your application so that it uses the explain plan information to help users write better queries.
Limitation You cannot use EXPLAIN with remote queries.
Specify the query expression to be used as criteria for selecting rows. When you include a wildcard, fuzzy, or soundex operator in text_query, this procedure looks at the index tables to determine the expansion. Wildcard, fuzzy (?), and soundex (!) expression feedback does not account for lazy deletes as in regular queries. explain_table
Specify the name of the table used to store representation of the parse tree for text_ query. You must have at least INSERT and DELETE privileges on the table used to store the results from EXPLAIN. See Also: For more information about the structure of the explain table, see "EXPLAIN Table" in Appendix A, "Result Tables". sharelevel
Specify whether explain_table is shared by multiple EXPLAIN calls. Specify 0 for exclusive use and 1 for shared use. This parameter defaults to 0 (single-use). When you specify 0, the system automatically truncates the result table before the next call to EXPLAIN. When you specify 1 for shared use, this procedure does not truncate the result table. Only results with the same explain_id are updated. When no results with the same explain_id exist, new results are added to the EXPLAIN table. explain_id
Specify a name that identifies the explain results returned by an EXPLAIN procedure when more than one EXPLAIN call uses the same shared EXPLAIN table. This parameter defaults to NULL. part_name
Specify the name of the index partition to query.
Example Creating the Explain Table To create an explain table called test_explain for example, use the following SQL statement:
CTX_QUERY Package 10-7
EXPLAIN
create table test_explain( explain_id varchar2(30) id number, parent_id number, operation varchar2(30), options varchar2(30), object_name varchar2(64), position number, cardinality number);
Executing CTX_QUERY.EXPLAIN To obtain the expansion of a query expression such as comp% OR ?smith, use CTX_ QUERY.EXPLAIN as follows: ctx_query.explain( index_name => text_query => explain_table sharelevel => explain_id =>
’newindex’, ’comp% OR ?smith’, => ’test_explain’, 0, ’Test’);
Retrieving Data from Explain Table To read the explain table, you can select the columns as follows: select explain_id, id, parent_id, operation, options, object_name, position from test_explain order by id;
The output is ordered by ID to simulate a hierarchical query: EXPLAIN_ID ID PARENT_ID OPERATION OPTIONS ----------- ---- --------- ------------ ------Test 1 0 OR NULL Test 2 1 EQUIVALENCE NULL Test 3 2 WORD NULL Test 4 2 WORD NULL Test 5 1 EQUIVALENCE (?) Test 6 5 WORD NULL Test 7 5 WORD NULL
OBJECT_NAME POSITION ----------- -------NULL 1 COMP% 1 COMPTROLLER 1 COMPUTER 2 SMITH 2 SMITH 1 SMYTHE 2
HFEEDBACK
HFEEDBACK In English or French, this procedure generates hierarchical query feedback information (broader term, narrower term, and related term) for the specified query. Broader term, narrower term, and related term information is obtained from the knowledge base. However, only knowledge base terms that are also in the index are returned as query feedback information. This increases the chances that terms returned from HFEEDBACK produce hits over the currently indexed document set. Hierarchical query feedback information is useful for suggesting other query terms to the user. Note: CTX_QUERY.HFEEDBACK is only supported in English and
Specify the name of the index for the text column to be queried. text_query
Specify the query expression to be used as criteria for selecting rows. feedback_table
Specify the name of the table used to store the feedback terms. See Also: For more information about the structure of the explain table, see "HFEEDBACK Table" in Appendix A, "Result Tables".
CTX_QUERY Package 10-9
HFEEDBACK
sharelevel
Specify whether feedback_table is shared by multiple HFEEDBACK calls. Specify 0 for exclusive use and 1 for shared use. This parameter defaults to 0 (single-use). When you specify 0, the system automatically truncates the feedback table before the next call to HFEEDBACK. When you specify 1 for shared use, this procedure does not truncate the feedback table. Only results with the same feedback_id are updated. When no results with the same feedback_id exist, new results are added to the feedback table. feedback_id
Specify a value that identifies the feedback results returned by a call to HFEEDBACK when more than one HFEEDBACK call uses the same shared feedback table. This parameter defaults to NULL. part_name
Specify the name of the index partition to query.
Example Create HFEEDBACK Result Table Create a result table to use with CTX_QUERY.HFEEDBACK as follows: CREATE TABLE restab ( feedback_id VARCHAR2(30), id NUMBER, parent_id NUMBER, operation VARCHAR2(30), options VARCHAR2(30), object_name VARCHAR2(80), position NUMBER, bt_feedback ctx_feedback_type, rt_feedback ctx_feedback_type, nt_feedback ctx_feedback_type ) NESTED TABLE bt_feedback STORE AS res_bt NESTED TABLE rt_feedback STORE AS res_rt NESTED TABLE nt_feedback STORE AS res_nt;
CTX_FEEDBACK_TYPE is a system-defined type in the CTXSYS schema.
10-10 Oracle Text Reference
HFEEDBACK
See Also: For more information about the structure of the hfeedback table, see "HFEEDBACK Table" in Appendix A, "Result Tables".
Call CTX_QUERY.HFEEDBACK The following code calls the hfeedback procedure with the query computer industry. BEGIN ctx_query.hfeedback (index_name text_query feedback_table sharelevel feedback_id ); END;
Select From the Result Table The following code extracts the feedback data from the result table. It extracts broader term, narrower term, and related term feedback separately from the nested tables. DECLARE i NUMBER; BEGIN FOR frec IN ( SELECT object_name, bt_feedback, rt_feedback, nt_feedback FROM restab WHERE feedback_id = ’query10’ AND object_name IS NOT NULL ) LOOP dbms_output.put_line(’Broader term feedback for ’ || frec.object_name || ’:’); i := frec.bt_feedback.FIRST; WHILE i IS NOT NULL LOOP dbms_output.put_line(frec.bt_feedback(i).text); i := frec.bt_feedback.NEXT(i); END LOOP; dbms_output.put_line(’Related term feedback for ’ || frec.object_name || ’:’); i := frec.rt_feedback.FIRST; WHILE i IS NOT NULL LOOP dbms_output.put_line(frec.rt_feedback(i).text);
CTX_QUERY Package
10-11
HFEEDBACK
i := frec.rt_feedback.NEXT(i); END LOOP; dbms_output.put_line(’Narrower term feedback for ’ || frec.object_name || ’:’); i := frec.nt_feedback.FIRST; WHILE i IS NOT NULL LOOP dbms_output.put_line(frec.nt_feedback(i).text); i := frec.nt_feedback.NEXT(i); END LOOP; END LOOP; END;
Sample Output The following output is for the example above, which queries on computer industry: Broader term feedback for computer industry: hard sciences Related term feedback for computer industry: computer networking electronics knowledge library science mathematics optical technology robotics satellite technology semiconductors and superconductors symbolic logic telecommunications industry Narrower term feedback for computer industry: ABEND - abnormal end of task AT&T Starlans ATI Technologies, Incorporated ActivCard Actrade International Ltd. Alta Technology Amiga Format Amiga Library Services Amiga Shopper Amstrat Action Apple Computer, Incorporated ..
10-12 Oracle Text Reference
HFEEDBACK
Note: The HFEEDBACK information you obtain depends on the
contents of your index and knowledge base and as such might differ from above.
CTX_QUERY Package
10-13
REMOVE_SQE
REMOVE_SQE The CTX_QUERY.REMOVE_SQE procedure removes the specified stored query expression.
Syntax CTX_QUERY.REMOVE_SQE(query_name IN VARCHAR2);
query_name
Specify the name of the stored query expression to be removed.
Examples begin ctx_query.remove_sqe(’disasters’); end;
10-14 Oracle Text Reference
STORE_SQE
STORE_SQE This procedure creates a stored query expression. Only the query definition is stored.
Supported Operators Stored query expressions support all of the CONTAINS query operators. Stored query expressions also support all of the special characters and other components that can be used in a query expression, including other stored query expressions.
Privileges Users are allowed to create and remove stored query expressions owned by them. Users are allowed to use stored query expressions owned by anyone. The CTXSYS user can create or remove stored query expressions for any user.
Syntax CTX_QUERY.STORE_SQE(query_name text_query
IN VARCHAR2, IN VARCHAR2);
query_name
Specify the name of the stored query expression to be created. If you are CTXSYS, you can specify this as user.name. text_query
Specify the query expression to be associated with query_name.
Examples begin ctx_query.store_sqe(’disasters’, ’hurricanes | earthquakes’); end;
CTX_QUERY Package
10-15
STORE_SQE
10-16 Oracle Text Reference
11 CTX_REPORT This chapter describes how to use the CTX_REPORT package to create various index reports. This chapter contains the following topics: ■
Procedures in CTX_REPORT
■
Using the Function Versions
CTX_REPORT
11-1
Procedures in CTX_REPORT
Procedures in CTX_REPORT The CTX_REPORT package contains the following procedures: Name
Description
DESCRIBE_INDEX
Create a report describing the index.
DESCRIBE_POLICY
Create a report describing a policy.
CREATE_INDEX_SCRIPT
Creates a SQL*Plus script to duplicate the named index.
CREATE_POLICY_SCRIPT
Creates a SQL*Plus script to duplicate the named policy.
INDEX_SIZE
Creates a report to show the internal objects of an index, their tablespaces and used sizes.
INDEX_STATS
Creates a report to show the various statistics of an index.
TOKEN_INFO
Creates a report showing the information for a token, decoded.
TOKEN_TYPE
Translates a name and returns a numeric token type.
Using the Function Versions Some of the procedures in the CTX_REPORT package have function variants. You can call these functions as follows: select ctx_report.describe_index(’MYINDEX’) from dual;
In SQL*Plus, to generate an output file to send to support, you can do: set long 64000 set pages 0 set heading off set feedback off spool outputfile select ctx_report.describe_index(’MYINDEX’) from dual; spool off
11-2
Oracle Text Reference
DESCRIBE_INDEX
DESCRIBE_INDEX Creates a report describing the index. This includes the settings of the index meta-data, the indexing objects used, the settings of the attributes of the objects, and index partition descriptions, if any. You can call this operation as a procedure with an IN OUT CLOB parameter or as a function that returns the report as a CLOB.
Syntax procedure CTX_REPORT.DESCRIBE_INDEX( index_name IN VARCHAR2, report IN OUT NOCOPY CLOB ); function CTX_REPORT.DESCRIBE_INDEX( index_name IN VARCHAR2 ) return CLOB;
index_name
Specify the name of the index to describe. report
Specify the CLOB locator to which to write the report. If report is NULL, a session-duration temporary CLOB will be created and returned. It is the caller’s responsibility to free this temporary CLOB as needed. The report CLOB will be truncated before report is generated, so any existing contents will be overwritten by this call.
CTX_REPORT
11-3
DESCRIBE_POLICY
DESCRIBE_POLICY Creates a report describing the policy. This includes the settings of the policy meta-data, the indexing objects used, the settings of the attributes of the objects. You can call this operation as a procedure with an IN OUT CLOB parameter or as a function that returns the report as a CLOB.
Syntax procedure CTX_REPORT.DESCRIBE_POLICY( policy_name IN VARCHAR2, report IN OUT NOCOPY CLOB ); function CTX_REPORT.DESCRIBE_POLICY( policy_name IN VARCHAR2 ) return CLOB;
policy_name Specify the name of the policy to describe
report
Specify the CLOB locator to which to write the report. If report is NULL, a session-duration temporary CLOB will be created and returned. It is the caller’s responsibility to free this temporary CLOB as needed. The report CLOB will be truncated before report is generated, so any existing contents will be overwritten by this call.
11-4
Oracle Text Reference
CREATE_INDEX_SCRIPT
CREATE_INDEX_SCRIPT Creates a SQL*Plus script which will create a text index that duplicates the named text index. The created script will include creation of preferences identical to those used in the named text index. However, the names of the preferences will be different. You can call this operation as a procedure with an IN OUT CLOB parameter or as a function that returns the report as a CLOB.
Syntax procedure CTX_REPORT.CREATE_INDEX_SCRIPT( index_name in varchar2, report in out nocopy clob, prefname_prefix in varchar2 default null ); function CTX_REPORT.CREATE_INDEX_SCRIPT( index_name in varchar2, prefname_prefix in varchar2 default null ) return clob;
index_name
Specify the name of the index. report
Specify the CLOB locator to which to write the script. If report is NULL, a session-duration temporary CLOB will be created and returned. It is the caller’s responsibility to free this temporary CLOB as needed. The report clob will be truncated before report is generated, so any existing contents will be overwritten by this call. prefname_prefix
Specify optional prefix to use for preference names. If prefname_prefix is omitted or NULL, index name will be used. The prefname_prefix follows index length restrictions.
CTX_REPORT
11-5
CREATE_POLICY_SCRIPT
CREATE_POLICY_SCRIPT Creates a SQL*Plus script which will create a text policy that duplicates the named text policy. The created script will include creation of preferences identical to those used in the named text policy. You can call this operation as a procedure with an IN OUT CLOB parameter or as a function that returns the report as a CLOB.
Syntax procedure CTX_REPORT.CREATE_POLICY_SCRIPT( policy_name in varchar2, report in out nocopy clob, prefname_prefix in varchar2 default null ); function CTX_REPORT.CREATE_POLICY_SCRIPT( policy_name in varchar2, prefname_prefix in varchar2 default null ) return clob;
policy_name
Specify the name of the policy. report
Specify the locator to which to write the script. If report is NULL, a session-duration temporary CLOB will be created and returned. It is the caller’s responsibility to free this temporary CLOB as needed. The report CLOB will be truncated before report is generated, so any existing contents will be overwritten by this call. prefname_prefix
Specify the optional prefix to use for preference names. If prefname_prefix is omitted or NULL, policy name will be used. prefname_prefix follows policy length restrictions.
11-6
Oracle Text Reference
INDEX_SIZE
INDEX_SIZE Creates a report showing the internal objects of the text index or text index partition, and their tablespaces, allocated, and used sizes. You can call this operation as a procedure with an IN OUT CLOB parameter or as a function that returns the report as a CLOB.
Syntax procedure CTX_REPORT.INDEX_SIZE( index_name IN VARCHAR2, report IN OUT NOCOPY CLOB, part_name IN VARCHAR2 DEFAULT NULL ); function CTX_REPORT.INDEX_SIZE( index_name IN VARCHAR2, part_name IN VARCHAR2 DEFAULT NULL ) return clob;
index_name
Specify the name of the index to describe report
Specify the CLOB locator to which to write the report. If report is NULL, a session-duration temporary CLOB will be created and returned. It is the caller’s responsibility to free this temporary CLOB as needed. The report clob will be truncated before report is generated, so any existing contents will be overwritten by this call part_name
Specify the name of the index partition (optional). If part_name is NULL, and the index is a local partitioned text index, then all objects of all partitions will be displayed. If part_name is provided, then only the objects of a particular partition will be displayed.
CTX_REPORT
11-7
INDEX_STATS
INDEX_STATS Creates a report showing various calculated statistics about the text index. This procedure will fully scan the text index tables, so it may take a long time to run for large indexes. INDEX_STATS will create and use a session-duration temporary table, which will be created in CTXSYS temp tablespace. procedure index_stats( index_name in varchar2, report in out nocopy clob, part_name in varchar2 default null, frag_stats in boolean default TRUE, list_size in number default 100 );
index_name
Specify the name of the index to describe. You can specify a CONTEXT, CTXCAT, CTXRULE, or CTXXPATH index. report
Specify the CLOB locator to which to write the report.If report is NULL, a session-duration temporary CLOB will be created and returned. It is the caller’s responsibility to free this temporary CLOB as needed. The report clob will be truncated before report is generated, so any existing contents will be overwritten by this call. part_name
Specify the name of the index partition. If the index is a local partitioned index, then part_name must be provided. INDEX_STATS will calculate the statistics for that index partition. frag_stats
Specify TRUE to calculate fragmentation statistics. If frag_stats is FALSE, the report will not show any statistics relating to size of index data. However, the operation should take less time and resources to calculate the token statistics. list_size
Specify the number of elements in each compiled list. list_size has a maximum value of 1000.
11-8
Oracle Text Reference
INDEX_STATS
Example The following is sample output for INDEX_STATS on a context index. This report has been truncated for clarity. It shows some of the token statistics and all of the fragmentation statistics. The fragmentation statistics are at the end of the report. It tells you optimal row fragmentation, an estimated amount of garbage data in the index, and a list of the most fragmented tokens. Running CTX_DDL.OPTIMIZE_INDEX cleans up the index.
----------------------------------------------------------------TOKEN STATISTICS ----------------------------------------------------------------unique tokens: average $I rows per token: tokens with most $I rows: telecommunications industry (THEME) science and technology (THEME) EMAIL (FIELD SECTION "SOURCE") DEC (FIELD SECTION "TIMESTAMP") electronic mail (THEME) computer networking (THEME) communications (THEME) 95 (FIELD SECTION "TIMESTAMP") 15 (FIELD SECTION "TIMESTAMP") HEADLINE (ZONE SECTION) average size per token: tokens with largest size: T (NORMAL) SAID (NORMAL) HEADLINE (ZONE SECTION) NEW (NORMAL) I (NORMAL)
MILLION (PREFIX) D (NORMAL) MILLION (NORMAL) U (NORMAL) DEC (FIELD SECTION "TIMESTAMP")
222 219 215 192 186
average frequency per token: most frequent tokens: HEADLINE (ZONE SECTION) DEC (FIELD SECTION "TIMESTAMP") 95 (FIELD SECTION "TIMESTAMP") 15 (FIELD SECTION "TIMESTAMP") T (NORMAL) D (NORMAL) 881115 (THEME) 881115 (NORMAL) I (NORMAL) geography (THEME)
2.00
token statistics by type: token type: unique tokens: total rows: average rows: total size: average size: average frequency: most frequent tokens: T D 881115 I SAID C NEW MILLION FIRST COMPANY token type: unique tokens: total rows: average rows: total size: average size:
average frequency: most frequent tokens: 881115 political geography geography United States business and economics abstract ideas and concepts North America science and technology NKS nulls
2.40 58 52 52 51 50 48 48 46 34 34
The fragmentation portion of this report is as follows: ----------------------------------------------------------------FRAGMENTATION STATISTICS ----------------------------------------------------------------total size of $I data: $I rows: estimated $I rows if optimal: estimated row fragmentation: garbage docids: estimated garbage size: most fragmented tokens: telecommunications industry (THEME) science and technology (THEME) EMAIL (FIELD SECTION "SOURCE") DEC (FIELD SECTION "TIMESTAMP") electronic mail (THEME) computer networking (THEME) communications (THEME) 95 (FIELD SECTION "TIMESTAMP") HEADLINE (ZONE SECTION) 15 (FIELD SECTION "TIMESTAMP")
TOKEN_INFO Creates a report showing the information for a token, decoded. This procedure will fully scan the info for a token, so it may take a long time to run for really large tokens. You can call this operation as a procedure with an IN OUT CLOB parameter or as a function that returns the report as a CLOB.
Syntax procedure CTX_REPORT.TOKEN_INFO( index_name in varchar2, report in out nocopy clob, token in varchar2, token_type in number, part_name in varchar2 default null, raw_info in boolean default FALSE, decoded_info in boolean default TRUE ); function CTX_REPORT.TOKEN_INFO( index_name in varchar2, token in varchar2, token_type in number, part_name in varchar2 default null, raw_info in varchar2 default ’N’, decoded_info in varchar2 default ’Y’ ) return clob;
index_name
Specify the name of the index. report
Specify the CLOB locator to which to write the report. If report is NULL, a session-duration temporary CLOB will be created and returned. It is the caller’s responsibility to free this temporary CLOB as needed. The report clob will be truncated before report is generated, so any existing contents will be overwritten by this call token may be case-sensitive, depending on the passed-in token type.
11-12 Oracle Text Reference
TOKEN_INFO
token
Specify the token text. token_type
Specify the token type. THEME, ZONE, ATTR, PATH, and PATH ATTR tokens are case-sensitive. Everything else gets passed through the lexer, so if the index’s lexer is case-sensitive, the token input is case-sensitive. part_name
Specify the name of the index partition. If the index is a local partitioned index, then part_name must be provided. TOKEN_ INFO will apply to just that index partition. raw_info
Specify TRUE to include a hex dump of the index data. If raw_info is TRUE, the report will include a hex dump of the raw data in the token_info column. decoded_info
Specify decode and include docid and offset data. If decoded_info is FALSE, ctx_ report will not attempt to decode the token information. This is useful when you just want a dump of data. resolve_docids
Specify TRUE to resolve docids to rowids. To facilitate inline invocation, the boolean arguments are varchar2 in the function variant. You can pass in ’Y’, ’N’, ’YES’, ’NO’, ’T’, ’F’, ’TRUE’, or ’FALSE’
CTX_REPORT
11-13
TOKEN_TYPE
TOKEN_TYPE This is a helper function which translates an English name into a numeric token type. This is suitable for use with token_info, or any other CTX API which takes in a token_type.
function token_type( index_name in varchar2, type_name in varchar2 ) return number; TOKEN_TYPE_TEXT TOKEN_TYPE_THEME TOKEN_TYPE_ZONE_SEC TOKEN_TYPE_ATTR_TEXT TOKEN_TYPE_ATTR_SEC TOKEN_TYPE_PREFIX TOKEN_TYPE_PATH_SEC TOKEN_TYPE_PATH_ATTR TOKEN_TYPE_STEM
number number number number number number number number number
:= := := := := := := := :=
0; 1; 2; 4; 5; 6; 7; 8; 9;
index_name
Specify the name of the index. type_name
Specify an English name for token_type. The following strings are legal input. All input is case-insensitive. Input
Meaning
Type Returned
TEXT
Normal text token.
0
THEME
Theme token.
1
ZONE SEC
Zone token.
2
ATTR TEXT
Text that occurs in attribute.
4
ATTR SEC
Attribute section.
5
PREFIX
Prefix token.
6
PATH SEC
Path section.
7
11-14 Oracle Text Reference
TOKEN_TYPE
Input
Meaning
Type Returned
PATH ATTR
Path attribute section.
8
STEM
Stem form token.
9
FIELD TEXT
Text token in field section
16-79
FILED PREFIX Prefix token in field section 616-916 FIELD STEM
Stem token in field section
916-979
For FIELD types, the index meta-data needs to be read, so if you are going to be calling this a lot for such things, you might want to consider caching the values in local variables rather than calling token_type over and over again. The constant types (0 - 9) also have constants in this package defined.
Example typenum := ctx_report.token_type(’myindex’, ’field author text’);
CTX_REPORT
11-15
TOKEN_TYPE
11-16 Oracle Text Reference
12 CTX_THES Package This chapter provides reference information for using the CTX_THES package to manage and browse thesauri. These thesaurus functions are based on the ISO-2788 and ANSI Z39.19 standards except where noted. Knowing how information is stored in your thesaurus helps in writing queries with thesaurus operators. You can also use a thesaurus to extend the knowledge base, which is used for ABOUT queries in English and French and for generating document themes. CTX_THES contains the following stored procedures and functions: Name
Description
ALTER_PHRASE
Alters thesaurus phrase.
ALTER_THESAURUS
Renames or truncates a thesaurus.
BT
Returns all broader terms of a phrase.
BTG
Returns all broader terms generic of a phrase.
BTI
Returns all broader terms instance of a phrase.
BTP
Returns all broader terms partitive of a phrase.
CREATE_PHRASE
Adds a phrase to the specified thesaurus.
CREATE_RELATION
Creates a relation between two phrases.
CREATE_THESAURUS
Creates the specified thesaurus.
CREATE_TRANSLATION
Creates a new translation for a phrase.
DROP_PHRASE
Removes a phrase from thesaurus.
DROP_RELATION
Removes a relation between two phrases.
CTX_THES Package 12-1
Name
Description
DROP_THESAURUS
Drops the specified thesaurus from the thesaurus tables.
DROP_TRANSLATION
Drops a translation for a phrase.
HAS_RELATION
Tests for the existence of a thesaurus relation.
NT
Returns all narrower terms of a phrase.
NTG
Returns all narrower terms generic of a phrase.
NTI
Returns all narrower terms instance of a phrase.
NTP
Returns all narrower terms partitive of a phrase.
OUTPUT_STYLE
Sets the output style for the expansion functions.
PT
Returns the preferred term of a phrase.
RT
Returns the related terms of a phrase
SN
Returns scope note for phrase.
SYN
Returns the synonym terms of a phrase
THES_TT
Returns all top terms for phrase.
TR
Returns the foreign equivalent of a phrase.
TRSYN
Returns the foreign equivalent of a phrase, synonyms of the phrase, and foreign equivalent of the synonyms.
TT
Returns the top term of a phrase.
UPDATE_TRANSLATION
Updates an existing translation.
See Also: Chapter 3, "CONTAINS Query Operators" for more
information about the thesaurus operators.
12-2
Oracle Text Reference
ALTER_PHRASE
ALTER_PHRASE Alters an existing phrase in the thesaurus. Only CTXSYS or thesaurus owner can alter a phrase.
Syntax CTX_THES.ALTER_PHRASE(tname phrase op operand
in varchar2, in varchar2, in varchar2, in varchar2 default null);
tname
Specify thesaurus name. phrase
Specify phrase to alter. op
Specify the alter operation as a string or symbol. You can specify one of the following operations with the op and operand pair:' op
meaning
operand
RENAME
Rename phrase. If the new phrase already exists in the thesaurus, this procedure raises an exception.
Specify new phrase. You can include qualifiers to change, add, or remove qualifiers from phrases.
or
CTX_THES.OP_RENAME PT or
CTX_THES.OP_PT SN or
Make phrase the preferred (none) term. Existing preferred terms in the synonym ring becomes non-preferred synonym. Change the scope note on the phrase.
Specify new scope note.
CTX_THES.OP_SN operand
Specify argument to the alter operation. See table for op.
CTX_THES Package 12-3
ALTER_PHRASE
Examples Correct misspelled word in thesaurus: ctx_thes.alter_phrase(’thes1’, ’tee’, ’rename’, ’tea’);
Remove qualifier from mercury (metal): ctx_thes.alter_phrase(’thes1’, ’mercury (metal)’, ’rename’, ’mercury’);
Add qualifier to mercury: ctx_thes.alter_phrase(’thes1’, ’mercury’, ’rename’, ’mercury (planet)’);
Make Kowalski the preferred term in its synonym ring: ctx_thes.alter_phrase(’thes1’, ’Kowalski’, ’pt’);
Change scope note for view cameras: ctx_thes.alter_phrase(’thes1’, ’view cameras’, ’sn’, ’Cameras with lens focusing’);
12-4
Oracle Text Reference
ALTER_THESAURUS
ALTER_THESAURUS Use this procedure to rename or truncate an existing thesaurus. Only the thesaurus owner or CTXSYS can invoke this function on a given thesaurus.
Syntax CTX_THES.ALTER_THESAURUS(tname in op in operand in
varchar2, varchar2, varchar2 default null);
tname
Specify the thesaurus name. op
Specify the alter operation as a string or symbol. You can specify one of two operations: op
Meaning
operand
RENAME
Rename thesaurus. Returns an error if the new name already exists.
Specify new thesaurus name.
Truncate thesaurus.
None.
or
CTX_THES.OP_RENAME TRUNCATE or
CTX_THES.OP_TRUNCATE operand
Specify the argument to the alter operation. See table for op.
Examples Rename thesaurus THES1 to MEDICAL: ctx_thes.alter_thesaurus(’thes1’, ’rename’, ’medical’);
or ctx_thes.alter_thesaurus(’thes1’, ctx_thes.op_rename, ’medical’);
You can use symbols for any op argument, but all further examples will use strings.
CTX_THES Package 12-5
ALTER_THESAURUS
Remove all phrases and relations from thesaurus THES1: ctx_thes.alter_thesaurus(’thes1’, ’truncate’);
12-6
Oracle Text Reference
BT
BT This function returns all broader terms of a phrase as recorded in the specified thesaurus.
Syntax 1: Table Result CTX_THES.BT(restab phrase lvl tname
IN IN IN IN
OUT NOCOPY EXP_TAB, VARCHAR2, NUMBER DEFAULT 1, VARCHAR2 DEFAULT 'DEFAULT');
Syntax 2: String Result CTX_THES.BT(phrase IN VARCHAR2, lvl IN NUMBER DEFAULT 1, tname IN VARCHAR2 DEFAULT 'DEFAULT') RETURN VARCHAR2;
restab
Optionally, specify the name of the expansion table to store the results. This table must be of type EXP_TAB which the system defines as follows: type exp_rec is record ( xrel varchar2(12), xlevel number, xphrase varchar2(256) ); type exp_tab is table of exp_rec index by binary_integer;
See Also: "CTX_THES Result Tables and Data Types" in Appendix A, "Result Tables" for more information about EXP_TAB. phrase
Specify phrase to lookup in thesaurus. lvl
Specify how many levels of broader terms to return. For example 2 means get the broader terms of the broader terms of the phrase. tname
Specify thesaurus name. If not specified, system default thesaurus is used.
CTX_THES Package 12-7
BT
Returns This function returns a string of broader terms in the form: {bt1}|{bt2}|{bt3} ...
Example String Result Consider a thesaurus named MY_THES that has an entry for cat as follows: cat BT1 feline BT2 mammal BT3 vertebrate BT4 animal
To look up the broader terms for cat up to two levels, issue the following statements: set serveroutput on declare terms varchar2(2000); begin terms := ctx_thes.bt('CAT', 2, 'MY_THES'); dbms_output.put_line('The broader expansion for CAT is: '||terms); end;
This code produces the following output: The broader expansion for CAT is: {cat}|{feline}|{mammal}
Table Result The following code does an broader term lookup for white wolf using the table result: set serveroutput on declare xtab ctx_thes.exp_tab; begin ctx_thes.bt(xtab, ’white wolf’, 2, ’my_thesaurus’); for i in 1..xtab.count loop dbms_output.put_line(xtab(i).rel||’ ’||xtab(i).phrase);
12-8
Oracle Text Reference
BT
end loop; end;
This code produces the following output: PHRASE WHITE WOLF BT WOLF BT CANINE BT ANIMAL
Related Topics OUTPUT_STYLE Broader Term (BT, BTG, BTP, BTI) Operators in Chapter 3, "CONTAINS Query Operators"
CTX_THES Package 12-9
BTG
BTG This function returns all broader terms generic of a phrase as recorded in the specified thesaurus.
Syntax 1: Table Result CTX_THES.BTG(restab phrase lvl tname
IN IN IN IN
OUT NOCOPY EXP_TAB, VARCHAR2, NUMBER DEFAULT 1, VARCHAR2 DEFAULT 'DEFAULT');
Syntax 2: String Result CTX_THES.BTG(phrase IN VARCHAR2, lvl IN NUMBER DEFAULT 1, tname IN VARCHAR2 DEFAULT 'DEFAULT') RETURN VARCHAR2;
restab
Optionally, specify the name of the expansion table to store the results. This table must be of type EXP_TAB which the system defines as follows: type exp_rec is record ( xrel varchar2(12), xlevel number, xphrase varchar2(256) ); type exp_tab is table of exp_rec index by binary_integer;
See Also: "CTX_THES Result Tables and Data Types" in
Appendix A, "Result Tables" for more information about EXP_TAB. phrase
Specify phrase to lookup in thesaurus. lvl
Specify how many levels of broader terms to return. For example 2 means get the broader terms of the broader terms of the phrase. tname
Specify thesaurus name. If not specified, system default thesaurus is used.
12-10 Oracle Text Reference
BTG
Returns This function returns a string of broader terms generic in the form: {bt1}|{bt2}|{bt3} ...
Example To look up the broader terms generic for cat up to two levels, issue the following statements: set serveroutput on declare terms varchar2(2000); begin terms := ctx_thes.btg('CAT', 2, 'MY_THES'); dbms_output.put_line('the broader expansion for CAT is: '||terms); end;
Related Topics OUTPUT_STYLE Broader Term (BT, BTG, BTP, BTI) Operators in Chapter 3, "CONTAINS Query Operators"
CTX_THES Package 12-11
BTI
BTI This function returns all broader terms instance of a phrase as recorded in the specified thesaurus.
Syntax 1: Table Result CTX_THES.BTI(restab phrase lvl tname
IN IN IN IN
OUT NOCOPY EXP_TAB, VARCHAR2, NUMBER DEFAULT 1, VARCHAR2 DEFAULT 'DEFAULT');
Syntax 2: String Result CTX_THES.BTI(phrase IN VARCHAR2, lvl IN NUMBER DEFAULT 1, tname IN VARCHAR2 DEFAULT 'DEFAULT') RETURN VARCHAR2;
restab
Optionally, specify the name of the expansion table to store the results. This table must be of type EXP_TAB which the system defines as follows: type exp_rec is record ( xrel varchar2(12), xlevel number, xphrase varchar2(256) ); type exp_tab is table of exp_rec index by binary_integer;
See Also: "CTX_THES Result Tables and Data Types" in
Appendix A, "Result Tables" for more information about EXP_TAB. phrase
Specify phrase to lookup in thesaurus. lvl
Specify how many levels of broader terms to return. For example 2 means get the broader terms of the broader terms of the phrase. tname
Specify thesaurus name. If not specified, system default thesaurus is used.
12-12 Oracle Text Reference
BTI
Returns This function returns a string of broader terms instance in the form: {bt1}|{bt2}|{bt3} ...
Example To look up the broader terms instance for cat up to two levels, issue the following statements: set serveroutput on declare terms varchar2(2000); begin terms := ctx_thes.bti('CAT', 2, 'MY_THES'); dbms_output.put_line('the broader expansion for CAT is: '||terms); end;
Related Topics OUTPUT_STYLE Broader Term (BT, BTG, BTP, BTI) Operators in Chapter 3, "CONTAINS Query Operators"
CTX_THES Package 12-13
BTP
BTP This function returns all broader terms partitive of a phrase as recorded in the specified thesaurus.
Syntax 1: Table Result CTX_THES.BTP(restab phrase lvl tname
IN IN IN IN
OUT NOCOPY EXP_TAB, VARCHAR2, NUMBER DEFAULT 1, VARCHAR2 DEFAULT 'DEFAULT');
Syntax 2: String Result CTX_THES.BTP(phrase IN VARCHAR2, lvl IN NUMBER DEFAULT 1, tname IN VARCHAR2 DEFAULT 'DEFAULT') RETURN VARCHAR2;
restab
Optionally, specify the name of the expansion table to store the results. This table must be of type EXP_TAB which the system defines as follows: type exp_rec is record ( xrel varchar2(12), xlevel number, xphrase varchar2(256) ); type exp_tab is table of exp_rec index by binary_integer;
See Also: "CTX_THES Result Tables and Data Types" in
Appendix A, "Result Tables" for more information about EXP_TAB. phrase
Specify phrase to lookup in thesaurus. lvl
Specify how many levels of broader terms to return. For example 2 means get the broader terms of the broader terms of the phrase. tname
Specify thesaurus name. If not specified, the system default thesaurus is used.
12-14 Oracle Text Reference
BTP
Returns This function returns a string of broader terms in the form: {bt1}|{bt2}|{bt3} ...
Example To look up the 2 broader terms partitive for cat, issue the following statements: declare terms varchar2(2000); begin terms := ctx_thes.btp('CAT', 2, 'MY_THES'); dbms_output.put_line('the broader expansion for CAT is: '||terms); end;
Related Topics OUTPUT_STYLE Broader Term (BT, BTG, BTP, BTI) Operators in Chapter 3, "CONTAINS Query Operators"
CTX_THES Package 12-15
CREATE_PHRASE
CREATE_PHRASE The CREATE_PHRASE procedure adds a new phrase to the specified thesaurus. Note: Even though you can create thesaurus relations with this
procedure, Oracle recommends that you use CTX_THES.CREATE_ RELATION rather than CTX_THES.CREATE_PHRASE to create relations in a thesaurus.
Specify the name of the thesaurus in which the new phrase is added or the existing phrase is located. phrase
Specify the phrase to be added to a thesaurus or the phrase for which a new relationship is created. rel
Specify the new relationship between phrase and relname. This parameter is supported only for backward compatibility. Use CTX_THES.CREATE_RELATION to create new relations in a thesaurus. relname
Specify the existing phrase that is related to phrase. This parameter is supported only for backward compatibility. Use CTX_THES.CREATE_RELATION to create new relations in a thesaurus.
Returns The ID for the entry.
12-16 Oracle Text Reference
CREATE_PHRASE
Examples Creating Entries for Phrases In this example, two new phrases (os and operating system) are created in a thesaurus named tech_thes. begin ctx_thes.create_phrase(’tech_thes’,’os’); ctx_thes.create_phrase(’tech_thes’,’operating system’); end;
CTX_THES Package 12-17
CREATE_RELATION
CREATE_RELATION Creates a relation between two phrases in the thesaurus. Note: Oracle recommends that you use CTX_THES.CREATE_ RELATION rather than CTX_THES.CREATE_PHRASE to create relations in a thesaurus.
Only thesaurus owner and CTXSYS can invoke this procedure on a given thesaurus.
Syntax CTX_THES.CREATE_RELATION(tname in phrase in rel in relphrase in
varchar2, varchar2, varchar2, varchar2);
tname
Specify the thesaurus name phrase
Specify the phrase to alter or create. If phrase is a disambiguated homograph, you must specify the qualifier. If phrase does not exist in the thesaurus, it is created. rel
Specify the relation to create.The relation is from phrase to relphrase. You can specify one of the following relations: relation
meaning
relphrase
BT*/NT*
Add hierarchical relation.
Specify related phrase. The relationship is interpreted from phrase to relphrase.
RT
Add associative relation.
Specify phrase to associate.
SYN
Add phrase to a synonym ring.
Specify an existing phrase in the synonym ring.
Specify language
Add translation for a phrase.
Specify new translation phrase.
12-18 Oracle Text Reference
CREATE_RELATION
relphrase
Specify the related phrase. If relphrase does not exist in tname, relphrase is created. See table for rel.
Notes The relation you specify for rel is interpreted as from phrase to relphrase. For example, consider dog with broader term animal: dog BT animal
To add this relation, specify the arguments as follows: begin CTX_THES.CREATE_RELATION(’thes’,’dog’,’BT’,’animal’); end;
Note: The order in which you specify arguments for CTX_
THES.CREATE_RELATION is different from the order you specify them with CTX_THES.CREATE_PHRASE.
Examples Create relation VEHICLE NT CAR: ctx_thes.create_relation(’thes1’, ’vehicle’, ’NT’, ’car’);
Create Japanese translation for you: ctx_thes.create_relation(’thes1’, ’you’, ’JAPANESE:’, ’kimi’);
CTX_THES Package 12-19
CREATE_THESAURUS
CREATE_THESAURUS The CREATE_THESAURUS procedure creates an empty thesaurus with the specified name in the thesaurus tables.
Syntax CTX_THES.CREATE_THESAURUS(name casesens
IN VARCHAR2, IN BOOLEAN DEFAULT FALSE);
name
Specify the name of the thesaurus to be created. The name of the thesaurus must be unique. If a thesaurus with the specified name already exists, CREATE_THESAURUS returns an error and does not create the thesaurus. casesens
Specify whether the thesaurus to be created is case-sensitive. If casesens is true, Oracle retains the cases of all terms entered in the specified thesaurus. As a result, queries that use the thesaurus are case-sensitive.
Example begin ctx_thes.create_thesaurus(’tech_thes’, FALSE); end;
12-20 Oracle Text Reference
CREATE_TRANSLATION
CREATE_TRANSLATION Use this procedure to create a new translation for a phrase in a specified language.
Syntax CTX_THES.CREATE_TRANSLATION(tname phrase language translation
in in in in
varchar2, varchar2, varchar2, varchar2);
tname
Specify the name of the thesaurus, using no more than 30 characters. phrase
Specify the phrase in the thesaurus to which to add a translation. Phrase must already exist in the thesaurus, or an error is raised. language
Specify the language of the translation, using no more than 10 characters. translation
Specify the translated term, using no more than 256 characters. If a translation for this phrase already exists, this new translation is added without removing that original translation, so long as that original translation is not the same. Adding the same translation twice results in an error.
Example The following code adds the Spanish translation for dog to my_thes: begin ctx_thes.create_translation(’my_thes’, ’dog’, ’SPANISH’, ’PERRO’); end;
CTX_THES Package 12-21
DROP_PHRASE
DROP_PHRASE Removes a phrase from the thesaurus. Only thesaurus owner and CTXSYS can invoke this procedure on a given thesaurus.
Syntax CTX_THES.DROP_PHRASE(tname phrase
in varchar2, in varchar2);
tname
Specify thesaurus name. phrase
Specify phrase to drop. If phrase is a disambiguated homograph, you must include the qualifier. When phrase does not exist in tname, this procedure raises and exception. BT* / NT* relations are patched around the dropped phrase. For example, if A has a BT B, and B has BT C, after B is dropped, A has BT C. When a word has multiple broader terms, then a relationship is established for each narrower term to each broader term. Note that BT, BTG, BTP, and BTI are separate hierarchies, so if A has BTG B, and B has BTI C, when B is dropped, there is no relation implicitly created between A and C. RT relations are not patched. For example, if A has RT B, and B has RT C, then if B is dropped, there is no associative relation created between A and C.
Example Assume you have the following relations defined in mythes: wolf BT canine canine BT animal
You drop phrase canine: begin ctx_thes.drop_phrase(’mythes’, ’canine’); end;
12-22 Oracle Text Reference
DROP_PHRASE
The resulting thesaurus is patched and looks like: wolf BT animal
CTX_THES Package 12-23
DROP_RELATION
DROP_RELATION Removes a relation between two phrases from the thesaurus. Note: CTX_THES.DROP_RELATION removes only the relation between two phrases. Phrases are never removed by this call.
Only thesaurus owner and CTXSYS can invoke this procedure on a given thesaurus.
Syntax CTX_THES.DROP_RELATION(tname in phrase in rel in relphrase in
Specify relation to drop. The relation is from phrase to relphrase. You can specify one of the following relations: relation
meaning
relphrase
BT*/NT*
Remove hierarchical relation.
Optional specify relphrase. If not provided, all relations of that type for the phrase are removed.
RT
Remove associative relation. Optionally specify relphrase. If not provided, all RT relations for the phrase are removed.
SYN
Remove phrase from its synonym ring.
(none)
PT
Remove preferred term designation from the phrase. The phrase remains in the synonym ring.
(none)
12-24 Oracle Text Reference
DROP_RELATION
relation
meaning
relphrase
language
Remove a translation from a Optionally specify relphrase. You can phrase. specify relphrase when there are multiple translations for a phrase for the language, and you want to remove just one translation. If relphrase is NULL, all translations for the phrase for the language are removed.
relphrase
Specify the related phrase.
Notes The relation you specify for rel is interpreted as from phrase to relphrase. For example, consider dog with broader term animal: dog BT animal
To remove this relation, specify the arguments as follows: begin CTX_THES.DROP_RELATION(’thes’,’dog’,’BT’,’animal’); end;
You can also remove this relation using NT as follows: begin CTX_THES.DROP_RELATION(’thes’,’animal’,’NT’,’dog’); end;
Example Remove relation VEHICLE NT CAR: ctx_thes.drop_relation(’thes1’, ’vehicle’, ’NT’, ’car’);
Remove all narrower term relations for vehicle: ctx_thes.drop_relation(’thes1’, ’vehicle’, ’NT’);
Remove Japanese translations for me: ctx_thes.drop_relation(’thes1’, ’me’, ’JAPANESE:’);
CTX_THES Package 12-25
DROP_RELATION
Remove a specific Japanese translation for me: ctx_thes.drop_relation(’thes1’, ’me’, ’JAPANESE:’, ’boku’)
12-26 Oracle Text Reference
DROP_THESAURUS
DROP_THESAURUS The DROP_THESAURUS procedure deletes the specified thesaurus and all of its entries from the thesaurus tables.
Syntax CTX_THES.DROP_THESAURUS(name IN VARCHAR2);
name
Specify the name of the thesaurus to be dropped.
Examples begin ctx_thes.drop_thesaurus(’tech_thes’); end;
CTX_THES Package 12-27
DROP_TRANSLATION
DROP_TRANSLATION Use this procedure to remove one or more translations for a phrase.
Syntax CTX_THES.DROP_TRANSLATION (tname phrase language translation
Specify the name of the thesaurus, using no more than 30 characters. phrase
Specify the phrase in the thesaurus to which to remove a translation. The phrase must already exist in the thesaurus or an error is raised. language
Optionally, specify the language of the translation, using no more than 10 characters. If not specified, the translation must also not be specified and all translations in all languages for the phrase are removed. An error is raised if the phrase has no translations. translation
Optionally, specify the translated term to remove, using no more than 256 characters. If no such translation exists, an error is raised.
Example The following code removes the Spanish translation for dog: begin ctx_thes.drop_translation(’my_thes’, ’dog’, ’SPANISH’, ’PERRO’); end;
To remove all translations for dog in all languages: begin ctx_thes.drop_translation(’my_thes’, ’dog’); end;
12-28 Oracle Text Reference
HAS_RELATION
HAS_RELATION Use this procedure to test that a thesaurus relation exists without actually doing the expansion. The function returns TRUE if the phrase has any of the relations in the specified list.
Syntax CTX_THES.HAS_RELATION(phrase in varchar2, rel in varchar2, tname in varchar2 default ’DEFAULT’) returns boolean;
phrase
Specify the phrase. rel
Specify a single thesaural relation or a comma-separated list of relations, except PT. Specify ’ANY’ for any relation. tname
Specify the thesaurus name.
Example The following example returns TRUE if the phrase cat in the DEFAULT thesaurus has any broader terms or broader generic terms: set serveroutput on result boolean; begin result := ctx_thes.has_relation(’cat’,’BT,BTG’); if (result) then dbms_output.put_line(’TRUE’); else dbms_output.put_line(’FALSE’); end if; end;
CTX_THES Package 12-29
NT
NT This function returns all narrower terms of a phrase as recorded in the specified thesaurus.
Syntax 1: Table Result CTX_THES.NT(restab IN OUT NOCOPY EXP_TAB, phrase IN VARCHAR2, lvl IN NUMBER DEFAULT 1, tname IN VARCHAR2 DEFAULT 'DEFAULT');
Syntax 2: String Result CTX_THES.NT(phrase IN VARCHAR2, lvl IN NUMBER DEFAULT 1, tname IN VARCHAR2 DEFAULT 'DEFAULT') RETURN VARCHAR2;
restab
Optionally, specify the name of the expansion table to store the results. This table must be of type EXP_TAB which the system defines as follows: type exp_rec is record ( xrel varchar2(12), xlevel number, xphrase varchar2(256) ); type exp_tab is table of exp_rec index by binary_integer;
See Also: "CTX_THES Result Tables and Data Types" in
Appendix A, "Result Tables" for more information about EXP_TAB. phrase
Specify phrase to lookup in thesaurus. lvl
Specify how many levels of narrower terms to return. For example 2 means get the narrower terms of the narrower terms of the phrase. tname
Specify thesaurus name. If not specified, system default thesaurus is used.
12-30 Oracle Text Reference
NT
Returns This function returns a string of narrower terms in the form: {nt1}|{nt2}|{nt3} ...
Example String Result Consider a thesaurus named MY_THES that has an entry for cat as follows: cat NT domestic cat NT wild cat BT mammal mammal BT animal domestic cat NT Persian cat NT Siamese cat
To look up the narrower terms for cat down to two levels, issue the following statements: declare terms varchar2(2000); begin terms := ctx_thes.nt('CAT', 2, 'MY_THES'); dbms_output.put_line('the narrower expansion for CAT is: '||terms); end;
This code produces the following output: the narrower expansion for CAT is: {cat}|{domestic cat}|{Persian cat}|{Siamese cat}| {wild cat}
Table Result The following code does an narrower term lookup for canine using the table result: declare xtab ctx_thes.exp_tab; begin ctx_thes.nt(xtab, ’canine’, 2, ’my_thesaurus’); for i in 1..xtab.count loop dbms_output.put_line(lpad(’ ’, 2*xtab(i).xlevel) ||
CTX_THES Package 12-31
NT
xtab(i).xrel || ’ ’ || xtab(i).xphrase); end loop; end;
This code produces the following output: PHRASE CANINE NT WOLF (Canis lupus) NT WHITE WOLF NT GREY WOLF NT DOG (Canis familiaris) NT PIT BULL NT DASCHUND NT CHIHUAHUA NT HYENA (Canis mesomelas) NT COYOTE (Canis latrans)
Related Topics OUTPUT_STYLE Narrower Term (NT, NTG, NTP, NTI) Operators in Chapter 3, "CONTAINS Query Operators"
12-32 Oracle Text Reference
NTG
NTG This function returns all narrower terms generic of a phrase as recorded in the specified thesaurus.
Syntax 1: Table Result CTX_THES.NTG(restab phrase lvl tname
IN IN IN IN
OUT NOCOPY EXP_TAB, VARCHAR2, NUMBER DEFAULT 1, VARCHAR2 DEFAULT 'DEFAULT');
Syntax 2: String Result CTX_THES.NTG(phrase IN VARCHAR2, lvl IN NUMBER DEFAULT 1, tname IN VARCHAR2 DEFAULT 'DEFAULT') RETURN VARCHAR2;
restab
Optionally, specify the name of the expansion table to store the results. This table must be of type EXP_TAB which the system defines as follows: type exp_rec is record ( xrel varchar2(12), xlevel number, xphrase varchar2(256) ); type exp_tab is table of exp_rec index by binary_integer;
See Also: "CTX_THES Result Tables and Data Types" in Appendix A, "Result Tables" for more information about EXP_TAB. phrase
Specify phrase to lookup in thesaurus. lvl
Specify how many levels of narrower terms to return. For example 2 means get the narrower terms of the narrower terms of the phrase. tname
Specify thesaurus name. If not specified, system default thesaurus is used.
CTX_THES Package 12-33
NTG
Returns This function returns a string of narrower terms generic in the form: {nt1}|{nt2}|{nt3} ...
Example To look up the narrower terms generic for cat down to two levels, issue the following statements: declare terms varchar2(2000); begin terms := ctx_thes.ntg('CAT', 2, 'MY_THES'); dbms_output.put_line('the narrower expansion for CAT is: '||terms); end;
Related Topics OUTPUT_STYLE Narrower Term (NT, NTG, NTP, NTI) Operators in Chapter 3, "CONTAINS Query Operators"
12-34 Oracle Text Reference
NTI
NTI This function returns all narrower terms instance of a phrase as recorded in the specified thesaurus.
Syntax 1: Table Result CTX_THES.NTI(restab phrase lvl tname
IN IN IN IN
OUT NOCOPY EXP_TAB, VARCHAR2, NUMBER DEFAULT 1, VARCHAR2 DEFAULT 'DEFAULT');
Syntax 2: String Result CTX_THES.NTI(phrase IN VARCHAR2, lvl IN NUMBER DEFAULT 1, tname IN VARCHAR2 DEFAULT 'DEFAULT') RETURN VARCHAR2;
restab
Optionally, specify the name of the expansion table to store the results. This table must be of type EXP_TAB which the system defines as follows: type exp_rec is record ( xrel varchar2(12), xlevel number, xphrase varchar2(256) ); type exp_tab is table of exp_rec index by binary_integer;
See Also: "CTX_THES Result Tables and Data Types" in Appendix A, "Result Tables" for more information about EXP_TAB. phrase
Specify phrase to lookup in thesaurus. lvl
Specify how many levels of narrower terms to return. For example 2 means get the narrower terms of the narrower terms of the phrase. tname
Specify thesaurus name. If not specified, system default thesaurus is used.
CTX_THES Package 12-35
NTI
Returns This function returns a string of narrower terms instance in the form: {nt1}|{nt2}|{nt3} ...
Example To look up the narrower terms instance for cat down to two levels, issue the following statements: declare terms varchar2(2000); begin terms := ctx_thes.nti('CAT', 2, 'MY_THES'); dbms_output.put_line('the narrower expansion for CAT is: '||terms); end;
Related Topics OUTPUT_STYLE Narrower Term (NT, NTG, NTP, NTI) Operators in Chapter 3, "CONTAINS Query Operators"
12-36 Oracle Text Reference
NTP
NTP This function returns all narrower terms partitive of a phrase as recorded in the specified thesaurus.
Syntax 1: Table Result CTX_THES.NTP(restab phrase lvl tname
IN IN IN IN
OUT NOCOPY EXP_TAB, VARCHAR2, NUMBER DEFAULT 1, VARCHAR2 DEFAULT 'DEFAULT');
Syntax 2: String Result CTX_THES.NTP(phrase IN VARCHAR2, lvl IN NUMBER DEFAULT 1, tname IN VARCHAR2 DEFAULT 'DEFAULT') RETURN VARCHAR2;
restab
Optionally, specify the name of the expansion table to store the results. This table must be of type EXP_TAB which the system defines as follows: type exp_rec is record ( xrel varchar2(12), xlevel number, xphrase varchar2(256) ); type exp_tab is table of exp_rec index by binary_integer;
See Also: "CTX_THES Result Tables and Data Types" in Appendix A, "Result Tables" for more information about EXP_TAB. phrase
Specify phrase to lookup in thesaurus. lvl
Specify how many levels of narrower terms to return. For example 2 means get the narrower terms of the narrower terms of the phrase. tname
Specify thesaurus name. If not specified, system default thesaurus is used.
CTX_THES Package 12-37
NTP
Returns This function returns a string of narrower terms partitive in the form: {nt1}|{nt2}|{nt3} ...
Example To look up the narrower terms partitive for cat down to two levels, issue the following statements: declare terms varchar2(2000); begin terms := ctx_thes.ntp('CAT', 2, 'MY_THES'); dbms_output.put_line('the narrower expansion for CAT is: '||terms); end;
Related Topics OUTPUT_STYLE Narrower Term (NT, NTG, NTP, NTI) Operators in Chapter 3, "CONTAINS Query Operators"
12-38 Oracle Text Reference
OUTPUT_STYLE
OUTPUT_STYLE Sets the output style for the return string of the CTX_THES expansion functions. This procedure has no effect on the table results to the CTX_THES expansion functions.
Syntax CTX_THES.OUTPUT_STYLE ( showlevel IN BOOLEAN showqualify IN BOOLEAN showpt IN BOOLEAN showid IN BOOLEAN );
DEFAULT DEFAULT DEFAULT DEFAULT
FALSE, FALSE, FALSE, FALSE
showlevel
Specify TRUE to show level in BT/NT expansions. showqualify
Specify TRUE to show phrase qualifiers. showpt
Specify TRUE to show preferred terms with an asterisk *. showid
Specify TRUE to show phrase ids.
Notes The general syntax of the return string for CTX_THES expansion functions is: {pt indicator:phrase (qualifier):level:phraseid}
Preferred term indicator is an asterisk then a colon at the start of the phrase. The qualifier is in parentheses after a space at the end of the phrase. Level is a number. The following is an example return string for turkey the bird: *:TURKEY (BIRD):1:1234
CTX_THES Package 12-39
PT
PT This function returns the preferred term of a phrase as recorded in the specified thesaurus.
Syntax 1: Table Result CTX_THES.PT(restab IN OUT NOCOPY EXP_TAB, phrase IN VARCHAR2, tname IN VARCHAR2 DEFAULT 'DEFAULT') RETURN varchar2;
Syntax 2: String Result CTX_THES.PT(phrase IN VARCHAR2, tname IN VARCHAR2 DEFAULT 'DEFAULT') RETURN varchar2;
restab
Optionally, specify the name of the expansion table to store the results. This table must be of type EXP_TAB which the system defines as follows: type exp_rec is record ( xrel varchar2(12), xlevel number, xphrase varchar2(256) ); type exp_tab is table of exp_rec index by binary_integer;
See Also: "CTX_THES Result Tables and Data Types" in
Appendix A, "Result Tables" for more information about EXP_TAB. phrase
Specify phrase to lookup in thesaurus. tname
Specify thesaurus name. If not specified, system default thesaurus is used.
Returns This function returns the preferred term as a string in the form: {pt}
12-40 Oracle Text Reference
PT
Example Consider a thesaurus MY_THES with the following preferred term definition for automobile: AUTOMOBILE PT CAR
To look up the preferred term for automobile, execute the following code: declare terms varchar2(2000); begin terms := ctx_thes.pt('AUTOMOBILE','MY_THES'); dbms_output.put_line('The prefered term for automobile is: '||terms); end;
Related Topics OUTPUT_STYLE Preferred Term (PT) Operator in Chapter 3, "CONTAINS Query Operators"
CTX_THES Package 12-41
RT
RT This function returns the related terms of a term in the specified thesaurus.
Syntax 1: Table Result CTX_THES.RT(restab IN OUT NOCOPY EXP_TAB, phrase IN VARCHAR2, tname IN VARCHAR2 DEFAULT 'DEFAULT');
Syntax 2: String Result CTX_THES.RT(phrase IN VARCHAR2, tname IN VARCHAR2 DEFAULT 'DEFAULT') RETURN varchar2;
restab
Optionally, specify the name of the expansion table to store the results. This table must be of type EXP_TAB which the system defines as follows: type exp_rec is record ( xrel varchar2(12), xlevel number, xphrase varchar2(256) ); type exp_tab is table of exp_rec index by binary_integer;
See Also: "CTX_THES Result Tables and Data Types" in
Appendix A, "Result Tables" for more information about EXP_TAB. phrase
Specify phrase to lookup in thesaurus. tname
Specify thesaurus name. If not specified, system default thesaurus is used.
Returns This function returns a string of related terms in the form: {rt1}|{rt2}|{rt3}| ...
12-42 Oracle Text Reference
RT
Example Consider a thesaurus MY_THES with the following related term definition for dog: DOG RT WOLF RT HYENA
To look up the related terms for dog, execute the following code: declare terms varchar2(2000); begin terms := ctx_thes.rt('DOG','MY_THES'); dbms_output.put_line('The related terms for dog are: '||terms); end;
This codes produces the following output: The related terms for dog are: {dog}|{wolf}|{hyena}
Related Topics OUTPUT_STYLE Related Term (RT) Operator in Chapter 3, "CONTAINS Query Operators"
CTX_THES Package 12-43
SN
SN This function returns the scope note of the given phrase.
Syntax CTX_THES.SN(phrase IN VARCHAR2, tname IN VARCHAR2 DEFAULT 'DEFAULT') RETURN VARCHAR2;
phrase
Specify phrase to lookup in thesaurus. tname
Specify thesaurus name. If not specified, system default thesaurus is used.
Returns This function returns the scope note as a string.
Example declare note varchar2(80); begin note := ctx_thes.sn(’camera’,’mythes’); dbms_output.put_line(’CAMERA’); dbms_output.put_line(’ SN ’ || note); end; sample output: CAMERA SN Optical cameras
12-44 Oracle Text Reference
SYN
SYN This function returns all synonyms of a phrase as recorded in the specified thesaurus.
Syntax 1: Table Result CTX_THES.SYN(restab IN OUT NOCOPY EXP_TAB, phrase IN VARCHAR2, tname IN VARCHAR2 DEFAULT 'DEFAULT');
Syntax 2: String Result CTX_THES.SYN(phrase IN VARCHAR2, tname IN VARCHAR2 DEFAULT 'DEFAULT') RETURN VARCHAR2;
restab
Optionally, specify the name of the expansion table to store the results. This table must be of type EXP_TAB which the system defines as follows: type exp_rec is record ( xrel varchar2(12), xlevel number, xphrase varchar2(256) ); type exp_tab is table of exp_rec index by binary_integer;
See Also: "CTX_THES Result Tables and Data Types" in Appendix A, "Result Tables" for more information about EXP_TAB. phrase
Specify phrase to lookup in thesaurus. tname
Specify thesaurus name. If not specified, system default thesaurus is used.
Returns This function returns a string of the form: {syn1}|{syn2}|{syn3} ...
CTX_THES Package 12-45
SYN
Example String Result Consider a thesaurus named ANIMALS that has an entry for cat as follows: CAT SYN KITTY SYN FELINE
To look-up the synonym for cat and obtain the result as a string, issue the following statements: declare synonyms varchar2(2000); begin synonyms := ctx_thes.syn('CAT','ANIMALS'); dbms_output.put_line('the synonym expansion for CAT is: '||synonyms); end;
This code produces the following output: the synonym expansion for CAT is: {CAT}|{KITTY}|{FELINE}
Table Result The following code looks up the synonyms for canine and obtains the results in a table. The contents of the table are printed to the standard output. declare xtab ctx_thes.exp_tab; begin ctx_thes.syn(xtab, ’canine’, ’my_thesaurus’); for i in 1..xtab.count loop dbms_output.put_line(lpad(’ ’, 2*xtab(i).xlevel) || xtab(i).xrel || ’ ’ || xtab(i).xphrase); end loop; end;
This code produces the following output: PHRASE CANINE PT DOG SYN PUPPY SYN MUTT SYN MONGREL
12-46 Oracle Text Reference
SYN
Related Topics OUTPUT_STYLE SYNonym (SYN) Operator in Chapter 3, "CONTAINS Query Operators"
CTX_THES Package 12-47
THES_TT
THES_TT This procedure finds and returns all top terms of a thesaurus. A top term is defined as any term which has a narrower term but has no broader terms. This procedure differs from TT in that TT takes in a phrase and finds the top term for that phrase, but THES_TT searches the whole thesaurus and finds all top terms.
Large Thesauri Since this procedure searches the whole thesaurus, it can take some time on large thesauri. Oracle recommends that you not call this often for such thesauri. Instead, your application should call this once, store the results in a separate table, and use those stored results.
Syntax CTX_THES.THES_TT(restab IN OUT NOCOPY EXP_TAB, tname IN VARCHAR2 DEFAULT 'DEFAULT');
restab
Specify the name of the expansion table to store the results. This table must be of type EXP_TAB which the system defines as follows: type exp_rec is record ( xrel varchar2(12), xlevel number, xphrase varchar2(256) ); type exp_tab is table of exp_rec index by binary_integer;
See Also: "CTX_THES Result Tables and Data Types" in
Appendix A, "Result Tables" for more information about EXP_TAB. tname
Specify thesaurus name. If not specified, system default thesaurus is used.
Returns This procedure returns all top terms and stores them in restab.
12-48 Oracle Text Reference
TR
TR For a given mono-lingual thesuarus, this function returns the foreign language equivalent of a phrase as recorded in the thesaurus. Note: Foreign language translation is not part of the ISO-2788 or
ANSI Z39.19 thesaural standards. The behavior of TR is specific to Oracle Text.
Syntax 1: Table Result CTX_THES.TR(restab phrase lang tname
IN IN IN IN
OUT NOCOPY EXP_TAB, VARCHAR2, VARCHAR2 DEFAULT NULL, VARCHAR2 DEFAULT 'DEFAULT')
Syntax 2: String Result CTX_THES.TR(phrase IN VARCHAR2, lang IN VARCHAR2 DEFAULT NULL, tname IN VARCHAR2 DEFAULT 'DEFAULT') RETURN VARCHAR2;
restab
Optionally, specify the name of the expansion table to store the results. This table must be of type EXP_TAB which the system defines as follows: type exp_rec is record ( xrel varchar2(12), xlevel number, xphrase varchar2(256) ); type exp_tab is table of exp_rec index by binary_integer;
See Also: "CTX_THES Result Tables and Data Types" in Appendix A, "Result Tables" for more information about EXP_TAB. phrase
Specify phrase to lookup in thesaurus.
CTX_THES Package 12-49
TR
lang
Specify the foreign language. Specify ’ALL’ for all translations of phrase. tname
Specify thesaurus name. If not specified, system default thesaurus is used.
Returns This function returns a string of foreign terms in the form: {ft1}|{ft2}|{ft3} ...
Example Consider a thesaurus MY_THES with the following entries for cat: cat SPANISH: gato FRENCH: chat SYN lion SPANISH: leon
To look up the translation for cat, you can issue the following statements: declare trans varchar2(2000); span_trans varchar2(2000); begin trans := ctx_thes.tr(’CAT’,’ALL’,’MY_THES’); span_trans := ctx_thes.tr(’CAT’,’SPANISH’,’MY_THES’) dbms_output.put_line(’the translations for CAT are: ’||trans); dbms_output.put_line(’the Spanish translations for CAT are: ’||span_trans); end;
This codes produces the following output: the translations for CAT are: {CAT}|{CHAT}|{GATO} the Spanish translations for CAT are: {CAT}|{GATO}
Related Topics OUTPUT_STYLE Translation Term (TR) Operator in Chapter 3, "CONTAINS Query Operators"
12-50 Oracle Text Reference
TRSYN
TRSYN For a given mono-lingual thesuarus, this function returns the foreign equivalent of a phrase, synonyms of the phrase, and foreign equivalent of the synonyms as recorded in the specified thesaurus.
Note: Foreign language translation is not part of the ISO-2788 or
ANSI Z39.19 thesaural standards. The behavior of TRSYN is specific to Oracle Text.
Syntax 1: Table Result CTX_THES.TRSYN(restab phrase lang tname
IN IN IN IN
OUT NOCOPY EXP_TAB, VARCHAR2, VARCHAR2 DEFAULT NULL, VARCHAR2 DEFAULT 'DEFAULT');
Syntax 2: String Result CTX_THES.TRSYN(phrase IN VARCHAR2, lang IN VARCHAR2 DEFAULT NULL, tname IN VARCHAR2 DEFAULT 'DEFAULT') RETURN VARCHAR2;
restab
Optionally, specify the name of the expansion table to store the results. This table must be of type EXP_TAB which the system defines as follows: type exp_rec is record ( xrel varchar2(12), xlevel number, xphrase varchar2(256) ); type exp_tab is table of exp_rec index by binary_integer;
See Also: "CTX_THES Result Tables and Data Types" in Appendix A, "Result Tables" for more information about EXP_TAB. phrase
Specify phrase to lookup in thesaurus.
CTX_THES Package 12-51
TRSYN
lang
Specify the foreign language. Specify ’ALL’ for all translations of phrase. tname
Specify thesaurus name. If not specified, system default thesaurus is used.
Returns This function returns a string of foreign terms in the form: {ft1}|{ft2}|{ft3} ...
Example Consider a thesaurus MY_THES with the following entries for cat: cat SPANISH: gato FRENCH: chat SYN lion SPANISH: leon
To look up the translation and synonyms for cat, you can issue the following statements: declare synonyms varchar2(2000); span_syn varchar2(2000); begin synonyms := ctx_thes.trsyn(’CAT’,’ALL’,’MY_THES’); span_syn := ctx_thes.trsyn(’CAT’,’SPANISH’,’MY_THES’) dbms_output.put_line(’all synonyms for CAT are: ’||synonyms); dbms_output.put_line(’the Spanish synonyms for CAT are: ’||span_syn); end;
This codes produces the following output: all synonyms for CAT are: {CAT}|{CHAT}|{GATO}|{LION}|{LEON} the Spanish synonyms for CAT are: {CAT}|{GATO}|{LION}|{LEON}
Related Topics OUTPUT_STYLE Translation Term Synonym (TRSYN) Operator in Chapter 3, "CONTAINS Query Operators"
12-52 Oracle Text Reference
TT
TT This function returns the top term of a phrase as recorded in the specified thesaurus.
Syntax 1: Table Result CTX_THES.TT(restab IN OUT NOCOPY EXP_TAB, phrase IN VARCHAR2, tname IN VARCHAR2 DEFAULT 'DEFAULT');
Syntax 2: String Result CTX_THES.TT(phrase IN VARCHAR2, tname IN VARCHAR2 DEFAULT 'DEFAULT') RETURN varchar2;
restab
Optionally, specify the name of the expansion table to store the results. This table must be of type EXP_TAB which the system defines as follows: type exp_rec is record ( xrel varchar2(12), xlevel number, xphrase varchar2(256) ); type exp_tab is table of exp_rec index by binary_integer;
See Also: "CTX_THES Result Tables and Data Types" in Appendix A, "Result Tables" for more information about EXP_TAB. phrase
Specify phrase to lookup in thesaurus. tname
Specify thesaurus name. If not specified, system default thesaurus is used.
Returns This function returns the top term string in the form: {tt}
CTX_THES Package 12-53
TT
Example Consider a thesaurus MY_THES with the following broader term entries for dog: DOG BT1 CANINE BT2 MAMMAL BT3 VERTEBRATE BT4 ANIMAL
To look up the top term for DOG, execute the following code: declare terms varchar2(2000); begin terms := ctx_thes.tt('DOG','MY_THES'); dbms_output.put_line('The top term for DOG is: '||terms); end;
This code produces the following output: The top term for dog is: {ANIMAL}
Related Topics OUTPUT_STYLE Top Term (TT) Operator in Chapter 3, "CONTAINS Query Operators"
12-54 Oracle Text Reference
UPDATE_TRANSLATION
UPDATE_TRANSLATION Use this procedure to update an existing translation.
Syntax CTX_THES.UPDATE_TRANSLATION(tname in varchar2, phrase in varchar2, language in varchar2, translation in varchar2, new_translation in varchar2);
tname
Specify the name of the thesaurus, using no more than 30 characters. phrase
Specify the phrase in the thesaurus to which to update a translation. The phrase must already exist in the thesaurus or an error is raised. language
Specify the language of the translation, using no more than 10 characters. translation
Specify the translated term to update. If no such translation exists, an error is raised. You can specify NULL if there is only one translation for the phrase. An error is raised if there is more than one translation for the term in the specified language. new_translation
Optionally, specify the new form of the translated term.
Example The following code updates the Spanish translation for dog: begin ctx_thes.update_translation(’my_thes’, ’dog’, ’SPANISH:’, ’PERRO’, ’CAN’); end;
CTX_THES Package 12-55
UPDATE_TRANSLATION
12-56 Oracle Text Reference
13 CTX_ULEXER Package This chapter provides reference information for using the CTX_ULEXER PL/SQL package to use with the user-lexer. CTX_ULEXER declares the following type: Name
Description
WILDCARD_TAB
Index-by table type you use to specify the offset of characters to be treated as wildcard characters by the user-defined lexer query procedure.
CTX_ULEXER Package 13-1
WILDCARD_TAB
WILDCARD_TAB TYPE WILDCARD_TAB IS TABLE OF NUMBER INDEX BY BINARY_INTEGER;
Use this index-by table type to specify the offset of those characters in the query word to be treated as wildcard characters by the user-defined lexer query procedure.
13-2
Oracle Text Reference
14 Executables This chapter discusses the executables shipped with Oracle Text. The following topics are discussed: ■
Thesaurus Loader (ctxload)
■
Knowledge Base Extension Compiler (ctxkbtc)
Executables 14-1
Thesaurus Loader (ctxload)
Thesaurus Loader (ctxload) Use ctxload to do the following with a thesaurus: ■
import a thesaurus file into the Oracle Text thesaurus tables.
■
export a loaded thesaurus to a user-specified operating-system file.
An import file is an ASCII flat file that contains entries for synonyms, broader terms, narrower terms, or related terms which can be used to expand queries. See Also: For examples of import files for thesaurus importing,
see "Structure of ctxload Thesaurus Import File" in Appendix C, "Loading Examples".
Text Loading The ctxload program no longer supports the loading of text columns. To load files to a text column in batch, Oracle recommends that you use SQL*Loader. See Also: "SQL*Loader Example" in Appendix C, "Loading
Specify the username and password of the user running ctxload.
14-2
Oracle Text Reference
Thesaurus Loader (ctxload)
The username and password can be followed immediately by @sqlnet_address to permit logon to remote databases. The value for sqlnet_address is a database connect string. If the TWO_TASK environment variable is set to a remote database, you do not have to specify a value for sqlnet_address to connect to the database. -name object_name
When you use ctxload to export/import a thesaurus, use object_name to specify the name of the thesaurus to be exported/imported. You use object_name to identify the thesaurus in queries that use thesaurus operators. Note: Thesaurus name must be unique. If the name specified for
the thesaurus is identical to an existing thesaurus, ctxload returns an error and does not overwrite the existing thesaurus. When you use ctxload to update/export a text field, use object_name to specify the index associated with the text column. -file file_name
When ctxload is used to import a thesaurus, use file_name to specify the name of the import file which contains the thesaurus entries. When ctxload is used to export a thesaurus, use file_name to specify the name of the export file created by ctxload. Note: If the name specified for the thesaurus dump file is identical
to an existing file, ctxload overwrites the existing file.
Optional Arguments -thes
Import a thesaurus. Specify the source file with the -file argument. You specify the name of the thesaurus to be imported with -name. -thescase y | n
Specify y to create a case-sensitive thesaurus with the name specified by -name and populate the thesaurus with entries from the thesaurus import file specified by -file. If -thescase is y (the thesaurus is case-sensitive), ctxload enters the terms in the thesaurus exactly as they appear in the import file.
Executables 14-3
Thesaurus Loader (ctxload)
The default for -thescase is n (case-insensitive thesaurus) Note: -thescase is valid for use with only the -thes argument. -thesdump
Export a thesaurus. Specify the name of the thesaurus to be exported with the -name argument. Specify the destination file with the -file argument. -log
Specify the name of the log file to which ctxload writes any national-language supported (Globalization Support) messages generated during processing. If you do not specify a log file name, the messages appear on the standard output. -trace
Enables SQL statement tracing using ALTER SESSION SET SQL_TRACE TRUE. This command captures all processed SQL statements in a trace file, which can be used for debugging. The location of the trace file is operating-system dependent and can be modified using the USER_DUMP_DEST initialization parameter. See Also: For more information about SQL trace and the USER_
DUMP_DEST initialization parameter, see Oracle9i Database Administrator’s Guide -pk
Specify the primary key value of the row to be updated or exported. When the primary key is compound, you must enclose the values within double quotes and separate the keys with a comma. -export
Exports the contents of a CLOB or BLOB column in a database table into the operating system file specified by -file. ctxload exports the CLOB or BLOB column in the row specified by -pk. When you use the -export, you must specify a primary key with -pk. -update
Updates the contents of a CLOB or BLOB column in a database table with the contents of the operating system file specified by -file. ctxload updates the CLOB or BLOB column in for the row specified by -pk. When you use -update, you must specify a primary key with -pk.
14-4
Oracle Text Reference
Thesaurus Loader (ctxload)
ctxload Examples This section provides examples for some of the operations that ctxload can perform. See Also: For more document loading examples, see Appendix C, "Loading Examples".
Thesaurus Import Example The following example imports a thesaurus named tech_doc from an import file named tech_thesaurus.txt: ctxload -user jsmith/123abc -thes -name tech_doc -file tech_thesaurus.txt
Thesaurus Export Example The following example dumps the contents of a thesaurus named tech_doc into a file named tech_thesaurus.out: ctxload -user jsmith/123abc -thesdump -name tech_doc -file tech_thesaurus.out
Executables 14-5
Knowledge Base Extension Compiler (ctxkbtc)
Knowledge Base Extension Compiler (ctxkbtc) The knowledge base is the information source Oracle Text uses to perform theme analysis, such as theme indexing, processing ABOUT queries, and document theme extraction with the CTX_DOC package. A knowledge base is supplied for English and French. With the ctxkbtc compiler, you can do the following: ■
■
Extend your knowledge base by compiling one or more thesauri with the Oracle Text knowledge base. The extended information can be application-specific terms and relationships. During theme analysis, the extended portion of the knowledge base overrides any terms and relationships in the knowledge base where there is overlap. Create a new user-defined knowledge base by compiling one or more thesauri. In languages other than English and French, this feature can be used to create a language-specific knowledge base. See Also: For more information about the knowledge base
packaged with Oracle Text, see Appendix I, "English Knowledge Base Category Hierarchy". For more information about the ABOUT operator, see ABOUT operator in Chapter 3, "CONTAINS Query Operators". For more information about document services, see Chapter 8, "CTX_DOC Package".
Knowledge Base Character Set Knowledge bases can be in any single-byte character set. Supplied knowledge bases are in WE8ISO8859P1. You can store an extended knowledge base in another character set such as US7ASCII.
Specify the username and password for the administrator creating an extended knowledge base. This user must have write permission to the ORACLE_HOME directory. -name thesname1 [thesname2 ... thesname16]
Specify the name(s) of the thesauri (up to 16) to be compiled with the knowledge base to create the extended knowledge base. The thesauri you specify must already be loaded with ctxload with the -thescase Y option -revert
Reverts the extended knowledge base to the default knowledge base provided by Oracle Text. -stoplist stoplistname
Specify the name of the stoplist. Stopwords in the stoplist are added to the knowledge base as useless words that are prevented from becoming themes or contributing to themes. You can still add stopthemes after running this command using CTX_DLL.ADD_STOPTHEME. -verbose
Displays all warnings and messages, including non-Globalization Support messages, to the standard output. -log
Specify the log file for storing all messages. When you specify a log file, no messages are reported to standard out.
ctxkbtc Usage Notes ■
■
■
■
Before running ctxkbtc, you must set the NLS_LANG environment variable to match the database character set. The user issuing ctxkbtc must have write permission to the ORACLE_HOME, since the program writes files to this directory. Before being compiled, each thesaurus must be loaded into Oracle Text case sensitive with the "-thescase Y" option in ctxload. Running ctxkbtc twice removes the previous extension.
Executables 14-7
Knowledge Base Extension Compiler (ctxkbtc)
ctxkbtc Limitations The ctxkbtc program has the following limitations: ■
■
■
When upgrading or downgrading your database to a different release, Oracle recommends that you recompile your extended knowledge base in the new environment for theme indexing and related features to work correctly. Knowledge base extension cannot be performed when theme indexing is being performed. In addition, any SQL sessions that are using Oracle Text functions must be exited and reopened to make use of the extended knowledge base. There can be only one user extension per installation. Since a user extension affects all users at the installation, only administrators or terminology managers should extend the knowledge base.
ctxkbtc Constraints on Thesaurus Terms Terms are case sensitive. If a thesaurus has a term in uppercase, for example, the same term present in lowercase form in a document will not be recognized. The maximum length of a term is 80 characters. Disambiguated homographs are not supported.
ctxkbtc Constraints on Thesaurus Relations The following constraints apply to thesaurus relations: ■
BTG and BTP are the same as BT. NTG and NTP are the same as NT.
■
Only preferred terms can have a BT, NTs or RTs.
■
If a term has no USE relation, it will be treated as its own preferred term.
■
■
An existing category cannot be made a top term.
■
There can be no cycles in BT and NT relations.
■
■
■
14-8
If a set of terms are related by SYN relations, only one of them may be a preferred term.
A term can have at most one preferred term and at most one BT. A term may have any number of NTs. An RT of a term cannot be an ancestor or descendent of the term. A preferred term may have any number of RTs up to a maximum of 32. The maximum height of a tree is 16 including the top term level.
Oracle Text Reference
Knowledge Base Extension Compiler (ctxkbtc)
■
When multiple thesauri are being compiled, a top term in one thesaurus should not have a broader term in another thesaurus. Note: The thesaurus compiler will tolerate certain violations of
the above rules. For example, if a term has multiple BTs, it ignores all but the last one it encounters. Similarly, BTs between existing knowledge base categories will only result in a warning message. Such violations are not recommended since they might produce undesired results.
Extending the Knowledge Base You can extend the supplied knowledge base by compiling one or more thesauri with the Oracle Text knowledge base. The extended information can be application-specific terms and relationships. During theme analysis, the extended portion of the knowledge base overrides any terms and relationships in the knowledge base where there is overlap. When extending the knowledge base, Oracle recommends that new terms be linked to one of the categories in the knowledge base for best results in theme proving when appropriate. See Also: For more information about the knowledge base, see Appendix I, "English Knowledge Base Category Hierarchy"
If new terms are kept completely disjoint from existing categories, fewer themes from new terms will be proven. The result of this is poorer precision and recall with ABOUT queries as well poor quality of gists and theme highlighting. You link new terms to existing terms by making an existing term the broader term for the new terms.
Example for Extending the Knowledge Base You purchase a medical thesaurus medthes containing a hierarchy of medical terms. The four top terms in the thesaurus are the following: ■
Anesthesia and Analgesia
■
Anti-Allergic and Respiratory System Agents
Executables 14-9
Knowledge Base Extension Compiler (ctxkbtc)
■
■
Anti-Inflammatory Agents, Antirheumatic Agents, and Inflammation Mediators Antineoplastic and Immunosuppressive Agents
To link these terms to the existing knowledge base, add the following entries to the medical thesaurus to map the new terms to the existing health and medicine branch: health and medicine NT Anesthesia and Analgesia NT Anti-Allergic and Respiratory System Agents NT Anti-Inflamammatory Agents, Antirheumatic Agents, and Inflamation Mediators NT Antineoplastic and Immunosuppressive Agents
Set your Globalization Support language environment variable to match the database character set. For example, if your database character set is WE8ISO8859P1 and you are using American English, set your NLS_LANG as follows: setenv NLS_LANG AMERICAN_AMERICA.WE8ISO8859P1
Assuming the medical thesaurus is in a file called med.thes, you load the thesaurus as medthes with ctxload as follows: ctxload -thes -thescase y -name medthes -file med.thes -user ctxsys/ctxsys
To link the loaded thesaurus medthes to the knowledge base, use ctxkbtc as follows: ctxkbtc -user ctxsys/ctxsys -name medthes
Adding a Language-Specific Knowledge Base You can extend theme functionality to languages other than English or French by loading your own knowledge base for any single-byte whitespace delimited language, including Spanish. Theme functionality includes theme indexing, ABOUT queries, theme highlighting, and the generation of themes, gists, and theme summaries with the CTX_DOC PL/SQL package. You extend theme functionality by adding a user-defined knowledge base. For example, you can create a Spanish knowledge base from a Spanish thesuarus. To load your language-specific knowledge base, follow these steps: 1.
Load your custom thesaurus using ctxload.
14-10 Oracle Text Reference
Knowledge Base Extension Compiler (ctxkbtc)
2.
Set NLS_LANG so that the language portion is the target language. The charset portion must be a single-byte character set.
3.
Compile the loaded thesaurus using ctxkbtc:
ctxkbtc -user ctxsys/ctxsys -name my_lang_thes This command compiles your language-specific knowledge base from the loaded thesaurus. To use this knowledge base for theme analysis during indexing and ABOUT queries, specify the NLS_LANG language as the THEME_LANGUAGE attribute value for the BASIC_LEXER preference.
Limitations for Adding a Knowledge Base The following limitations hold for adding knowledge bases: ■
■
Oracle supplies knowledge bases in English and French only. You must provide your own thesaurus for any other language. You can only add knowledge bases for languages with single-byte character sets. You cannot create a knowledge base for languages which can be expressed only in multi-byte character sets. If the database is a multi-byte universal character set, such as UTF-8, the NLS_LANG parameter must still be set to a compatible single-byte character set when compiling the thesaurus.
■
Adding a knowledge base works best for whitespace delimited languages.
■
You can have at most one knowledge base per Globalization Support language.
■
Obtaining hierarchical query feedback information such as broader terms, narrower terms and related terms does not work in languages other than English and French. In other languages, the knowledge bases are derived entirely from your thesauri. In such cases, Oracle recommends that you obtain hierarchical information directly from your thesauri.
Order of Precedence for Multiple Thesauri When multiple thesauri are to be compiled, precedence is determined by the order in which thesauri are listed in the arguments to the compiler (most preferred first). A user thesaurus always has precedence over the built-in knowledge base.
Executables
14-11
Knowledge Base Extension Compiler (ctxkbtc)
Size Limits for Extended Knowledge Base The following table lists the size limits associated with creating and compiling an extended knowledge base: Description of Parameter
Limit
Number of RTs (from + to) per term
32
Number of terms per a single hierarchy (i.e., all narrower terms for a given top term)
64000
Number of new terms in an extended knowledge base
1 million
Number of separate thesauri that can be compiled into a user extension to the KB
16
14-12 Oracle Text Reference
A Result Tables This appendix describes the structure of the result tables used to store the output generated by the procedures in the CTX_QUERY, CTX_DOC, and CTX_THES packages. The following topics are discussed in this appendix: ■
CTX_QUERY Result Tables
■
CTX_DOC Result Tables
■
CTX_THES Result Tables and Data Types
Result Tables
A-1
CTX_QUERY Result Tables
CTX_QUERY Result Tables For the CTX_QUERY procedures that return results, tables for storing the results must be created before the procedure is called. The tables can be named anything, but must include columns with specific names and data types. This section describes the following types of result tables, and their required columns: ■
EXPLAIN Table
■
HFEEDBACK Table
EXPLAIN Table Table A–1 describes the structure of the table to which CTX_QUERY.EXPLAIN writes its results. Table A–1
A-2
Column Name
Datatype
Description
EXPLAIN_ID
VARCHAR2(30)
The value of the explain_id argument specified in the FEEDBACK call.
ID
NUMBER
A number assigned to each node in the query execution tree. The root operation node has ID =1. The nodes are numbered in a top-down, left-first manner as they appear in the parse tree.
PARENT_ID
NUMBER
The ID of the execution step that operates on the output of the ID step. Graphically, this is the parent node in the query execution tree. The root operation node (ID =1) has PARENT_ID = 0.
OPERATION
VARCHAR2(30)
Name of the internal operation performed. Refer to Table A–2 for possible values.
OPTIONS
VARCHAR2(30)
Characters that describe a variation on the operation described in the OPERATION column. When an OPERATION has more than one OPTIONS associated with it, OPTIONS values are concatenated in the order of processing. See Table A–3 for possible values.
OBJECT_NAME
VARCHAR2(80)
Section name, wildcard term, weight, or threshold value or term to lookup in the index.
Oracle Text Reference
CTX_QUERY Result Tables
Table A–1 Column Name
Datatype
Description
POSITION
NUMBER
The order of processing for nodes that all have the same PARENT_ID.The positions are numbered in ascending order starting at 1.
CARDINALITY
NUMBER
Reserved for future use. You should create this column for forward compatibility.
Operation Column Values Table A–2 shows the possible values for the OPERATION column of the explain table. Table A–2 Operation Value
Query Operator
Equivalent Symbol
ABOUT
ABOUT
(none)
ACCUMULATE
ACCUM
,
AND
AND
&
COMPOSITE
(none)
(none)
EQUIVALENCE
EQUIV
=
MINUS
MINUS
-
NEAR
NEAR
;
NOT
NOT
~
NO_HITS
(no hits will result from this query)
OR
OR
PHRASE
(a phrase term)
SECTION
(section)
THRESHOLD
>
>
WEIGHT
*
*
WITHIN
within
(none)
WORD
(a single term)
|
Result Tables
A-3
CTX_QUERY Result Tables
OPTIONS Column Values The following table list the possible values for the OPTIONS column of the explain table. Table A–3
A-4
Options Value
Description
($)
Stem
(?)
Fuzzy
(!)
Soundex
(T)
Order for ordered Near.
(F)
Order for unordered Near.
(n)
A number associated with the max_span parameter for the Near operator.
Oracle Text Reference
CTX_QUERY Result Tables
HFEEDBACK Table Table A–4 describes the table to which CTX_QUERY.HFEEDBACK writes its results. Table A–4 Column Name
Datatype
Description
FEEDBACK_ID
VARCHAR2(30)
The value of the feedback_id argument specified in the HFEEDBACK call.
ID
NUMBER
A number assigned to each node in the query execution tree. The root operation node has ID =1. The nodes are numbered in a top-down, left-first manner as they appear in the parse tree.
PARENT_ID
NUMBER
The ID of the execution step that operates on the output of the ID step. Graphically, this is the parent node in the query execution tree. The root operation node (ID =1) has PARENT_ID = 0.
OPERATION
VARCHAR2(30)
Name of the internal operation performed. Refer to Table A–5 for possible values.
OPTIONS
VARCHAR2(30)
Characters that describe a variation on the operation described in the OPERATION column. When an OPERATION has more than one OPTIONS associated with it, OPTIONS values are concatenated in the order of processing. See Table A–6 for possible values.
OBJECT_NAME
VARCHAR2(80)
Section name, wildcard term, weight, threshold value or term to lookup in the index.
POSITION
NUMBER
The order of processing for nodes that all have the same PARENT_ID.The positions are numbered in ascending order starting at 1.
BT_FEEDBACK
CTX_FEEDBACK_TYPE
Stores broader feedback terms. See Table A–7.
PT_FEEDBACK
CTX_FEEDBACK_TYPE
Stores related feedback terms. See Table A–7.
NT_FEEDBACK
CTX_FEEDBACK_TYPE
Stores narrower feedback terms. See Table A–7.
Result Tables
A-5
CTX_QUERY Result Tables
Operation Column Values Table A–5 shows the possible values for the OPERATION column of the hfeedback table. Table A–5 Operation Value
Query Operator
Equivalent Symbol
ABOUT
ABOUT
(none)
ACCUMULATE
ACCUM
,
AND
AND
&
EQUIVALENCE
EQUIV
=
MINUS
MINUS
-
NEAR
NEAR
;
NOT
NOT
~
OR
OR
|
SECTION
(section)
TEXT
word or phrase of a text query
THEME
word or phrase of an ABOUT query
THRESHOLD
>
>
WEIGHT
*
*
WITHIN
within
(none)
OPTIONS Column Values The following table list the values for the OPTIONS column of the feedback table. Table A–6
A-6
Options Value
Description
(T)
Order for ordered Near.
(F)
Order for unordered Near.
(n)
A number associated with the max_span parameter for the Near operator.
Oracle Text Reference
CTX_QUERY Result Tables
CTX_FEEDBACK_TYPE The CTX_FEEDBACK_TYPE is a nested table of objects. This datatype is pre-defined in the ctxsys schema. Use this type to define the columns BT_FEEDBACK, RT_ FEEDBACK, and NT_FEEDBACK. The nested table CTX_FEEDBACK_TYPE holds objects of type CTX_FEEDBACK_ ITEM_TYPE, which is also pre-defined in the ctxsys schema. This object is defined with three members and one method as follows: Table A–7 CTX_FEEDBACK_ITEM_TYPE Members and Methods
Type
Description
text
member
Feedback term.
cardinality
member
(reserved for future use.)
score
member
(reserved for future use.)
The SQL code that defines these objects is as follows: CREATE OR REPLACE TYPE ctx_feedback_type AS TABLE OF ctx_feedback_item_type; CREATE OR REPLACE TYPE ctx_feedback_item_type AS OBJECT (text VARCHAR2(80), cardinality NUMBER, score NUMBER, MAP MEMBER FUNCTION rank RETURN REAL, PRAGMA RESTRICT_REFERENCES (rank, RNDS, WNDS, RNPS, WNPS) ); CREATE OR REPLACE TYPE BODY ctx_feedback_item_type AS MAP MEMBER FUNCTION rank RETURN REAL IS BEGIN RETURN score; END rank; END;
See Also: For an example of how to select from the hfeedback table and its nested tables, refer to CTX_QUERY.HFEEDBACK in Chapter 10, "CTX_QUERY Package".
Result Tables
A-7
CTX_DOC Result Tables
CTX_DOC Result Tables The CTX_DOC procedures return results stored in a table. Before calling a procedure, you must create the table. The tables can be named anything, but must include columns with specific names and data types. This section describes the following result tables and their required columns: ■
Filter Table
■
Gist Table
■
Highlight Table
■
Markup Table
■
Theme Table
Filter Table A filter table stores one row for each filtered document returned by CTX_ DOC.FILTER. Filtered documents can be plain text or HTML. When you call CTX_DOC.FILTER for a document, the document is processed through the filter defined for the text column and the results are stored in the filter table you specify. Filter tables can be named anything, but must include the following columns, with names and datatypes as specified: Table A–8 Column Name
Type
Description
QUERY_ID
NUMBER
The identifier for the results generated by a particular call to CTX_DOC.FILTER (only populated when table is used to store results from multiple FILTER calls)
DOCUMENT
CLOB
Text of the document, stored in plain text or HTML.
Gist Table A Gist table stores one row for each Gist/theme summary generated by CTX_ DOC.GIST. Gist tables can be named anything, but must include the following columns, with names and data types as specified:
A-8
Oracle Text Reference
CTX_DOC Result Tables
Table A–9 Column Name
Type
Description
QUERY_ID
NUMBER
Query ID.
POV
VARCHAR2(80) Document theme. Case depends of how themes were used in document or represented in the knowledge base. POV has the value of GENERIC for the document GIST.
GIST
CLOB
Text of Gist or theme summary, stored as plain text
Result Tables
A-9
CTX_DOC Result Tables
Highlight Table A highlight table stores offset and length information for highlighted terms in a document. This information is generated by CTX_DOC.HIGHLIGHT. Highlighted terms can be the words or phrases that satisfy a word or an ABOUT query. If a document is formatted, the text is filtered into either plain text or HTML and the offset information is generated for the filtered text. The offset information can be used to highlight query terms for the same document filtered with CTX_ DOC.FILTER. Highlight tables can be named anything, but must include the following columns, with names and datatypes as specified: Table A–10 Column Name
Type
Description
QUERY_ID
NUMBER
The identifier for the results generated by a particular call to CTX_DOC.HIGHLIGHT (only populated when table is used to store results from multiple HIGHLIGHT calls)
OFFSET
NUMBER
The position of the highlight in the document, relative to the start of document which has a position of 1.
LENGTH
NUMBER
The length of the highlight.
Markup Table A markup table stores documents in plain text or HTML format with the query terms in the documents highlighted by markup tags. This information is generated when you call CTX_DOC.MARKUP. Markup tables can be named anything, but must include the following columns, with names and datatypes as specified: Table A–11 Column Name
Type
Description
QUERY_ID
NUMBER
The identifier for the results generated by a particular call to CTX_DOC.MARKUP (only populated when table is used to store results from multiple MARKUP calls)
DOCUMENT
CLOB
Marked-up text of the document, stored in plain text or HTML format
A-10 Oracle Text Reference
CTX_DOC Result Tables
Theme Table A theme table stores one row for each theme generated by CTX_DOC.THEMES. The value stored in the THEME column is either a single theme phrase or a string of parent themes, separated by colons. Theme tables can be named anything, but must include the following columns, with names and data types as specified: Table A–12 Column Name
Type
Description
QUERY_ID
NUMBER
Query ID
THEME
VARCHAR2(2000) Theme phrase or string of parent themes separated by colons (:).
WEIGHT
NUMBER
Weight of theme phrase relative to other theme phrases for the document.
Token Table A token table stores the text tokens for a document as output by the CTX_ DOC.TOKENS procedure. Token tables can be named anything, but must include the following columns, with names and data types as specified. Table A–13 Column Name
Type
Description
QUERY_ID
NUMBER
The identifier for the results generated by a particular call to CTX_DOC.HIGHLIGHT (only populated when table is used to store results from multiple HIGHLIGHT calls)
TOKEN
VARCHAR2(64) The token string in the text.
OFFSET
NUMBER
The position of the token in the document, relative to the start of document which has a position of 1.
LENGTH
NUMBER
The character length of the token.
Result Tables
A-11
CTX_THES Result Tables and Data Types
CTX_THES Result Tables and Data Types The CTX_THES expansion functions such as BT, NT, and SYN can return the expansions in a table of type EXP_TAB. You can specify the name of your table with the restab argument.
EXP_TAB Table Type The EXP_TAB table type is a table of rows of type EXP_REC. The EXP_REC and EXP_TAB types are defined as follows in the CTXSYS schema: type exp_rec is record ( xrel varchar2(12), xlevel number, xphrase varchar2(256) ); type exp_tab is table of exp_rec index by binary_integer;
When you call a thesaurus expansion function and specify restab, the system returns the expansion as an EXP_TAB table. Each row in this table is of type EXP_REC and represents a word or phrase in the expansion. The following table describes the fields in EXP_REC: EXP_REC Field
Description
xrel
The xrel field contains the relation of the term to the input term (e.g. ’SYN’, ’PT’, ’RT’, etc.). The xrel value is PHRASE when the input term appears in the expansion. For translations, the xrel value is the language.
xlevel
The xlevel field is the level of the relation. This is used mainly when xrel is a hierarchical relation (BT*/NT*). The xlevel field is 0 when xrel is PHRASE. The xlevel field is 2 for translations of synonyms under TRSYN. The xlevel field is 1 for operators that are not hierarchical, such as PT and RT.
xphrase
A-12 Oracle Text Reference
The xphrase is the related term. This includes a qualifier in parentheses, if one exists for the related term. Compound terms are not de-compounded.
B Supported Document Formats This appendix contains a list of the document formats supported by the Inso filtering technology. The following topics are covered in this appendix: ■
About Document Filtering Technology
■
Supported Document Formats
■
Unsupported Formats
Supported Document Formats B-1
About Document Filtering Technology
About Document Filtering Technology Oracle Text uses document filtering technology licensed from Stellent Chicago, Inc. This filtering technology enables you to index most document formats. This technology also enables you to convert documents to HTML for document presentation, with the CTX_DOC package. See Also: For a list of supported formats, see "Supported
Document Formats" in this Appendix. To use Inso filtering for indexing and DML processing, you must specify the INSO_ FILTER object in your filter preference. To use Inso filtering technology for converting documents to HTML with the CTX_ DOC package, you need not use the INSO_FILTER indexing preference, but you must still set up your environment to use this filtering technology as described in this appendix. To convert documents to HTML format, Inso filtering technology relies on shared libraries and data files licensed from Stellent Chicago, Inc. The following sections discuss the supported platforms and how to enable Inso filtering on the different platforms.
Supported Platforms Supported Platforms Inso filter technology is supported on the following platforms: ■
Sun Solaris on SPARC 32-bit and 64-bit (2.5.1 - 2.6,7-8)
■
IBM AIX 32-bit and 64-bit (4.2 - 4.3)
■
HP-UX 32-bit and 64-bit (10.2 - 11.0)
■
DEC UNIX for Alpha/Tru64 UNIX (4.0)
■
SGI IRIX 32-bit and 64-bit (6.3)
■
Microsoft Windows
■
B-2
■
Intel x86 WinNT (4.0 and above)
■
Intelx86 Win95, Win98 SE, Win2000, and Windows ME
Red Hat Linux for Intel x86 (5.2 - 7.0)
Oracle Text Reference
About Document Filtering Technology
Environment Variables All environment variables related to Inso filtering must be made visible to Oracle Text.
Requirements for UNIX Platforms The following requirements apply to Solaris, IBM AIX, HP/UX, Digital UNIX, SGI, and Linux platforms: ■
■
■
■
Ensure the *.flt files have execute permission granted to the operating system user running the Oracle database and ctxsrv server. Set the $PATH variable to include the location of the *.flt files, in particular to the location of the file isunx2.flt, and to $ORACLE_HOME/ctx/lib which is the location of the shared libraries for Inso filtering Set the $HOME environment variable to allow Inso technology to write files to a sub-directory (.oit) in $HOME directory. Access to a running X-Windows server is required to perform vector graphics image conversion.
Filtering Vector Graphic Formats Follow these steps to filter vector graphic formats on UNIX platforms: ■
■
■
Start an X server to filter vector graphic formats. If no X server exists (system detects no X libraries, such as Xm, Xt, and X11), vector graphic filtering is not performed. Vector graphic formats include CAD drawings and presentation formats such as Power Point 97. Bitmap formats include GIF, JPEG, and TIF formats as well as bitmap formats. Because the system depends on X libraries to perform vector graphic conversion, ensure that the system-specific library path environment variable for the X libraries is set correctly. Set the $DISPLAY environment variable. For example, setting DISPLAY=:0.0 tells the system to use the X server on the console.
OLE2 Object Support There are platform dependent limits on what Inso filter technology can do with OLE2 objects. On all platforms when a metafile snapshot is available, Inso technology will use it to convert the object.
Supported Document Formats B-3
About Document Filtering Technology
When a metafile snapshot is not available on UNIX platforms, Inso technology cannot convert the OLE2 object. However, when a metafile snapshot is not available on the NT platform, the original application is used (if available) to convert the OLE2 object.
B-4
Oracle Text Reference
Supported Document Formats
Supported Document Formats The following table lists all of the document formats that Oracle Text supports for filtering. Document filtering is used for indexing, DML, and for converting documents to HTML with the CTX_DOC package. This filtering technology is based on Outside In HTML Export and Outside In Content Access technology licensed from Stellent Chicago, Inc.
Note: This list does not represent the complete list of formats that Oracle is able to process. The external filter framework enables Oracle to process any document format, provided an external filter exists which can filter all the formats to plain text.
Word Processing - Generic Format
Version
ASCII Text (7 &8 bit versions)
All versions
ANSI Text (7 & 8 bit)
All versions
Unicode Text
All versions
HTML
Versions through 3.0 (some limitations)
IBM Revisable Form Text
All versions
IBM FFT
All versions
Microsoft Rich Text Format (RTF)
All versions
Word Processing - DOS Format
Version
DEC WPS Plus (WPL)
Versions through 4.1
DEC WPS Plus (DX)
Versions through 4.0
DisplayWrite 2 & 3 (TXT)
All versions
DisplayWrite 4 & 5
Versions through Release 2.0
Enable
Versions 3.0, 4.0 and 4.5
Supported Document Formats B-5
Supported Document Formats
B-6
Format
Version
First Choice
Versions through 3.0
Framework
Version 3.0
IBM Writing Assistant
Version 1.01
Lotus Manuscript
Versions through 2.0
MASS11
Versions through 8.0
Microsoft Word
Versions through 6.0
Microsoft Works
Versions through 2.0
MultiMate
Versions through 4.0
Navy DIF
All versions
Nota Bene
Version 3.0
Office Writer
Version 4.0 to 6.0
PC-File Letter
Versions through 5.0
PC-File+ Letter
Versions through 3.0
PFS:Write
Versions A, B, and C
Professional Write
Versions through 2.1
Q&A
Version 2.0
Samna Word
Versions through Samna Word IV+
SmartWare II
Version 1.02
Sprint
Versions through 1.0
Total Word
Version 1.2
Volkswriter 3 & 4
Versions through 1.0
Wang PC (IWP)
Versions through 2.6
WordMARC
Versions through Composer Plus
WordPerfect
Versions through 6.1
WordStar
Versions through 7.0
WordStar 2000
Versions through 3.0
XyWrite
Versions through III Plus
Oracle Text Reference
Supported Document Formats
Word Processing - International Format
Version
JustSystems Ichitaro
Version 5.0, 6.0, 8.0, 9.0, and 10.0
Word Processing - Windows Format
Version
AMI/AMI Professional
Versions through 3.1
Corel WordPerfect for Windows
Versions through 2002
JustWrite
Versions through 3.0
Legacy
Versions through 1.1
Lotus WordPro (NT on Intel only)
SmartSuite 96, 97, Millennium and Millennium 9.6
Lotus WordPro (all supported platforms except NT on Intel; Text only)
SmartSuite 97, Millennium, and Millennium 9.6
Microsoft Windows Works
Versions through 4.0
Microsoft Windows Write
Versions through 3.0
Microsoft Word 97
Word 97
Microsoft Word 2000
Word 2000
Microsoft Word 2002 (Office XP)
Word 2002
Microsoft Word for Windows
Versions through 7.0
Microsoft WordPad
All versions
Novell Perfect Works
Version 2.0
Novell WordPerfect for Windows
Versions through 7.0
Professional Write Plus
Version 1.0
Q&A Write for Windows
Version 3.0
Star Office Writer for Windows (Text only)
Version 5.2
WordStar for Windows
Version 1.0
Supported Document Formats B-7
Supported Document Formats
Word Processing - Macintosh Format
Version
Microsoft Word
Versions 4.0 through 6.0
Microsoft Word 98
Word 98
WordPerfect
Versions 1.02 through 3.0
Microsoft Works
Versions through 2.0
MacWrite II
Version 1.1
Word Processing - Unix Format
Version
Star Office Writer for Windows
Version 5.2
Desktop Publishing Format
Version
Adobe FrameMaker
Version 6.0
Spreadsheets Formats
B-8
Format
Version
Enable
Versions 3.0, 4.0 and 4.5
First Choice
Versions through 3.0
Framework
Version 3.0
Lotus 1-2-3 (DOS & Windows)
Versions through 5.0
Lotus 1-2-3 for SmartSuite
SmartSuite 97, Millennium, and Millennium 9.6
Lotus 1-2-3 Charts (DOS & Windows)
Versions through Millennium 9.6
Lotus 1-2-3 (OS/2)
Versions through 2.0
Oracle Text Reference
Supported Document Formats
Format
Version
Lotus 1-2-3 Charts (OS/2)
Versions through 2.0
Lotus Symphony
Versions 1.0,1.1 and 2.0
Microsoft Excel 97
Excel 97
Microsoft Excel 2000
Excel 2000
Microsoft Excel 2002 (Office XP)
Excel 2002
Microsoft Excel Windows
Versions 2.2 through 7.0
Microsoft Excel Macintosh
Versions 3.0 - 4.0 and 98
Microsoft Excel Charts
Versions 2.x - 7.0
Microsoft Multiplan
Version 4.0
Microsoft Windows Works
Versions through 4.0
Microsoft Works (DOS)
Versions through 2.0
Microsoft Works (Mac)
Versions through 2.0
Mosaic Twin
Version 2.5
Novell Perfect Works
Version 2.0
QuattroPro for DOS
Versions through 5.0
QuttroPro for Windows
Versions through 2002
PFS:Professional Plan
Version 1.0
SuperCalc 5
Version 4.0
SmartWare II
Version 1.02
VP Planner 3D
Version 1.0
Databases Formats Format
Version
Access
Versions through 2.0
dBASE
Versions through 5.0
DataEase
Version 4.x
dBXL
Version 1.3
Supported Document Formats B-9
Supported Document Formats
Format
Version
Enable
Versions 3.0, 4.0 and 4.5
First Choice
Versions through 3.0
FoxBase
Version 2.1
Framework
Version 3.0
MicrosoftWindowsWorks
Versions through 4.0
MicrosoftWorks(DOS)
Versions through 2.0
Microsoft Works (Mac)
Versions through 2.0
Paradox (DOS)
Versions through 4.0
Paradox (Windows)
Versions through 1.0
Personal R:BASE
Version 1.0
R:BASE 5000
Versions through 3.1
R:BASE System V
Version 1.0
Reflex
Version 2.0
Q&A
Versions through 2.0
SmartWare II
Version 1.02
Display Formats Format
Version
PDF - Portable Document Format
Acrobat Versions 2.1, 3.0, 4.0, and 5.0 including Japanese PDF.
Presentation Formats Format
Version
Corel Presentations
Versions 8.0, 9.0 and 2002
Novell Presentations
Versions 3.0 and 7.0
Harvard Graphics for DOS
Versions 2.x & 3.x
Harvard Graphics
Windows versions
B-10 Oracle Text Reference
Supported Document Formats
Format
Version
Freelance 96
Freelance 96
Freelance for Windows
SmartSuite 97, Millennium, and Millennium 9.6
Freelance for Windows
Version 1.0 and 2.0
Freelance for OS/2
Versions through 2.0
Microsoft PowerPoint for Windows
Versions through 7.0
Microsoft PowerPoint 97
PowerPoint 97
Microsoft PowerPoint 2000
PowerPoint 2000
Microsoft PowerPoint 2002 (Office XP) PowerPoint 2002 MicrosoftPowerPointforMacintosh
Version 4.0 and 98
Standard Graphic Formats The following table lists the graphic formats that the INSO filter recognizes. This means that indexing a text column that contains any of these formats produces no error. As such, it is safe for the column to contain any of these formats. Note: The INSO filter cannot extract textual information from
graphics.
Format
Version
Binary Group 3 Fax
All versions
BMP (including RLE, ICO, CUR & OS/2 DIB)
Windows
CALS Raster
Type 1 and II
CDR (if TIFF image is embedded in it)
Corel Draw version 2.0 - 9.0
CGM - Computer Graphics Metafile
ANSI, CALS, NIST, Version 3.0
DCX (multi-page PCX)
Microsoft Fax
DRW - Micrografx Designer
Version 3.1
DRW - Micrografx Draw
Version 4.0
DXF (Binary and ASCII) AutoCAD Drawing Interchange Format
Versions through 14
Supported Document Formats
B-11
Supported Document Formats
Format
Version
EMF
Windows Enhanced Metafile
EPS - Encapsulated PostScript
If TIFF image is embedded in it
FPX - Kodak Flash Pix
No specific version
GIF - Graphics Interchange Format
Compuserve
GP4 - Group 4 CALS format
Types I and II
HPGL - Hewlett Packard Graphics Language
Version 2.0
IMG - GEM Paint
No specific version
JFIF (JPEG not in TIFF)
All versions
JPEG
All versions
Novell Perfect Works (Draw)
Novell version 2.0
PBM - Portable Bitmap
No specific version
PCD - Kodak Photo CD
Version 1.0
PCX Bitmap
PC Paintbrush
PGM - Portable Graymap
No specific version
PIC
Lotus 1-2-3 Picture File Format - No Specific Version
PICT1 & PICT2 (Raster)
Macintosh Standard
PNG - Portable Network Graphics Internet Format
Version 1.0
PNTG
MacPaint
PPM - Portable Pixmap
No specific version
Progressive JPEG
No Specific version
PSP - Paintshop Pro (NT on Intel only)
Versions 5.0 and 5.0.1
SDW
Ami Draw
Snapshot (Lotus)
All versions
SRS - Sun Raster File Format
No specific version
Targa
Truevision
TIFF
Versions through 6
TIFF CCITT Group 3 & 4
Fax Systems
B-12 Oracle Text Reference
Supported Document Formats
Format
Version
VISO
Visio 4 (Page Preview only), 5, 2000, 2002
WBMP
No Specific version
WMF
Windows Metafile
WordPerfect Graphics [WPG and WPG2]
Versions through 2.0
XBM - X-Windows Bitmap
x10 compatible
XPM - X-Windows Pixmap
x10 compatible
XWD - X-Windows Dump
x10 compatible
Other Format
Version
Executable (EXE, DLL)
No specific version
Executable for Windows NT
No specific version
Microsoft Project (Text only)
Project 98
MSG (Text only)
Microsoft Outlook mail format
vCardElectronicBusinessCard
Versit version 2.1
WML
Compatible with version 5.2
Supported Document Formats
B-13
Unsupported Formats
Unsupported Formats Password protected documents and documents with password protected content are not supported by the Inso filter.
B-14 Oracle Text Reference
C Loading Examples This appendix provides examples of how to load text into a text column. It also describes the structure of ctxload import files: ■
SQL INSERT Example
■
SQL*Loader Example
■
Structure of ctxload Thesaurus Import File
Loading Examples C-1
SQL INSERT Example
SQL INSERT Example A simple way to populate a text table is to create a table with two columns, id and text, using CREATE TABLE and then use the INSERT statement to load the data. This example makes the id column the primary key, which is optional. The text column is VARCHAR2: create table docs (id number primary key, text varchar2(80));
To populate the text column, use the INSERT statement as follows: insert into docs values(1, ’this is the text of the first document’); insert into docs values(12, ’this is the text of the second document’);
C-2
Oracle Text Reference
SQL*Loader Example
SQL*Loader Example The following example shows how to use SQL*Loader to load mixed format documents from the operating system to a BLOB column. The example has two steps: ■
create the table
■
issue the SQL*Loader command that reads control file and loads data into table See Also: For a complete discussion on using SQL*Loader, see Oracle9i Database Utilities
Creating the Table This example loads to a table articles_formatted created as follows: CREATE TABLE articles_formatted ( ARTICLE_ID NUMBER PRIMARY KEY , AUTHOR VARCHAR2(30), FORMAT VARCHAR2(30), PUB_DATE DATE, TITLE VARCHAR2(256), TEXT BLOB );
The article_id column is the primary key. Documents are loaded in the text column, which is of type BLOB.
Issuing the SQL*Loader Command The following command starts the loader, which reads the control file LOADER1.DAT: sqlldr userid=demo/demo control=loader1.dat log=loader.log
Loading Examples C-3
SQL*Loader Example
Example Control File: loader1.dat This SQL*Loader control file defines the columns to be loaded and instructs the loader to load the data line by line from loader2.dat into the articles_ formatted table. Each line in loader2.dat holds a comma separated list of fields to be loaded. -- load file example load data INFILE 'loader2.dat' INTO TABLE articles_formatted APPEND FIELDS TERMINATED BY ',' (article_id SEQUENCE (MAX,1), author CHAR(30), format, pub_date SYSDATE, title, ext_fname FILLER CHAR(80), text LOBFILE(ext_fname) TERMINATED BY EOF)
This control file instructs the loader to load data from loader2.dat to the articles_formatted table in the following way:
C-4
1.
The ordinal position of the line describing the document fields in loader2.dat is written to the article_id column.
2.
The first field on the line is written to author column.
3.
The second field on the line is written to the format column.
4.
The current date given by SYSDATE is written to the pub_date column.
5.
The title of the document, which is the third field on the line, is written to the title column.
6.
The name of each document to be loaded is read into the ext_fname temporary variable, and the actual document is loaded in the text BLOB column:
Oracle Text Reference
SQL*Loader Example
Example Data File: loader2.dat This file contains the data to be loaded into each row of the table, articles_ formatted. Each line contains a comma separated list of the fields to be loaded in articles_ formatted. The last field of every line names the file to be loaded in to the text column: Ben Kanobi, plaintext,Kawasaki news article,../sample_docs/kawasaki.txt, Joe Bloggs, plaintext,Java plug-in,../sample_docs/javaplugin.txt, John Hancock, plaintext,Declaration of Independence,../sample_docs/indep.txt, M. S. Developer, Word7,Newsletter example,../sample_docs/newsletter.doc, M. S. Developer, Word7,Resume example,../sample_docs/resume.doc, X. L. Developer, Excel7,Common example,../sample_docs/common.xls, X. L. Developer, Excel7,Complex example,../sample_docs/solvsamp.xls, Pow R. Point, Powerpoint7,Generic presentation,../sample_docs/generic.ppt, Pow R. Point, Powerpoint7,Meeting presentation,../sample_docs/meeting.ppt, Java Man, PDF,Java Beans paper,../sample_docs/j_bean.pdf, Java Man, PDF,Java on the server paper,../sample_docs/j_svr.pdf, Ora Webmaster, HTML,Oracle home page,../sample_docs/oramnu97.html, Ora Webmaster, HTML,Oracle Company Overview,../sample_docs/oraoverview.html, John Constable, GIF,Laurence J. Ellison : portrait,../sample_docs/larry.gif, Alan Greenspan, GIF,Oracle revenues : Graph,../sample_docs/oragraph97.gif, Giorgio Armani, GIF,Oracle Revenues : Trend,../sample_docs/oratrend.gif,
Loading Examples C-5
Structure of ctxload Thesaurus Import File
Structure of ctxload Thesaurus Import File The import file must use the following format for entries in the thesaurus: phrase BT broader_term NT narrower_term1 NT narrower_term2 . . . NT narrower_termN BTG broader_term NTG narrower_term1 NTG narrower_term2 . . . NTG narrower_termN BTP broader_term NTP narrower_term1 NTP narrower_term2 . . . NTP narrower_termN BTI broader_term NTI narrower_term1 NTI narrower_term2 . . . NTI narrower_termN SYN synonym1 SYN synonym2 . . . SYN synonymN USE synonym1 or SEE synonym1 or PT synonym1 RT RT . . RT
related_term1 related_term2 . related_termN
SN text language_key: term
C-6
Oracle Text Reference
Structure of ctxload Thesaurus Import File
phrase
is a word or phrase that is defined as having synonyms, broader terms, narrower terms, and/or related terms. In compliance with ISO-2788 standards, a TT marker can be placed before a phrase to indicate that the phrase is the top term in a hierarchy; however, the TT marker is not required. In fact, ctxload ignores TT markers during import. A top term is identified as any phrase that does not have a broader term (BT, BTG, BTP, or BTI). Note: The thesaurus query operators (SYN, PT, BT, BTG, BTP, BTI,
NT, NTG, NTP, NTI, and RT) are reserved words and, thus, cannot be used as phrases in thesaurus entries. BT, BTG, BTP, BTI broader_termN
are the markers that indicate broader_termN is a broader (generic|partitive|instance) term for phrase. broader_termN is a word or phrase that conceptually provides a more general description or category for phrase. For example, the word elephant could have a broader term of land mammal. NT, NTG, NTP, NTI narrower_termN
are the markers that indicate narrower_termN is a narrower (generic|partitive|instance) term for phrase. If phrase does not have a broader (generic|partitive|instance) term, but has one or more narrower (generic|partitive|instance) terms, phrase is created as a top term in the respective hierarchy (in an Oracle Text thesaurus, the BT/NT, BTG/NTG, BTP/NTP, and BTI/NTI hierarchies are separate structures). narrower_termN is a word or phrase that conceptually provides a more specific description for phrase. For example, the word elephant could have a narrower terms of indian elephant and african elephant. SYN synonymN
is a marker that indicates phrase and synonymN are synonyms within a synonym ring. synonymN is a word or phrase that has the same meaning for phrase. For example, the word dog could have a synonym of canine.
Loading Examples C-7
Structure of ctxload Thesaurus Import File
Note: Synonym rings are not defined explicitly in Oracle Text
thesauri. They are created by the transitive nature of synonyms. USE SEE PT synonym1
are markers that indicate phrase and synonym1 are synonyms within a synonym ring (similar to SYN). The markers USE, SEE or PT also indicate synonym1 is the preferred term for the synonym ring. Any of these markers can be used to define the preferred term for a synonym ring. RT related_termN
is the marker that indicates related_termN is a related term for phrase. related_termN is a word or phrase that has a meaning related to, but not necessarily synonymous with phrase. For example, the word dog could have a related term of wolf. Note: Related terms are not transitive. If a phrase has two or more
related terms, the terms are related only to the parent phrase and not to each other. SN text
is the marker that indicates the following text is a scope note (i.e. comment) for the preceding entry. language_key term
term is the translation of phrase into the language specified by language_key.
Alternate Hierarchy Structure In compliance with thesauri standards, the load file supports formatting hierarchies (BT/NT, BTG/NTG, BTP, NTP, BTI/NTI) by indenting the terms under the top term and using NT (or NTG, NTP, NTI) markers that include the level for the term: phrase NT1 narrower_term1 NT2 narrower_term1.1 NT2 narrower_term1.2 NT3 narrower_term1.2.1 NT3 narrower_term1.2.2
C-8
Oracle Text Reference
Structure of ctxload Thesaurus Import File
NT1 narrower_term2 . . . NT1 narrower_termN
Using this method, the entire branch for a top term can be represented hierarchically in the load file.
Usage Notes for Terms in Import Files The following conditions apply to the structure of the entries in the import file: ■
■
■
■
■
■
■
■
■
■
each entry (phrase, BT, NT, or SYN) must be on a single line followed by a newline character entries can consist of a single word or phrases the maximum length of an entry (phrase, BT, NT, or SYN) is 255 characters, not including the BT, NT, and SYN markers or the newline characters entries cannot contain parentheses or plus signs. each line of the file that starts with a relationship (BT, NT, etc.) must begin with at least one space a phrase can occur more than once in the file each phrase can have one or more narrower term entries (NT, NTG, NTP), broader term entries (BT, BTG, BTP), synonym entries, and related term entries each broader term, narrower term, synonym, and preferred term entry must start with the appropriate marker and the markers must be in capital letters the broader terms, narrower terms, and synonyms for a phrase can be in any order homographs must be followed by parenthetical disambiguators everywhere they are used For example: cranes (birds), cranes (lifting equipment)
■
■
■
compound terms are signified by a plus sign between each factor (for example. buildings + construction) compound terms are allowed only as synonyms or preferred terms for other terms, never as terms by themselves, or in hierarchical relations. terms can be followed by a scope note (SN), total maximum length of 2000 characters, on subsequent lines
Loading Examples C-9
Structure of ctxload Thesaurus Import File
■
multi-line scope notes are allowed, but require an SN marker on each line of the note Example of Incorrect SN usage: VIEW CAMERAS SN Cameras with through-the lens focusing and a range of movements of the lens plane relative to the film plane
Example of Correct SN usage: VIEW CAMERAS SN Cameras with through-the lens focusing and a SN range of movements of the lens plane relative SN to the film plane ■
Multi-word terms cannot start with reserved words (for example, use is a reserved word, so use other door is not an allowed term; however, use is an allowed term)
Usage Notes for Relationships in Import Files The following conditions apply to the relationships defined for the entries in the import file: ■
■
■
related term entries must follow a phrase or another related term entry related term entries start with one or more spaces, the RT marker, followed by white space, then the related term on the same line multiple related terms require multiple RT markers Example of incorrect RT usage: MOVING PICTURE CAMERAS RT CINE CAMERAS TELEVISION CAMERAS
Example of correct RT usage: MOVING PICTURE CAMERAS RT CINE CAMERAS RT TELEVISION CAMERAS ■
Terms are allowed to have multiple broader terms, narrower terms, and related terms
C-10 Oracle Text Reference
Structure of ctxload Thesaurus Import File
Examples of Import Files This section provides three examples of correctly formatted thesaurus import files.
Example 1 (Flat Structure) cat SYN feline NT domestic cat NT wild cat BT mammal mammal BT animal domestic cat NT Persian cat NT Siamese cat wild cat NT tiger tiger NT Bengal tiger dog BT mammal NT domestic dog NT wild dog SYN canine domestic dog NT German Shepard wild dog NT Dingo
Example 2 (Hierarchical) animal NT1 mammal NT2 cat NT3 domestic cat NT4 Persian cat NT4 Siamese cat NT3 wild cat NT4 tiger NT5 Bengal tiger NT2 dog NT3 domestic dog NT4 German Shepard NT3 wild dog
Loading Examples
C-11
Structure of ctxload Thesaurus Import File
NT4 Dingo cat SYN feline dog SYN canine
Example 3 35MM CAMERAS BT MINIATURE CAMERAS CAMERAS BT OPTICAL EQUIPMENT NT MOVING PICTURE CAMERAS NT STEREO CAMERAS LAND CAMERAS USE VIEW CAMERAS VIEW CAMERAS SN Cameras with through-the lens focusing and a range of SN movements of the lens plane relative to the film plane UF LAND CAMERAS BT STILL CAMERAS
C-12 Oracle Text Reference
D Supplied Stoplists This appendix describes the default stoplists for all the different languages supported and list the stopwords in each. The following stoplists are described: ■
English Default Stoplist
■
Chinese Stoplist (Traditional)
■
Chinese Stoplist (Simplified)
■
Danish (dk) Default Stoplist
■
Dutch (nl) Default Stoplist
■
Finnish (sf) Default Stoplist
■
French (f) Default Stoplist
■
German (d) Default Stoplist
■
Italian (i) Default Stoplist
■
Portuguese (pt) Default Stoplist
■
Spanish (e) Default Stoplist
■
Swedish (s) Default Stoplist
Supplied Stoplists D-1
English Default Stoplist
English Default Stoplist The following English words are defined as stop words:
D-2
Stop word
Stop word
Stop word
Stop word
Stop word
Stop word
Stop word
a
be
had
it
only
she
was
about
because
has
its
of
some
we
after
been
have
last
on
such
were
all
but
he
more
one
than
when
also
by
her
most
or
that
which
an
can
his
mr
other
the
who
any
co
if
mrs
out
their
will
and
corp
in
ms
over
there
with
are
could
inc
mz
s
they
would
as
for
into
no
so
this
up
at
from
is
not
says
to
Oracle Text Reference
Chinese Stoplist (Traditional)
Chinese Stoplist (Traditional) The following traditional Chinese words are defined in the default stoplist for this language.
Supplied Stoplists D-3
Chinese Stoplist (Simplified)
Chinese Stoplist (Simplified) The following simplified Chinese words are defined in the default stoplist for this language.
D-4
Oracle Text Reference
Danish (dk) Default Stoplist
Danish (dk) Default Stoplist The following Danish words are defined in the default stoplist for this language: Stop word
Stop word
Stop word
Stop word
Stop word
Stop word
Stop word
af
en
god
hvordan
med
og
udenfor
aldrig
et
han
I
meget
oppe
under
alle
endnu
her
De
mellem
på
ved
altid
få
hos
i
mere
rask
vi
bagved
lidt
hovfor
imod
mindre
hurtig
de
fjernt
hun
ja
når
sammen
der
for
hvad
jeg
hvonår
temmelig
du
foran
hvem
langsom
nede
nok
efter
fra
hvor
mange
nej
til
eller
gennem
hvorhen
måske
nu
uden
Supplied Stoplists D-5
Dutch (nl) Default Stoplist
Dutch (nl) Default Stoplist The following Dutch words are defined in the default stoplist for this language: Stop word
Stop word
Stop word
Stop word
Stop word
Stop word
Stop word
Stop word
Stop word
aan
betreffende
eer
had
juist
na
overeind
van
weer
aangaande
bij
eerdat
hadden
jullie
naar
overigens
vandaan
weg
aangezien
binnen
eerder
hare
kan
nadat
pas
vanuit
wegens
achter
binnenin
eerlang
heb
klaar
net
precies
vanwege
wel
achterna
boven
eerst
hebben
kon
niet
reeds
veeleer
weldra
afgelopen
bovenal
elk
hebt
konden
noch
rond
verder
welk
al
bovendien
elke
heeft
krachtens
nog
rondom
vervolgens
welke
aldaar
bovengenoemd
en
hem
kunnen
nogal
sedert
vol
wie
aldus
bovenstaand
enig
hen
kunt
nu
sinds
volgens
wiens
alhoewel
bovenvermeld
enigszins
het
later
of
sindsdien
voor
wier
alias
buiten
enkel
hierbeneden liever
ofschoon
slechts
vooraf
wij
alle
daar
er
hierboven
maar
om
sommige
vooral
wijzelf
allebei
daarheen
erdoor
hij
mag
omdat
spoedig
vooralsnog
zal
alleen
daarin
even
hoe
meer
omhoog
steeds
voorbij
ze
alsnog
daarna
eveneens
hoewel
met
omlaag
tamelijk
voordat
zelfs
altijd
daarnet
evenwel
hun
mezelf
omstreeks
tenzij
voordezen
zichzelf
altoos
daarom
gauw
hunne
mij
omtrent
terwijl
voordien
zij
ander
daarop
gedurende
ik
mijn
omver
thans
voorheen
zijn
andere
daarvanlangs
geen
ikzelf
mijnent
onder
tijdens
voorop
zijne
anders
dan
gehad
in
mijner
ondertussen toch
vooruit
zo
anderszins
dat
gekund
inmiddels
mijzelf
ongeveer
toen
vrij
zodra
behalve
de
geleden
inzake
misschien
ons
toenmaals
vroeg
zonder
behoudens
die
gelijk
is
mocht
onszelf
toenmalig
waar
zou
beide
dikwijls
gemoeten
jezelf
mochten
onze
tot
waarom
zouden
beiden
dit
gemogen
jij
moest
ook
totdat
wanneer
zowat
ben
door
geweest
jijzelf
moesten
op
tussen
want
zulke
beneden
doorgaand
gewoon
jou
moet
opnieuw
uit
waren
zullen
bent
dus
gewoonweg jouw
moeten
opzij
uitgezonderd was
bepaald
echter
haar
mogen
over
vaak
D-6
Oracle Text Reference
jouwe
wat
zult
Finnish (sf) Default Stoplist
Finnish (sf) Default Stoplist The following Finnish words are defined in the default stoplist for this language: Stop word
Stop word
Stop word
Stop word
Stop word
Stop word
Stop word
aina
hyvin
kesken
me
nyt
takia
yhdessä ylös
alla
hoikein
kukka
mikä
oikea
tässä
ansiosta
ilman
kyllä
miksi
oikealla
te
ei
ja
kylliksi
milloin
paljon
ulkopuolella
enemmän
jälkeen
tarpeeksi
milloinkan
siellä
vähän
ennen
jos
lähellä
koskaan
sinä
vahemmän
etessa
kanssa
läpi
minä
ssa
vasen
haikki
kaukana
liian
missä
sta
vasenmalla
hän
kenties
lla
miten
suoraan
vastan
he
ehkä
luona
kuinkan
tai
vielä
hitaasti
keskellä
lla
nopeasti
takana
vieressä
Supplied Stoplists D-7
French (f) Default Stoplist
French (f) Default Stoplist The following French words are defined in the default stoplist for this language: Stop word
Stop word
Stop word
Stop word
Stop word
Stop word
a
beaucoup
comment
encore
lequel
moyennant près
ses
toujours
afin
ça
concernant
entre
les
ne
puis
sien
tous
ailleurs
ce
dans
et
lesquelles
ni
puisque
sienne
toute
ainsi
ceci
de
étaient
lesquels
non
quand
siennes
toutes
alors
cela
dedans
était
leur
nos
quant
siens
très
après
celle
dehors
étant
leurs
notamment que
soi
trop
attendant
celles
déjà
etc
lors
notre
quel
soi-même
tu
au
celui
delà
eux
lorsque
notres
quelle
soit
un
aucun
cependant
depuis
furent
lui
nôtre
quelqu''un
sont
une
aucune
certain
des
grâce
ma
nôtres
quelqu''une
suis
vos
au-dessous
certaine
desquelles
hormis
mais
nous
quelque
sur
votre
au-dessus
certaines
desquels
hors
malgré
nulle
quelques-unes
ta
vôtre
auprès
certains
dessus
ici
me
nulles
quelques-uns
tandis
vôtres
auquel
ces
dès
il
même
on
quels
tant
vous
aussi
cet
donc
ils
mêmes
ou
qui
te
vu
aussitôt
cette
donné
jadis
mes
où
quiconque
telle
y
autant
ceux
dont
je
mien
par
quoi
telles
autour
chacun
du
jusqu
mienne
parce
quoique
tes
aux
chacune
duquel
jusque
miennes
parmi
sa
tienne
auxquelles
chaque
durant
la
miens
plus
sans
tiennes
auxquels
chez
elle
laquelle
moins
plusieurs
sauf
tiens
avec
combien
elles
là
moment
pour
se
toi
à
comme
en
le
mon
pourquoi
selon
ton
D-8
Oracle Text Reference
Stop word
Stop word Stop word
German (d) Default Stoplist
German (d) Default Stoplist The following German words are defined in the default stoplist for this language: Stop word
Stop word
Stop word
Stop word
Stop word
Stop word
Stop word
Stop word
Stop word
ab
dann
des
es
ihnen
keinem
obgleich
sondern
welchem
aber
daran
desselben
etwa
ihr
keinen
oder
sonst
welchen
allein
darauf
dessen
etwas
ihre
keiner
ohne
soviel
welcher
als
daraus
dich
euch
Ihre
keines
paar
soweit
welches
also
darin
die
euer
ihrem
man
sehr
über
wem
am
darüber
dies
eure
Ihrem
mehr
sei
um
wen
an
darum
diese
eurem
ihren
mein
sein
und
wenn
auch
darunter
dieselbe
euren
Ihren
meine
seine
uns
wer
auf
das
dieselben
eurer
Ihrer
meinem
seinem
unser
weshalb
aus
dasselbe
diesem
eures
ihrer
meinen
seinen
unsre
wessen
außer
daß
diesen
für
ihres
meiner
seiner
unsrem
wie
bald
davon
dieser
fürs
Ihres
meines
seines
unsren
wir
bei
davor
dieses
ganz
im
mich
seit
unsrer
wo
beim
dazu
dir
gar
in
mir
seitdem
unsres
womit
bin
dazwischen
doch
gegen
ist
mit
selbst
vom
zu
bis
dein
dort
genau
ja
nach
sich
von
zum
bißchen
deine
du
gewesen
je
nachdem
Sie
vor
zur
bist
deinem
ebenso
her
jedesmal
nämlich
sie
während
zwar
da
deinen
ehe
herein
jedoch
neben
sind
war
zwischen
dabei
deiner
ein
herum
jene
nein
so
wäre
zwischens
dadurch
deines
eine
hin
jenem
nicht
sogar
wären
dafür
dem
einem
hinter
jenen
nichts
solch
warum
dagegen
demselben
einen
hintern
jener
noch
solche
was
dahinter
den
einer
ich
jenes
nun
solchem
wegen
damit
denn
eines
ihm
kaum
nur
solchen
weil
danach
der
entlang
ihn
kein
ob
solcher
weit
daneben
derselben
er
Ihnen
keine
ober
solches
welche
Supplied Stoplists D-9
Italian (i) Default Stoplist
Italian (i) Default Stoplist The following Italian words are defined in the default stoplist for this language: Stop word
Stop word
Stop word
Stop word
Stop word
Stop word
Stop word
a
da
durante
lo
o
seppure
un
affinchè
dachè
e
loro
onde
si
una
agl''
dagl''
egli
ma
oppure
siccome
uno
agli
dagli
eppure
mentre
ossia
sopra
voi
ai
dai
essere
mio
ovvero
sotto
vostro
al
dal
essi
ne
per
su
all''
dall''
finché
neanche
perchè
subito
alla
dalla
fino
negl''
perciò
sugl''
alle
dalle
fra
negli
però
sugli
allo
dallo
giacchè
nei
poichè
sui
anzichè
degl''
gl''
nel
prima
sul
avere
degli
gli
nell''
purchè
sull''
bensì
dei
grazie
nella
quand''anche
sulla
che
del
I
nelle
quando
sulle
chi
dell''
il
nello
quantunque
sullo
cioè
delle
in
nemmeno
quasi
suo
come
dello
inoltre
neppure
quindi
talchè
comunque
di
io
noi
se
tu
con
dopo
l''
nonchè
sebbene
tuo
contro
dove
la
nondimeno
sennonchè
tuttavia
cosa
dunque
le
nostro
senza
tutti
D-10 Oracle Text Reference
Portuguese (pt) Default Stoplist
Portuguese (pt) Default Stoplist The following Portuguese words are defined in the default stoplist for this language: Stop word
Stop word
Stop word
Stop word
Stop word
Stop word
Stop word
a
bem
e
longe
para
se
você
abaixo
com
ela
mais
por
sem
vocês
adiante
como
elas
menos
porque
sempre
agora
contra
êle
muito
pouco
sim
ali
debaixo
eles
não
próximo
sob
antes
demais
em
ninguem
qual
sobre
aqui
depois
entre
nós
quando
talvez
até
depressa
eu
nunca
quanto
todas
atras
devagar
fora
onde
que
todos
bastante
direito
junto
ou
quem
vagarosamente
Supplied Stoplists
D-11
Spanish (e) Default Stoplist
Spanish (e) Default Stoplist The following Spanish words are defined in the default stoplist for this language: Stop word
Stop word
Stop word
Stop word
Stop word
Stop word
Stop word
Stop word
Stop word
a
aquí
cuantos
esta
misma
nosotras
querer
tales
usted
acá
cada
cuán
estar
mismas
nosotros
qué
tan
ustedes
ahí
cierta
cuánto
estas
mismo
nuestra
quien
tanta
varias
ajena
ciertas
cuántos
este
mismos
nuestras
quienes
tantas
varios
ajenas
cierto
de
estos
mucha
nuestro
quienesquiera tanto
vosotras
ajeno
ciertos
dejar
hacer
muchas
nuestros
quienquiera
tantos
vosotros
ajenos
como
del
hasta
muchísima
nunca
quién
te
vuestra
al
cómo
demasiada
jamás
muchísimas
os
ser
tener
vuestras
algo
con
demasiadas
junto
muchísimo
otra
si
ti
vuestro
alguna
conmigo
demasiado
juntos
muchísimos
otras
siempre
toda
vuestros
algunas
consigo
demasiados
la
mucho
otro
sí
todas
y yo
alguno
contigo
demás
las
muchos
otros
sín
todo
algunos
cualquier
el
lo
muy
para
Sr
todos
algún
cualquiera
ella
los
nada
parecer
Sra
tomar
allá
cualquieras
ellas
mas
ni
poca
Sres
tuya
allí
cuan
ellos
más
ninguna
pocas
Sta
tuyo
aquel
cuanta
él
me
ningunas
poco
suya
tú
aquella
cuantas
esa
menos
ninguno
pocos
suyas
un
aquellas
cuánta
esas
mía
ningunos
por
suyo
una
aquello
cuántas
ese
mientras
no
porque
suyos
unas
aquellos
cuanto
esos
mío
nos
que
tal
unos
D-12 Oracle Text Reference
Swedish (s) Default Stoplist
Swedish (s) Default Stoplist The following Swedish words are defined in the default stoplist for this language: Stop word
Stop word
Stop word
Stop word
ab
efter
ja
sin
aldrig
efterät
jag
skall
all
eftersom
långsamt
som
alla
ej
långt
till
alltid
eller
lite
tillräckligt
än
emot
man
tillsammans
ännu
en
med
trots att
ånyo
ett
medan
under
är
fastän
mellan
uppe
att
för
mer
ut
av
fort
mera
utan
avser
framför
mindre
utom
avses
från
mot
vad
bakom
genom
myckett
väl
bra
gott
när
var
bredvid
hamske
nära
varför
dä
han
nej
vart
där
här
nere
varthän
de
hellre
ni
vem
dem
hon
nu
vems
den
hos
och
vi
denna
hur
oksa
vid
deras
i
om
vilken
dess
in
över
det
ingen
på
detta
innan
så
du
inte
sådan
Supplied Stoplists
D-13
Swedish (s) Default Stoplist
D-14 Oracle Text Reference
E Alternate Spelling Conventions This appendix describes the alternate spelling conventions that Oracle Text uses in the German, Danish, and Swedish languages. This chapter also describe how to enable alternate spelling. The following topics are covered: ■
Overview
■
German Alternate Spelling
■
Danish Alternate Spelling
■
Swedish Alternate Spelling
Alternate Spelling Conventions E-1
Overview
Overview This chapter lists the alternate spelling conventions Oracle Text uses for German, Danish and Swedish. These languages contain words that have more than one accepted spelling. When a language has more than one way of spelling a word, Oracle indexes the word in its basic form. For example in German, the basic form of the ä character is ae, and so words containing the ä character are indexed with ae as the substitution. Oracle also converts query terms to their basic forms before lookup. As a result, users can query words with either spelling.
Enabling Alternate Spelling You enable alternate spelling by specifying either GERMAN, DANISH, or SWEDISH for the alternate spelling BASIC_LEXER attribute. For example, to enable alternate spelling in German, you can issue the following statements: begin ctx_ddl.create_preference(’GERMAN_LEX’, ’BASIC_LEXER’); ctx_ddl.set_attribute(’GERMAN_LEX’, ’ALTERNATE_SPELLING’, ’GERMAN’); end;
Disabling Alternate Spelling To disable alternate spelling, use the CTX_DDL.UNSET_ATTRIBUTE procedure as follows: begin ctx_ddl.unset_attribute(’GERMAN_LEX’, ’ALTERNATE_SPELLING’); end;
E-2 Oracle Text Reference
German Alternate Spelling
German Alternate Spelling The German alphabet is the English alphabet plus the additional characters: ä ö ü ß. The following table lists the alternate spelling conventions Oracle Text uses for these characters. Character
Alternate Spelling Substitution
ä
ae
ü
ue
ö
oe
Ä
AE
Ü
UE
Ö
OE
ß
ss
Alternate Spelling Conventions E-3
Danish Alternate Spelling
Danish Alternate Spelling The Danish alphabet is the Latin alphabet without the w, plus the special characters: ø æ å. The following table lists the alternate spelling conventions Oracle Text uses for these characters. Character
Alternate Spelling Substitution
æ
ae
ø
oe
å
aa
Æ
AE
Ø
OE
Å
AA
E-4 Oracle Text Reference
Swedish Alternate Spelling
Swedish Alternate Spelling The Swedish alphabet is the English alphabet without the w, plus the additional characters: å ä ö. The following table lists the alternate spelling conventions Oracle Text uses for these characters. Character
Alternate Spelling Substitution
ä
ae
å
aa
ö
oe
Ä
AE
Å
AA
Ö
OE
Alternate Spelling Conventions E-5
Swedish Alternate Spelling
E-6 Oracle Text Reference
F Scoring Algorithm This appendix describes the scoring algorithm for word queries.You obtain score using the SCORE operator. Note: This appendix discusses how Oracle calculates score for
word queries, which is different from the way it calculates score for ABOUT queries in English.
Scoring Algorithm
F-1
Scoring Algorithm for Word Queries
Scoring Algorithm for Word Queries To calculate a relevance score for a returned document in a word query, Oracle uses an inverse frequency algorithm based on Salton’s formula. Inverse frequency scoring assumes that frequently occurring terms in a document set are noise terms, and so these terms are scored lower. For a document to score high, the query term must occur frequently in the document but infrequently in the document set as a whole. The following table illustrates Oracle’s inverse frequency scoring. The first column shows the number of documents in the document set, and the second column shows the number of terms in the document necessary to score 100. This table assumes that only one document in the set contains the query term. Number of Documents in Document Set
Occurrences of Term in Document Needed to Score 100
1
34
5
20
10
17
50
13
100
12
500
10
1,000
9
10,000
7
100,000
5
1,000,000
4
The table illustrates that if only one document contained the query term and there were five documents in the set, the term would have to occur 20 times in the document to score 100. Whereas, if there were 1,000,000 documents in the set, the term would have to occur only 4 times in the document to score 100.
F-2
Oracle Text Reference
Scoring Algorithm for Word Queries
Example You have 5000 documents dealing with chemistry in which the term chemical occurs at least once in every document. The term chemical thus occurs frequently in the document set. You have a document that contains 5 occurrences of chemical and 5 occurrences of the term hydrogen. No other document contains the term hydrogen. The term hydrogen thus occurs infrequently in the document set. Because chemical occurs so frequently in the document set, its score for the document is lower with respect to hydrogen, which is infrequent is the document set as a whole. The score for hydrogen is therefore higher than that of chemical. This is so even though both terms occur 5 times in the document. Note: Even if the relatively infrequent term hydrogen occurred 4
times in the document, and chemical occurred 5 times in the document, the score for hydrogen might still be higher, because chemical occurs so frequently in the document set (at least 5000 times). Inverse frequency scoring also means that adding documents that contain hydrogen lowers the score for that term in the document, and adding more documents that do not contain hydrogen raises the score.
DML and Scoring Because the scoring algorithm is based on the number of documents in the document set, inserting, updating or deleting documents in the document set is likely change the score for any given term before and after the DML. If DML is heavy, you or your Oracle administrator must optimize the index. Perfect relevance ranking is obtained by executing a query right after optimizing the index. If DML is light, Oracle still gives fairly accurate relevance ranking. In either case, you or your Oracle administrator must synchronize the index with CTX_DDL.SYNC_INDEX.
Scoring Algorithm
F-3
Scoring Algorithm for Word Queries
F-4
Oracle Text Reference
G Views This appendix lists all of the views provided by Oracle Text. The system provides the following views: ■
CTX_CLASSES
■
CTX_INDEXES
■
CTX_INDEX_ERRORS
■
CTX_INDEX_OBJECTS
■
CTX_INDEX_PARTITIONS
■
CTX_INDEX_SETS
■
CTX_INDEX_SET_INDEXES
■
CTX_INDEX_SUB_LEXERS
■
CTX_INDEX_SUB_LEXER_VALUES
■
CTX_INDEX_VALUES
■
CTX_OBJECTS
■
CTX_OBJECT_ATTRIBUTES
■
CTX_OBJECT_ATTRIBUTE_LOV
■
CTX_PARAMETERS
■
CTX_PENDING
■
CTX_PREFERENCES
■
CTX_PREFERENCE_VALUES
■
CTX_SECTIONS
Views G-1
■
CTX_SECTION_GROUPS
■
CTX_SERVERS
■
CTX_SQES
■
CTX_STOPLISTS
■
CTX_STOPWORDS
■
CTX_SUB_LEXERS
■
CTX_THESAURI
■
CTX_THES_PHRASES
■
CTX_USER_INDEXES
■
CTX_USER_INDEX_ERRORS
■
CTX_USER_INDEX_OBJECTS
■
CTX_USER_INDEX_PARTITIONS
■
CTX_USER_INDEX_SETS
■
CTX_USER_INDEX_SET_INDEXES
■
CTX_USER_INDEX_SUB_LEXERS
■
CTX_USER_INDEX_SUB_LEXER_VALS
■
CTX_USER_INDEX_VALUES
■
CTX_USER_PENDING
■
CTX_USER_PREFERENCES
■
CTX_USER_PREFERENCE_VALUES
■
CTX_USER_SECTIONS
■
CTX_USER_SECTION_GROUPS
■
CTX_USER_SQES
■
CTX_USER_STOPLISTS
■
CTX_USER_STOPWORDS
■
CTX_USER_SUB_LEXERS
■
CTX_USER_THESAURI
■
CTX_USER_THES_PHRASES
G-2 Oracle Text Reference
■
CTX_VERSION
Views G-3
CTX_CLASSES
CTX_CLASSES This view displays all the preference categories registered in the Text data dictionary. It can be queried by any user. Column Name
Type
Description
CLA_NAME
VARCHAR2(30)
Class name
CLA_DESCRIPTION
VARCHAR2(80)
Class description
CTX_INDEXES This view displays all indexes that are registered in the Text data dictionary for the current user. It can be queried by CTXSYS. Column Name
Type
Description
IDX_ID
NUMBER
Internal index id.
IDX_OWNER
VARCHAR2(30)
Owner of index.
IDX_NAME
VARCHAR2(30)
Name of index.
IDX_TABLE_OWNER
VARCHAR2(30)
Owner of table.
IDX_TABLE
VARCHAR2(30)
Table name.
IDX_KEY_NAME
VARCHAR2(256)
Primary key column(s).
IDX_TEXT_NAME
VARCHAR2(30)
Text column name.
IDX_DOCID_COUNT
NUMBER
Number of documents indexed.
IDX_STATUS
VARCHAR2(12)
Status.
IDX_LANGUAGE_COLUMN
VARCHAR2(256)
Name of the language column in base table.
IDX_FORMAT_COLUMN
VARCHAR2(256)
Name of the format column in base table.
IDX_CHARSET_COLUMN
VARCHAR2(256)
Name of the charset column in base table.
G-4 Oracle Text Reference
CTX_INDEX_OBJECTS
CTX_INDEX_ERRORS This view displays the DML errors and is queryable by CTXSYS. Column Name
Type
Description
ERR_INDEX_OWNER
VARCHAR2(30)
Index owner.
ERR_INDEX_NAME
VARCHAR2(30)
Name of index.
ERR_TIMESTAMP
DATE
Time of error.
ERR_TEXTKEY
VARCHAR2(18)
ROWID of errored document or name of errored operation (i.e. ALTER INDEX)
ERR_TEXT
VARCHAR2(4000)
Error text.
CTX_INDEX_OBJECTS This view displays the objects that are used for each class in the index. It can be queried by CTXSYS. Column Name
Type
Description
IXO_INDEX_OWNER
VARCHAR2(30)
Index owner.
IXO_INDEX_NAME
VARCHAR2(30)
Index name.
IXO_CLASS
VARCHAR2(30)
Class name.
IXO_OBJECT
VARCHAR2(30)
Object name.
Views G-5
CTX_INDEX_PARTITIONS
CTX_INDEX_PARTITIONS This view displays all index partitions. It can be queried by CTXSYS. Column Name
Type
Description
IXP_ID
NUMBER(38)
Index partition id.
IXP_INDEX_OWNER
VARCHAR2(30)
Index owner.
IXP_INDEX_NAME
VARCHAR2(30)
Index name.
IXP_INDEX_ PARTITION_NAME
VARCHAR2(30)
Index partition name.
IXP_TABLE_OWNER
VARCHAR2(30)
Table owner.
IXP_TABLE_NAME
VARCHAR2(30)
Table name.
IXP_TABLE_ PARTITION_NAME
VARCHAR2(30)
Table partition name.
IXP_DOCID_COUNT
NUMBER(38)
Number of documents associated with the partition.
IXP_STATUS
VARCHAR2(12)
Partition status.
CTX_INDEX_SETS This view displays all index set names. It can be queried by any user. Column Name
Type
Description
IXS_OWNER
VARCHAR2(30)
Index set owner.
IXS_NAME
VARCHAR2(30)
Index set name.
G-6 Oracle Text Reference
CTX_INDEX_SUB_LEXERS
CTX_INDEX_SET_INDEXES This view displays all the sub-indexes in an index set. It can be queried by any user. Column Name
Type
Description
IXX_INDEX_SET_ OWNER
VARCHAR2(30)
Index set owner.
IXX_INDEX_SET_NAME VARCHAR2(30)
Index set name.
IXX_COLLIST
VARCHAR2(500)
Column list of the sub-index.
IXX_STORAGE
VARCHAR2(500)
Storage clause of the sub-index.
CTX_INDEX_SUB_LEXERS This view shows the sub-lexers for each language for each index. It can be queried by CTXSYS. Column Name
Type
Description
ISL_INDEX_OWNER
VARCHAR2(30)
Index owner.
ISL_INDEX_NAME
VARCHAR2(30)
Index name.
ISL_LANGUAGE
VARCHAR2(30)
Language of sub-lexer
ISL_ALT_VALUE
VARCHAR2(30)
Alternate value of language.
ISL_OBJECT
VARCHAR2(30)
Name of lexer object used for this language.
Views G-7
CTX_INDEX_SUB_LEXER_VALUES
CTX_INDEX_SUB_LEXER_VALUES Shows the sub-lexer attributes and their values. Accessible by CTXSYS. Column Name
Type
Description
ISV_INDEX_OWNER
VARCHAR2(30)
Index owner.
ISV_INDEX_NAME
VARCHAR2(30)
Index name.
ISV_LANGUAGE
VARCHAR2(30)
Language of sub-lexer
ISV_OBJECT
VARCHAR2(30)
Name of lexer object used for this language.
ISV_ATTRIBUTE
VARCHAR2(30)
Name of sub-lexer attribute.
ISV_VALUE
VARCHAR2(500)
Value of attribute of sub-lexer.
G-8 Oracle Text Reference
CTX_OBJECTS
CTX_INDEX_VALUES This view displays attribute values for each object used in indexes. This view is queryable by CTXSYS. Column Name
Type
Description
IXV_INDEX_OWNER
VARCHAR2(30)
Index owner.
IXV_INDEX_NAME
VARCHAR2(30)
Index name.
IXV_CLASS
VARCHAR2(30)
Class name.
IXV_OBJECT
VARCHAR2(30)
Object name.
IXV_ATTRIBUTE
VARCHAR2(30)
Attribute name
IXV_VALUE
VARCHAR2(500)
Attribute value.
CTX_OBJECTS This view displays all of the Text objects registered in the Text data dictionary. This view can be queried by any user. Column Name
Type
Description
OBJ_CLASS
VARCHAR2(30)
Object class (Datastore, Filter, Lexer, etc.)
OBJ_NAME
VARCHAR2(30)
Object name
OBJ_DESCRIPTION VARCHAR2(80)
Object description
Views G-9
CTX_OBJECT_ATTRIBUTES
CTX_OBJECT_ATTRIBUTES This view displays the attributes that can be assigned to preferences of each object. It can be queried by all users. Column Name
Type
Description
OAT_CLASS
VARCHAR2(30)
Object class (Data Store, Filter, Lexer, etc.)
OAT_OBJECT
VARCHAR2(30)
Object name
OAT_ATTRIBUTE
VARCHAR2(64)
Attribute name
OAT_DESCRIPTION VARCHAR2(80)
Description of attribute
OAT_REQUIRED
VARCHAR2(1)
Required attribute, either Y or N.
OAT_STATIC
VARCHAR2(1)
Not currently used.
OAT_DATATYPE
VARCHAR2(64)
Attribute datatype
OAT_DEFAULT
VARCHAR2(500)
Default value for attribute
OAT_MIN
NUMBER
Minimum value.
OAT_MAX
NUMBER
Maximum value.
OAT_MAX_LENGTH
NUMBER
Maximum length.
CTX_OBJECT_ATTRIBUTE_LOV This view displays the allowed values for certain object attributes provided by Oracle Text. It can be queried by all users. Column Name
Type
Description
OAL_CLASS
NUMBER(38)
Class of object.
OAL_OBJECT
VARCHAR2(30)
Object name.
OAL_ATTRIBUTE
VARCHAR2(32)
Attribute name.
OAl_LABEL
VARCHAR2(30)
Attribute value label.
OAL_VALUE
VARCHAR2(64)
Attribute value.
OAL_DESCRIPTION VARCHAR2(80)
G-10
Oracle Text Reference
Attribute value description.
CTX_PARAMETERS
CTX_PARAMETERS This view displays all system-defined parameters as defined by CTXSYS. It can be queried by any user.
Views
G-11
CTX_PARAMETERS
Column Name
Type
Description
PAR_NAME
VARCHAR2(30)
Parameter name:
PAR_VALUE
G-12
Oracle Text Reference
VARCHAR2(500)
■
max_index_memory
■
ctx_doc_key_type
■
default_index_memory
■
default_datastore
■
default_filter_binary
■
default_filter_text
■
default_filter_file
■
default_section_html
■
default_section_xml
■
default_section_text
■
default_lexer
■
default_stoplist
■
default_storage
■
default_wordlist
■
default_ctxcat_lexer
■
default_ctxcat_index_set
■
default_ctxcat_stoplist
■
default_ctxcat_storage
■
default_ctxcat_wordlist
■
default_ctxrule_lexer
■
default_ctxrule_stoplist
■
default_ctxrule_storage
■
default_ctxrule_wordlist
■
log_directory
■
file_access_role
Parameter value. For max_index_memory and default_index_memory, PAR_VALUE stores a string consisting of the memory amount. For the other parameter names, PAR_VALUE stores the names of the preferences used as defaults for index creation.
CTX_PREFERENCES
CTX_PENDING This view displays a row for each of the user’s entries in the DML Queue. It can be queried by CTXSYS. Column Name
Type
Description
PND_INDEX_OWNER
VARCHAR2(30)
Index owner.
PND_INDEX_NAME
VARCHAR2(30)
Name of index.
PND_PARTITION_NAME VARCHAR2(30)
Name of partition for local partition indexes. NULL for normal indexes.
PND_ROWID
ROWID
ROWID to be indexed
PND_TIMESTAMP
DATE
Time of modification
CTX_PREFERENCES This view displays preferences created by Oracle Text users, as well as all the system-defined preferences included with Oracle Text. The view contains one row for each preference. It can be queried by all users. Column Name
Type
Description
PRE_OWNER
VARCHAR2(30)
Username of preference owner.
PRE_NAME
VARCHAR2(30)
Preference name.
PRE_CLASS
VARCHAR2(30)
Preference class.
PRE_OBJECT
VARCHAR2(30)
Object used.
Views
G-13
CTX_PREFERENCE_VALUES
CTX_PREFERENCE_VALUES This view displays the values assigned to all the preferences in the Text data dictionary. The view contains one row for each value. It can be queried by all users. Column Name
Type
Description
PRV_OWNER
VARCHAR2(30)
Username of preference owner.
PRV_PREFERENCE
VARCHAR2(30)
Preference name.
PRV_ATTRIBUTE
VARCHAR2(64)
Attribute name
PRV_VALUE
VARCHAR2(500)
Attribute value
CTX_SECTIONS This view displays information about all the sections that have been created in the Text data dictionary. It can be queried by any user.
G-14
Column Name
Type
Description
SEC_OWNER
VARCHAR2(30)
Owner of the section group.
SEC_SECTION_GROUP
VARCHAR2(30)
Name of the section group.
SEC_TYPE
VARCHAR2(30)
Type of section, either ZONE, FIELD, SPECIAL, ATTR, STOP.
SEC_ID
NUMBER
Section id.
SEC_NAME
VARCHAR2(30)
Name of section.
SEC_TAG
VARCHAR2(64)
Section tag
SEC_VISIBLE
VARCHAR2(1)
Y or N visible indicator for field sections only.
Oracle Text Reference
CTX_SERVERS
CTX_SECTION_GROUPS This view displays information about all the section groups that have been created in the Text data dictionary. It can be queried by any user. Column Name
Type
Description
SGP_OWNER
VARCHAR2(30)
Owner of section group.
SGP_NAME
VARCHAR2(30)
Name of section group.
SGP_TYPE
VARCHAR2(30)
Type of section group
CTX_SERVERS This view displays ctxsrv servers that are currently active. It can be queried only by CTXSYS. Column Name
Type
Description
SER_NAME
VARCHAR2(60)
Server identifier
SER_STATUS
VARCHAR2(8)
Server status (IDLE, RUN, EXIT)
SER_ADMMBX
VARCHAR2(60)
Admin pipe mailbox name for server.
SER_OOBMBX
VARCHAR2(60)
Out-of-band mailbox name for server.
SER_SESSION
NUMBER
No Longer Used
SER_AUDSID
NUMBER
Server audit session ID
SER_DBID
NUMBER
Server database ID
SER_PROCID
VARCHAR2(10)
No Longer Used
SER_PERSON_MASK VARCHAR2(30)
Personality mask for server
SER_STARTED_AT
DATE
Date on which server was started
SER_IDLE_TIME
NUMBER
Idle time, in seconds, for server
SER_DB_INSTANCE VARCHAR2(10)
Database instance ID
SER_MACHINE
Name of host machine on which server is running
VARCHAR2(64)
Views
G-15
CTX_SQES
CTX_SQES This view displays the definitions for all SQEs that have been created by users. It can be queried by all users. Column Name
Type
Description
SQE_OWNER
VARCHAR2(30)
Owner of SQE.
SQE_NAME
VARCHAR2(30)
Name of SQE.
SQE_QUERY
VARCHAR2(2000)
Query Text
CTX_STOPLISTS This view displays stoplists. Queryable by all users. Column Name
Type
Description
SPL_OWNER
VARCHAR2(30)
Owner of stoplist.
SPL_NAME
VARCHAR2(30)
Name of stoplist.
SPL_COUNT
NUMBER
Number of stopwords
SPL_TYPE
VARCHAR2(30)
Type of stoplist, MULTI or BASIC.
CTX_STOPWORDS This view displays the stopwords in each stoplist. Queryable by all users.
G-16
Column Name
Type
Description
SPW_OWNER
VARCHAR2(30)
Stoplist owner.
SPW_STOPLIST
VARCHAR2(30)
Stoplist name.
SPW_TYPE
VARCHAR2(10)
Stop type, either STOP_WORD, STOP_ CLASS, STOP_THEME.
SPW_WORD
VARCHAR2(80)
Stopword.
SPW_LANGUAGE
VARCHAR2(30)
Stopword language.
Oracle Text Reference
CTX_THES_PHRASES
CTX_SUB_LEXERS This view contains information on multi-lexers and the sub-lexer preferences they contain. It can be queried by any user. Column Name
Type
Description
SLX_OWNER
VARCHAR2(30)
Owner of the multi-lexer preference.
SLX_NAME
VARCHAR2(30)
Name of the multi-lexer preference.
SLX_LANGUAGE
VARCHAR2(30)
Language of the referenced lexer (full name, not abbreviation).
SLX_ALT_VALUE
VARCHAR2(30)
An alternate value for the language.
SLX_SUB_OWNER
VARCHAR2(30)
Owner of the sub-lexer.
SLX_SUB_NAME
VARCHAR2(30)
Name of the sub-lexer.
CTX_THESAURI This view displays information about all the thesauri that have been created in the Text data dictionary. It can be queried by any user. Column Name
Type
Description
THS_OWNER
VARCHAR2(30)
Thesaurus owner
THS_NAME
VARCHAR2(30)
Thesaurus name
CTX_THES_PHRASES This view displays phrase information for all thesauri in the Text data dictionary. It can be queried by any user. Column Name
Type
Description
THP_THESAURUS
VARCHAR2(30)
Thesaurus name.
THP_PHRASE
VARCHAR2(256)
Thesaurus phrase.
THP_QUALIFIER
VARCHAR2(256)
Thesaurus qualifier.
THP_SCOPE_NOTE
VARCHAR2(2000)
Thesaurus scope notes.
Views
G-17
CTX_USER_INDEXES
CTX_USER_INDEXES This view displays all indexes that are registered in the Text data dictionary for the current user. It can be queried by all users.
G-18
Column Name
Type
Description
IDX_ID
NUMBER
Internal index id.
IDX_NAME
VARCHAR2(30)
Name of index.
IDX_TYPE
VARCHAR2(30)
Type of index: CONTEXT, CTXCAT, OR CTXRULE
IDX_TABLE_OWNER
VARCHAR2(30)
Owner of table.
IDX_TABLE
VARCHAR2(30)
Table name.
IDX_KEY_NAME
VARCHAR2(256)
Primary key column(s).
IDX_TEXT_NAME
VARCHAR2(30)
Text column name.
IDX_DOCID_COUNT
NUMBER
Number of documents indexed.
IDX_STATUS
VARCHAR2(12)
Status, either INDEXED or INDEXING.
IDX_LANGUAGE_COLUMN
VARCHAR2(256)
Name of the language column of base table.
IDX_FORMAT_COLUMN
VARCHAR2(256)
Name of the format column of base table.
IDX_CHARSET_COLUMN
VARCHAR2(256)
Name of the charset column of base table.
Oracle Text Reference
CTX_USER_INDEX_OBJECTS
CTX_USER_INDEX_ERRORS This view displays the indexing errors for the current user and is queryable by all users. Column Name
Type
Description
ERR_INDEX_NAME
VARCHAR2(30)
Name of index.
ERR_TIMESTAMP
DATE
Time of error.
ERR_TEXTKEY
VARCHAR2(18)
ROWID of errored document or name of errored operation (i.e. ALTER INDEX)
ERR_TEXT
VARCHAR2(4000)
Error text.
CTX_USER_INDEX_OBJECTS This view displays the preferences that are attached to the indexes defined for the current user. It can be queried by all users. Column Name
Type
Description
IXO_INDEX_NAME VARCHAR2(30)
Name of index.
IXO_CLASS
VARCHAR2(30)
Object name
IXO_OBJECT
VARCHAR2(80)
Object description
Views
G-19
CTX_USER_INDEX_PARTITIONS
CTX_USER_INDEX_PARTITIONS This view displays all index partitions for the current user. It is queryable by all users. Column Name
Type
Description
IXP_ID
NUMBER(38)
Index partition id.
IXP_INDEX_NAME
VARCHAR2(30)
Index name.
IXP_INDEX_ PARTITION_NAME
VARCHAR2(30)
Index partition name.
IXP_TABLE_OWNER
VARCHAR2(30)
Table owner.
IXP_TABLE_NAME
VARCHAR2(30)
Table name.
IXP_TABLE_ PARTITION_NAME
VARCHAR2(30)
Table partition name.
IXP_DOCID_COUNT
NUMBER(38)
Number of documents associated with the index partition.
IXP_STATUS
VARCHAR2(12)
Partition status.
CTX_USER_INDEX_SETS This view displays all index set names that belong to the current user. It is queryable by all users.
G-20
Column Name
Type
Description
IXS_NAME
VARCHAR2(30)
Index set name.
Oracle Text Reference
CTX_USER_INDEX_SUB_LEXER_VALS
CTX_USER_INDEX_SET_INDEXES This view displays all the indexes in an index set that belong to the current user. It is queryable by all users. Column Name
Type
Description
IXX_INDEX_SET_NAME VARCHAR2(30)
Index set name.
IXX_COLLIST
VARCHAR2(500)
Column list of the index.
IXX_STORAGE
VARCHAR2(500)
Storage clause of the index.
CTX_USER_INDEX_SUB_LEXERS This view shows the sub-lexers for each language for each index for the querying user. This view can be queried by all users. Column Name
Type
Description
ISL_INDEX_NAME
VARCHAR2(30)
Index name.
ISL_LANGUAGE
VARCHAR2(30)
Language of sub-lexer
ISL_ALT_VALUE
VARCHAR2(30)
Alternate value of language.
ISL_OBJECT
VARCHAR2(30)
Name of lexer object used for this language.
CTX_USER_INDEX_SUB_LEXER_VALS Shows the sub-lexer attributes and their values for the querying user. This view can be queried by all users. Column Name
Type
Description
ISV_INDEX_NAME
VARCHAR2(30)
Index name.
ISV_LANGUAGE
VARCHAR2(30)
Language of sub-lexer
ISV_OBJECT
VARCHAR2(30)
Name of lexer object used for this language.
ISV_ATTRIBUTE
VARCHAR2(30)
Name of sub-lexer attribute.
ISV_VALUE
VARCHAR2(500)
Value of sub-lexer attribute.
Views
G-21
CTX_USER_INDEX_VALUES
CTX_USER_INDEX_VALUES This view displays attribute values for each object used in indexes for the current user. This view is queryable by all users. Column Name
Type
Description
IXV_INDEX_NAME
VARCHAR2(30)
Index name.
IXV_CLASS
VARCHAR2(30)
Class name.
IXV_OBJECT
VARCHAR2(30)
Object name.
IXV_ATTRIBUTE
VARCHAR2(30)
Attribute name
IXV_VALUE
VARCHAR2(500)
Attribute value.
CTX_USER_PENDING This view displays a row for each of the user’s entries in the DML Queue. It can be queried by all users. Column Name
Type
Description
PND_INDEX_NAME
VARCHAR2(30)
Name of index.
PND_PARTITION_NAME VARCHAR2(30)
Name of partition for local partition indexes. NULL for normal indexes.
PND_ROWID
ROWID
Rowid to be indexed.
PND_TIMESTAMP
DATE
Time of modification.
CTX_USER_PREFERENCES This view displays all preferences defined by the current user. It can be queried by all users.
G-22
Column Name
Type
Description
PRE_NAME
VARCHAR2(30)
Preference name.
PRE_CLASS
VARCHAR2(30)
Preference class.
PRE_OBJECT
VARCHAR2(30)
Object used.
Oracle Text Reference
CTX_USER_SECTION_GROUPS
CTX_USER_PREFERENCE_VALUES This view displays all the values for preferences defined by the current user. It can be queried by all users. Column Name
Type
Description
PRV_PREFERENCE
VARCHAR2(30)
Preference name.
PRV_ATTRIBUTE
VARCHAR2(64)
Attribute name
PRV_VALUE
VARCHAR2(500)
Attribute value
CTX_USER_SECTIONS This view displays information about the sections that have been created in the Text data dictionary for the current user. It can be queried by all users. Column Name
Type
Description
SEC__SECTION_GROUP VARCHAR2(30)
Name of the section group.
SEC_TYPE
VARCHAR2(30)
Type of section, either ZONE, FIELD, SPECIAL, STOP, or ATTR.
SEC_ID
NUMBER
Section id.
SEC_NAME
VARCHAR2(30)
Name of section.
SEC_TAG
VARCHAR2(64)
Section tag
SEC_VISIBLE
VARCHAR2(1)
Y or N visible indicator for field sections.
CTX_USER_SECTION_GROUPS This view displays information about the section groups that have been created in the Text data dictionary for the current user. It can be queried by all users.
Column Name
Type
Description
SGP_NAME
VARCHAR2(30)
Name of section group.
SGP_TYPE
VARCHAR2(30)
Type of section group
Views
G-23
CTX_USER_SQES
CTX_USER_SQES This view displays the definitions for all system and session SQEs that have been created by the current user. It can be viewed by all users. Column Name
Type
Description
SQE_OWNER
VARCHAR2(30)
Owner of SQE.
SQE_NAME
VARCHAR2(30)
Name of SQE.
SQE_QUERY
VARCHAR2(2000)
Query Text
CTX_USER_STOPLISTS This view displays stoplists for current user. It is queryable by all users. Column Name
Type
Description
SPL_NAME
VARCHAR2(30)
Name of stoplist.
SPL_COUNT
NUMBER
Number of stopwords
SPL_TYPE
VARCHAR2(30)
Type of stoplist, MULTI or BASIC.
CTX_USER_STOPWORDS This view displays stopwords in each stoplist for current user. Queryable by all users.
G-24
Column Name
Type
Description
SPW_STOPLIST
VARCHAR2(30)
Stoplist name.
SPW_TYPE
VARCHAR2(10)
Stop type, either STOP_WORD, STOP_ CLASS, STOP_THEME.
SPW_WORD
VARCHAR2(80)
Stopword.
SPW_LANGUAGE
VARCHAR2(30)
Stopword language.
Oracle Text Reference
CTX_USER_THES_PHRASES
CTX_USER_SUB_LEXERS For the current user, this view contains information on multi-lexers and the sub-lexer preferences they contain.It can be queried by any user. Column Name
Type
Description
SLX_NAME
VARCHAR2(30)
Name of the multi-lexer preference.
SLX_LANGUAGE
VARCHAR2(30)
Language of the referenced lexer (full name, not abbreviation).
SLX_ALT_VALUE
VARCHAR2(30)
An alternate value for the language.
SLX_SUB_OWNER
VARCHAR2(30)
Owner of the sub-lexer.
SLX_SUB_NAME
VARCHAR2(30)
Name of the sub-lexer.
CTX_USER_THESAURI This view displays the information about all of the thesauri that have been created in the system by the current user. It can be viewed by all users.
Column Name
Type
Description
THS_NAME
VARCHAR2(30)
Thesaurus name
CTX_USER_THES_PHRASES This view displays the phrase information of all thesaurus owned by the current user. It can be queried by all users. Column Name
Type
Description
THP_THESAURUS
VARCHAR2(30)
Thesaurus name.
THP_PHRASE
VARCHAR2(256)
Thesaurus phrase.
THP_QUALIFIER
VARCHAR2(256)
Phrase qualifier.
THP_SCOPE_NOTE
VARCHAR2(2000)
Scope note of the phrase.
Views
G-25
CTX_VERSION
CTX_VERSION This view displays the ctxsys data dictionary and code version number information. Column Name
Type
Description
VER_DICT
CHAR(9)
The CTXSYS data dictionary version number.
VER_CODE
VARCHAR2(9)
The version number of the code linked in to the Oracle shadow process. This column calls out to a trusted callout to fetch the version number for linked-in code. Thus, you can use this column to detect and verify patch releases.
G-26
Oracle Text Reference
H Stopword Transformations This appendix describes stopword transformations. The following topic is covered: ■
Understanding Stopword Transformations
Stopword Transformations
H-1
Understanding Stopword Transformations
Understanding Stopword Transformations When you use a stopword or stopword-only phrase as an operand for a query operator, Oracle rewrites the expression to eliminate the stopword or stopword-only phrase and then executes the query. The following section describes the stopword rewrites or transformations for each operator. In all tables, the Stopword Expression column describes the query expression or component of a query expression, while the right-hand column describes the way Oracle rewrites the query. The token stopword stands for a single stopword or a stopword-only phrase. The token non_stopword stands for either a single non-stopword, a phrase of all non-stopwords, or a phrase of non-stopwords and stopwords. The token no_lex stands for a single character or a string of characters that is neither a stopword nor a word that is indexed. For example, the + character by itself is an example of a no_lex token. When the Stopword Expression column completely describes the query expression, a rewritten expression of no_token means that no hits are returned when you enter such a query. When the Stopword Expression column describes a component of a query expression with more than one operator, a rewritten expression of no_token means that a no_ token value is passed to the next step of the rewrite. Transformations that contain a no_token as an operand in the Stopword Expression column describe intermediate transformations in which the no_token is a result of a previous transformation. These intermediate transformations apply when the original query expression has at least one stopword and more than one operator. For example, consider the following compound query expression: ’(this NOT dog) AND cat’
Assuming that this is the only stopword in this expression, Oracle applies the following transformations in the following order: stopword NOT non-stopword => no_token no_token AND non_stopword => non_stopword The resulting expression is: ’cat’
H-2
Oracle Text Reference
Understanding Stopword Transformations
Word Transformations Stopword Expression
Rewritten Expression
stopword
no_token
no_lex
no_token
The first transformation means that a stopword or stopword-only phrase by itself in a query expression results in no hits. The second transformation says that a term that is not lexed, such as the + character, results in no hits.
AND Transformations Stopword Expression
Rewritten Expression
non_stopword AND stopword
non_stopword
non_stopword AND no_token
non_stopword
stopword AND non_stopword
non_stopword
no_token AND non_stopword
non_stopword
stopword AND stopword
no_token
no_token AND stopword
no_token
stopword AND no_token
no_token
no_token AND no_token
no_token
OR Transformations Stopword Expression
Rewritten Expression
non_stopword OR stopword
non_stopword
non_stopword OR no_token
non_stopword
stopword OR non_stopword
non_stopword
no_token OR non_stopword
non_stopword
Stopword Transformations
H-3
Understanding Stopword Transformations
Stopword Expression
Rewritten Expression
stopword OR stopword
no_token
no_token OR stopword
no_token
stopword OR no_token
no_token
no_token OR no_token
no_token
ACCUMulate Transformations
H-4
Stopword Expression
Rewritten Expression
non_stopword ACCUM stopword
non_stopword
non_stopword ACCUM no_token
non_stopword
stopword ACCUM non_stopword
non_stopword
no_token ACCUM non_stopword
non_stopword
stopword ACCUM stopword
no_token
no_token ACCUM stopword
no_token
stopword ACCUM no_token
no_token
no_token ACCUM no_token
no_token
Oracle Text Reference
Understanding Stopword Transformations
MINUS Transformations Stopword Expression
Rewritten Expression
non_stopword MINUS stopword
non_stopword
non_stopword MINUS no_token
non_stopword
stopword MINUS non_stopword
no_token
no_token MINUS non_stopword
no_token
stopword MINUS stopword
no_token
no_token MINUS stopword
no_token
stopword MINUS no_token
no_token
no_token MINUS no_token
no_token
NOT Transformations Stopword Expression
Rewritten Expression
non_stopword NOT stopword
non_stopword
non_stopword NOT no_token
non_stopword
stopword NOT non_stopword
no_token
no_token NOT non_stopword
no_token
stopword NOT stopword
no_token
no_token NOT stopword
no_token
stopword NOT no_token
no_token
no_token NOT no_token
no_token
Stopword Transformations
H-5
Understanding Stopword Transformations
EQUIValence Transformations Stopword Expression
Rewritten Expression
non_stopword EQUIV stopword
non_stopword
non_stopword EQUIV no_token
non_stopword
stopword EQUIV non_stopword
non_stopword
no_token EQUIV non_stopword
non_stopword
stopword EQUIV stopword
no_token
no_token EQUIV stopword
no_token
stopword EQUIV no_token
no_token
no_token EQUIV no_token
no_token
Note: When you use query explain plan, not all of the equivalence
transformations are represented in the EXPLAIN table.
NEAR Transformations
H-6
Stopword Expression
Rewritten Expression
non_stopword NEAR stopword
non_stopword
non_stopword NEAR no_token
non_stopword
stopword NEAR non_stopword
non_stopword
no_token NEAR non_stopword
non_stopword
stopword NEAR stopword
no_token
no_token NEAR stopword
no_token
stopword NEAR no_token
no_token
no_token NEAR no_token
no_token
Oracle Text Reference
Understanding Stopword Transformations
Weight Transformations Stopword Expression
Rewritten Expression
stopword * n
no_token
no_token * n
no_token
Threshold Transformations Stopword Expression
Rewritten Expression
stopword > n
no_token
no_token > n
no_token
WITHIN Transformations Stopword Expression
Rewritten Expression
stopword WITHIN section
no_token
no_token WITHIN section
no_token
Stopword Transformations
H-7
Understanding Stopword Transformations
H-8
Oracle Text Reference
I English Knowledge Base Category Hierarchy This appendix provides a list of all the concepts in the knowledge base that serve as categories. The appendix is divided into six sections, corresponding to the six main branches of the knowledge base: ■
Branch 1: science and technology
■
Branch 2: business and economics
■
Branch 3: government and military
■
Branch 4: social environment
■
Branch 5: geography
■
Branch 6: abstract ideas and concepts
The categories are presented in an inverted-tree hierarchy and within each category, sub-categories are listed in alphabetical order. Note: This appendix does not contain all the concepts found in
the knowledge base. It only contains those concepts that serve as categories (meaning they are parent nodes in the hierarchy).
English Knowledge Base Category Hierarchy I-1
Branch 1: science and technology
Branch 1: science and technology [1] communications [2] journalism [3] broadcast journalism [3] photojournalism [3] print journalism [4] newspapers [2] public speaking [2] publishing industry [3] desktop publishing [3] periodicals [4] business publications [3] printing [2] telecommunications industry [3] computer networking [4] Internet technology [5] Internet providers [5] Web browsers [5] search engines [3] data transmission [3] fiber optics [3] telephone service
[1] formal education [2] colleges and universities [3] academic degrees [3] business education [2] curricula and methods [2] library science [2] reference books [2] schools [2] teachers and students
[1] hard sciences [2] aerospace industry [3] satellite technology [3] space exploration [4] Mars exploration [4] lunar exploration [4] space explorers [4] spacecraft and space stations [2] chemical industry [3] chemical adhesives [3] chemical dyes
I-2 Oracle Text Reference
[3] chemical engineering [3] materials technology [4] industrial ceramics [4] metal industry [5] aluminum industry [5] metallurgy [5] steel industry [4] plastics [4] rubber [4] synthetic textiles [3] paints and finishing materials [3] pesticides [4] fungicides [4] herbicides [2] chemistry [3] chemical properties [3] chemical reactions [3] chemicals [4] chemical acids [4] chemical elements [4] molecular reactivity [4] molecular structure [3] chemistry tools [4] chemical analysis [4] chemistry glassware [4] purification and isolation of chemicals [3] organic chemistry [3] theory and physics of chemistry [2] civil engineering [3] building architecture [3] construction industry [4] building components [5] exterior structures [6] entryways and extensions [6] landscaping [6] ornamental architecture [6] roofs and towers [6] walls [6] windows [5] interior structures [6] building foundations [6] building systems [7] electrical systems [7] fireproofing and insulation [7] plumbing [6] rooms [4] buildings and dwellings
Branch 1: science and technology
[5] outbuildings [4] carpentry [4] construction equipment [4] construction materials [5] paneling and composites [5] surfaces and finishing [2] computer industry [3] computer hardware industry [4] computer components [5] computer memory [5] microprocessors [4] computer peripherals [5] data storage devices [4] hand-held computers [4] laptop computers [4] mainframes [4] personal computers [4] workstations [3] computer science [4] artificial intelligence [3] computer security and data encryption [4] computer viruses and protection [3] computer software industry [4] CAD-CAM [4] client-server software [4] computer programming [5] programming development tools [5] programming languages [4] operating systems [3] computer standards [3] cyberculture [3] human-computer interaction [3] information technology [4] computer multimedia [5] computer graphics [5] computer sound [5] computer video [4] databases [4] document management [4] natural language processing [4] spreadsheets [3] network computing [3] supercomputing and parallel computing [3] virtual reality [2] electrical engineering [2] electronics [3] consumer electronics [4] audio electronics [4] video electronics
[3] electronic circuits and components [4] microelectronics [4] semiconductors and superconductors [3] radar technology [2] energy industry [3] electric power industry [3] energy sources [4] alternative energy sources [4] fossil fuels industry [5] coal industry [5] petroleum products industry [4] nuclear power industry [2] environment control industries [3] heating and cooling systems [3] pest control [3] waste management [2] explosives and firearms [3] chemical explosives [3] firearm parts and accessories [3] recreational firearms [2] geology [3] geologic formations [3] geologic substances [4] mineralogy [5] gemstones [5] igneous rocks [5] metamorphic rocks [5] sedimentary rocks [3] hydrology [3] meteorology [4] atmospheric science [4] clouds [4] storms [4] weather modification [4] weather phenomena [4] winds [3] mining industry [3] natural disasters [3] oceanography [3] seismology [3] speleology [3] vulcanology [2] inventions [2] life sciences [3] biology [4] biochemistry [5] biological compounds [6] amino acids [6] enzymes
English Knowledge Base Category Hierarchy I-3
Branch 1: science and technology
[6] hormones [7] androgens and anabolic
[6] immune systems [7] antigens and antibodies [6] lymphatic systems [6] muscular systems [6] nervous systems [6] reproductive systems [6] respiratory systems [6] skeletal systems [6] tissue systems [6] torso [6] urinary systems [5] reproduction and development [4] populations and vivisystems [5] biological evolution [5] ecology [6] ecological conservation [6] environmental pollution [5] genetics and heredity [4] zoology [5] invertebrates [6] aquatic sponges [6] arthropods [7] arachnids [8] mites and ticks [8] scorpions [8] spiders [7] crustaceans [7] insects [6] coral and sea anemones [6] jellyfish [6] mollusks [7] clams, oysters, and
steroids
[4]
[4]
[4] [4]
[7] blood sugar hormones [7] corticosteroids [7] estrogens and progestins [7] gonadotropins [7] pituitary hormones [7] thyroid hormones [6] lipids and fatty acids [6] nucleic acids [6] sugars and carbohydrates [6] toxins [6] vitamins [5] cell reproduction [5] cell structure and function [5] molecular genetics botany [5] algae [5] fungi [5] plant diseases [5] plant kingdom [6] ferns [6] flowering plants [7] cacti [7] grasses [6] mosses [6] trees and shrubs [7] conifers [7] deciduous trees [7] palm trees [5] plant physiology [6] plant development [6] plant parts lower life forms [5] bacteria [5] viruses paleontology [5] dinosaurs physiology [5] anatomy [6] cardiovascular systems [6] digestive systems [6] extremities and appendages [6] glandular systems [6] head and neck [7] ear anatomy [7] eye anatomy [7] mouth and teeth
I-4 Oracle Text Reference
mussels [7] octopi and squids [7] snails and slugs [6] starfish and sea urchins [6] worms [5] vertebrates [6] amphibians [6] birds [7] birds of prey [8] owls [7] game birds [7] hummingbirds [7] jays, crows, and magpies [7] parrots and parakeets [7] penguins [7] pigeons and doves [7] warblers and sparrows
Branch 1: science and technology
[7] water birds [8] ducks, geese, and swans [8] gulls and terns [8] pelicans [7] woodpeckers [7] wrens [6] fish [7] boneless fish [8] rays and skates [8] sharks [7] bony fish [8] deep sea fish [8] eels [8] tropical fish [7] jawless fish [6] mammals [7] anteaters and sloths [8] aardvarks [7] carnivores [8] canines [8] felines [7] chiropterans [7] elephants [7] hoofed mammals [8] cattle [8] goats [8] horses [8] pigs [8] sheep [7] hyraxes [7] marine mammals [8] seals and walruses [9] manatees [8] whales and porpoises [7] marsupials [7] monotremes [7] primates [8] lemurs [7] rabbits [7] rodents [6] reptiles [7] crocodilians [7] lizards [7] snakes [7] turtles [3] biotechnology [4] antibody technology [5] immunoassays
[4] biometrics [5] voice recognition technology [4] genetic engineering [4] pharmaceutical industry [5] anesthetics [6] general anesthetics [6] local anesthetics [5] antagonists and antidotes [5] antibiotics, antimicrobials, and antiparasitics [6] anthelmintics [6] antibacterials [7] antimalarials [7] antituberculars and antileprotics [6] antifungals [6] antivirals [6] local anti-infectives [5] antigout agents [5] autonomic nervous system drugs [6] neuromuscular blockers [6] skeletal muscle relaxants [5] blood drugs [5] cardiovascular drugs [6] antihypertensives [5] central nervous system drugs [6] analgesics and antipyretics [6] antianxiety agents [6] antidepressants [6] antipsychotics [6] narcotic and opioid analgesics [6] nonsteroidal anti-inflammatory drugs [6] sedative-hypnotics [5] chemotherapeutics, antineoplastic agents [5] dermatomucosal agents [6] topical corticosteroids [5] digestive system drugs [6] antacids, adsorbents, and antiflatulents [6] antidiarrheals [6] antiemetics [6] antiulcer agents [6] digestants [6] laxatives [5] eye, ear, nose, and throat drugs [6] nasal agents [6] ophthalmics
vasoconstrictors [6] otics, ear care drugs [5] fluid and electrolyte balance drugs [6] diuretics [5] hormonal agents [5] immune system drugs [6] antitoxins and antivenins [6] biological response modifiers [6] immune serums [6] immunosuppressants [6] vaccines and toxoids [5] oxytocics [5] respiratory drugs [6] antihistamines [6] bronchodilators [6] expectorants and antitussives [5] spasmolytics [5] topical agents [3] health and medicine [4] healthcare industry [5] healthcare providers and practices [5] medical disciplines and specialties [6] cardiology [6] dentistry [6] dermatology [6] geriatrics [6] neurology [6] obstetrics and gynecology [6] oncology [6] ophthalmology [6] pediatrics [5] medical equipment [6] artificial limbs and organs [6] dressings and supports [5] medical equipment manufacturers [5] medical facilities [4] medical problems [5] blood disorders [5] cancers and tumors [6] carcinogens [5] cardiovascular disorders [5] developmental disorders [5] environment-related afflictions [5] gastrointestinal disorders [5] genetic and hereditary disorders [5] infectious diseases
I-6 Oracle Text Reference
diseases [5] injuries [5] medical disabilities [5] neurological disorders [5] respiratory disorders [5] skin conditions [4] nutrition [4] practice of medicine [5] alternative medicine [5] medical diagnosis [6] medical imaging [5] medical personnel [5] medical procedures [6] physical therapy [6] surgical procedures [7] cosmetic surgery [4] veterinary medicine [2] machinery [3] machine components [2] mathematics [3] algebra [4] linear algebra [4] modern algebra [3] arithmetic [4] elementary algebra [3] calculus [3] geometry [4] mathematical topology [4] plane geometry [4] trigonometry [3] math tools [3] mathematical analysis [3] mathematical foundations [4] number theory [4] set theory [4] symbolic logic [3] statistics [2] mechanical engineering [2] physics [3] acoustics [3] cosmology [4] astronomy [5] celestial bodies [6] celestial stars [6] comets [6] constellations [6] galaxies
Branch 1: science and technology
[6] moons [6] nebulae [6] planets [5] celestial phenomena [3] electricity and magnetism [3] motion physics [3] nuclear physics [4] subatomic particles [3] optical technology [4] holography [4] laser technology [5] high-energy lasers [5] low-energy lasers [3] thermodynamics [2] robotics [2] textiles [2] tools and hardware [3] cements and glues [3] hand and power tools [4] chisels [4] drills and bits [4] gauges and calipers [4] hammers [4] machine tools [4] planes and sanders [4] pliers and clamps [4] screwdrivers [4] shovels [4] trowels [4] wrenches [3] knots
[1] social sciences [2] anthropology [3] cultural identities [4] Native Americans [3] cultural studies [4] ancient cultures [3] customs and practices [2] archeology [3] ages and periods [3] prehistoric humanoids [2] history [3] U.S. history [4] slavery in the U.S. [3] ancient Rome [4] Roman emperors [3] ancient history
[3] biographies [3] historical eras [2] human sexuality [3] homosexuality [3] pornography [3] prostitution [3] sexual issues [2] linguistics [3] descriptive linguistics [4] grammar [5] parts of speech [4] phonetics and phonology [3] historical linguistics [3] languages [3] linguistic theories [3] rhetoric and figures of speech [3] sociolinguistics [4] dialects and accents [3] writing and mechanics [4] punctuation and diacritics [4] writing systems [2] psychology [3] abnormal psychology [4] anxiety disorders [4] childhood onset disorders [4] cognitive disorders [4] dissociative disorders [4] eating disorders [4] impulse control disorders [4] mood disorders [4] personality disorders [4] phobias [4] psychosomatic disorders [4] psychotic disorders [4] somatoform disorders [4] substance related disorders [3] behaviorist psychology [3] cognitive psychology [3] developmental psychology [3] experimental psychology [3] humanistic psychology [3] neuropsychology [3] perceptual psychology [3] psychiatry [3] psychoanalytic psychology [3] psychological states and behaviors [3] psychological therapy [3] psychological tools and techniques [3] sleep psychology
English Knowledge Base Category Hierarchy I-7
Branch 2: business and economics
[4] sleep disorders [2] sociology [3] demographics [3] social identities [4] gender studies [4] senior citizens [3] social movements and institutions [3] social structures
Branch 2: business and economics [1] business services industry [1] commerce and trade
[1] transportation [2] aviation [3] aircraft [3] airlines [3] airports [3] avionics [2] freight and shipping [3] package delivery industry [3] trucking industry [2] ground transportation [3] animal powered transportation [3] automotive industry [4] automobiles [4] automotive engineering [5] automotive parts [5] internal combustion engines [4] automotive sales [4] automotive service and repair [4] car rentals [4] motorcycles [4] trucks and buses [3] human powered vehicles [3] rail transportation [4] subways [4] trains [3] roadways and driving [2] marine transportation [3] boats and ships [3] seamanship [3] waterways [2] travel industry [3] hotels and lodging [3] tourism [4] cruise lines [4] places of interest [4] resorts and spas
[2] [2] [2] [2] [2]
electronic commerce general commerce international trade and finance mail-order industry retail trade industry [3] convenience stores [3] department stores [3] discount stores [3] supermarkets [2] wholesale trade industry
[1] corporate business [2] business enterprise [3] entrepreneurship [2] business fundamentals [2] consulting industry [2] corporate finance [3] accountancy [2] corporate management [2] corporate practices [2] diversified companies [2] human resources [3] employment agencies [2] office products [2] quality control [3] customer support [2] research and development [2] sales and marketing [3] advertising industry
[1] economics [1] financial institutions [2] banking industry [2] insurance industry [2] real-estate industry
[1] industrial business [2] industrial engineering [3] production methods [2] industrialists and financiers [2] manufacturing [3] industrial goods manufacturing
[1] public sector industry [1] taxes and tariffs [1] work force [2] organized labor
Branch 3: government and military [1] government [2] [2] [2] [2]
county government forms and philosophies of government government actions government bodies and institutions [3] executive branch [4] U.S. presidents [4] executive cabinet [3] judiciary branch [4] Supreme Court [5] chief justices [3] legislative branch [4] house of representatives [4] senate [2] government officials [3] royalty and aristocracy [3] statesmanship [2] government programs [3] social programs [4] welfare [2] international relations [3] Cold War [3] diplomacy [3] immigration [2] law [3] business law [3] courts [3] crimes and offenses [4] controlled substances [5] substance abuse [4] criminals [4] organized crime [3] law enforcement [3] law firms [3] law systems [4] constitutional law [3] legal bodies [3] legal customs and formalities [3] legal judgments [3] legal proceedings [3] prisons and punishments [2] municipal government [3] municipal infrastructure [3] urban areas
English Knowledge Base Category Hierarchy I-9
Branch 4: social environment
[4] urban phenomena [4] urban structures [2] politics [3] civil rights [3] elections and campaigns [3] political activities [3] political advocacy [4] animal rights [4] consumer advocacy [3] political parties [3] political principles and philosophies [4] utopias [3] political scandals [3] revolution and subversion [4] terrorism [2] postal communications [2] public facilities [2] state government
[1] military [2] [2] [2] [2] [2] [2] [2] [2]
air force armored clothing army cryptography military honors military intelligence military leaders military ranks [3] army, air force, and marine ranks [3] navy and coast guard ranks [2] military wars [3] American Civil War [3] American Revolution [3] World War I [3] World War II [3] warfare [2] military weaponry [3] bombs and mines [3] chemical and biological warfare [3] military aircraft [3] missiles, rockets, and torpedoes [3] nuclear weaponry [3] space-based weapons [2] navy [3] warships [2] service academies
I-10
Oracle Text Reference
Branch 4: social environment [1] belief systems [2] folklore [2] mythology [3] Celtic mythology [3] Egyptian mythology [3] Greek mythology [3] Japanese mythology [3] Mesopotamian and Sumerian mythology [3] Norse and Germanic mythology [3] Roman mythology [3] South and Central American mythology [3] mythological beings [3] myths and legends [2] paranormal phenomena [3] astrology [3] occult [3] superstitions [2] philosophy [3] epistemology [3] ethics and aesthetics [3] metaphysics [3] philosophical logic [3] schools of philosophy [2] religion [3] God and divinity [3] doctrines and practices [3] history of religion [3] religious institutions and structures [3] sacred texts and objects [4] Bible [4] liturgical garments [3] world religions [4] Christianity [5] Christian denominations [5] Christian heresies [5] Christian theology [5] Mormonism [5] Roman Catholicism [6] popes [6] religious orders [5] evangelism [5] protestant reformation [4] Islam [4] Judaism [4] eastern religions [5] Buddhism
emergency medical services fire prevention and suppression hazardous material control heavy rescue
[1] family [2] death and burial [3] funeral industry [2] divorce [2] infancy [2] kinship and ancestry [2] marriage [2] pregnancy [3] contraception [2] upbringing
beds candles carpets and rugs cases, cabinets, and chests chairs and sofas curtains, drapes, and screens functional wares [3] cleaning supplies [2] home appliances [2] kitchenware [3] cookers [3] fine china [3] glassware
[1] leisure and recreation [2] arts and entertainment [3] broadcast media [4] radio [5] amateur radio [4] television [3] cartoons, comic books, and superheroes [3] cinema [4] movie stars [4] movie tools and techniques [4] movies [3] entertainments and spectacles [4] entertainers [3] humor and satire [3] literature [4] children's literature [4] literary criticism [4] literary devices and techniques [4] poetry [5] classical poetry [4] prose [5] fiction [6] horror fiction [6] mystery fiction [4] styles and schools of literature [3] performing arts [4] dance [5] ballet [5] choreography [5] folk dances [5] modern dance [4] drama [5] dramatic structure
Branch 4: social environment
[5] stagecraft [4] music [5] blues music [5] classical music [5] composition types [5] folk music [5] jazz music [5] music industry [5] musical instruments [6] keyboard instruments [6] percussion instruments [6] string instruments [6] wind instruments [7] brass instruments [7] woodwinds [5] opera and vocal [5] popular music and dance [5] world music [3] science fiction [3] visual arts [4] art galleries and museums [4] artistic painting [5] painting tools and techniques [5] styles and schools of art [4] graphic arts [4] photography [5] cameras [5] photographic lenses [5] photographic processes [5] photographic techniques [5] photographic tools [4] sculpture [5] sculpture tools and techniques [2] crafts [2] games [3] indoor games [4] board games [4] card games [4] video games [3] outdoor games [2] gaming industry [3] gambling [2] gardening [2] hobbies [3] coin collecting [3] stamp collecting [2] outdoor recreation [3] hunting and fishing [2] pets
[2] restaurant industry [2] sports [3] Olympics [3] aquatic sports [4] canoeing, kayaking, and rafting [4] swimming and diving [4] yachting [3] baseball [3] basketball [3] bicycling [3] bowling [3] boxing [3] equestrian events [4] horse racing [4] rodeo [3] fantasy sports [3] fitness and health [4] fitness equipment [3] football [3] golf [3] gymnastics [3] martial arts [3] motor sports [4] Formula I racing [4] Indy car racing [4] NASCAR racing [4] drag racing [4] motorcycle racing [4] off-road racing [3] soccer [3] sports equipment [3] tennis [3] track and field [3] winter sports [4] hockey [4] ice skating [4] skiing [2] tobacco industry [2] toys
English Knowledge Base Category Hierarchy
I-13
Branch 5: geography
Branch 5: geography [1] cartography [2] explorers
[1] physical geography [2] bodies of water [3] lakes [3] oceans [3] rivers [2] land forms [3] coastlands [3] continents [3] deserts [3] highlands [3] islands [3] lowlands [3] mountains [3] wetlands
[1] political geography [2] Africa [3] Central Africa [4] Angola [4] Burundi [4] Central African Republic [4] Congo [4] Gabon [4] Kenya [4] Malawi [4] Rwanda [4] Tanzania [4] Uganda [4] Zaire [4] Zambia [3] North Africa [4] Algeria [4] Chad [4] Djibouti [4] Egypt [4] Ethiopia [4] Libya [4] Morocco [4] Somalia [4] Sudan
I-14
Oracle Text Reference
[4] Tunisia [3] Southern Africa [4] Botswana [4] Lesotho [4] Mozambique [4] Namibia [4] South Africa [4] Swaziland [4] Zimbabwe [3] West Africa [4] Benin [4] Burkina Faso [4] Cameroon [4] Equatorial Guinea [4] Gambia [4] Ghana [4] Guinea [4] Guinea-Bissau [4] Ivory Coast [4] Liberia [4] Mali [4] Mauritania [4] Niger [4] Nigeria [4] Sao Tome and Principe [4] Senegal [4] Sierra Leone [4] Togo [2] Antarctica [2] Arctic [3] Greenland [3] Iceland [2] Asia [3] Central Asia [4] Afghanistan [4] Bangladesh [4] Bhutan [4] India [4] Kazakhstan [4] Kyrgyzstan [4] Nepal [4] Pakistan [4] Tajikstan [4] Turkmenistan [4] Uzbekistan [3] East Asia [4] China [4] Hong Kong [4] Japan
Branch 5: geography
[4] Macao [4] Mongolia [4] North Korea [4] South Korea [4] Taiwan [3] Southeast Asia [4] Brunei [4] Cambodia [4] Indonesia [4] Laos [4] Malaysia [4] Myanmar [4] Papua New Guinea [4] Philippines [4] Singapore [4] Thailand [4] Vietnam [2] Atlantic area [3] Azores [3] Bermuda [3] Canary Islands [3] Cape Verde [3] Falkland Islands [2] Caribbean [3] Antigua and Barbuda [3] Bahamas [3] Barbados [3] Cuba [3] Dominica [3] Dominican Republic [3] Grenada [3] Haiti [3] Jamaica [3] Netherlands Antilles [3] Puerto Rico [3] Trinidad and Tobago [2] Central America [3] Belize [3] Costa Rica [3] El Salvador [3] Guatemala [3] Honduras [3] Nicaragua [3] Panama [2] Europe [3] Eastern Europe [4] Albania [4] Armenia [4] Azerbaijan
Belarus Bulgaria Czech Republic Czechoslovakia Estonia Greece Hungary Latvia Lithuania Moldava Poland Republic of Georgia Romania Russia [5] Siberia [4] Slovakia [4] Soviet Union [4] Ukraine [4] Yugoslavia [5] Bosnia and Herzegovina [5] Croatia [5] Macedonia [5] Montenegro [5] Serbia [5] Slovenia [3] Western Europe [4] Austria [4] Belgium [4] Denmark [4] Faeroe Island [4] Finland [4] France [4] Germany [4] Iberia [5] Andorra [5] Portugal [5] Spain [4] Ireland [4] Italy [4] Liechtenstein [4] Luxembourg [4] Monaco [4] Netherlands [4] Norway [4] San Marino [4] Sweden [4] Switzerland [4] United Kingdom [5] England
English Knowledge Base Category Hierarchy
I-15
Branch 5: geography
[5] Northern Ireland [5] Scotland [5] Wales [2] Indian Ocean area [3] Comoros [3] Madagascar [3] Maldives [3] Mauritius [3] Seychelles [3] Sri Lanka [2] Mediterranean [3] Corsica [3] Cyprus [3] Malta [3] Sardinia [2] Middle East [3] Bahrain [3] Iran [3] Iraq [3] Israel [3] Jordan [3] Kuwait [3] Lebanon [3] Oman [3] Palestine [3] Qatar [3] Saudi Arabia [3] Socotra [3] Syria [3] Turkey [3] United Arab Emirates [3] Yemen [2] North America [3] Canada [3] Mexico [3] United States [4] Alabama [4] Alaska [4] Arizona [4] Arkansas [4] California [4] Colorado [4] Delaware [4] Florida [4] Georgia [4] Hawaii [4] Idaho [4] Illinois [4] Indiana
Iowa Kansas Kentucky Louisiana Maryland Michigan Minnesota Mississippi Missouri Montana Nebraska Nevada New England [5] Connecticut [5] Maine [5] Massachusetts [5] New Hampshire [5] Rhode Island [5] Vermont [4] New Jersey [4] New Mexico [4] New York [4] North Carolina [4] North Dakota [4] Ohio [4] Oklahoma [4] Oregon [4] Pennsylvania [4] South Carolina [4] South Dakota [4] Tennessee [4] Texas [4] Utah [4] Virginia [4] Washington [4] Washington D.C. [4] West Virginia [4] Wisconsin [4] Wyoming [2] Pacific area [3] American Samoa [3] Australia [4] Tasmania [3] Cook Islands [3] Fiji [3] French Polynesia [3] Guam [3] Kiribati [3] Mariana Islands
Branch 6: abstract ideas and concepts
[3] Marshall Islands [3] Micronesia [3] Nauru [3] New Caledonia [3] New Zealand [3] Palau [3] Solomon Islands [3] Tonga [3] Tuvalu [3] Vanuatu [3] Western Samoa [2] South America [3] Argentina [3] Bolivia [3] Brazil [3] Chile [3] Colombia [3] Ecuador [3] French Guiana [3] Guyana [3] Paraguay [3] Peru [3] Suriname [3] Uruguay [3] Venezuela
[3] past [3] regularity of time [3] relative age [4] stages of development [3] simultaneity [3] time measurement [4] instants [3] timeliness [4] earliness [4] lateness [3] transience
composite word indexing, fuzzy matching, 2-72 index defaults, 2-89 stemming, 2-71 supplied stoplist, D-6
2-43
E empty indexes creating, 1-37, 1-46 EMPTY_STOPLIST system-defined preference, 2-91 END_LOG procedure, 9-3 endjoins attribute, 2-41 English fuzzy matching, 2-72 index defaults, 2-89 supplied stoplist, D-2 english attribute (Korean lexer), 2-52 environment variables setting for Inso filter, B-3 equivalence operator, 3-16 stopword transformations, H-6 with NEAR, 3-33 errors indexing, 1-40 escaping special characters, 4-3 example, 1-39 EXP_TAB table type, A-12 expansion operator soundex, 3-40 stem, 3-41 viewing, 10-6 EXPLAIN procedure, 10-6 example, 10-8 result table, A-2 explain table creating, 10-7 retrieving data example, 10-8 structure, A-2 extending knowledge base, 12-6 external filters specifying, 2-31
F failed index operation resuming, 1-6 features new, xxv field section defining, 7-5 limitations, 7-7 querying, 3-56 field sections adding dynamically, 1-8 repeated, 3-59 WITHIN example, 3-58 file data storage example, 7-30 FILE_DATASTORE object, 2-11 example, 2-11 FILE_DATASTORE system-defined preference, 2-88 filter formats supported, B-5 FILTER procedure, 8-2 example, 8-4 in-memory example, 8-3 result table, A-8 filter table structure, A-8 filter types, 2-23 filtering to plain text, 8-13 to plain text and HTML, 8-2 filters character-set, 2-24 Inso, 2-26, B-2 user, 2-31 Finnish index defaults, 2-89 supplied stoplist, D-7 format column, 1-35 formatted documents filtering, 2-26 fragmentation of index, 1-37 French fuzzy matching, 2-72 supplied stoplist, D-8
Index-7
German alternate spelling attribute, 2-45 alternate spelling conventions, E-3 composite word indexing, 2-43 fuzzy matching, 2-72 index defaults, 2-89 stemming, 2-71 supplied stoplist, D-9 gist generating, 8-5 GIST procedure example, 8-8 result table, A-8 updated syntax, 8-5 Gist table structure, A-8
structure, A-10 highlighting generating markup, 8-14 generating offsets, 8-10 with NEAR operator, 3-34 hit counting, 10-5 HOME environment variable setting for INSO, B-3 homographs in broader term queries, 3-14 in queries, 3-13 in thesaurus import file, C-9 HTML bypassing filtering, 2-27 filtering to, 8-2 generating highlight offsets for, 8-10 highlight markup, 8-14 highlighting example, 8-18 indexing, 1-44, 2-30, 2-81, 7-33 zone section example, 7-23 HTML_SECTION_GROUP example, 2-82 HTML_SECTION_GROUP object, 1-44, 2-81, 7-23, 7-33 with NULL_FILTER, 2-30 HTML_SECTION_GROUP system-defined preference, 2-90 http_proxy attribute, 2-14
H
I
French stemming, 2-71 ftp_proxy attribute, 2-14 fuzzy matching automatic language detection, 2-72 example for enabling, 2-75 specifying a language, 2-73 fuzzy operator, 3-17 fuzzy_match attribute, 2-73 fuzzy_numresults attribute, 2-73 fuzzy_score attribute, 2-73
G
hanja attribute, 2-52 HASPATH operator, 3-19 HFEEDBACK procedure, 10-9 example, 10-10 result table, A-5 hierarchical query feedback information generating, 10-9 hierarchical relationships in thesaurus import file, C-8 HIGHLIGHT procedure, 8-10 example, 8-12 result table, A-10 highlight table example, 8-12
Index-8
i_index_clause attribute, 2-79 i_table_clause attribute, 2-78 IFILTER procedure, 8-13 IGNORE format column value, 1-35 import file examples of, C-11 structure, C-6 index creating, 1-29 renaming, 1-3 viewing registered, G-4 index creation custom preference example, default example, 1-38
1-38
index creation parameters example, 2-79 index errors deleting, 1-40 viewing, 1-40 index fragmentation, 1-37 index maintenance, 1-2 index objects, 2-1 viewing, G-5, G-9 index optimization, 1-6 index preference about, 2-2 creating, 2-2, 7-30 index requests logging, 9-6 index tablespace parameters specifying, 2-78 index tokens generating for a document, 8-26 INDEX_PROCEDURE user_lexer attribute, 2-56 index_stems attribute, 2-44 index_text attribute, 2-44 index_themes attribute, 2-43 indexing master/detail example, 2-10 parallel, 1-4, 1-32 themes, 2-43 indextype context, 1-29 inflectional stemming enabling, 2-72 INPATH operator, 3-21 INPUT_TYPE user_lexer attribute, 2-56 INSERT statement loading example, C-2 Inso filter index preference object, 2-26 setting up, B-2 supported formats, B-5 supported platforms, B-2 unsupported formats, B-14 INSO_FILTER object, 2-26 character-set conversion, 2-28 INSO_FILTER system-defined preference, 2-89 inverse frequency scoring, F-2 Italian
J JA16EUC character set, 2-48, 2-49 JA16SJIS character set, 2-48, 2-49 Japanese fuzzy matching, 2-72 index defaults, 2-90 indexing, 2-48 japanese attribute (Korean lexer), 2-52 Japanese character sets supported, 2-48 Japanese EUC character set, 2-49 JAPANESE_LEXER, 2-49 JAPANESE_VGRAM_LEXER object, 2-48 JOB_QUEUE_PROCESSES initialization parameter, 1-4, 1-32
K k_table_clause attribute, 2-78 knowledge base supported character set, 12-6 user-defined, 12-10 knowledge base extension compiler, 12-6 knowledge catalog category hierarchy, I-1 KO16KSC5601 character set, 2-50, 2-51 Korean fuzzy matching, 2-72 index defaults, 2-90 korean character sets supported, 2-50, 2-51 Korean text indexing, 2-51 KOREAN_LEXER object, 2-50 KOREAN_MORP_LEXER, 2-51 composite example, 2-53 supplied dictionaries, 2-51
L language setting, 2-37 language column, 1-36 left-truncated searching
Index-9
improving performance, 2-73 lexer types, 2-37 loading text SQL INSERT example, C-2 SQL*Loader example, C-3 loading thesaurus, 12-2 LOB columns loading, C-3 LOG_DIRECTORY system parameter, LOGFILENAME procedure, 9-4 logging index requests, 9-6 logical operators with NEAR, 3-33 LONG columns indexing, 1-31 long_word attribute, 2-52
2-92, 9-4
M maintaining index, 1-2 MARKUP procedure, 8-14 example, 8-18 HTML highlight example, 8-18 result table, A-10 markup table example, 8-18 structure, A-10 master/detail data storage, 2-8 example, 2-8, 7-31 master/detail tables indexing example, 2-10 MAX_INDEX_MEMORY system parameter, 2-92 max_span parameter in near operator, 3-32 maxdocsize attribute, 2-14 maxthreads attribute, 2-13 maxurls attribute, 2-14 memory for index synchronize, 1-7 for indexing, 1-7, 1-37, 1-46, 7-53 META tag creating field sections for, 7-7 creating zone section for, 7-23 MINUS operator, 3-28 stopword transformations, H-5 mixed character-set columns indexing, 2-24
R r_table_clause attribute, 2-78 rebuilding index example, 1-10 syntax, 1-4 RECOVER procedure, 5-2 related term operator, 3-39
2-60
related term query feedback, 10-9 relevance ranking word queries, F-2 REMOVE_EVENT procedure, 9-5 REMOVE_SECTION procedure, 7-47 REMOVE_SQE procedure, 10-14 REMOVE_STOPCLASS procedure, 7-49 REMOVE_STOPTHEME procedure, 7-50 REMOVE_STOPWORD procedure, 7-51 renaming index, 1-3 repeated field sections querying, 3-59 replacing preferences, 1-4 reserved words and characters, 4-4 escaping, 4-3 result table TOKENS, A-11 result tables, A-1 CTX_DOC, A-8 CTX_QUERY, A-2 CTX_THES, A-12 resuming failed index, 1-6 example, 1-10 RFC 1738 URL specification, 2-12 RT function, 12-42 RT operator, 3-39 RULE_CLASSIFIER type, 2-84 rules generating, 6-2
S Salton’s formula for scoring, F-2 scope notes finding, 12-44 SCORE operator, 1-51 scoring accumulate, 3-10 effect of DML, F-3 for NEAR operator, 3-33 scoring algorithm word queries, F-2 section group creating, 7-33 viewing information about, G-15 section group example, 2-82
X XML documents attribute sections, 7-3 doctype sensitive sections, 7-24 indexing, 1-45, 2-82, 7-34 querying, 3-59 XML sectioning, 2-83 XML_SECTION_GROUP example, 2-83 XML_SECTION_GROUP object, 1-45, 2-81, 7-33 XMLType column indexing, 1-47
Z
V VARCHAR2 column indexing, 1-31 verb_adjective attribute, version numbers viewing, G-26
viewing operator expansion, 10-6 operator precedence, 10-6 views, G-1 visible flag for field sections, 7-6 visible flag in field sections, 3-58
2-52
ZHS16CGB231280 character set, 2-47 ZHS16GBK character set, 2-48 ZHT16BIG5 character set, 2-48 ZHT32EUC character set, 2-48 ZHT32TRIS character set, 2-48 zone section