This document was uploaded by user and they confirmed that they have the permission to share
it. If you are author or own the copyright of this book, please report to us by using this DMCA
report form. Report DMCA
Overview
Download & View E-business Globalization Guide as PDF for free.
e-business Globalization Solution Design Guide Getting Started Easily comprehend state-of-the-art globalization technologies See how best practice design guidelines can work for you Learn ways to achieve cost-effective globalization
Xiao Hui Zhu Ming Zhu Cui Bei Shu Yi Zhen Xu Xia Li Ming Li Fei Qu
ibm.com/redbooks
International Technical Support Organization e-business Globalization Solution Design Guide: Getting Started December 2002
SG24-6851-00
Note: Before using this information and the product it supports, read the information in “Notices” on page vii.
Notices This information was developed for products and services offered in the U.S.A. IBM may not offer the products, services, or features discussed in this document in other countries. Consult your local IBM representative for information on the products and services currently available in your area. Any reference to an IBM product, program, or service is not intended to state or imply that only that IBM product, program, or service may be used. Any functionally equivalent product, program, or service that does not infringe any IBM intellectual property right may be used instead. However, it is the user's responsibility to evaluate and verify the operation of any non-IBM product, program, or service. IBM may have patents or pending patent applications covering subject matter described in this document. The furnishing of this document does not give you any license to these patents. You can send license inquiries, in writing, to: IBM Director of Licensing, IBM Corporation, North Castle Drive Armonk, NY 10504-1785 U.S.A. The following paragraph does not apply to the United Kingdom or any other country where such provisions are inconsistent with local law: INTERNATIONAL BUSINESS MACHINES CORPORATION PROVIDES THIS PUBLICATION “AS IS” WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF NON-INFRINGEMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Some states do not allow disclaimer of express or implied warranties in certain transactions, therefore, this statement may not apply to you. This information could include technical inaccuracies or typographical errors. Changes are periodically made to the information herein; these changes will be incorporated in new editions of the publication. IBM may make improvements and/or changes in the product(s) and/or the program(s) described in this publication at any time without notice. Any references in this information to non-IBM Web sites are provided for convenience only and do not in any manner serve as an endorsement of those Web sites. The materials at those Web sites are not part of the materials for this IBM product and use of those Web sites is at your own risk. IBM may use or distribute any of the information you supply in any way it believes appropriate without incurring any obligation to you. Information concerning non-IBM products was obtained from the suppliers of those products, their published announcements or other publicly available sources. IBM has not tested those products and cannot confirm the accuracy of performance, compatibility or any other claims related to non-IBM products. Questions on the capabilities of non-IBM products should be addressed to the suppliers of those products. This information contains examples of data and reports used in daily business operations. To illustrate them as completely as possible, the examples include the names of individuals, companies, brands, and products. All of these names are fictitious and any similarity to the names and addresses used by an actual business enterprise is entirely coincidental. COPYRIGHT LICENSE: This information contains sample application programs in source language, which illustrates programming techniques on various operating platforms. You may copy, modify, and distribute these sample programs in any form without payment to IBM, for the purposes of developing, using, marketing or distributing application programs conforming to the application programming interface for the operating platform for which the sample programs are written. These examples have not been thoroughly tested under all conditions. IBM, therefore, cannot guarantee or imply reliability, serviceability, or function of these programs. You may copy, modify, and distribute these sample programs in any form without payment to IBM for the purposes of developing, using, marketing, or distributing application programs conforming to IBM's application programming interfaces.
Trademarks The following terms are trademarks of the International Business Machines Corporation in the United States, other countries, or both: AFP™ AIX/L™ AIX® DB2 Universal Database™ DB2®
IBM® Netfinity® Redbooks(logo)™ S/390® SP™
Tivoli® ViaVoice® VisualAge® VTAM® WebSphere®
The following terms are trademarks of International Business Machines Corporation and Lotus Development Corporation in the United States, other countries, or both: Lotus Notes®
Lotus®
Notes®
The following terms are trademarks of other companies: ActionMedia, LANDesk, MMX, Pentium and ProShare are trademarks of Intel Corporation in the United States, other countries, or both. Microsoft, Windows, Windows NT, and the Windows logo are trademarks of Microsoft Corporation in the United States, other countries, or both. Java and all Java-based trademarks and logos are trademarks or registered trademarks of Sun Microsystems, Inc. in the United States, other countries, or both. C-bus is a trademark of Corollary, Inc. in the United States, other countries, or both. UNIX is a registered trademark of The Open Group in the United States and other countries. SET, SET Secure Electronic Transaction, and the SET Logo are trademarks owned by SET Secure Electronic Transaction LLC.
viii
e-Business Globalization Solution Design Guide
Preface Globalization is not a feature—it is an architecture.1 Globalization is the proper design and execution of systems, software, services, and procedures so that one instance of software, executing on a single server or end-user machine, can process multilingual data and present culturally correct information (for example, collation, date, and number formats). As the Internet increasingly drives the economy, today's market is quickly becoming more and more geared toward multinational participation and international transactions. The challenge for those companies that intend to thrive in this environment is that you cannot just drop globalization on top of your existing applications. Globalization permeates so many areas that it must be taken into consideration from the very beginning of the development cycle. This redbook presents an architecture, a working example, and an accompanying set of methodologies. The sample solution is built on WebSphere Application Server and the DB2 Universal Database, together with Web Services technologies incorporating dynamic e-business concepts. We will introduce IBM's recommended globalization architecture and how it works throughout the application development cycle, and will also explain from the customer's point of view how to plan and design a multilingual solution, with our working example validating the soundness of this architecture. Our target audience includes design architects who are new to or at the entry level of e-business globalization. Software developers can also use this book as a reference when developing globalized e-business applications.
The GCL The Globalization Certification Laboratory (GCL) is an organization established by IBM Corporate Globalization. GCL provides the following services to IBM internal and external customers: Globalization Comprehensive Interoperability Test Services—Tests products and solutions to verify from a globalization perspective whether they behave properly in various e-business scenarios. Globalization Enablement Services—Enables customers' products and solutions with globalization features (multilingual capabilities, proper data format, etc.). Globalization Consultation Services—Provides consultation services in the architectural design of customers' products and solutions in order to minimize their expenditures for globalization application development. Other globalization-specific Test Services—Provides testing services for production platforms covering one or more specific text encodings or locales.
The team that wrote this redbook This redbook was produced by a team of specialists from the Globalization Certification Laboratory working together with the International Technical Support Organization, Raleigh Center. You can contact members of the GCL team at [email protected]. 1
Addison P. Phillips, Globalization Architect/Manager, Globalization Engineering, webMethods, Inc.
Figure 0-1 The team that wrote this redbook—Front row (LTR): CP Chang, Xia Li, Bei Shu, Fei Qu, Xiao Hui Zhu, Ming Zhu Cui, Feng Zheng. Back row (LTR): Yi Zhen Xu, Ting Yong Zhu, Buck Stearns, Ming Li, Yang Wang
Xiao Hui Zhu is an advisory software engineer at the IBM Development Lab in China. She has worked for IBM since November 1994, starting her career as a project manager in the Globalization organization and performing various roles, including tester, architect, coordinator, and consultant. Currently, she is the technical leader for the Globalization Certification Laboratory located in Shanghai. Xiao Hui Zhu wrote: Chapter 2, “Why is globalization necessary?” on page 9 Chapter 3, “How to implement globalization” on page 13 Chapter 4, “Single Executable” on page 17 Chapter 5, “Unicode support” on page 21 Chapter 1, “What is globalization?” on page 3 (with Yi Zhen Xu) Chapter 6, “Locale model” on page 23 (with Xia Li) Chapter 8, “Input and output of multilingual data” on page 37 (with Yi Zhen Xu) Chapter 9, “Linguistic services” on page 43 (with Xia Li) Chapter 14, “A development methodology for globalized applications” on page 87 (with Ming Zhu Cui) Chapter 10, “Global Business Object” on page 51 (with Xia Li and Ming Li)
Ming Zhu Cui is a software engineer in the Globalization Certification Laboratory. She received her MS in Computer Sciences at Oxford Brookes University in 2000. She joined IBM in May 2001 and has been involved in Globalization Inter-operability Testing as a developer and tutorial writer. Her main interests lie in Java programming, globalization solutions, and technical writing. Ming Zhu Cui wrote: x
Chapter 12, “Overview” on page 63 Chapter 13, “Environment” on page 77 Chapter 15, “Design and development” on page 91 Chapter 16, “Testing” on page 119 Chapter 17, “Maintenance” on page 135
e-Business Globalization Solution Design Guide
Appendix A, “Server-side installation and configuration for Our Global Travel Shanghai Demo” on page 141 Appendix B, “Client-side installation and configuration for Our Global Travel Shanghai Demo” on page 157 Chapter 14, “A development methodology for globalized applications” on page 87 (with Xiao Hui Zhu) Appendix C, “CSS and artwork globalization” on page 165 (with Fei Qu) Bei Shu is a software engineer in the Globalization Certification Laboratory. She joined IBM in April 2001 and has been involved in many globalization solution projects as tester, developer, and coordinator. She has deep interests and abundant experience in XML-related technologies and their contribution to globalization. Bei Shu wrote: Chapter 7, “Localization pack” on page 29 Chapter 11, “Localization” on page 57 Yi Zhen Xu is a software engineer in the Globalization Certification Laboratory. He joined IBM in April 2001 and participated in many globalization solution projects as tester, developer, and team leader. His area of specialty includes globalization technologies, XML-related technologies, Voice Server, and Web-based application development. Yi Zhen Xu wrote: Chapter 1, “What is globalization?” on page 3 (with Xiao Hui Zhu) Chapter 8, “Input and output of multilingual data” on page 37 (with Xiao Hui Zhu) Xia Li is a software engineer with the Globalization Certification Laboratory. She has extensive experience in globalized Web site development and globalization interoperability test projects as test coordinator. She holds a BS in English for Science and Technology. Xia Li wrote: Chapter 6, “Locale model” on page 23 (with Xiao Hui Zhu) Chapter 9, “Linguistic services” on page 43 (with Xiao Hui Zhu) Chapter 10, “Global Business Object” on page 51 (with Xiao Hui Zhu and Ming Li) Ming Li is a software engineer with the Globalization Certification Lab. He joined GCL one year ago and primarily focuses on J2EE architecture, Web Services, and EIP. He has abundant development experience in Java-based Web application, took part in developing the Translation Communication Tool (TCT), and is interested in open-source Java projects such as Tomcat and Jboss. Ming Li wrote: Chapter 10, “Global Business Object” on page 51 (with Xiao Hui Zhu and Xia Li) Fei Qu is the Artwork Designer of the Globalization Certification Laboratory located in Shanghai, P. R. China. Fei Qu contributed: All screen graphics Appendix C, “CSS and artwork globalization” on page 165 (with Ming Zhu Cui)
Editorial staff Buck Stearns was managing editor. He is a Solution Development IT Specialist for IGS Business Development at ITSO’s Raleigh Center. Prior to joining ITSO, he worked two years
Preface
xi
as a mobile employee assigned to Tivoli Services, and previously logged over 25 years in the banking and insurance industries. He has extensive experience in IT management disciplines, and holds undergraduate and graduate degrees in English from the University of North Carolina at Chapel Hill. Gail Christensen of ITSO Raleigh served as executive editor. Linda Robinson of ITSO Raleigh was our graphics design supervisor.
Contributors Thanks to the following people for their contributions to this project: Feng Zheng is the Software Engineering Manager of the Globalization Certification Laboratory located in Shanghai, P. R. China. Feng Zheng wrote the section entitled “The GCL” at the beginning of this Preface. Ting Yong Zhu is a software engineer in the IBM Research Lab in China, and has worked for IBM since August 2000. He received his BS in Mathematics and MS in Computer Science and Engineering from East China Normal University. His interests include exploring the Linux world and object-oriented technologies. He is one of the technical reviewers of this redbook. Yang Wang is a software engineer at the Globalization Certification Laboratory and has worked for IBM since November 2000. Yang Wang has joined or led many globalization solution projects with various roles, including tester, coordinator, developer, and project leader, and has become one of the key engineers in the lab. Yang Wang was our other technical reviewer. Thomas Hampp-Bahnmueller is Technical Team Lead, Text Analysis Framework, Globalization Architectural Technical Team (GATT), Germany, and D.J. McCloskey is Principal, Software Development, Dublin, Ireland. They provided much of the information used in Chapter 9, “Linguistic services” on page 43. And thanks to the following people working in the Globalization Center of Competency (GCoC) and Globalization Architecture and Technology Team (GATT) for their contributions to this book: Ahmed Talaat, Globalization Development Manager—Bidirectional Scripts, Cairo, Eqypt Akio Kido, Linux Globalization, Yamato, Japan Akira K Oda, Manager, Globalization Center of Competency, Yamato, Japan Alexis Cheng, DB2 UDB Globalization, Markham, Ontario, Canada Art Day, S/390 Software Design, Poughkeepsie, New York, USA Charles Pau, Director Globalization Architecture and Technology, Cambridge, Massachusetts, USA CP Chang, Manager of Globalization, CDL, Shanghai, PRC Debasish Banerjee, WebSphere Internationalization Architect, Rochester, Minnesota, USA Dennis Hebert, WebSphere Application Server Development, Research Triangle Park, North Carolina, USA Elizabeth Cuan, AIX National Language Support Development, Austin, Texas, USA Israel Ervin Gidali, Globalization Manager, GCoC—Complex Text Languages, Petah Tikva, Israel Joe Ross,Tivoli Internationalization, Austin, Texas, USA xii
e-Business Globalization Solution Design Guide
Julius Griffith, Globalization Support, User Technology Solutions Team, San José, California, USA Katsushi Takeuchi, Senior Product Development Manager, Lotus Development, Westford, Massachusetts, USA Kentaroh Noji, Globalization Architecture, Globalization Center of Competency, Yamato, Japan Mark Davis, Chief Globalization Architect, Globalization Center of Competency, San José, California, USA Markus Scherer, GCoC San Jose/Unicode/ICU International Components for Unicode for C/C++ Project Leader, Globalization Center of Competency, San José, California, USA Matitiahu Allouche, Bidi Architect, GCoC—Bidirectional Scripts, Israel Mike Moriarty, Corporate Globalization Strategy and Architecture, National Language Support and Information Development, Rochester, Minnesota, USA Ranat Thopunya, Manager, Globalization Center of Competency, Bangkok, Thailand Rasha Morgan, IT Specialist, National Language Support and Business Services, Cairo, Egypt Takaaki Shiratori, Globalization Architecture, Yamato, Japan Tetsuji Orita, DBCS SPA, Code Page Standard, Yamato, Japan Thomas McBride, Technical CEM for Globalization, Integrated File System, NetServer, e-business Management and Integration, Rochester, Minnesota, USA V.S. Umamaheswaran, IBM Standards Projects Authority for SIRS 030—Coded Character Sets, IBM Rep to UTC/CAC/JTC1/SC2, Globalization Center of Competency and Language Services, Markham, Ontario, Canada William Nettles, Distributed Strategy, Strategy/Architecture and Planning, San José, California, USA W.J. (Bill) Sullivan, Program Director for Globalization, National Language Support and Information Development, Southbury, Connecticut, USA Yukiko Kane, Technical Advisor, Globalization and Production Planning Services, Research Triangle Park, North Carolina, USA
Become a published author Join us for a two- to six-week residency program! Help write an IBM Redbook dealing with specific products or solutions, while getting hands-on experience with leading-edge technologies. You'll team with IBM technical professionals, Business Partners and/or customers. Your efforts will help increase product acceptance and customer satisfaction. As a bonus, you'll develop a network of contacts in IBM development labs, and increase your productivity and marketability. Find out more about the residency program, browse the residency index, and apply online at: ibm.com/redbooks/residencies.html
Preface
xiii
Comments welcome Your comments are important to us! We want our Redbooks to be as helpful as possible. Send us your comments about this or other Redbooks in one of the following ways: Use the online Contact us review redbook form found at: ibm.com/redbooks
Mail your comments to: IBM Corporation, International Technical Support Organization Dept. HZ8 Building 662 P.O. Box 12195 Research Triangle Park, NC 27709-2195
xiv
e-Business Globalization Solution Design Guide
Part 1
Part
1
Introduction This part includes a general introduction to the globalization area from the end user’s perspective and what technology can provide.
What is globalization? One of the key aspects of globalization is the ability to handle multiple languages. As we all know, much of the original development in the field of computers was done in English. But compared to English, most other languages involve a greater degree of computer processing. Many of these languages, such as Chinese, have a very large character set, while others, such as French, have accent marks, and still others, such as Arabic, have bi-directional (left-to-right and right-to-left) input and output.
Figure 1-1 Chinese has a very large character set
Les journalistes de la presse spécialisé mondiale ont reconnu les performances des logiciels IBM ViaVoice. Figure 1-2 French has its accent marks
Computer software was originally designed to work with English, one of the simplest languages in terms of characters. No wonder then that we face difficulties today using more complicated character sets. Although English is still the most commonly used language on the Internet, we cannot assume that it will always maintain its present level of dominance. Internet business is global. It expands the traditional transaction region to a global level. End users in this global community increasingly expect to interact with Web pages and applications in their native languages, and these cultural expectations should be satisfied by computer applications. A particular concern might be language differences, sorting systems, calendar presentation, or even the cultural suitability of icons. Therefore, a fundamental challenge in developing multilingual Web applications is to customize user interactions to each user's cultural expectations.
Thinking internationally As with any Web site, developing a multilingual Web site begins with identifying your audience. Only when you are clearly aware of the purpose of your site can you begin to design that site to meet the requirements and expectations of future customers.
Figure 1-4 Identifying the audience is the first step to designing a multilingual Web site
The more languages that a multilingual Web site provides, the more effort its development takes. Yet it is generally far less expensive to build and maintain a single multilingual Web site than to duplicate efforts with parallel sites for different languages. Even the same language can vary widely in different countries and regions. For example, people in Mainland China write in Simplified Chinese, while people in Hong Kong and Taiwan use Traditional Chinese. Simplified and Traditional Chinese differ from each other in both glyphs and wording.
Figure 1-5 Simplified and Traditional Chinese differ from each other in both glyphs and wording
Good translation should “translate” not only the language itself but also the relevant culture. Translators should translate the source English message into the target language using a tone that is right for the specific audience. In addition, local individuals, such as ordinary native readers, lawyers, and the marketing team, should be asked to review the contents. This will guarantee that the message is conveyed around the world accurately and with proper regard to legal requirements. It is also the developers' responsibility to ensure that the site is designed suitably and works well for every target audience.
4
e-Business Globalization Solution Design Guide
Figure 1-6 Multilingual sites should be designed for every target audience
Winning globally To win in global business, you must win in a global perspective. Globalization technology is becoming more and more prevalent in a variety of fields. Up-to-date Web sites now provide online translation services, some of which even work with spoken language. Research shows that only 8% of the world's population speaks English as their first language. If your Web application can communicate with users only in English, you might well lose your customers. To make your site multilingual is to make it communicate directly with as many customers as possible across the remaining 92%. In this way, your site can attract more customers and thus benefit from more business opportunities. Multilingual Web applications multiply your e-business.
Figure 1-7 Multilingual Web applications can multiply your e-business
Globalization is everywhere Consider a simple travel application. What can globalization bring the customer? A travel agent wants to set up a Web site to serve tourists from around the world. In general, tourists as a group are much more likely to visit a Web site if they can read its contents easily. At the very least, this site must be able to: Provide sightseeing information in the user's language and cultural setting (for example, date format or currency symbol), based on the browser's setting or the user's selection. A more sophisticated site can also take cultural differences into consideration when conducting business. For example, it could recommend different itineraries for people of different national backgrounds. Distinguish between information that is dependent upon the server's cultural setting and that which is dependent on the client's, and then ensure the integrity of that information. When a ticket price in US dollars is displayed on a Japanese client machine, that price might either be converted to Japanese Yen or left in US dollars. In either case, using a particular currency symbol carries with it the responsibility to ensure the accuracy of its corresponding amount. Accept input in the user's language and cultural setting.
Chapter 1. What is globalization?
5
Store and display user information in the user's own language and format (for example, name and address).
Figure 1-8 Serve users in their own languages and formats
Some applications might need more capabilities in addition to these common customer needs for globalization, and technologies are advancing aggressively to make all of the following things happen: Machine-assisted translation might be able provide on-the-fly translations for rapidly changing contents (for example, in the case of tourism Web sites). Note, however, that since the technology is still immature, client expectations must be set appropriately in situations where it seems appropriate. Pervasive computing is making e-business an “any place, any time” phenomenon. Voice technology (both speech-to-text and text-to-speech) lets people easily check account balances via telephones and in their own languages.
Figure 1-9 Voice servers and the enterprise environment
6
e-Business Globalization Solution Design Guide
The portal concept makes customization easier and more culture-oriented so that globalization features can be merged together seamlessly with other design considerations. Globalization requirements are everywhere. They serve various cultural expectations for users around the world interacting with computer systems. Their differences can range from the obvious to the subtler, such that sometimes customers are not even aware of their existence. For example, currency amount can be displayed in a form compatible with its users' cultural conventions, and the date can be represented correctly based on their preferred calendar systems (for example, the Chinese lunar calendar, the Arabic calendar, or the Hebrew calendar). While formatting and handling text (either displaying or printing), text boundary analysis can locate appropriate points for word-wrapping text so that it can fit within specific margins or general linguistic boundaries for whole-word searching or indexing.
Figure 1-10 Line break and word break differ in different languages 1
While comparing and sorting strings, some cultural conventions will be selected to take precedence over other properties in order to have the culturally expected result. Along with other script-dependent considerations, punctuation must sometimes be ignored. Accent differences are occasionally treated as key sorting properties. When a sequence consisting of two or more letters must consider a single letter in sorting, specific guidance is required. In English, the sorting order of words is generally straightforward. In Simplified Chinese, while sorting Chinese chat (PinYin) in alphabetical order is also the most widely used method, sometimes the character counts override that order. For example, the words (TongXue), (Da) can be sorted alphabetically as:
(LanTianBaiYun),
(ShiJie), and
But users might instead prefer to use character counts as the primary sorting rule and alphabetic order as the secondary:
1
This illustration is from “Introduction to ICU” ((http://oss.software.ibm.com/icu/userguide/boundaryAnalysis.html)
Chapter 1. What is globalization?
7
Moreover, different languages sort the same characters differently. For example, in Swedish, z comes before ö if sorted in an ascending order, while in German, z comes after ö. More technical details will be introduced in the following chapters.
8
e-Business Globalization Solution Design Guide
2
Chapter 2.
Why is globalization necessary? The Internet's rapid growth has opened unprecedented opportunities for businesses beyond national boundaries and geographical barriers. Recent surveys have shown that over two-thirds of Internet users live outside the United States, and that they would very much like to see sites in their own languages and sensitive to their cultural and national conventions. As a company's e-business expands globally, there will inevitably be a growing number of globalization demands on the e-business applications. End users are much more likely to purchase from a Web site written in their own languages. The need to support multiple languages and cultures is even more important for large businesses. Recent surveys indicate that over 40 percent of large companies having more than 500 employees provide multilingual Web sites to run their businesses. IBM has long been in the forefront of the IT industry in providing customers with globally enabled applications. Globalization is well understood by IBM as the Internet expands. Architectural principles are strictly constructed to meet the growing demand for global application design. Technologies are quickly being enabled to ease application development; and engineers are effectively moving forward to help their customers by showing them how to put all the pieces together into workable global solutions. Today many e-business entities seek help from IBM in extending their presence worldwide. The solutions that IBM offers its customers will not only meet their business needs, but also provide maximum leverage of limited resources. Drawing on its extensive experience in this area, IBM has identified a number of architectural principles for designing and developing such applications, thereby enabling its customers to achieve even more benefits.
Multiple languages The customer can use a single server to support applications in different languages as well as those that handle multiple languages simultaneously, thus reducing the cost and time needed to develop and deploy applications worldwide. For example, applications for minor groups might work well with other applications and in so doing be able to share the same resources.
Figure 2-1 Single Executable for all is the basis for globalization
Flexibility Since any server can support all available languages, the customer can design its network and deploy servers based on load levels and resources rather than language support requirements. For example, one individual server can easily handle visits from both Chinese and American customers even though there is as much as a 13-hour time zone difference separating the two.
Figure 2-2 A well-globalized solution can balance loads effectively
Lower total cost of ownership An IBM customer can use the same version and patch level of a product throughout the world, thereby reducing the cost of support, maintenance, and training. IBM itself therefore sets a very good example by shipping a software product with only one code-base for all languages, thereby greatly reducing unnecessary expense for itself while simplifying customer usage.
10
e-Business Globalization Solution Design Guide
Consistent data handling Customers using multiple solutions expect each included product to handle data identically and consistent with established industry standards—for example, in collation and date/time formatting. In order to develop a multilingual application, they most likely will have picked up an assortment of global programming tools. If these tools can interact with one another, this creates added value for each tool and thus improves the total solution.
Shorter time to market If various localized versions for different geographies are handled separately rather than centrally in a global product development, the multinational product owner cannot deploy a product until all language versions are available. The waiting time can last from a couple of days to several weeks or even many months, and the functions and features might not be strictly consistent. Following the methodologies covered in this book (especially the concepts of the Single Executable and localization packs) will greatly shorten the time it takes to deliver all localized versions for different geographies as soon as possible and at the same time (that is, worldwide simultaneous general availability).
Consistent delivery Upon implementing the methodologies discussed in this book, a product owner will not be compelled to create separate language versions for all maintenance releases, updates, fix packs, patches, etc. The Single Executable model means that changes to executables can be delivered independently of their translation so that new translated versions are needed only when there are changes to the product's user interface.
Chapter 2. Why is globalization necessary?
11
12
e-Business Globalization Solution Design Guide
3
Chapter 3.
How to implement globalization Generally speaking, globalization belongs in the category of ease-of-use technologies. Offering ease-of-use solutions to your customers is critical to business success. Various globalization elements are available for ease-of-use solutions, ranging from the explicit to the subtle. This chapter gives a brief introduction to implementing globalization, and the remainder of the book adds many detailed explanations and examples. Globalization functions can be enabled by the operating system, software product, or business application. The operating system usually provides basic support, being the minimum set of baseline requirements that a globalization solution needs, but all three work together in the following ways in order to provide an integrated and seamless globalized solution:1 1. The end user can input, view, and print characters from diverse languages, and a system should be able to accept data, process it, and output results correctly. Some languages such as English, French, German, and Spanish are easy to handle, while others are complicated in terms of the programming required for computer processing. Scripts such as Thai, Hebrew, and Arabic are called complex display languages. Hebrew and Arabic have letters that are displayed from right to left. Since they also mix in other languages and numbers that display from left to right, they require what is called bi-directional support. Thai and again Arabic have characters that change vertical position or shape depending on the characters around them, and require contextual support. Special devices are being visualized and then designed by globalization professionals as new script requirements emerge. The standard keyboard is the most common input medium, while desktop or notebook displays and printers are the most prevalent output devices. Software assistants such as Input Method Editors (IMEs) are employed to support data entry of composed characters or large character sets. Now that pervasive computing devices such as cell phones, personal digital assistants (PDAs), and pagers have increasingly important roles, they are constantly being equipped with new input/output mechanisms as research labs continue turning out their pioneering technologies. On-screen keyboards not only beautify the appearance of computers to an unprecedented extent, but also bring improved functionality to the end user. Speech 1 The four categories here follow IBM G11N organization's opinions, which can be found in http://eou2.austin.ibm.com/global/global_int.nsf/Publish/982. However, the details are written by GCL.
recognition introduces a brand-new input method with unprecedented globalization challenges in that computers now must recognize numerous spoken languages and dialects. Handwriting recognition similarly helps people get closer to the computer and more easily, while challenging it to detect various written scripts and personal-chore graphics.
Voice Technology Architecture
Web/Application Server
Enterprise Server
System Management Component Telephony and Media Component Public Switched Telephone Network
Language Support Component Dialogic Hardware and Software
VoiceXML Browser Reco Engine
TTS Engine
VoiceXML
Global VoiceXML Application
Enterprise Data
VoiceXML Browser Reco Engine
TTS Engine
Figure 3-1 Voice technology architecture
So that physical devices can support the full character set of diversified languages, the operating system must at the very least provide corresponding support for IMEs, fonts, and layout software. 2. Correct cultural support of data. For example, date/time/number/currency must be displayed/processed appropriately in formats that users prefer. (See Chapter 6, “Locale model” on page 23.) Cultural support can be accomplished through the use of locales and locale-sensitive functions. IBM's recommended cultural support solution is International Components for Unicode (ICU), an open source project that it sponsors. ICU can be provided to operating systems, software products, and business applications to meet their globalization needs. Advanced cultural support might involve business logic. For example, income tax calculators must reflect tax amounts based on nation-specific income policies and the individual's reported income. 3. Multilingual support through Unicode technology. Unicode is the universal character-encoding scheme for written characters and text, including character sets used by many of the world's written scripts. By providing a consistent way for handling multilingual text interchange internationally, Unicode is in widespread use today. It has been widely accepted as the default encoding for many industry standards such as HTML and XML that enable Java's capabilities for multilingual support, thus providing one of the principal foundations of e-business. 4. Users can choose language and cultural preferences. The working example introduced in Part 3, “Our Global Travel Shanghai Demo: A working example” on page 61 clearly explains how to make this happen.
14
e-Business Globalization Solution Design Guide
Part 2
Part
2
Globalization application design This part includes a general description of how to develop multilingual applications.
Single Executable The world abounds with software products and applications encompassing a wide range of technical areas, and it is impossible to have a single architecture that will work for all of them. Nevertheless, there still must be interrelated base elements to enable a successful and efficient global e-business solution. In this chapter, we briefly introduce the building blocks essential to composing a multilingual system. Above all, it is most important that by product or application a Single Executable provide total support for all languages. This is key to ensuring that a globalized system can be designed, built, and maintained efficiently and correctly. This methodology has many benefits. For a product owner, it greatly simplifies development, testing, and support. For a product user, only one body of globally executable code must be installed per platform using a straightforward system configuration method to make it work for different languages. There is no easy way to evolve from single language-only applications to their globalized counterparts. To surmount the significant obstacles that will confront you, several approaches have been devised for the delivery of multi-language translated applications. From today's standpoint, earlier thinking seems rather crude, and we can readily see how far software engineers have progressed in this area. Basically there are three kinds of approaches to enabling applications with language awareness and cultural sensitivity, as illustrated in Figure 4-1 on page 18.
Program Category Type Figure 4-1 Three different program category types
1. Programs with messages, menus, and cultural behavior embedded in their code This is the most expensive approach among the three, because each language needs its own separate program, and each of these programs can serve only its own particular language. Therefore, costs are escalated by the redundant testing, maintenance, and support required for such kinds of programs. 2. Programs with separated but bound or linked messages, menus, and cultural behavior This approach shows some improvement over the first. Here the application is generated from a common program source that bundles culture-sensitive files. Program source code is separated from culture-related considerations, thus making it easier to maintain and leverage the existing investment. However, the executable still can support only the language sets packaged, and functional testing, maintenance, and support must still be repeated for each language. Even if there is only a single set of codes, they might have certain assumptions “burned” into them at compile time (for example, the use of single byte-only or multiple byte-aware string libraries). 3. Single Executable programs that dynamically retrieve resources This is a dramatic improvement over the previous two approaches, and the benefits are many. In this approach, software programs are developed that allow the Single Executable produced by source compilation to handle the cultural needs of all supported locales. The difference from the second approach is that cultural and language-independent program code calls cultural and language-dependent information at runtime, thereby greatly reducing the expenditure of cost and effort otherwise invested throughout the product life cycle. Employing this best-of-breed technique brings up several design and implementation considerations: 1. The executable source code must logically be the only one used for all the supported locales. 2. Only one executable should be built/tested for all supported locales, and there should be no lag in code availability. 3. Only one version of the executable should be manufactured and distributed, although package options might be made available.1
18
e-Business Globalization Solution Design Guide
4. Only one logic fix pack should apply in all supported locales. 5. The addition of new locales will generally require no modifications or additions to the program executable. 6. Locale-sensitive operations are supplied through a common API support mechanism that provides a full set of globalization functions. 7. All functions must behave correctly for all supported locales, including but not limited to the following: – – – – – –
1
Number representation Date representation Time representation Currency representation Messaging User interface
Supported locale resources can be freely selected for packaging with the Single Executable.
Chapter 4. Single Executable
19
20
e-Business Globalization Solution Design Guide
5
Chapter 5.
Unicode support An encoding system is a method of assigning numbers to individual characters so that a computer can process those characters. In computing's early days, there were hundreds of different encoding systems spanning many different language sets. No single system existed that could process every single character from all languages throughout the world. Another drawback was that those encoding systems might conflict with one another in that the same character could have different “number” representations in different systems. Then Unicode evolved. The Unicode consortium, found at http://www.unicode.org, publishes Unicode information. Now when the world wants to talk, it speaks in Unicode. Unicode provides a unique number for every character, no matter what the platform, no matter what the program, no matter what the language. Unicode can represent every character in the world by providing a single consistent “character to number” mapping schema. Figure 5-1 illustrates some of these “character to number” mappings.
Figure 5-1 A single consistent “character-to-number” mapping schema
Unicode can be used as a lingua franca across systems and languages. The International Components for Unicode (ICU) and Java are the IBM-recommended ways to handle Unicode text. With Unicode, characters in different character sets can be displayed in the same Web page simultaneously, as illustrated in Figure 5-2.
Figure 5-2 Characters from different scripts can be displayed in the same Web page simultaneously
22
e-Business Globalization Solution Design Guide
6
Chapter 6.
Locale model In English, the word “locale” means a place where something happens or has happened. This is a key term in globalization. Its meaning is not so straightforward that the definition cannot be better given in several short sentences. In globalization, the word “locale” was borrowed by software engineering from geography to indicate that the distribution of human cultural expectations of computer behavior fall into clumps that can be grouped together, most commonly by language and country or region. This clumping of expectations has allowed the use of computer standards that describe sets of related expectations, such as how dates and times are formatted and how words are sorted. For the purposes of this architecture, a locale means a specification of a language and country/region, or a specification of a language, country/region, and variant. Thus a locale can be specified by a string such as “French-Belgium”. It does not mean a data structure that contains information for a language and country/region. Locale is used by the software industry in general to mean any of the following related concepts: The set of people who share a set of common expectations about their computer interactions The common expectations of computer behaviors that those people share The name given to one of those particular sets of expectations or people The computer-readable data (and sometimes code) that encapsulates those behaviors A locale model contains assumptions about all of these cultural features. In particular, any adequate locale model used in a global e-business system must meet these requirements: The locale model accounts for at least language and country/region and has some additional way of specifying variants. It includes support for the major categories of locale-dependent computing. It provides for hierarchical fall-back behavior at either the source or runtime levels. It allows different locales to be set per client. For multi-client server software, this means that there must be a way to have different locale processing for each client context (which may be per thread, depending on the client interaction model).
The locale model supports conventions that allow all locale-sensitive components of an e-business system to communicate appropriately about locale settings. This chapter covers several frequently used locale models. It will present complications presented by dates, times, currencies, etc. Typically, programs call international services such as those found in Java or ICU to handle the complications found in all of these.
Numbers and mathematics The decimal system (base 10) is used in almost every country of the world. However, number formats vary considerably. and there are still traditional (non-decimal) numbering systems such as Roman numerals that are used in important contexts. Table 6-1 Number format
Currency format Currency format is usually composed of a locale, its currency name, its currency subunits,1 its currency symbol,2 positive format, negative format,3 currency codes,4 and currency separators.5 Currency separators include thousands separators, decimal separators, decimal position, field length, and padding character.6 Different countries/regions have different formats and rules for currency. Table 6-2 and Table 6-3 on page 25 show typical currencies used in international banking: Table 6-2 Currency format
1 2 3 4 5 6
24
For example, the Egyptian pound contains 100 piasters. Whether symbols should be displayed to the right or to the left. Whether the minus sign should be displayed to the right or to the left. Defined by ISO 4217: 1995, for the currency code used in international banking. For example, $9,876,543.21. The symbol used to “pad” out the format to a specific string length.
e-Business Globalization Solution Design Guide
Table 6-3 Currency separators Locale
Thousands Separator
Decimal Separator
Field Length
Padding Character
ar_EG
apostrophe
comma
2
Not applicable
None
en_US
comma
period
2
Not applicable
Not applicable
zh_CN
comma
period
2
12
None
Decimal Position
The situation is slightly more complicated in real life, since a given locale can have different formats for different currencies. For example, people in the US might want to display a chart showing both dollars and rupees, but using an English format for rupees rather than the Indian (i.e., with Hindi letters). Important: Since both Java and ICU have multiple currency support (see http://oss.software.ibm.com/icu4j/doc/com/ibm/icu/util/Currency.html), numeric amounts should always be paired with their corresponding ISO currency tags. Otherwise, people might confuse values expressed in British pounds as Japanese yen.
Date Table 6-4 shows the common and short formats for presenting dates within several countries/regions. When further precision is required, there are additional considerations. For example: Whether a leading zero should be used for the day and month. Although numbers are usually presented in decimal format, other locale-specific characters might be desired, such as in Hebrew. Differences might exist between storage format (keyboard sequence) and presentation format, such as with bi-directional scripts. Table 6-4 Date format
Time Table 6-5 on page 26 shows formats for presenting the time, and they can be adjusted depending on various business circumstances. For more precision, we can add other parameters. For example: Time zone information (EST, CST, GMT, etc.) might need to be appended to the representation. Where situations warrant, the separator symbol between minutes and seconds can be omitted. Regarding AM and PM indicators, different geographies can have different understandings as to what constitutes midnight and noon. Weekends, holidays, and daylight saving time can make things even more complicated.
Chapter 6. Locale model
25
Table 6-5 Time format
Calendar The Gregorian calendar is used today in most places in the world and is the standard calendar for international business transactions. However, some countries/regions still use their own calendars for historical, political, religious, cultural, or even astrological reasons. For thousands of years, the Chinese people have used their own lunar calendar (still unofficially used even today), which accurately reflects the moon's rotation around the earth. The ancient Chinese used this calendar to guide their annual planting, and this custom has persisted into the 21st century. In China, many senior citizens do not know their birth date by the Gregorian calendar, but only by the Chinese lunar calendar. In countries/regions such as Japan, the local calendar has an additional era name that is derived from the name of the reigning emperor. It is simply a Gregorian calendar, but must be refreshed so as to restart from year 1 once a new emperor begins his reign. The Hijri calendar used in some Arabic countries/regions is more sophisticated. It begins in the year 625 Gregorian and is a lunar calendar where each month begins with the new moon. Consequently, the number of days in the month is not fixed each year, but changes depending on this cycle.
Telephone Telephone numbers vary in length from country to country, but certain fields are common. For example, a hypothetical call from one country to another might require a calling sequence of “011-xxxx-yyy-000-0000”, consisting of:
011, the international access code from the USA and Canada xxxx, the country code (1 to 4 digits) yyy, the area code (1 to 3 digits, with no leading zero) 000-0000, the local portion (usually 8 digits or less)
To place an international telephone call, you must know the international access code of the country from which you are dialing as well as the country code of the country you are trying to reach. Table 6-6 International telephone codes
26
Country/Region
International Access Code
CCITT/ITU Code
Internal Phone Format
Egypt
00
20
(12) 3456789
Germany
00
49
12345-6789012345678
United States
011
1
(123) 456-7890
China
00
86
(10)65391188
e-Business Globalization Solution Design Guide
Measure The measurement systems used in various countries/regions differ due to historical and linguistic reasons. For example, people in United States still tend to use miles instead of kilometers. The common terms used for the same units can also differ. Kg and kilogram are recognized metric designations in the USA and Canada, but not in Greece, China, Russia, and many other countries/regions. Another example is the size of paper used in printers and typewriters, which is inconsistent throughout the world.
Icons Icons are pictures of objects or actions. The importance of an icon's local meaning is explicitly recognized. To reduce the chance of a product rejection in a particular country/region or culture, we might: Allow for icon substitution (not bind icons into executable code) Aim for widespread acceptance, or prepare different icons to suit different cultures If possible, avoid using icons that are similar to an offensive symbol in the target countries/regions
Conventions Many other things vary considerably in form and meaning within particular countries/regions and cultures. For example: Abbreviations—The same symbol may have different meanings in different countries/regions. For example, some peoples may interpret an X as crossing out what is not desired rather than indicating what is to be selected. Question marks—A great many languages (such as English, French, and German) use the question mark to indicate interrogation, but there are some exceptions to this. In Spanish, questions always begin with an inverted question mark (¿). For example, “¿Qué es eso?” The Greek question mark looks very much like a semicolon, while the Greek semicolon resembles an elevated period. Percent symbol—The most common symbol used to indicate percent is %, as in 37%. There are exceptions. The Dutch language as written in Belgium and the Netherlands sometimes uses pct (as in 37 pct) to represent percentages. In the province of Québec in Canada, the number and symbol is written as 37 %, with a space separating the two. In Turkey, the percent symbol is written before the number, as in %37. In Arabic countries/regions, the percent sign should logically be written after the number, but since writing progresses from right to left, it is displayed on the left. Pound sign/number sign symbol—The symbol “#” is known as the pound sign, hash mark, or number sign in various countries/regions, but unknown in many others. Wildcard symbols—A wildcard symbol is any graphical character used to specify an indefinite argument in a search or other query. The asterisk (*), question mark (?), and ampersand (&) are commonly used wildcard symbols. Navigation and motoring—In the UK, India, Japan, and South Africa, people drive on the left side of the road. In North America, South America, China, most of Europe and Africa, and the Middle East, they drive on the right. Numeric superstitions—Superstitions are beyond exhaustive cataloging, but a few numeric superstitions are worth specific mention because of their impact on individual and mass behavior. For example, the number 8 is usually considered to be wealthy in China, while the number 13 is treated as an unlucky sign in most western countries/regions.
Chapter 6. Locale model
27
Sorting order The sorting order of two strings is the order in which they should appear when sorted. This order is typically derived according to weights given to each of the characters in the string. There are, however, a number of complicating factors for different languages.1 In the context of national language support, a correct sort must produce the following: Predictable results—The sort result must always be the same, regardless of the initial order of the items. Culturally expected order—A person will easily find an item in a sorted list only if it is sorted in the expected order. People usually expect items to be sorted in alphabetical sequence. For example, if the user is searching for the item “H90U42” and he finds “L49M31” first, he will expect “H90U42” to have preceded “L49M31” and thus search for “H90U42” among those items occurring before “L49M31,” taking no notice of the items that follow it.
Conclusion In addition to all of the things discussed above, many other customs and practices differ from country/region to country/region. Different countries/regions have different “lucky” (fortunate) colors, while colors in general are perceived differently in different countries/regions. Business etiquette is very culture-sensitive. For example, business dress codes are strictly enforced in Arab countries/regions. To be successful in business on a worldwide basis, it is imperative that you understand and accommodate such cultural differences. In the realm of e-business, that means satisfying individual cultural needs by providing sound locale model software programs.
1
28
For more information, see Section 5.17 in the Unicode Standard.
e-Business Globalization Solution Design Guide
7
Chapter 7.
Localization pack Since Single Executable, discussed in Chapter 4, “Single Executable” on page 17, means to have one and only one executable for multiple locales, we must use a standardized approach (which we call localization packs) for working with different sets of locale-specific program data. There are two types of localization packs, based on the type of program that uses them: Application-dependent (such as menus, dialogs, and other user-interface elements) Application-independent (such as collation tables, transliteration rules, and the names of date and time elements) A very simple example helps to explain this concept. In an application program, a single key (“msg1”) is associated with a single string value (“Hello”), such that when we want other language versions, corresponding localization packs can easily be created. See Example 7-1. Example 7-1 A simple example containing the greetings in different languages English Version: STRINGTABLE BEGIN Msg1“Hello” END Simplified-Chinese Version: STRINGTABLE BEGIN Msg1“??” END French Version: … Japanese Version:
The localization pack manager is the module that manages the location, loading and accessing of localization pack resources. There are various localization pack formats such as Java resource bundles, but in this chapter we discuss only the XML format (which has been recommended by the IBM globalization organization). Java has different formats for resource bundles, among which the most commonly used are ListResourceBundle and PropertyResourceBundle. ListResourceBundle is actually compiled Java codes that are hardly readable outside the Java environment. PropertyResourceBundle contains property files holding unstructured mappings, and 8859-1 character encoding is used when saving properties to or loading them from a stream. For characters that cannot be directly represented in this encoding, Unicode escapes are used. This can result in low readability in PropertyResourceBundle. XML files can store human-readable and well-structured information, although in general Java resource bundles have better performance. Most XML discussion will be reserved until later on in this book. However, since this format has various implementation approaches, we will cover some of those in this chapter.
XML source format and implementations Considering that localization packs need a cross-platform format and all-in-one character repository, the IBM globalization organization recommends XML because: It is platform-independent (flexible enough to accommodate the need for various platforms) By default it uses Unicode for document encoding so that it is capable of processing multilingual data without data loss It is an Internet standard, meaning that it can meet content format requirements for Web applications Furthermore, due to its popularity there are already many tools for working with XML files. For example, XML Spy (see http://www.xmlspy.com) is the first true Integrated Development Environment (IDE) for XML and contains many useful functions that can simplify typical XML editing tasks. It offers different presentations of an XML document, including:
The enhanced grid view is XML Spy's core presentation and editing view. This view allows you to see and directly manipulate elements in your XML document, such as the actual data that it contains.
30
e-Business Globalization Solution Design Guide
Figure 7-1 XML Spy—enhanced grid view
XML Spy shows the hierarchical structure of any XML-compliant document through a set of nested containers that can easily be expanded and collapsed to get a clear picture of the document's structure. All items contained in an XML document such as the XML declaration, document type declaration, or any element that contains child elements are displayed in a structured way that allows for easy manipulation of content and structure simultaneously. A hierarchical item is represented with a gray side bar and a tiny arrow. An element is denoted with the icon “<>”, and an attribute is denoted with the icon “=”. The enhanced grid view manipulates data in a graphical way so that editing in this view is infinitely more comfortable. For example, you can:
Click the side bar to expand or collapse the item Drag and drop elements Insert new rows Copy/paste your data to and from other applications such as Excel and Access
When opening any XML document, XML Spy uses its built-in incremental validating parser both to check the document for good formation and to validate it against any specified DTD or
Chapter 7. Localization pack
31
XSD schema. The same parser is also used while editing a document that refers to one of these schemas in order to provide intelligent editing help and immediately display any validation error encountered. For localization packs with XML format, it is also important to validate source and translated files both before and after translation. XML Spy is an excellent tool for the preparation, editing, and maintenance of XML localization packs. The typical e-business application essentially uses HTML as the interface shown to end users. In a multilingual e-business environment, the XML-based localization pack manager can act as a kind of XML parser and therefore have different implementations as dictated by the various HTML generation approaches. Here we present three modes for localization pack manager implementation—embed mode, extend mode, and synthesize mode. Suppose we have a page containing the greeting message “hello” and need to show that word in the language of a specific locale (for example, zh_CN). The target HTML is shown in Figure 7-2.
Figure 7-2 A simple page containing the greeting message
And the localization pack XML files that store the multi-language greeting messages are shown in Figure 7-3 and Figure 7-4.
Figure 7-3 Localizationpack_en_US.xml
Figure 7-4 Localizationpack_zh_CN.xml
32
e-Business Globalization Solution Design Guide
For other languages, Localizationpack_xx_XX.xml (where “xx_XX” is the locale string) can be produced if necessary. Usually, the programs that generate the result HTML are JSPs, servlets, or portlets. We only provide pseudo-codes here for demonstation purposes.
Embed mode This is so called because the codes accessing localization pack XML are embedded in programs such as JSPs, servlets, or portlets, where HTML source codes are produced line by line. The working flow as shown in Figure 7-5 gets the current locale, selects the corresponding localization pack file for that locale in order to construct the XML instance tree, and accesses the nodes in the tree to insert them into the right place in the resulting HTML. To fetch the translated keywords from localization packs, many Java-based XML parser APIs can be applied (such as DOM or SAX). We recommend that you use the parser API for XPATH because XPATH is an excellent language for manipulating path expressions (working much like the directory path in a computer file system to identify nodes in an XML document). Thus it is easy to locate the message “Hello” by the path expression “//greeting/Msg1”, “//Msg1”, or “/localizationpack/greeting/Msg1”.
This method is simple and classic, with no extra files except the localization packs themselves. Note, however, that since the page layout and program data are merged in the program code, the program might need to be rebuilt if the layout changes.
Extend mode Besides the Java XML parser, you can also use XSL (eXtensible Stylesheet Language) to parse XML using XPATH path expressions. In this mode, the result HTML is not generated
Chapter 7. Localization pack
33
line by line in JSPs, servlets, or portlets, but rather transformed from an XML file to an XSL file as shown in Figure 7-7 on page 35.
The program creates a temporary XML file containing dynamic data from back-end logic and applies an external XSL file to transform it to the result HTML. It is the XSL's responsibility to manage the localization packs and obtain the required locale data. The following XSL syntax can access nodes from another XML file:1 <xsl:value-of select=”document($lpfile)//greeting/Msg1”/>
Using XSL separates the page layout from the program data and saves coding by applying the XML parser. All you need to do when the layout changes is to modify your XSL files without rebuilding the program. Important: Since the number of files doubles because each program needs at least one corresponding XSL file, using XSL might have an adverse impact on program performance.
Synthesize mode Similar to extend mode, synthesize mode also needs transforming XML and XSL to get the result HTML. But this XML is actually an HTML containing the page layout with the to-be-translated messages marked by the syntax <word lppath=”//greeting/Msg1”/>. The way to produce such HTML is the same as that in embed mode.
1
34
Not the one with which to perform transforming—this is the data XML
e-Business Globalization Solution Design Guide
This XSL is different from the XSL used in extend mode. It acts as a translator that searches all tags named <word>, replaces them with the content fetched from the localization pack XML by the path expression defined in the attribute “lppath”, and copies the left parts to the result HTML. In this way, one XSL can be applicable for all transformations. To make the source HTML transformable, all you must do is add an XML header at the top of your HTML.
Like extend mode, synthesize mode saves the coding for the XML parser. Moreover, the total number of files is small because only one XSL file is needed. The drawback is the need for rebuilding when your layout changes, and the tag <word> remains in the result HTML although it does not affect your display. The XSL can be revised to make the result HTML more concise.
Conclusion Using XML as the source format of localization pack provides flexibility in organizing localization packs and localization pack manager.
Chapter 7. Localization pack
35
36
e-Business Globalization Solution Design Guide
8
Chapter 8.
Input and output of multilingual data Input and output are very commonly used computer terms defining two separate computers' roles in communicating with end users. Input is the process of getting data from users, and output is the process of sending back a comprehensible reply to them. In this context globalization means the ability to input text in different languages with a keyboard, mouse, or other device and to properly present it in those languages on the screen or printer. Generally, these functions will be supported by the operating system. By using linguistics services, a more human-friendly interface such as speech input can be enabled for the end user.
Complex input IMEs (input method engines, or editors) have been developed for handling complex input for certain language sets. Language scripts such as Chinese are composed of a set of ideographs, and the supported character set is quite large. Input thus becomes very complex. IMEs are designed to solve this problem. Usually, IMEs are bundled with the operating system, provide user-friendly and keyboard-accepted input methods (such as Pin Yin in Chinese), get the right character based on the user's input, and are returned to whatever application is running on the system for further processing.
Figure 8-1 IMEs help to handle complex input for certain language sets
If we want to input Chinese characters, we first need to choose a Chinese IME. Once this IME is active, a panel is usually displayed to accept input. Figure 8-1 shows a Chinese IME that provides the Chinese Pin Yin input method.
When we want to input the Chinese word which is the Pin Yin for the Chinese word.
we key in the letters “zhongwenshuru”,
The Pin Yin Latin alphabet characters are shown on the top line of the panel, with their candidate Chinese glyphs listed just beneath. You select a Chinese word by pressing “1”, and its Chinese characters will be then become your input. Multiple IMEs can be installed on the same operating system to provide multilingual input. No matter what the current working locale, the end user is always allowed to choose the preferred IME for a particular input language script. Given the ability to freely switch IMEs, an end user can easily input multilingual data regardless of system locale. A typical example is Microsoft's Windows operating system, which operates in such a way that end users can select and install the appropriate IMEs, together with their associated “hot keys.”
Figure 8-2 IME selection panel in Microsoft Windows
In addition, IMEs also let end users choose their preferred encoding method. Windows 2000 has a good range of IMEs, providing both Unicode and non-Unicode1 character set APIs so that the application can define whether values generated by the IME should be in Unicode or non-Unicode encoding. Figure 8-3 shows Windows 2000 Unicode character IMEs in different languages. 1
38
Also known as “legacy” or OEM.
e-Business Globalization Solution Design Guide
Figure 8-3 Unicode characters input by different IMEs
Complex output In certain scripts such as Japanese and Chinese, input becomes extraordinarily complex and difficult. In languages such as Hebrew, Arabic, and Thai, on the other hand, it is with output that the complexity arises. Glyphs are visible shapes representing characters in languages. Each language must have enough glyphs to represent all defined composite characters. As far as mapping between characters in memory and glyphs on the screen or printer is concerned, processing can easily be carried on as long as there is a one-to-one algorithm. However, those scripts mentioned in the preceding paragraph do not have a one-to-one mapping, and this creates big challenges for software engineers when it comes to designing a mechanism to handle such languages on computers. Two characters can merge into a single glyph, one character can split into multiple glyphs, and glyphs can be arbitrarily ordered across the screen. There are even cases where a character splits into glyphs, and one of those glyphs then migrates across other glyphs to merge with another glyph some distance away.
Figure 8-4 Two characters merge into a single glyph
In Arabic, a character can have four possible shapes: Isolated—Where the character is unlinked to either the preceding or the following character. Final—Where the character is linked to the preceding character but not to the following one. Initial—Where the character is linked to the following character but not to the preceding one. Middle—Where the character is linked to both the preceding and following characters.
Chapter 8. Input and output of multilingual data
39
Figure 8-5 A character from Arabic can have four possible shapes
Some languages such as Arabic, Farsi, Urdu, Hebrew, and Yiddish have scripts that are called bi-directional, because the text is written from right to left, while embedded numbers or segments of text in languages such as English or French are written from left to right.
Figure 8-6 Some languages have bi-directional scripts
Presentation of bi-directional text You can think of bi-directional text as a collection of segments, with each segment taking a right-to-left or left-to-right direction. The order in which the segments are stored might not be the same as the order in which they appear on the screen. The presentation is usually performed by a special bi-directional reordering algorithm (in most cases based on the algorithm published in the Unicode Standard). The side of the presentation area (the window on a screen or the page on a printer) on which the first segment is presented is called its “global orientation” (also called “paragraph orientation,” “basic orientation,” “writing order,” or “reading order”),It is possible to explicitly indicate to the rendering engine the appropriate global orientation (for example, if it is from the right or from the left). This is done using different techniques for different environments. For example, this can be done in HTML by using a DIR attribute, which is set to DIR=RTL or DIR=LTR as needed. Since English is written from left to right, the default global orientation is LTR when not otherwise explicitly indicated. Before presenting a bi-directional text, you must make sure that the global orientation is not left to its default LTR value but is set according to the requirements of the bi-directional text. If this is not done, that text will appear mixed up and unreadable. Furthermore, the dot used to specify the end of a sentence might erroneously appear at the right-most side instead of the left-most, where it should be. Tip: We recommend that globalized products be enabled for BiDi (allowing input, processing, and presentation of bi-directional text) so that substantial time and cost can potentially be saved should your system be required to support Arabic or Hebrew.
40
e-Business Globalization Solution Design Guide
Mirroring of GUI elements When a product must support Arabic or Hebrew, not only does the textual data need to be translated into that language, but the graphic elements in the user interface should also be mirrored to allow easy readability by customers who are used to reading from right to left. Mirroring means a symmetrical horizontal inversion of the GUI elements, such as buttons, labels, menus, or vertical scroll bars.
Figure 8-7 Mirroring user interface for bi-directional scripts
A font is a complete set of type in one style and size for representing glyphs on computers. Software such as Windows can ship with a range of separate fonts that cover the supported languages. By choosing different fonts for different scripts, users can see their text presented correctly. For less sophisticated software, a single unified font that can be used across all product lines and present the full repertoire of Unicode might be useful.
Figure 8-8 Different fonts make different display results
Complex input and output both require that applications use the proper APIs. Developers must be able to make the right assumptions. Common misunderstandings are that the width of a string is the length times the width of the first character, versus that it is the sum of the widths of its characters. A set of ideal APIs would enable applications to handle text input and output for all languages using a single set of APIs such that the applications using those APIs require very little in the way of language-specific code. It should include specialized, built-in support functions such as bi-directional rendering, contextual character shaping, and combining characters, as well as provide specialized functions such as word-breaking and justification rules. The Unicode standard defines the rules governing the shaping and positioning of glyphs. The Java TextPane component provides such APIs and makes the
Chapter 8. Input and output of multilingual data
41
work of supporting complex input and output fairly routine. For more information about Java text component, link to http://java.sun.com/docs/books/tutorial/uiswing/components/text.html.
The new technologies In today's world we do not just mean keyboards, printers, screens, or mice when we talk about input and output. Ever-evolving technologies bring brand-new meanings to these traditional concepts. As its name implies, text-to-speech is a technology for converting text to speech. Using this technology, a system can output information in voice form without the need for any pre-recording, which is valuable in terms of providing mutable information with a big vocabulary. This technology can be applied to such applications as outputting voice information to PC speakers, sound cards, or telephones. Speech recognition is the technology of converting speech to text. Using speech recognition technology, a system can accept the users' voice input and “understand” what he is saying. Speech recognition can be applied to such applications as voice command, voice navigation, and dictation. Using text-to-speech and speech recognition technology, applications can act as a voice gateway for ordinary text systems and voice users (for example IBM WebSphere Voice Server). Optical character recognition (OCR) can be used to input information from characters appearing on paper. First the page is scanned into an image, so that the OCR software can recognize the glyphs in the image and convert them into characters. All characters on a page can thus be input in just a few minutes. OCR performs well for the printed page, and is most suitable for inputting substantial information from paper to computer. Handwriting recognition enables users to input characters by writing. Commonly, handwriting recognition tools provide a “pen” and a “panel” that users employ to write down characters. The tool then reads the user's handwriting and performs character recognition, capturing and processing such additional information as the order of his pen stroke. Handwriting recognition provides an alternative way to inputting characters by keyboard and can be used on Palm-type devices. Users can input such characters as Chinese more freely through a handwriting recognition process than by IMEs on a keyboard. In a word, handwriting recognition facilitates users' input of information in a more natural way.
42
e-Business Globalization Solution Design Guide
9
Chapter 9.
Linguistic services People fundamentally communicate ideas and concepts through the use of natural language. The encoding of these ideas into specific languages such as English or Chinese is analogous to the encoding of the text that represents them into character sets such as ISO 8859-1 or Big5. We can view information processing at two distinct levels—the character-based processing required to read and display text, and the linguistic-based processing that interrogates text at the language level in order to identify the various properties of words. Linguistic services are required by more sophisticated global applications to address the challenges presented by the growing body of electronic multilingual data on both the World Wide Web and corporate intranets. Some of the key global e-business solution components are possible only through the application of linguistic services (for example, voice as input/output and linguistically sensitive searches). Linguistic services for a particular language generally require an extensive understanding of that language. In most cases, processing is language-sensitive. Even though many algorithms used are language-independent, they are driven by language-specific data. In other cases, both the algorithms and data are language-specific. Situations also exist where a particular technology is language-independent (for example, in clustering), although these situations are rare and usually depend on lower-level services that are themselves language-specific (such as segmentation). Finally, some types of languages have unique linguistic features not shared by others, thus requiring specific support for those features.
General low-level linguistic tools Linguistic services encompass a variety of technologies, including: Spell checking, which verifies that the spellings of words are correct. The concept of misspelling does not apply to ideographic languages such as Chinese or Korean in the same sense as to orthographic languages. For ideographic languages, input is controlled by an Input Method Editor (IME). Any character generated by the IME is valid and represents a real word. The issue for these languages is not one of orthographic validity, but of identifying grammatical or semantical mistakes that occur from accidental use of a mistakenly selected word or character. This is similar to the situation in English, where it is possible to misspell one word as another valid word so that only a grammar check or
statistical analysis against common mistakes will reveal that this orthographically correct word is in fact a misspelling and thus contextually incorrect. Spell checking technology is widely used in word processors.
Figure 9-1 Spell checker helps to verify the spellings of the wordings
Grammar checking, the process of verifying that sentence structure is valid according to a set of rules. These rules form the grammar and are language-specific. As previously discussed, grammar checking can be used to find those combinations of characters valid in spelling but contextually incorrect. For example, someone might erroneously use a character that looks or sounds similar to the intended character. An English example might be “I red (meaning read) the book last night.” Today's word processors incorporate this technology. Grammar checking plays a key role in spell checking for ideographic languages. It is also used in the very important area of disambiguation. Closely related to grammar checking is the grammatical parsing used in parsing queries for natural language question-answering applications. Hyphenation, a very common practice using the hyphen symbol to compose words, as well as for such things as constructing line-break boundaries. Hyphenation is meaningless in Chinese, Japanese, and Korean.
44
e-Business Globalization Solution Design Guide
Figure 9-2 Hyphenation is used to compose words and make line breaks
The thesaurus, a system providing lists of synonyms and related words for an input word. This technology is very helpful when searching categorically. The base technology is a dictionary of synonyms. Words have multiple “senses” or meanings, and therefore a single input word has a number of synsets (sets of synonyms relating to each sense of the word). There are various classes of synonyms—hypernyms and hyponyms, where the relationship is hierarchical (such as “car: vehicle, transport, etc.” or “vehicle: car, bus, etc.”; and antonyms, where the relationship is the contrary of the input word (for example, “like: dislike, hate, despise, etc.”.
Figure 9-3 Lists and synonyms for the word “communication” found in Merriam-Webster's Collegiate Thesaurus
Disambiguation, the process of selecting the correct option where the same sentence can have different meanings, each of them valid. For example, “The ladies of our thrift store have cast off clothing of every kind, and they may be seen in the basement on Tuesdays.” Does this mean that the ladies have old clothing that they simply wish to donate to a good cause, or that they have disrobed in public? Other kinds of disambiguation include: – Word-sense disambiguation (“bank” as in money or “bank” as in river?)
Chapter 9. Linguistic services
45
– Part-of-speech disambiguation (“saw” as a verb or “saw” as a noun?) – Syntactic disambiguation (In the sentence “Peter saw the man in the park with a telescope,” who had the telescope?) Disambiguation as a technology is very difficult to achieve, needing very large amounts of raw data as part of its rule repository. Segmentation, the division of text into segments (usually words, sentences, and paragraphs). For segmentation there are two types of language—those that separate words using spaces and other punctuation (for example, English), and those in which words are not separated from each other at all (such as Chinese). In the first case, acceptable (though naïve) segmentation can be achieved simply through breaking the text at punctuation. However, breaking at each ideographic character in languages that have no separation between words (such as Chinese) yields results that are totally meaningless.
Though it is comparatively easier to segment text in languages such as English, there are still complications. More sophisticated segmentation even in these punctuation-delimited languages can be a real challenge. The use of multi-word expressions and ambiguous delimiters is very common in these languages. Also, sentence segmentation becomes more challenging since the most common sentence delimiter (the period) is highly ambiguous and can also be used in abbreviations, dates, filenames, ellipsis, or numeric expressions. For example: This year their profit increases by 49.45%. This surprises the industry. Although there are three periods in the paragraph, there are only two sentences. We must also be able to segment multi-word expressions correctly, some having no meaning on their own (for example, “ad hoc”) and others having multiple interpretations, such as: I have a black belt. Do I possess a high level of proficiency in a martial art, or simply a belt that is black? The identification of the correct segmentation—”[black] [belt]” versus “[black belt]”—is analogous to the situation in languages not delimited by punctuation, such as Chinese. The approaches to correct segmentation pose some real challenges for the IT industry today. Dictionary—Dictionaries play a very fundamental role in linguistic services. The most common interpretation is the single-language dictionary that provides information about words. The information provided usually includes a set of possible definitions of the input
46
e-Business Globalization Solution Design Guide
word, and with each definition the grammatical category or part-of-speech to which the word belongs (for example, noun, verb, etc.). In the most general sense, dictionaries provide a “key-value” lookup for any input returning any type of value. In the linguistic services sense, however, dictionaries provide everything from the part-of-speech base forms of words to lists of synonyms for a given word. Dictionaries are generally used to annotate text for higher-level linguistic analysis applications such as summarization or categorization. Morphological analysis, the study of the forms and structures of words and how they apply to many different languages. Morphology is a particularly important aspect of language. The morphology of a word identifies the “base form” of that word (for example, “running” has the base form “run”). This information is particularly useful in search applications to find more hits. Morphology also describes the rules that allow other words to be formed from the base form. This process is called inflection. Irregular cases exist where the rules of inflection are not obvious (for example, “went” has the base form “go”). Other important morphological information includes part-of-speech, number (single/plural), and gender (masculine/feminine/neuter). In many languages these can be derived from formational elements of the word.
Language identification By “automatic language identification” we mean the computerized process of determining the language in which a natural language text is written.1 This term refers to computing the language of a document by looking at the text and, optionally, other metadata that might be helpful in determining the document language.2 The term “language identification” (without the “automatic” qualification) is also used for standards and systems to tag information objects containing natural language text (such as XML or HTML) with a label to explicitly encode the language of the text contained in the object. Automatic language identification works without human intervention. Though unrelated to a specific form of markup, its results can be used by a program to automatically label information objects with “language identification” tags. The algorithms used for language identification are mostly language-neutral in the sense that they can be trained on a new language simply by presenting representative sample documents to the system, thus requiring only a minimal effort to provide this function for a new language.
Text mining Summarization—Automatic summarization is the task of creating a summary for a document's content. Most people are familiar with human-written summaries such as abstracts for scientific articles, summaries on book covers, or the executive summaries prefacing business reports. Automatic summarization does not really strive to create such high-quality summaries, and automatic summaries do not try to allow a reader to do completely without the original document. Instead, the goal is to allow the reader to decide by reading only a few sentences if the whole document seems relevant. If this is the case, he can proceed to look at the full document. A good example of this kind of automatic summarization is the mini-summary thatsearch engines provide in their result lists.
1
This task is sometimes called automatic language detection. Finding and interpreting an explicit language markup that might be present in some document formats is not covered by the term language identification. 2
Chapter 9. Linguistic services
47
Figure 9-4 Text summarization
Categorization—Automatic categorization is the task of assigning categories to documents (for example, the sports and business sections in newspapers). These categories are taken from a predefined list or tree called a taxonomy. Multiple taxonomies can be used to represent different views on a subject. A good example of a specific general-purpose taxonomy is the Yahoo directory structure. To initiate automatic categorization, you need more than a taxonomy and an input document. The automatic categorization engine first must be trained for a taxonomy. This is done by giving it sample documents for each category in the taxonomy. The categorization engine then looks at each document and derives rules from it. These rules are later used by the engine to classify new documents as they come in.
Figure 9-5 Text categorization: The document “An Introduction to Lotus Notes” will be categorized as “software” since it is a document discussing the software Lotus Notes.
Clustering—Automatic clustering is the task of grouping documents according to the similarity of their content. For example, all documents about medicine might end up in one cluster. While related to categorization, clustering does not require training documents or pre-existing taxonomy. It works purely on document contents. Technically, clustering is based on distribution of terms across document collections. Cluster labels are chosen based on the most frequent words for each cluster, and there is no way to know in advance or to influence how clustering will group the documents of a collection. The automatically chosen labels for each cluster give the user an idea concerning what the documents within each cluster have in common. The nature of the challenges undertaken by clustering, summarization, and categorization is such that it is impossible to obtain absolute precision. Consider that the results of two manual categorizations or summaries created by two different people will usually be significantly different. Although these automated technologies are usable in themselves, the infrastructure (such as taxonomies and end-user tools for training the algorithms) is not yet in place, and there is always the chance that user expectations will be disappointed by unsatisfactory results. 48
e-Business Globalization Solution Design Guide
The biggest problem with clustering/categorization derives from its underlying base technologies, such as segmentation and disambiguation. Though summarization is commonly seen in Internet search results, it cannot create summaries of a quality equal to those obtainable from humans. Still, categorization and summarization are very advanced emerging linguistics technologies particularly useful in knowledge-management applications.
Speech Speech technology (both speech-to-text and text-to-speech) is very useful in many areas, especially pervasive computing. This provides conversational access by receiving spoken input and sending audio output so that people who (due to time, location, and/or cost constraints) do not have access to a computer can access online businesses on the telephone. Users benefit from the convenience of using the mobile Internet for self-service transactions, while companies enjoy the Web's relatively low transaction costs. Voice service is also commonly used in daily office routines. A big office might be equipped with a voice reception program for inbound calls, “hearing” an employee's name and then transferring the call directly to the right extension after searching its database. IBM occupies a strong leadership position in this industry. Its speech recognition and synthesis capabilities can help tremendously in the development and deployment of applications running over phone systems.
Text search There are two basic search technologies-looking for groups of characters regardless of the language, and using morphological analysis to search for words and their variants (for example, searching for the word “begin” will find documents containing “begin” as well as “begins,” “began,” “begun,” and “beginning”). The first type of search generally performs better, but the second gives better results if the user's language is known and its morphological analysis is available. As more and more non-English content is added to the Internet as well as company intranets, we will need to search across languages even to the extent that if a user enters the word “spring” and indicates an interest in the season, this term is translated to “Frühling” in German and that word is used in searching German documents.
Machine translation Machine translation (MT) refers to the automatic translation of human language by computers. For instance, an English-to-German MT system translates English (the source language) into German (the target language) without human intervention.
Chapter 9. Linguistic services
49
50
e-Business Globalization Solution Design Guide
10
Chapter 10.
Global Business Object We borrowed the concept of the Global Business Object from the Business Object (also sometimes known as Common Business Components) that encapsulate commonly used business functions within Enterprise JavaBeans components (EJBs) so that they can be shared and reused by several applications. For global e-business applications, Global Business Objects should follow the localization pack architecture introduced previously. The business functions contained in these objects are presented in the user's preferred language and then shared across multilingual applications. Furthermore, they should also follow the legal, cultural, and business conventions of the user 's locale. For example, since user profile registration is commonly used, the Global Business Object can provide a one-for-all solution enabling users to enter and display name, address, and telephone number. Note: Details of implementing name and address format can be found in 15.6, “Global Business Object” on page 112. The challenge comes with different countries having totally different cultural conventions. In the United States, a given name is followed by a surname. The exception is in a telephone book, where these components are reversed. For a person whose given name is John and surname is Smith, we usually write his name as “John Smith.” But in a telephone book, it would be written as “Smith, John” (in a surname, comma, given name sequence).
Smith, A R Anchorage, AK 99501 907-337-4789 Smith, Aaron R North Kenai, AK 99611 907-776-8318 Smith, Ardy and Rob Juneau, AK 99801 907-586-6220 Smith, Billy R and Ann Wasilla, AK 99654 907-376-8224 Figure 10-1 Sample phone book-type address format
The name format in most Western countries is the same as that in the United States. But in most Eastern countries, a person’s given name comes after his or her surname. For a Korean whose given name is and surname is , we would write her name as . With wide use of courtesy titles such as Mr. and Ms., the use of professional or academic titles is rarely part of English protocol. However, in some countries, professional titles such as doctor, lawyer, and engineer are commonly used, and sometimes it is imperative to respect this more traditional business etiquette. The Japanese often use professional titles in place of actual names as an acknowledgment of a person’s status. There are other kinds of titles as well. In Japan, customers are usually addressed with a respectful title (honorific). The honorific “sama” is used to address someone whose station is above yours. For example, instead of , we address the customer in business letters. “san” is also used as an honorific. customer face to face.
is used in e-mail and when you address the
Korea also has such titles of respect. To show respect in business letters, “Kwiha” is used; for example, . “nim” is a less formal honorific, used in addressing a customer face to face. Address format is a more complicated topic. Even within a single language such as English there can be multiple address formats that the business object needs to fulfill. For example, letters delivered to foreign countries may or may not need postal codes, depending on the country. The postal format presented on envelopes differs from that on phone books. With cultural conventions, things get more complex. In most Western countries, people put street first and then city, state, and country when addressing envelopes. Even when following the same sequence, there are minor differences among countries. For example, in Germany states or provinces are not used in postal addresses, so that when a German writes his address, he simply uses his street, zipcode, city, and country. And in most Eastern countries such as Japan, people put country first, then state, city, and street. Address format is a more complicated topic. Even within a single language such as English, there can be multiple address formats that the business object needs to fulfill. For example, letters delivered to foreign countries may or may not need postal codes, depending on the 52
e-Business Globalization Solution Design Guide
country. The postal format presented on envelopes differs from that on phone books. With cultural conventions, things get more complex. In most Western countries, people put street first and then city, state, and country when addressing envelopes. Even when following the same sequence, there are minor differences among countries. For example, in Germany states or provinces are not used in postal addresses, so that when a German writes his address, he simply uses his street, zipcode, city, and country. And in most Eastern countries such as Japan, people put country first, then state, city, and street.
1155 Benhust Avenue New York NY 10036 U.S.A.
An example of the address format in the United States on the letter cover. The First line is the street, the second line is the city, the third line is the state initial and zip code, and the fourth line is the country. An example of the address format in Japan on the letter cover. The first line is the zip code, and everything else is written in one line in the sequence of country, state, city, and street.
Figure 10-2 Different address formats in different countries
Since business objects are designed to work across applications and languages, we recommend that you maintain a re-usable stand-alone library with open interfaces so that developers can easily pick up whatever is needed. Following are typical samples. Starting with these as your guide, you can easily customize functions and extend coverage. Example 10-1 shows how to implement country-specific calculations such as income and property tax. The real cases will be more sophisticated. What we introduce here is solely for the purpose of explaining concepts and basic implementation approaches. Example 10-1 Implementation First, we define a common interface for all supported customized calculations. /* * The interface defines all customized calculation functions. */ package com.ibm.gcl.git4.gbo; public interface CustomizedCalculation { /** * Calculate income tax * @return tax * @param value salary */ double calcIncomeTax( double value ); /** * Calculate house tax * @return tax * @param value salary */ double calcHouseTax( double value ); }
Chapter 10. Global Business Object
53
Next, for every country we define a class that implements its interface. In this example, locale_A and locale_B represent two different locales. The class com.ibm.gcl.git4.gbo.impl.locale_A.CustomizedCalculationImpl is an implementation for country A. package com.ibm.gcl.git4.gbo.impl.locale_A; import com.ibm.gcl.git4.gbo.CustomizedCalculation; public class CustomizedCalculationImpl implements CustomizedCalculation { public double calcIncomeTax( double value ) { return value * 0.08; // formula of calculating income tax of country A } public double calcHouseTax( double value ) { return value * 0.03; // formula of calculating house tax of country A } } The class com.ibm.gcl.git4.gbo.impl.locale_B.CustomizedCalculationImpl is an implementation for country B. package com.ibm.gcl.git4.gbo.impl.locale_B; import com.ibm.gcl.git4.gbo.CustomizedCalculation; public class CustomizedCalculationImpl implements CustomizedCalculation { public double calcIncomeTax( double value ) { return value * 0.05; // formula of calculating income tax of country B } public double calcHouseTax( double value ) { return value * 0.026; // formula of calculating house tax of country B } } Note: The GBOCustomizedCalculation class implements GBO-customized calculations. For calculating income tax, we use the localeStr parameter to find the implementation class corresponding to the correct execution method. package com.ibm.gcl.git4.gbo; public class GBOCustomizedCalculation { private final static String IMPL_CLASS_NAME_PREFIX = “com.ibm.gcl.git4.gbo.impl.”; private final static String IMPL_CLASS_NAME_SUFFIX = “.CustomizedCalculationImpl”; /** * Calculate income tax. * @return tax * @param value salary * @param localeStr locale string */ public double calcIncomeTax( double value, String localeStr ) {
54
e-Business Globalization Solution Design Guide
CustomizedCalculation calculator = getCalculator(localeStr); if ( calculator != null ) { return calculator.calcIncomeTax( value ); } return 0; } /** * Calculate house tax. * @return tax * @param value salary * @param localeStr locale string */ public double calcHouseTax( double value, String localeStr ) { CustomizedCalculation calculator = getCalculator(localeStr); if ( calculator != null ) { return calculator.calcHouseTax( value ); } return 0; } /** * Get calculator in respect of locale. */ private CustomizedCalculation getCalculator( String locStr ) { String resName = getLocaleImplClassname( locStr ); try { return ( CustomizedCalculation ) GBOCustomizedCalculation.class.forName( resName ) .newInstance(); } catch ( Exception e ) { if ( LOG.isWarnEnabled() ) { LOG.warn( “Cannot instantiate the calculator class “ + resName ); } return null; } }
/** * Get class name of concrete calculator. * * @return java.lang.String * @param locStr java.lang.String */ private String getLocaleImplClassname( String locStr ) { return IMPL_CLASS_NAME_PREFIX + locStr.trim().toLowerCase() + MPL_CLASS_NAME_SUFFIX; } /** * Main method. * @param args java.lang.String[]
Localization Localization is the process of creating additional localization packs for programs.(See Chapter 7, “Localization pack” on page 29 for more on localization packs.) These programs can be applications, middleware, operating systems, or other components needing to localize the user interface (UI) of an application in order to provide common cross-platform services shared among multiple applications. In information technology, the user interface (UI) is everything designed into an information device with which a human being may interact, including display screen, keyboard, mouse, light pen, the appearance of a desktop, illuminated characters, help messages, and how an application program or a Web site invites interaction and responds to it. The user interface can arguably include the total user experience, which may include the aesthetic appearance of the device, response time, and the content that is presented to the user within the context of the user interface.
Application-independent localization To reduce the total work needed to create and maintain culture-sensitive functionalities and to remain consistent among various applications, a common library service providing a cross-platform, constantly refined set of locale-related information is necessary for an effective globalization architecture. Such things as collation tables, currency formatting information, and transliteration rules can all be created and maintained in centralized places and used as needed by applications through the use of localization packs. A cornerstone of this effort is ICU (International Components for Unicode), an IBM-supported open-source project that serves up cultural data. The repertoires for this information will continue to be maintained by IBM for the benefit of IBM and its customers. ICU is a pair of libraries (one in C/C++ and the other in Java) that make it easy to add robust Unicode and internationalization support to various applications. This library provides:
Calendar support Message formatting Character set conversions Normalization Collation (language-sensitive) Number and currency formatting Date and time formatting
Time zones Locales (200+ supported) Transliteration Resource bundles Word, line, and sentence breaks
For more information on ICU4J, go to http://oss.software.ibm.com/icu4j/, and for ICU4C, to http://oss.software.ibm.com/icu/.
Application-specific localization Localizing the UI of a particular application involves translating the text in localization packs, modifying images and icons as necessary for regional considerations, and modifying layout to accommodate text or image-size changes.
Localization services Different models are used for software localization services. IBM has many translation centers spread throughout different countries, and they provide translation services to worldwide development labs to help enable globalization versions. The IBM workflow starts with English source files sent to translation centers by development groups. The translated source files are then returned to those groups by the centers to be rolled back by the groups in the form of translated versions. Figure 11-1 depicts the complete cycle.
Figure 11-1 Translation process
The detailed translation process can be described as follows: 1. Developers extract all translatable content and create the source file(s) for translation. We call these localization packs.
58
e-Business Globalization Solution Design Guide
2. The translation coordinator collects these localization packs, verifies them, and sends them to the Translation Service Center (TSC) as source files. 3. The TSC distributes the validated sources files to the corresponding language translators. 4. The translators perform translation using various translation tools and then return the translated files to the TSC. 5. The TSC packages the returned files, verifies them, and returns them to the coordinator. 6. The translation coordinator un-packages the translated files and gives them to the developers after verification. These steps might be repeated several times should changes occur during development. Throughout the entire cycle, there are a number of tools for reinforcing quality control. We will introduce five of these in the paragraphs that follow.
Localization tools Various tools have been developed for localization. Most of these, such as TranslationManager 1 and TRADOS,2 facilitate translation and are therefore categorized as translation tools. Apart from the translation tools, IBM translation service centers also apply a communication tool called the Translation Problem Reporting System (TPRS) to report problems or bugs they find during the translation process.
TranslationManager As one of the key translation tools used by the IBM localization business, TranslationManager is a computer-assisted translation system that automates repetitive tasks when documents are translated from one language into another. It processes and translates more than 100 different file formats (including XML, DOC, and RTF), according to their markups, and presents a working environment that lets you translate and perform all tasks closely related to translation. This includes:
1 2
Looking up terms in dictionaries Making use of translation memories Checking the spelling of your translations Supporting the management of huge translation projects
See http://tcdct1.bbhulb91.de.ibm.com/ibmtrans/tm2.htm for detailed information concerning this tool. See http://www.trados.com
Chapter 11. Localization
59
Figure 11-2 TranslationManager working environment
TranslationManager comes with general-purpose dictionaries integrated into its programming. The translator can add new entries to these as required. A translation memory is a database containing the translation from the original documents into their translated ones. If a new document for translation is an updated version of a former one that has already been stored in the translation memory, TranslationManager will compare the two documents and where contents are identical automatically translate the new document based on the translation of the old one. In the case of differences, TranslationManager will prompt the user either to select translations from its translation memories or dictionaries, or to input new translations. Translation memories are therefore the key components in that they contain reusable text strings. Through the management of translation memories and dictionaries, TranslationManager makes the process of translation more efficient and economical, ensures quality and consistency, and thus enhances the implementation and management of localization.
Translation Problem Reporting System (TPRS) Developed by GCL of IBM China, TPRS is a strategic tool in the IBM translation process and an official communication tool for IBM translation projects. TPRS provides a sophisticated vehicle for translation problem tracking and group discussion on the project level. It provides a consistent method for any member of the IBM Translation Community to submit an inquiry and receive a traced response from another member of that project or an IBM terminologist. IBM external users (vendors) can also have secured access to the system if so authorized. Each problem raised is tracked throughout the project life cycle and solved by designated individuals, thus eliminating duplication of effort. TPRS is available for both Netscape and Internet Explorer.
60
e-Business Globalization Solution Design Guide
Part 3
Part
3
Our Global Travel Shanghai Demo: A working example In this part, we provide an actual case study to see how the theory works in a real solution. Note: Go to http://gcls2.cn.ibm.com/ to see a variety of multilingual "showcases" under “GCL Service/Demo and Product.” This includes an enhanced version of the Shanghai case study demo called "Global Travel BeiJing."
12.1 Multilingual front-end Our Global Travel Shanghai Demo is a simulation of a comprehensive multilingual Web site that specializes in providing information and services relating to travel in Shanghai—including weather reports, an introduction to attractions, flight searches and airline ticket bookings, hotel searches and room reservations, and currency conversion. We do not guarantee that this demo will completely reflect the customer's real situation and environment. In addition, the name “Our Global Travel Shanghai Demo” is fictitious. No resemblance is intended to the name of any actual travel agency or Web site.
Figure 12-1 Scenario of Our Global Travel Shanghai Demo
12.1.1 Multilingual user interface As with any Web site, developing a multilingual site begins by identifying your audience. Only when you are clearly aware of your site's purpose can you begin to design it to meet the requirements and expectations of your target customers. Our Global Travel Shanghai Demo serves a global audience speaking a variety of languages. One of its main globalization features is the user interface, which can be displayed in any of 12 languages (including English, French, German, Italian, Spanish, Portuguese, Arabic, Hebrew, Simplified Chinese, Traditional Chinese, Japanese, and Korean). All of these languages are listed in the Language Selection List on the home page so that users can navigate through the site in any of these languages. 64
e-Business Globalization Solution Design Guide
Figure 12-2 Home pages of Our Global Travel Shanghai Demo in English, German and Japanese
One of the main features of Our Global Travel Shanghai Demo is that it also supports bi-directional display, required for users speaking Arabic or Hebrew.
Chapter 12. Overview
65
Figure 12-3 Home pages of Our Global Travel Shanghai Demo in Arabic and Hebrew
Apart from language, locale-sensitive data throughout the working sample Web site conforms to the cultural conventions of the user's locale. A detailed introduction to this data is given in the following sections.
12.1.2 Multilingual main functions Our Global Travel Shanghai Demo provides many traveling services to users who are interested in taking a trip to Shanghai. Here we introduce two main functions to show how convenient and enjoyable it can be for a multilingual Web site to serve and entertain its users.
Flight search and airline ticket booking This is one of the primary services provided by Our Global Travel Shanghai Demo. By clicking the Flight link on the menu bar, the Flight Search page is displayed. This page looks like Figure 12-4 on page 67 in English.
66
e-Business Globalization Solution Design Guide
Figure 12-4 Flight search page of Our Global Travel Shanghai Demo in English
To a French-speaking user, Figure 12-4 will look like this:
Chapter 12. Overview
67
Figure 12-5 Flight search page of Our Global Travel Shanghai Demo in French
Users need to specify their flight search criteria, including trip type, destination, departing and returning date, and seating class. If there are any flights meeting the user's requirements, the search result page will be displayed. Figure 12-6 on page 69 shows part of a flight search result page in English:
68
e-Business Globalization Solution Design Guide
Figure 12-6 Flight search result page of Our Global Travel Shanghai Demo in English
To an Egyptian user, the flight search result page will look like Figure 12-7.
Figure 12-7 Flight search result page of Our Global Travel Shanghai Demo in Arabic
If a user needs to know more about the airline company or agency, he or she can click the name of that company or agency and a separate window will open containing information about the organization. The user can select an itinerary and follow the booking steps in this working example. The result of that booking will then be displayed. Chapter 12. Overview
69
Hotel search and room reservation Another main function provided by Our Global Travel Shanghai Demo is hotel search and room reservation. Users can access this function by clicking the Hotel link on the menu bar of any Web page. If the language selected is English, the hotel search page shown in Figure 12-8 appears:
Figure 12-8 Hotel search page of Our Global Travel Shanghai Demo in English
The same page for a German user looks like Figure 12-9 on page 71.
70
e-Business Globalization Solution Design Guide
Figure 12-9 Hotel search page of Our Global Travel Shanghai Demo in German
Users fill in their search criteria, such as check-in and check-out dates, hotel rating, room type, and price range. If there are hotels meeting the user's requirements, the search result page will be displayed. Figure 12-10 on page 72 represents part of a Hotel Search Result page shown to an English-speaking user:
Chapter 12. Overview
71
Figure 12-10 Hotel search result page of Our Global Travel Shanghai Demo in English
If a Chinese user is visiting this Web site, Figure 12-10 can be displayed in Simplified Chinese:
Figure 12-11 Hotel search result page of Our Global Travel Shanghai Demo in Simplified Chinese
If a user needs to know more about the hotel and the hotel agency, he can click the name of that hotel or travel agency and a separate window will open to show details about the selected organization. He can then select any listed hotel and follow the reservation steps in the working example, after which a reservation confirmation will be displayed.
12.2 Multilingual Web Services Our Global Travel Shanghai Demo runs in a mature Web Services environment.
72
e-Business Globalization Solution Design Guide
What are Web Services? Until now, what a user can do with e-business on the Web could be some or all of the following: Search for the Web site addresses of business providers via one or more search engines. Browse the providers' Web sites one by one to find the most suitable service provider. Manually initiate a business transaction (such as online shopping) through the Web site—or by e-mail, phone calls, or personal visits if the site does not provide direct business-to-customer services. As you might expect, these steps might be repeated many times before the most suitable business provider is found and the business transaction can be made successfully. Web Services raise e-business to a new level of interoperability by providing functions (or sets of functions) that are packaged together and presented as a single entity that is made available (“published”) on the Internet. These functions are the building blocks for creating open, distributed systems that provide end users and enterprises with a methodology to quickly deploy their computing assets worldwide in a very cost-effective manner. And these functions can be grouped and nested to provide comprehensive functionality for specific objectives such as travel services where separate layers of the high-level Web Service handle lower-level sub-Web Services such as car rentals. A Web Service is an interface whose service description specifies a set of operations using a standard form of XML notation to provide the low-level details required to invoke the service over the Web. Web Services, then, provide the basis for a Web-centric programming model that is loosely coupled and standardized to be as “open” as the World Wide Web, and that allows for rapid application integration within the enterprise as well as collaboratively among enterprises and their customers. The three legs of this architectural stool—”service provider,” “service requestor,” and “service registry”—are the formal supports for this programming model that allows these “services” to be specified, registered to the Web, and invoked. Both the programming language and computing platform neutralities of this dynamic programming model allow new e-applications to be built that can integrate the new programming logic of Web Services with programs executing on multiple computers, including the ability to leverage existing legacy programming. The Universal Description, Discovery, and Integration (UDDI) Business Registry provides a place for Web Services to be published and found over the Web. UDDI defines open, platform-neutral standards that let all participants share information in a global business registry, discover each other's services, and define how they can interact with each other across the Web. Web services are published in the UDDI Registry using the Web Services Description Language (WSDL). They interact with each other through the Simple Object Access Protocol (SOAP).
Architecture of Web Services Web Services architecture is based on Service Oriented Architecture (SOA), as illustrated in Figure 12-12 on page 74.
Chapter 12. Overview
73
Figure 12-12 Service Oriented Architecture (SOA)
SOA emphasizes how service components are described, published, found, and invoked to support dynamic and automated e-business. There are three basic roles in the architecture: The service requestor searches the service registry for specific services, and if found invokes a service directly through dynamic binding. In this working example, Our Global Travel Shanghai Demo Web site, the airline agencies, and the hotel agencies all act as service requestors. The service registry registers and categorizes published service providers and offers search function to service requestors. In our working example, this refers to the IBM UDDI Business Registry. The service provider publishes the specifications of its services in a standard way and responds to requests for using its services. In our working example, this refers to all of the airline companies, airline agencies, hotels, and hotel agencies that register their services at the service registry. Often a service provider in one Web service can act as a service requestor in another, for example airline and hotel agencies. These three roles interact with one another using three basic operations: publish, find, and bind.
Benefits of Web Services Web Services is a rapidly emerging technology that extends the power and reach of dynamic global e-business with automated process chains that can run across integrated business patterns and system environments. In this way, Web Services breaths new life into the Web by enabling it to support meaningful collaboration beyond its role as a medium for static or dynamic text, images, or scripts. Now it can also transport service interfaces and service-invoking messages.
74
e-Business Globalization Solution Design Guide
Web Services also free e-business from the browser. On the one hand, business providers do not have to restrict their information to what can be displayed on a browser. Instead, with the help of WSDL and SOAP they can describe their functionality in a more direct and dynamic way for the user to invoke using a standardized and formal open interface. On the other hand, business requestors do not have to view the service with a browser. They can invoke the services directly by sending the proper SOAP message to the business provider over the Web. For example, unlike many other Web sites providing traveling services, Our Global Travel Shanghai Demo does not have to build tight connections with service providers such as hotel and airline ticket agencies, but rather finds them from a business registry and invokes their services at runtime. In turn, these agencies will also search for the existence of hotels or airline companies from a business registry (which can be the same one) and invoke Web Services to retrieve their details or make reservations. The Web Services environment exposes all registered service providers to potential customers and business partners, thus offering more choices to service requestors while simultaneously providing more business opportunities to service providers. In this way, B2C and B2B e-businesses are chained together and all parties along the chain can enjoy optimal benefits.
Multilingual capabilities of Web Services Web Services architecture determines its multilingual capabilities primarily from the service requestor and provider. The service requestor provides the user interface to end users, and its multilingual capability determines if it can accept and display multilingual data to them, while the service provider holds the actual resources or provides services, and its multilingual capability determines whether it can hold data in multiple languages and provide data to service requestors in their preferred languages. In our working example, the service requestor (the Web site of Our Global Travel Shanghai Demo) can accept user requirements or display search results in their preferred languages. The service interfaces of the service providers (airline companies, hotels, and agencies) accept locale information from service requestors as a parameter, and any returned textual data will be in the language complying with the specified locale. If the service registry supports multilingual data, that can be an add-on to the multilingual capabilities of Web Services. The IBM UDDI Business Registry used in this working example can accept and display in its Web site the name and description of registered business entities in multiple languages. The following illustration is captured from the UDDI Web site in the detailed business information of a fictitious airline company:1
1
Note that the description is in 12 languages.
Chapter 12. Overview
75
Figure 12-13 A screen capture of IBM UDDI Business Registry with business description in 12 languages
In our working example, the Description” section for detailed information about airline companies, hotels, or agencies is taken directly from the UDDI Registry. Figure 12-13 contains details about that airline (Milky Way, Ltd) as displayed to users of Our Global Travel Shanghai Demo.
Figure 12-14 An airline company details page of Our Global Travel Shanghai Demo
76
e-Business Globalization Solution Design Guide
13
Chapter 13.
Environment This chapter deals with two aspects of Our Global Travel Shanghai Demo environment: Architecture Product globalization capabilities
13.1 Architecture Based on the structure of Web Services, the architecture of Our Global Travel Shanghai Demo is composed of three parts—Requester, Provider, and Registry—as shown in Figure 13-1
Figure 13-1 Extended enterprise environment of Our Global Travel Shanghai Demo
13.1.1 Development environment This consists of the following:
Hardware Software WebSphere execution environment VisualAge for Java development environment WebSphere Studio development environment WebSphere Studio and VisualAge for Java interactions Web Services development environment
Hardware PCs with Pentium III 750 MHz processors and 512 MB RAM.
Software Windows 2000 Server IBM HTTP Server V1.3.19 DB2 Universal Database V7.2 Enterprise Edition on Windows 2000 78
e-Business Globalization Solution Design Guide
WebSphere Application Server Advanced Edition V4.0 WebSphere Application Server Advanced Edition Single Server V4.0 WebSphere Studio Advance Edition V4.0 for Multiplatforms VisualAge for Java Enterprise Edition V4.0 Web Services ToolKit V2.4
The development environment covers five functional areas:
The WebSphere execution environment The VisualAge for Java development environment The WebSphere Studio development environment WebSphere Studio and VisualAge for Java interactions The Web Services development environment
WebSphere execution environment The WebSphere execution environment includes: WebSphere Application Server Provides the foundation for building Web sites and serves as the cornerstone for IBM Web application offerings and services. IBM HTTP Server DB2 Classes and HTML/JSP files Includes the servlets, JavaBeans, EJBs, and HTML/JSP files that perform locale exchange and business logic. Enterprise data—Supports connections among a variety of enterprise data sources.
VisualAge for Java development environment VisualAge for Java is an integrated visual environment that supports the complete cycle of Java program development and provides excellent support for Java code editing and debugging as well as JSP debugging. With the help of the Visual Composition Editor and SmartGuides, creating new applets, packages, and classes is quick and convenient.
WebSphere Studio development environment WebSphere Studio is a suite of tools used collaboratively to create, assemble, publish, and maintain dynamic interactive Web applications. It contains several wizards and built-in editors that help users build Web sites easily. WebSphere Studio also provides support for team development and serves as a version control tool.
WebSphere Studio and VisualAge for Java interactions WebSphere Studio 4.0 provides a “facilitate” function for integrating with VisualAge for Java. Using this function you can develop servlets and JavaBeans in VisualAge and then bring them back into Studio. You can also use Studio wizards to create servlets and JavaBeans and then transfer them to VisualAge for further maintenance.
Chapter 13. Environment
79
Web Services development environment In our working sample, Web Services are built primarily with the help of the Web Services ToolKit (WSTK) V2.4, which provides the following components and functions: A set of client runtime executables for application access: – A UDDI4J API allowing applications to perform Save, Delete, Find, and Get operations against a UDDI registry (a private or public UDDI registry that resides on the Internet) – A service registry API allowing applications to perform Publish, Unpublish, and Find operations against a UDDI registry – A service proxy API allowing applications to access Web Services via the SOAP protocol Specifications for WSDL 1.1, WSFL (Web Services Flow Language), and HTTPR (reliable HTTP) A set of design-time Web Services tools: – A WSDL Generator tool for assisting in encapsulating legacy code inside Web Services. This tool creates WSDL documents and SOAP deployment descriptors from “legacy” JavaBeans, Java classes, EJBs, and COM objects. – A Web Services Toolkit Configuration tool for facilitating setup and customization of the Web Services Toolkit – A service proxy generator tool (proxygen) that creates a client-side interface for connecting to Web Services. – A service implementation template generator tool [servicegen] that creates a server-side interface template so that client programs can access Web Services. Apache SOAP code base for SOAP V2.2 and the Apache AXIS alpha 1, which is the follow-on project to the Apache SOAP project (also known as SOAP 3.0). SOAP COM support allowing Microsoft COM objects to be accessed by SOAP services. A preview of WSDL4J (WSDL for Java), a set of Java classes useful in working programmatically with WSDL documents
13.1.2 Runtime environment By Web Services architectural design, our working example runs in three separate environments: requestor, provider, and registry.
Requestor One Netfinity Server with 1G RAM and the following software:
Red Hat Linux 7.1 DB2 Universal Database Enterprise Edition V7.2 WebSphere Application Server Advanced Edition V4.0 WebSphere Personalization Server V4.0.0 IBM HTTP Server V1.3.19
One IBM 300PL Personal Computer with the following software: Windows 2000 Server WebSphere Application Server Advanced Single Server V4.0 WebSphere Translation Server V1.0
80
e-Business Globalization Solution Design Guide
Provider One Netfinity Server with 1G RAM and the following software: Red Hat Linux 7.1 DB2 Universal Database Enterprise Edition V7.2 WebSphere Application Server Advanced Edition V4.0
Registry IBM 300PL Personal Computer with the following software:
Windows 2000 Server DB2 Universal Database Enterprise Edition V7.2 WebSphere Application Server Advanced Edition Single Server V4.0 Private UDDI Registry Preview
13.2 Product globalization capabilities These consist of the following: IBM WebSphere Application Server Advanced Edition V4.0 IBM DB2 Universal Database
13.2.1 IBM WebSphere Application Server Advanced Edition V4.0 In this section we focus on the globalization capabilities of WebSphere Application Server Advanced Edition V4.0 for Windows 2000/AIX/Linux. Note: WebSphere Application Server Advanced Edition V4.0 needs configuration before it can handle Unicode data. Please refer to Appendix A.3, “IBM WebSphere Application Server V4.0” on page 144 for details.
HTML, Servlet, and JSP WebSphere Application Server accepts data in any Unicode-encoded character within HTML, servlets, and JSP files. The end user can submit multilingual data in Unicode, and the JSP or servlet can handle this data in different locales without data loss.
Chapter 13. Environment
81
Figure 13-2 HTML Unicode example
Figure 13-3 Servlet Unicode example
82
e-Business Globalization Solution Design Guide
Figure 13-4 JSP Unicode example
XML support With XML Support, WebSphere Application Server can accept data in any language encoded in either Unicode or non-Unicode.
Figure 13-5 Unicode encoding in XML file
SOAP In Web Services, Simple Object Access Protocol (SOAP) is a standard for reliably transporting e-business messages encoded in XML between businesses over the Web. The SOAP function in WebSphere Application Server can handle Unicode data without data loss, and thus makes it possible for multilingual data to be transported through Web Services in sound condition.
Chapter 13. Environment
83
UDDI4J The UDDI API serves as a channel for publishing and searching in Web Services' UDDI centers. The WebSphere Application Server UDDI4J can handle Unicode in different locales without data loss, thereby ensuring that multilingual data used by Web Services can be sent to, retrieved from, and processed at UDDI centers in different locales.
Figure 13-6 UDDI publish example
84
e-Business Globalization Solution Design Guide
Figure 13-7 UDDI find example
13.2.2 IBM DB2 Universal Database In this section we will take a look at globalization capabilities provided by IBM Universal Database DB2 V7.1 (or later). Before a database can store UTF-8 data, you must specify the code page for this database at creation: CREATE DATABASE database USING CODEPAGE UTF-8 TERRITORY territory
Note: Here “territory” refers to the correct territory code. Following are three major ways to insert and retrieve data in DB2: Native DB function DB2 supports Unicode on Windows 2000, Red Hat Linux 6.2, and AIX 4.3.2. You can create a database with Unicode to support all languages. DB2 Call Level Interface function The Call Level Interface (CLI) is IBM's callable SQL interface for the DB2 family of database servers. The CLI application sets up a connection to the back-end database. Data sent from the CLI is encoded according to the code page defined in the CLI. If that CLI code page is the same as the code page for the back-end database, no data conversion is needed. Therefore, if both the CLI and database code pages are Unicode, data can be inserted into and retrieved from the database via CLI without data loss. JDBC function DB2 supports JDBC. On Windows 2000, Red Hat Linux 6.2, and AIX 4.3.2 systems having DB2 JDBC functionality, Unicode data can be inserted into or retrieved from a Unicode database without data loss.
Chapter 13. Environment
85
86
e-Business Globalization Solution Design Guide
14
Chapter 14.
A development methodology for globalized applications Logically, the task of developing a general Web application can be broken up into separate functions—a project manager, design architects, logic group, content group, and design artists.1 For a multilingual Web application, you will also need a translation coordinator, especially when several language versions will be launched at the same time. The project manager is responsible for monitoring the work progress, managing the schedule, and coordinating resources. The design architects are responsible for designing the architecture of the Web site and the JSP modules. The design architects should make sure that the design conforms to the globalization architecture. The logic group is responsible for database customization and servlet development, which enable the back-end functions and globalization features of the Web site and ensures that the application implements the globalization architecture. The content group is responsible for preparing data and developing Web pages using technologies such as JSP and JavaScript. The content team prepares data in the source language version. Once they are done, this content is translated into other languages by local Translation Service Centers. The design artists are responsible for the style of Web pages and the arrangement of images, tables, and texts. The translation coordinator is responsible for the coordination of the content translation performed by local Translation Service Centers. The tasks of the translation coordinator include: – Organizing the translation process – Distributing and collecting localization packs – Raising application-specific translation requirements
1
In practice, some of these roles might be occupied by the same person or distributed differently among the team.
Figure 14-1 A typical multilingual application development team
Development process Phase1: Architecture design In the architectural design phase, the design architects decide the Web site's structure. Based on these results, a scenario is created to illustrate the main function and business flow of the Web site. The static pages of the Web application should be developed during this phase. Phase 2: Prototype development The objective of prototype development is to overcome challenges you might meet during the development of a multilingual Web application such as Our Global Travel Shanghai Demo and in so doing to manage risk and resources effectively. In Our Global Travel Shanghai Demo prototype, we implemented an airline search function. In this phase, the logic group is required to work closely with the design architects to define the key functions of each prototype module. The detailed design and development of each of these functions is then assigned to individual members of the logic group. The content group works with the design architects as well in order to define the trading flow and finish the catalog data collection, page design, and data translation work required by the prototype. Meanwhile, the design artists work with the content group to determine the page style and design the artwork for the prototype. Phase 3: Web application development After a prototype has been built, the logic team will focus on Web application program development. This involves the implementation of business functions, the realization of globalization features, and the integration of separate modules. At the same time, the content group is responsible for Web content tuning, localization pack development, and JSP and servlet development. When Web content tuning is complete, the design artists finalize the Web site's style and begin to decorate the pages.
88
e-Business Globalization Solution Design Guide
At this stage, the translation coordinator begins to work with the worldwide Translation Service Centers (TSCs) to translate textual data required by the Web application into all supported languages. The customized database, servlets, translated catalog data, localization packs, and Web pages are then combined to build the Web application. Throughout this entire development process, the project manager should closely monitor work progress in order to effectively manage the schedule and resources.
Chapter 14. A development methodology for globalized applications
89
90
e-Business Globalization Solution Design Guide
15
Chapter 15.
Design and development This chapter covers the following:
Single Executable Unicode support Locale model Localization pack Machine translation Global Business Object Localization
15.1 Single Executable As mentioned in Chapter 4, “Single Executable” on page 17, the foundation of a well-structured globalized application is the Single Executable, a program free of cultural information. The application programming code in our working example strictly follows the concept of the Single Executable.
Structure of the application program In our working example, there is only one set of binary codes running in the background of Our Global Travel Shanghai Demo Web site, and this set of codes includes all of the JavaBeans, JSP pages, and business objects needed to run it in any locale, with text in any language. Figure 15-1 illustrates the application program structure.
Figure 15-1 Application program structure
Core Application Logic Bean The Core Application Logic Bean executes the primary business logic. This includes collecting and organizing information for the Web, searching the IBM UDDI Business Registry for agencies, invoking agency service interfaces, and storing and retrieving data from the database. This bean is devoid of any language-dependent data or locale-sensitive logic.
Localization Pack Manager Bean The Localization Pack Manager Bean serves as the bridge between JSP codes and localization packs. Web application content, including all data displayed in Web pages, is organized and stored in separate resource bundles called localization packs (one pack for each language supported in our working example). Because all localization packs are in XML file format, the Localization Pack Manager Bean invokes the IBM XML parser to parse 92
e-Business Globalization Solution Design Guide
Localization Pack content. JSP codes are responsible for invoking the Localization Pack Manager Bean at runtime to retrieve Web content from the relevant localization pack according to the current language settings. Example 15-1 contains JSP code that displays the title Welcome to Our Global Travel Shanghai Demo on Our Global Travel Shanghai Demo main menu bar:
Figure 15-2 Menu bar in English
Figure 15-3 Menu bar in Arabic Example 15-1 Sample JSP codes for the menu bar
Note: Cell widths on the menu bar have fixed relative widths so that any text that appears within cells will WRAP (automatically expand) to accommodate any extra spaces needed. If cell widths were chosen in any other fashion (such as with the NOWRAP option), this would not work.
Locale Model Bean The Locale Model Bean invokes the ICU4J (International Component for Unicode for Java) API to provide locale-sensitive computing for specific needs in our working example. ICU4J is a Java library that provides robust and full-featured Unicode support on a wide variety of platforms. This library includes support for locale-sensitive computing. From it, the following categories were used to provide culture-related information in Our Global Travel Shanghai Demo:
Calendar Currency Date and time formatting Dictionary sorting Message formatting Number formatting Time zones
For example, in the Hotel Booking Details page of our working example, the room price is displayed in the currency format retrieved through the Locale Model Bean that invokes
Chapter 15. Design and development
93
ICU4J. Example 15-2 contains a code snippet that is part of the JSP for displaying room price. Example 15-2 Sample JSP codes for retrieving formatted room price ... RoomInfo roomInfo = null; float roomPrice = 0; roomInfo = (RoomInfo)hotelSearchResult.getRoomInfoList().elementAt(roomi); roomPrice = roomInfo.getPrice(); String bigRoomPrice = localeModelBean.getBigCurrencyNumber(new BigDecimal(roomPrice), localeStr); ...
With these codes, room price will be displayed in the currency format conforming to the current user locale. For example, a room price of CNY 10098 displayed in the en_US user locale will be CNY 10,098.00, while in the fr_FR locale it will be CNY 10 098,00.
Format Bean The main function of the Format Bean is to invoke the Global Business Object to get culture-related information such as the order of different elements in people's names, associated dates, and addresses, and then to sequence relevant data and wrap it in HTML tags so that it can be displayed directly on the Web page and in an order conforming to cultural conventions. The Global Business Object connects to data sources in properties files containing the mapping between locales and their corresponding information such as name and address sequences. Example 15-3 contains an excerpt from the MyAccountInfo.jsp that displays user profiles. This piece of code puts into the formatBean object the values for the user's title, first name, middle name (if any), and last name, all of which have been retrieved from the database. Example 15-3 Sample JSP codes for Personal Information Table htmlElement = new com.ibm.gcl.git4.travel.user.HtmlElement(); htmlElement.setLabelName(parser.getStringByTagName(“UserInfo”, “Title”)); htmlElement.setTagName(“title”); htmlElement.setElementValue(strTitle); formatBean.setTitle(htmlElement); htmlElement = new com.ibm.gcl.git4.travel.user.HtmlElement(); htmlElement.setLabelName(parser.getStringByTagName(“UserInfo”, “FirstName”)); htmlElement.setTagName(“firstName”); htmlElement.setElementValue(strFirstName); formatBean.setFirstName(htmlElement); htmlElement = new com.ibm.gcl.git4.travel.user.HtmlElement(); htmlElement.setLabelName(parser.getStringByTagName(“UserInfo”, “MiddleName”)); htmlElement.setTagName(“middleName”); htmlElement.setElementValue(strMiddleName); formatBean.setMidName(htmlElement); htmlElement = new com.ibm.gcl.git4.travel.user.HtmlElement(); htmlElement.setLabelName(parser.getStringByTagName(“UserInfo”, “LastName”)); htmlElement.setTagName(“lastName”); htmlElement.setElementValue(strLastName); formatBean.setLastName(htmlElement); ...
The last line displays personal information on the My Account Info page, shown in Figure 15-4.
Figure 15-4 Personal information page
Based on this programming model, although there is only one set of application programming code running in the background of Our Global Travel Shanghai Demo, it is sufficient to handle all supported languages.
15.2 Unicode support The Our Global Travel Shanghai Demo Web site must be able to accept user input and display Web content in 12 languages, and sometimes data in different languages appears on the same page. Users can input data in their own languages through their browsers. Figure 15-5 represents part of a user profile displayed in English. Since the user registered her name in Chinese, her first and last names are displayed in Chinese, while other data is in English.
Figure 15-5 A Our Global Travel Shanghai Demo page with multilingual data
Unicode is used in this working example to accept, process, and display textual data without data corruption. The data in Figure 15-5 is retrieved from two data sources: the localization pack and the user database. The user database stores user information, including the user's title, first name, middle name (if any), last name, and birthday, simply because this data does not need translation.
Chapter 15. Design and development
95
All data that does need translating1 is stored in the localization packs of the various languages. When JSP code was invoked to display Figure 15-5 on page 95, it retrieved data from both the user database and the localization pack of the current language. Because both the DB2 databases and the localization packs in XML format support Unicode, data is encoded in UTF-8 and then stored in the two data sources. After retrieval by the JSP codes via relative JavaBeans, this data can be displayed in a Web page in the same way it is stored (and without data loss, even when user data for different fields (cells) are in different languages). Note: Because the encoding of Web pages is required to be UTF-8, the character encoding of both HTTP and JSP files should be set to UTF-8 (see Example 15-4). Example 15-4 Specifying UTF-8 character encoding /*The following code shows how to specify the character encoding for the document to be UTF-8*/ <meta http-equiv=”Content-Type” content=”text/html; charset=UTF-8”> /*The following code is for a JSP file to set the content type of the response being sent to the client. The content type may include the type of character encoding used and here UTF-8 is specified.*/ <% response.setContentType(“text/html; charset=UTF-8”); %>
As stated in Chapter 5, “Unicode support” on page 21, Unicode is the universal character-encoding scheme for written characters and text.2 Unicode uniquely identifies any character of any language. Therefore, words in different languages can be stored in the same database—even within the same cell—and displayed simultaneously on the same Web page without distorting any of them. Our Global Travel Shanghai Demo requires that your browser be Internet Explorer 5.5 (or above) or Netscape 6.2 (or above). Both browsers support Unicode and can accept and display Unicode without data corruption. International Component for Unicode for Java (ICU4J) is used as a support to the application program to provide locale-sensitive computing. ICU includes support for working with Unicode strings (collation, iteration, character classification) as well as efficient conversions to and from a very wide set of other encodings. ICU also includes Unicode-based locale support for standard locale-related features, such as formatting and parsing dates, times, currencies, and other numeric formats.
15.3 Locale model As we mentioned in Chapter 12, “Overview” on page 63, our entire Web site can be displayed in all 12 supported languages. Just as globalization is not restricted to the process and result of translation, language itself cannot satisfy all the needs of our users. However, it can make them more comfortable with Our Global Travel Shanghai Demo Web site if the content displayed is in their own languages and conforms to their own cultural conventions. For this reason, how to categorize languages and cultural conventions into relevant locales is as essential to the organization of content as to our source code.
1 2
96
For example, the user's gender and birthplace, together with labels such as “first name” and “last name.” The Unicode Standard 3.0, page 1
e-Business Globalization Solution Design Guide
15.3.1 Structure of locale model Our Global Travel Shanghai Demo supports 12 sets of locales following the locale model specified in J2SE documentation.1 Each locale applies a two-letter ISO language code to indicate language and a two-letter ISO country/region code to indicate country/region, with an underscore (“_”) in between. The 12 locale sets supported in our working example are:
Each locale set consists of one language and one country/region. For example, fr_FR accounts for both the French language and the nation of France. In addition to language and country/region codes, the locale model in our working example also applies variant code for cases where the Euro currency is needed for automatic data processing and output. Because our working example applies ICU4J V1.8 for locale-sensitive computing, the EURO variant must be added to the locales for the European Commonwealth countries, thereby creating the following new locales:
fr_FR_EURO de_DE_EURO it_IT_EURO es_ES_EURO.
For example, if we have a Single Executable program that calculates an airline ticket price, it will return that price based on the user locale's currency code. If the fr_FR locale is passed to this program, you will get a ticket price in the format FRF x xxx,xx. However, if the fr_FR_EURO locale is passed, the result will be EUR x xxx,xx.2
15.3.2 Identification of user locale In our working example, we apply two ways of identifying user locale.
Automatic selection from user agent setting When a user accesses Our Global Travel Shanghai Demo Web site via a user agent, the site will retrieve the user agent's top language setting, and use it as the initial locale for Our Global Travel Shanghai Demo.3 For example, if the top language setting of the user agent is en-US, then the user will view the home page of Our Global Travel Shanghai Demo in American English.
1
http://java.sun.com/j2se/1.3/docs/api/java/util/Locale.html In later versions of ICU4J, Euro is the default currency for European commonwealth countries, and the EURO variant is no longer needed. 3 This arrangement is made under the assumption that the locale setting for a user's agent reflects that user's general language preference for navigating the Internet. 2
Chapter 15. Design and development
97
User selection Apart from the Automatic Selection, our working example also allows users to change the Web site language by selecting from the language list on the home page.
Figure 15-6 Language list of Our Global Travel Shanghai Demo
There are 12 total languages supported by our working example, each for a single country/region. Each language and country/region pair forms a locale. To make it more convenient for a user to find his preferred language, each language name is displayed in the language itself. Once the user has selected a language, the corresponding locale will become the Web site's locale and will consequently result in many locale-sensitive changes within that site. The good thing about this approach is that users can always make their own choices instead of totally depending on the system “intelligence” to select the user interface language for them. Note An e-business application can provide either of the approaches for identifying user locale according to customer requirements and application capability From a development point of view, automatic selection from a user agent setting is convenient because it retrieves that user agent's language setting from the Accept_Language header1 through the request.getLocale() or request.getLocales() methods2 that are supported by many middleware applications. However, for user convenience, it would be better to provide both approaches, but this might cause inconsistency between the application level locale setting and the middleware-level (especially servlet) locale setting, and the application should be designed and developed in a careful way to synchronize usage of the two locale settings. Moreover, Unicode is also recommended for the data encoding so that code conversion can be avoided during locale switching. Recently, the Internationalization Service has emerged as a new and comprehensive solution for the automatic identification of caller locale as well as time zone by transparently creating and propagating the internationalization context during business method invocation. The Internationalization Service can be used in both of the two locale identification approaches mentioned above: In the first approach that uses request.getLocales() to determine the caller locale, you can use the default configuration of the Internationalization Service. 1 2
98
Refer to HTTP/1.1 Specification for details on the Accept_Language header. Refer to Java Servlet Specifications Version 2.3 for more information.
e-Business Globalization Solution Design Guide
In the second approach (where application-level protocol is used to determine the caller locale), you must deploy the receiving servlet to run under AMI (Application Managed Internationalization) policy (available from IBM WebSphere Application Server Enterprise Edition Extension, Release 5.0), and programmatically set the invocation internationalization context.1
15.3.3 Implementation of locale-sensitive features Locale not only includes language information but also implies various cultural expectations. As we mentioned in 15.1, “Single Executable” on page 92, these cultural features are implemented using the Locale Model Bean, which invokes ICU4J and tailors its results according to the needs of our working example. The following date format example addresses implementation details: Date format changes from locale to locale. Most of us have heard stories about the misunderstandings over dates between Englishmen and Americans because they do not use the same date format. For example, “5/12/00” to an Englishman indicates December 5, 2000, while to an American it means May 12, 2000. In countries such as China, a full date begins with the year, followed by the month, and then by the day of month (and sometimes followed by the day of the week). In Our Global Travel Shanghai Demo, the date is displayed in full text on the home page. If a locale uses a lunar calendar, it will be displayed together with the Gregorian calendar. Otherwise, only the Gregorian calendar will be displayed. For example:
Figure 15-7 Date displayed in English
Figure 15-8 Date displayed in Simplified Chinese
This feature is implemented in the following steps.
JSP codes get the current user locale In the index.jsp that forms the home page, the current user locale is identified in three steps: 1. See if the user makes any new selection from the Language Selection List. If yes, use it as the user locale and update the language setting in HTTPSession. If not, go to step 2. 2. Retrieve the language setting in HTTPSession. The language setting in HTTPSession is initialized with the preferred language of the user agent and then reset when the user makes a selection from the Language Selection List. Unless this value is null, it will be used as the current user locale. Else, go to step 3.
1
For more information on on the Internationalization Service, please refer toThe Internationalization Service in IBM WebSphere by Debasish Banerjee, Jeffrey A. Frey, and Robert H. High, Jr, found at http://www.unicode.org/iuc/iuc20/a318.html, and Towards the Internationalization of Web Services in IBM WebSphere by Debasish Banerjee and Casey A. Swenson, found at http://www.w3.org/2002/02/01-i18n-workshop/Banerjee.html Chapter 15. Design and development
99
3. Use the user agent’s preferred language as the user locale, and set the language setting to the HTTPSession. Because the Language Selection List only exists on the home page, only steps 2 and 3 are required to identify the current user locale in the JSP codes for other Web pages. Example 15-5 Sample codes in index.jsp to identify user locale <% /* “select-lang” is the parameter in the index.jsp referring to the user's new selection from the “Language Selection List”. “session_lang” is the HTTPSession parameter indicating the language setting of user session. “userLocale” is a java.util.Locale object indicating user locale.*/ localeStr = request.getParameter(“select-lang”); /*If the user selects any new language from the Language Selection List, the new selected language will be used to identify his locale, and the language setting of user session will be updated. */ if (localeStr != null){ userLocale = new Locale(getLangCode(localeStr), getRegionCode(localeStr)); session.setAttribute(“session_lang”, userLocale); }else{ userLocale = (Locale)session.getAttribute(“session_lang”); /*If the language setting of user session has been set, then this setting will be used as the current user locale.*/ if (userLocale == null){ userLocale = request.getLocale(); session.setAttribute(“session_lang”, userLocale); } } ... %>
JSP codes invoke relevant classes in the Locale Model Bean This is also done in the index.jsp: Example 15-6 Sample JSP codes invoking relevant classes in the Locale Model Bean <% //Retrieve calendar information according to user locale. String [] calendarStr =TravelCalendar.newInstance(currentDate, userLocale).getResultCalendar(); if (calendarStr!=null){ //Display on the Web page date in the GregorianCalendar //and date in the lunar calendar, if any. for (int d = 0 ;d”); } } %>
Locale Model Bean invokes ICU4J and tailors the result according to the requirement of our working example The lines of codes shown in Example 15-7 on page 101 are extracted from TravelCalendar.java, a Locale Model Bean class. 100
e-Business Globalization Solution Design Guide
Example 15-7 Sample Java codes in the TravelCalendar.java to present calendar information import java.util.Locale; import java.util.Date; public abstract class TravelCalendar { protected Date date = null; protected String year = null; protected String month = null; protected String day = null; protected String[] resultCalendar = null; abstract public String[] getResultCalendar(); public static TravelCalendar getInstance(Date date, Locale userLocale){ if ((date == null) || (userLocale == null))return “”; //Invoke the relevant calendar class according to user locale. TravelCalendar calendar = (TravelCalendar) Class.forName(“TravelCalendar_” + userLocale.toString()).newInstance(); calendar.date = date; calendar.userLocale = userLocale; return calendar; } ... }
Chapter 15. Design and development
101
Example 15-8 Sample Java codes in the TravelCalendar_zh_CN.java to present calendar information of P. R. China import java.util.Locale; import java.util.Date; //import ICU4J packages import com.ibm.util.ChineseCalendar; import com.ibm.util.Calendar; //This class invokes IVU4J to retrieve required information of Chinese calendar and then tailors the result to public class TravelCalendar_zh_CN extends TravelCalendar { public TravelCalendar_zh_CN() { super(); } public String[] getResultCalendar() { this. resultCalendar = this.handleChineseCalendar(); return this.resultCalendar; } /*This method returns an array of String containing a date in Gregorian calendar and the date in Chinese lunar calendar, both in the data format of the “zh_CN” locale.*/ private String[] handleChineseCalendar(){ int year = 0; int month = 0; int leap_Month = 0; int dayOfMonth = 0; String chCalString = null; ArrayList calendarList = new ArrayList(); ChineseCalendar chCal = new ChineseCalendar(); chCal.setTime(this.date); year = chCal.get(Calendar.YEAR); month = chCal.get(Calendar.MONTH) + 1; //base 1 leap_Month = chCal.get(ChineseCalendar.IS_LEAP_MONTH); DayOfMonth = chCal.get(Calendar.DAY_OF_MONTH); calendarList.add(formatGregorianDateString(date, userLocale)); chCalStr = formatChineseDateString(year, month, leap_Month, dayOfMonth); if (chCalStr !=null) calendarList.add(chCalStr); return toStringArray (calendarList.toArray()); } // Transfer Object array to String array; private static toStringArray(Object[] obj) { … } // Return the Gregorian calendar date in the date format for the “zh_CN” locale. private String formatChineseDateString(int year, int month, int leap_Month, int dayOfMonth){ .... } //Returns the date according to the Chinese lunar calendar. private String formatGregorianDateString(Date date, Locale userLocale){ ... } ... }
102
e-Business Globalization Solution Design Guide
15.3.4 Locale-sensitive features displayed in Our Global Travel Shanghai Demo In addition to date formatting introduced in 15.3.3, “Implementation of locale-sensitive features” on page 99, there are other globalization features in our working example implemented by the Locale Model Bean using the ICU4J API in order to satisfy users' cultural expectations.
Calendar To make it easier for end users to select dates, Our Global Travel Shanghai Demo provides a calendar that can be displayed and selected at runtime by clicking the calendar icon.
Figure 15-9 A calendar in English from Our Global Travel Shanghai
Weekdays are ordered according to users' cultural conventions. For example, the calendar for an American user has Sunday as the first day of week, while that for a European uses Monday. Figure 15-10 represents a calendar shown to a French user.
Figure 15-10 A calendar in French from Our Global Travel Shanghai
Users can change the year and month by pressing the forward or backward buttons and then selecting the date by clicking the number indicating the day.
Time zone Since Our Global Travel Shanghai Demo is a Web site providing global traveling services, time zones deserve consideration. In pages relating to flight bookings, the local times for both the departure place and the destination are displayed to give users more complete trip information. Figure 15-11 on page 104 represents part of the schedule for a flight from Paris to Shanghai. Chapter 15. Design and development
103
Figure 15-11 Flight information with times in different time zones
Dictionary sorting In our working example, all lists except for language selection on the home page are in the locale language's dictionary order so that users can find entries more quickly. Figure 15-12 was captured from the Flight Search page:
Figure 15-12 List of cities in dictionary order
Message format Messages are a concatenation of strings, numbers, and dates. “Message format” is a term related more to linguistics than to culture. It refers to the challenge of sequencing the various parts of a message in an automated and language-neutral way so that localization can be realized without hard-coding message strings and concatenation. In our working example, message format is realized by extracting the parts of the message stored in the localization pack, concatenating them in a predefined sequence and then displaying a well-formed string to the end user. Figure 15-13 and Figure 15-14 on page 105 are examples of how message format is used in the Hotel Search Result page.
Figure 15-13 Message format of “1 night”
104
e-Business Globalization Solution Design Guide
Figure 15-14 Message format of “3 nights”
15.4 Localization pack Although Our Global Travel Shanghai Demo has 12 user interfaces in all (each in a different language), there is only one set of globally executable and locale-independent codes on the back end. This is called the Single Executable. Single Executable programming codes are free from language- and culture-dependent data. This data is stored in localization packs and can be retrieved dynamically by Single Executable code at runtime. This requires that there be a unified way to access all localization packs and to retrieve data from them.
File structure of localization packs There is a different localization pack corresponding to each locale supported by our working example. Each localization pack is stored under the directory carrying the name of its corresponding locale.1 All locale directories are bundled under a common directory named Resources, which is accessible to the Localization Pack Manager Bean within the Single Executable code.
Figure 15-15 File structure of localization packs
This file structure ensures that by specifying locale, different localization packs can be accessed in the same manner.
Format and encoding of localization packs In our working example, each localization pack is in XML file format with the attribute encoding=”UTF-8”
UTF-8 is used for data encoding so that data in any language can be stored in localization packs. The XML file format is applied for two reasons. First, it allows flexible and hierarchical 1
For example, the en_US localization pack is stored under the en_US directory.
Chapter 15. Design and development
105
content structure, thus facilitating quick and easy data retrieval. Secondly, the XML file format makes it easy for content to be translated with TranslationManager (a translation tool widely used in IBM Translation Service Centers to localize IBM products). In Our Global Travel Shanghai Demo, the localization pack is structured according to Web pages and primary functions. Figure 15-16 represents an excerpt from the en_US localization pack:
Figure 15-16 A section of localization pack for en_US locale
Hierarchy of localization packs The concept of “fall back” used in JDKs for resource bundle management is applied in our working example. The en_US locale is set as our default locale. This means that if the localization pack for a certain locale is missing, we use the en_US localization pack as a matter of course. Upon a user's initial visit to Our Global Travel Shanghai Demo, the system will map the language setting in his IE or Netscape browser to the closest supported locale as its locale parameter. If the user locale is not supported by existing localization packs, the closest locale will be used as a replacement. For example, if the user locale is zh_SG (Singapore)—which is not supported by Our Global Travel Shanghai Demo—the first compatible locale containing “zh” (which is zh_CN) will be used instead. However, if no “closest locale” can be found (for example, Russia, whose locale is ru_RU), the en_US default localization pack will be used. Unless the user selects another language from the home page, this locale will be passed to the Localization Pack Manager Bean as the determinant for the localization pack to be used for data retrieval.
Localization pack manager The multilingual user interface is assembled at runtime with pieces of data retrieved dynamically from the localization pack for the user's locale. The Localization Pack Manager acts as the bridge between the JSP code and localization packs. It is responsible for selecting 106
e-Business Globalization Solution Design Guide
and loading the relevant localization pack, parsing it to pick out the required data, and returning that data to the invoking JSP code. The Localization Pack Manager is also in charge of the hierarchy of localization packs used in our working example. In case that the user locale is not supported in our working example, the Localization Pack Manager will select the one closest to the user's locale from all supported locales. If no compatible locale can be found, the en_US default localization pack will be used instead.
15.5 Machine translation In addition to its user interfaces in 12 languages (translated in advance by human beings), Our Global Travel Shanghai Demo also allows you to experience dynamic machine translation performed by WebSphere Translation Server V1.0 from English to four single-byte character set (SBCS) languages (French, German, Italian, and Spanish) as well as to four double-byte character set (DBCS) languages Simplified Chinese, Traditional Chinese, Japanese, and Korean).
15.5.1 What is machine translation? Machine translation (MT) is automatic translation of human language by computers. For instance, an English-to-German MT system translates English (the source language) into German (the target language) without human intervention.” 1 Though there is still much room for improvement before it can fully replace human translation, machine translation is an exciting technology that has made great progress over the years. It is currently capable of conveying the gist of text and can assist human translators in producing translation drafts. Further human intervention might be required to tune machine translation output, especially in mission-critical activities such as legal documentation, but in cases where instant translation is required and document gist is satisfactory, it can save significant time and effort.
15.5.2 WebSphere Translation Server WebSphere Translation Server is a cardinal achievement of IBM's long-term attempts in the realm of globalization. As an add-on tool for the end-to-end globalization process, WebSphere Translation Server helps bridge language barriers in the world of e-commerce by exploiting machine translation. IBM has combined several powerful technologies to deliver an enterprise-level MT solution for the Web. The WebSphere Translation Server is comprised of a Translation Services Gateway (TSG) and a set of IBM's language engines. The Translation Services Gateway uses the language engines to provide instantaneous translation services to Web clients, including servlets, JSPs, and plugin-enabled Web servers.2 The User Dictionary Manager (UDM) is a supplementary tool for building user dictionaries containing context-related translations from the source language to the target language. This allows users some control over machine translation output. Figure 15-17 on page 108 provides a high-level view of WebSphere Translation Server components.
1 2
InfoCenter of the WebSphere Translation Server V1.0 InfoCenter of the WebSphere Translation Server V1.0
Chapter 15. Design and development
107
Figure 15-17 WebSphere Translation Server components
Each language engine indicates the supported machine translation between a language pair. The supported language pairs in WebSphere Translation Server 1.0 are:
English to or from French English to or from German English to or from Italian English to Japanese English to Korean English to Simplified Chinese English to or from Spanish English to Traditional Chinese
WebSphere Translation Server is capable of two methods of machine translation—on-the-fly translation and on-demand translation. On-the-fly translation occurs automatically if the user sets the language preference in his Web browser. 1 For example, if the language used on the server is English and the preferred language on the user's browser is Simplified Chinese, the Translation Server will automatically translate so that what the user sees on his browser is a Web page in Simplified Chinese. On-demand translation allows the end user to decide what text string or which URL-specified Web page and for which language the machine translation should be performed. WebSphere Translation Server provides an open API for users to customize the on-demand translation function in order to meet the needs of their applications.
15.5.3 Solution for Our Global Travel Shanghai Demo In our working example, we applied the WebSphere Translation Server 1.0 on-demand translation function to perform machine translation based on two considerations: Language selection. Users can select for which language the machine translation should be performed (the source language is English). Hyperlinks. On-the-fly translation can translate the current Web page, but not pages linked from it. In our working example, since machine translation must be performed in a constant way, on-the-fly translation cannot satisfy our needs. 1
108
InfoCenter of the WebSphere Translation Server V1.0
e-Business Globalization Solution Design Guide
Process of on-demand machine translation In our working example, the TranslatorServlet servlet works as a bridge between the WebSphere Application Server containing Our Global Travel Shanghai Demo and the WebSphere Translation Server. Figure 15-8 on page 99 illustrates how on-demand machine translation operates in our working example.
Web (HTML, JSP, Servlet...)
WTS
1 4 6
API
Translator Servlet 5 WebSphere Application Server
2
3 URL to Visit
Figure 15-18 Process of on-demand machine translation with TranslatorServlet
1. Set the URL of the TranslatorServlet, together with the necessary parameters, to the URL prefix for the entire translation process. For example: http://hostname/TranslatorServlet?langpair=ende&translatepage=
where langpair=ende means that the Web page is to be translated from English to German, and translatepage refers to the URL address of the Web page to be translated. 2. When a user visits a Web page, the URL of that Web page is appended to the translatepage= parameter. 3. TranslatorServlet then connects to the Web page whose URL is specified by the translatepage= parameter, and Web page data is retrieved. 4. TranslatorServlet then visits WebSphere Translation Server and returns retrieved data for translation. 5. WebSphere Translation Server passes the translated data to TranslatorServlet. 6. Translated data is assembled, together forming the translated Web page, which is then forwarded to the user's browser.
Define application-specific dictionaries Since it is difficult for a machine to pick the most suitable translation for a given word or phase from its dictionary database, WebSphere Translation Server applies the User Dictionary File (UDF) so that individual users can input more context-relative translations. If a word or phrase appears in both the UDF and the language engine, the UDF translation is used. In our working example, we used the User Dictionary Manager to build a User Dictionary File in two Chapter 15. Design and development
109
languages, German and Simplified Chinese. These two languages were selected at random to help us determine to what extent the User Dictionary File can enhance machine translation quality. Figure 15-19 is an excerpt from the UDF in German.
Figure 15-19 Part of User Dictionary File from English to German
The original Hotel Search Page in German as translated by the WebSphere Translation Server without the German UDF looks like Figure 15-20 on page 111.
110
e-Business Globalization Solution Design Guide
Figure 15-20 Hotel search translated from English to German by the WebSphere Translation Server without UDF
By applying the UDF to our WebSphere Translation Server translation, the German Hotel Search page context seems more relevant:
Chapter 15. Design and development
111
Figure 15-21 Hotel search translated from English to German by the WebSphere Translation Server using UDF
During the testing phase that followed the development of our working example, we conducted an extensive user survey covering machine translation quality.1 The German and Simplified Chinese pages translated by the WebSphere Translation Server with UDF showed a significantly higher user satisfaction than those without.
15.6 Global Business Object As we mentioned in Chapter 10, “Global Business Object” on page 51, the term Global Business Objects” (GBOs) refers to business functions that can be shared and reused by multiple global applications. In our working example, two GBOs have been set up for the following business functions.
Name, title, and address format Different countries have different formats for a person's name. For example, in the United States and Europe, a person's full name usually begins with the given name followed by the surname, while in China, Japan, and Korea, surnames come before given names. Moreover, in the United States and Europe, a person's title comes before his name, while in China, Korea, and Japan, it comes after.
1
112
See 16.6, “Usability testing” on page 132.
e-Business Globalization Solution Design Guide
Address formats also differ in different countries. In the United States and Europe, the order of address begins with a person's name, followed by apartment number, street number, street name, city, state or province, and country. In China, Korea, and Japan, the order of address is just the opposite—country, province, city, street name and number, room or apartment number, and the user's name. The Global Business Object GBONameAddressStyle.class can provide the format for a person's name (either with or without his title), as well as the format for his address, both in conformity with the cultural conventions of the user locale. Examples of GBONameAddressStyle.class usage can be found on the Registration page and the My Account Info page, shown in Figure 15-22.
Figure 15-22 Name format in the en_US locale
The GBO also applies the concept of the Single Executable. That is, locale-sensitive information is separated from core business logic. In Figure 15-22, information regarding how a person's name or address is formatted in a certain locale is separated from the programming code and stored in property files called Format.properties, one for each locale. In the file for the en_US locale, for example, name format information is stored as shown in Example 15-9. Example 15-9 Name format information in property file of en_US locale NameFormat: T 1 2 3
Example 15-10 shows the GBONameAddressStyle code for retrieving information about persons' name formats. Example 15-10 Sample code in GBONameAddressStyle.java providing name format information /** * Get the format of people's name. * The following table includes the characters in the returned string and their meanings. * 1: First name * 2: Middle name * 3: Last name * 4: Father's name
Chapter 15. Design and development
113
* 5: Family name * 6: First last name * 7: Second last name * 8: Preposition * T: Title * * @return: String of name format * @param value Locale locale */ public static String getNameFormat(Locale locale) { if (locale == null)return ERROR_INVALID_LOCALE_STRING; ResourceBundle bundle = getBundle(locale); if (bundle != null){ return bundle.getString(“NameFormat”); } return “”; }
In our working example, the Format Bean acts as the bridge between JSP codes and GBONameAddressStyle.class. When the Registration page or My Account Info page must be displayed, the relevant JSP code invokes the Format Bean, which then invokes the GBONameAddressStyle class, passing user locale in a locale string containing a parameter value such as en_US so that the returned value is a string such as “T 1 2 3." The Format Bean then builds the separate parts of the person's name according to this format, wrapping them with HTML tags if needed. For example, if a user inputs his title as “Mr.,” his first name as “William,” his middle name as “M,” and his last name as “Jones,” then the name displayed on the My Account Info page for an en_US user locale will be
Figure 15-23 Name format in the en_US locale
The same section of that page displayed for a ja_JP user locale conforms to the customary Japanese name format:
Figure 15-24 Name format in the ja_JP locale
114
e-Business Globalization Solution Design Guide
Measurement system The measurement systems used in various countries differ for historical reasons. For example, the British system is still used in the USA and Canada, while the Metric system is used in most other countries. GBOMeasure provides a conversion function between different measurement systems. In our working example, the flying distance between two cities is displayed in miles for users having en_US locales, and in kilometers for those from other locales.
Figure 15-25 Distance using the en_US measurement system
Figure 15-26 Distance using the fr_FR measurement system
The implementation of this GBO is somewhat more complicated than the previous one, since within a given measurement system there are various units of length and weight and the conversion from one measurement system to another must be based on source and target units of length or weight. In our working example, there is a file named UnitDef.xml that contains commonly used units of length, weight, area, etc. The units of length are stored as shown in Example 15-11 on page 116.
Chapter 15. Design and development
115
Example 15-11 Units of length contained in UnitDef. xml
For each supported locale, there is an XML file called Unit.xml that contains the units commonly used in that locale. For example, the Unit.xml for en_US contains the units of length shown in Example 15-12. Example 15-12 Units of length contained in Unit.xml in the en_US locale
When a unit in one category (for example Linear) is being converted to a unit used in another locale and the target unit is not specified, GBOMeasure.class will get the default unit for that category from Unit.xml in the target locale, Example 15-13 Sample codes of GBOMeasure.java /** * Convert a value in source unit to a value in the default unit used in a specific locale. * @return new value in the target unit. * 0 if source unit is empty; * 0 if the default unit of the locale does not exist; * @param value the double value in the source unit * @param sourceUnitName source unit * @param localeStr String value of a locale, e.g., “en_US” * @param category unit category, e.g., “Linear”. */ public double convertWithLocaleDefUnit( double value, String sourceUnitName, String localeStr, String category) { if (locStr == null) { // illegal locale string if (LOG.isWarnEnabled()) { LOG.warn(“Invalid parameter: localeStr=” + localeStr); } return 0; } //to get the default unit of a certain category used in a specific locale String targetUnitName = getDefaultUnit(locStr, category);
116
e-Business Globalization Solution Design Guide
//to convert the value in the source unit to the value in the target unit, and then return the new value return convert(value, sourceUnitName, targetUnitName); }
15.7 Localization In our working example, localization has three aspects—locale model, GBOs, and localization packs. GBOs are application-independent, while localization packs are application-specific.
15.7.1 Locale model Locale model localization pertains primarily to locale-sensitive computing provided by the ICU4J-based Locale Model Bean. ICU4J also builds and maintains a repertoire of locale-related information—for example, time zones, date and time format, and number format. Using ICU4J, localization is accomplished simply by passing user locale as a parameter into the Locale Model Bean. Culturally correct data is then returned. Therefore, locale model localization is implemented by the development team.
15.7.2 GBO GBO localization pertains primarily to the property files accompanying GBO classes (where locale-sensitive information is stored). GBO localization for a certain locale means collecting locale-related information required for this GBO and then organizing it into a property file according to the pre-defined file structure. For example, in GBONameAddressStyle.class locale-sensitive information for various locales is stored in the Format.properties file—one for each locale. GBO localization is implemented by the development team as well, though information collection might require local support.1
15.7.3 Localization packs Localization packs are application-specific because they contain all language-dependent Web content needed for Our Global Travel Shanghai Demo. It is also the last step of localization in our working example. During the development phase, Our Global Travel Shanghai Demo prototype was set up in English, so that the English-version localization pack became our base.
Translation of localization packs The English localization pack was then sent to IBM's Translation Service Centers (TSCs) in Brazil, China, Egypt, France, Germany, Israel, Italy, Japan, South Korea, Spain, and Taiwan to be translated into Portuguese, Simplified Chinese, Arabic, French, German, Hebrew, Italian, Japanese, Korean, Spanish, and Traditional Chinese respectively. In our working example, a Readme file containing special requirements relating to the localization pack translation was delivered to the local TSCs together with the localization pack itself. Example 15-14 on page 118 illustrates one such readme:
1
Production of localization packs in other language requires full local translation support.
Chapter 15. Design and development
117
Example 15-14 Localization pack translation readme file example The LocalizationPack file comprises all locale-unique information that will be presented in the user interface. Please refer to the English site of Our Global Travel Shanghai Demo when you are translating this file. This will help you get a rough idea about what you are translating. Translate punctuations (for example, colons) according to your locale. For example, in LocalizationPack.xml, notice the placement of the colon in the following sentence: Street Address1: The names and addresses of all airlines, hotels, and agencies are fictitious and only need be transliterated. We strongly recommend that translated XML files be encoded in UTF-8 format.
The readme file also contains information and instructions about the use of Translation Problem Reporting System (TPRS). A project named Global Travel Shanghai Translation has been opened in the TPRS for communication purpose while you are doing translation for Our Global Travel Shanghai Demo. The TSC of your country/region has been added to this project as TSC user administrator. Please take advantage of TPRS to report problems, share information, and get help. Since important inquiries and reminders about translation will be posted on TPRS, prompt access and response is needed during translation and the translation verification test of Our Global Travel Shanghai Demo. Following the instructions in the readme file, local TSCs applied various translation tools such as TranslationManager (TM) to translate the localization pack and report any problems or bugs via TPRS. After translation, the TSC returned the translated localization pack to Our Global Travel Shanghai Demo developer group and also kept a copy for the coming translation verification test.
Translation verification test All localization packs were loaded onto the Web server under the same directory accessible to the application program. TSCs then went through the entire Our Global Travel Shanghai Demo Web site in their native languages to verify that the translations were done correctly. If not, they updated the corresponding localization packs and returned them to the developer group. During this testing phase, TPRS was also used as the communication tool between local TSCs and the developer group. When all Web pages in every supported language appeared correct both functionally and culturally, the translation verification test was considered complete.
118
e-Business Globalization Solution Design Guide
16
Chapter 16.
Testing Multilingual applications must display content in various languages and usually also demonstrate locale-sensitive information. Since it is difficult for a testing team to have native testers for all supported locales, local or outsourced localization services must be involved in the testing phase of multilingual applications. The testing of a multilingual application should cover not only the business functions provided by the application, but also language-dependent content and globalization features.
Testing process and focuses The testing process of our working example can be divided into three phases: 1. Test cases design and documentation Before coding is finalized, a testing team should have been formed. Because translation and locale-sensitive content (including globalization features) were also tested, the testing team for our working example included native testers from the localization service providers (see Chapter 11, “Localization” on page 57). The coding team gives detailed specifications to the testing team concerning: – The application scenario, including all primary functions – Languages supported – Globalization features demonstrated in the project The testing team then designs and documents test cases with expected results based on this information. Test cases are designed to verify all project functions, support for all required languages, and the appropriateness of all globalization features. The tester should document each test step and its results. A complete test case document should include the following information: – – – – – – –
Test objectives The creation and latest update dates of this test case The test case number and name Its platform(s) Additional testing tools, data, and hardware and software requirements (optional) Scenario execution details, including steps and screens (to be filled in by the tester) Test results (to be filled in by tester)
2. Test environment building When coding is complete, the testing team sets up the test environment with the aid of the coding team. The test environment should be identical to that of the actual application runtime environment. 3. Testing focuses During the testing period, open and instant communications between the coding and testing teams are absolutely essential. Whenever the testing team finds any problem, it should inform the coding team immediately so that it can make a fix as soon as possible. In our working example, testing focused on the following: – – – – – –
Function testing Translation testing Globalization feature testing Linguistic service testing Browser testing Usability testing
In the following sections, these six testing focuses are explained in detail.
16.1 Function testing One advantage of a well-designed and developed globalized application is that function testing is unnecessary for every localized version. Since all language versions use the same set of programs (the Single Executable 1), we have good reason to assume that a localized version should work exactly in the same way as the source language product in business functions. Therefore, only the source language version must undergo all function testing cases in order to ensure basic functional competence. The following example from Our Global Travel Shanghai Demo testing checks whether the Air Ticket Search function works correctly in the source language version: 1. Start your browser and visit Our Global Travel Shanghai Demo home page. 2. Verify that the home page is displayed. 3. From the home page, click the Flights link on the menu bar. 4. Verify that the Flight Search page is displayed as shown in Figure 16-1 on page 121.
1
120
See 15.1, “Single Executable” on page 92.
e-Business Globalization Solution Design Guide
Figure 16-1 Flight Search page of Our Global Travel Shanghai Demo
Here, all options and input text fields are set with default values. 5. Select the From XXX to Shanghai option (where XXX refers to the destination or departure city name). 6. In the XXX pull-down list, select the --Please Select-- option. 7. Click the Submit button. 8. Verify that the Please select the departing city error message is displayed on the same page. 9. Select the From Shanghai to XXX option. 10.Click the Submit button. 11.Verify that the Please select the destination city error message is displayed on the same page. 12.Select a city from its pull-down list.
Chapter 16. Testing
121
13.Click the calendar icon:
14.Verify that a calendar pops up. 15.Select the departure and return dates from the pop-up calendar, making the return date earlier than departure. 16.Click the Submit button. 17.Verify that the Sorry, the returning date is earlier than the departure date. Please try again. error message is displayed on the same page 18.Select one departure date and make it earlier than today. 19.Click the Submit button. 20.Verify that the Sorry, the departure date should be later than the current date. Please try again. error message is displayed on the same page. 21.Select the correct departure and returning dates. 22.Make all six required seat text fields blank. 23.Click the Submit button. 24.Verify that the Please fill in the number of tickets required for at least one type of seat. error message is displayed on the same page. 25.Input abnormal numbers into the text fields for required seats (such as -1, 1.1, 0, or very large numbers such as 999999999). 26.Click the Submit button. 27.Verify that the The amount of First/Business/Economy class for adult/child is invalid. Please try again. error message is displayed on the same page. 28.Input normal numbers for seats required. 29.Select a number for results displayed per page. 30.Click the Submit button. 31.Verify that the Flight Search Result page appears as in Figure 16-2 on page 123.
122
e-Business Globalization Solution Design Guide
Figure 16-2 A Flight Search Result page of Our Global Travel Shanghai Demo
32.Verify that all displayed information for the following is correct: a. Trip type (the default value is Round trip) b. Departure and destination places c. Departure and returning dates d. Seat class and number e. Ticket price f. Search results per page g. Result order (the default value is Lowest Price) 33.Move the mouse over the Connecting to or Details link of any flight itinerary. 34.Verify that detailed flight information pops up
Chapter 16. Testing
123
Figure 16-3 A Flight Search Result Page with detailed flight information
35.If so, verify that this information is correct. 36.Click the Search Again button. 37.Verify that the Flight Search Page is re-displayed. 38.Check the One Way option in trip type. 39.Repeat steps 5 - 35. 40.Click the Search Again button. 41.Check the Shortest Flight search result order. 42.Click the Submit button. 43.Verify that the search result is displayed from the shortest flying distance to the longest.
16.2 Translation testing Translation testing has two major purposes: 1. To check translation accuracy and contextual pertinence 2. To improve translation quality Localization service providers in all supported locales join the testing team to test the Web site translation in their respective languages. During the Localization phase, if the source localization pack is in XML format, it is difficult for translators to catch the meaning of separate words or phrases without knowing and understanding their context. During translation testing, testers review all translated Web pages to pick out words and phrases that might have been inappropriately translated. For example, in Our Global Travel Shanghai Demo Hotel Search and Reservation function, the word “single” in room type means that the room is for one person only.
124
e-Business Globalization Solution Design Guide
Figure 16-4 A section of the Hotel Search Page in Our Global Travel Shanghai Demo
However, without knowing the context, it is possible for this word to be translated as “not married.”
Figure 16-5 A section of localization pack for translation
Such translation defects should be discovered during translation testing. If any translation defect is detected, the corresponding localization pack should be updated with a translation more pertinent to the context.
16.3 Globalization feature testing The objective of globalization feature testing is to ensure that the application provides globalization feature correctly. For this reason, globalization feature testing should be performed on all localized versions. Common concerns include: Whether the user language interface conforms to the locale selected by users Whether the character set is called correctly, especially multi-byte character sets such as Chinese, Japanese, and Korean Whether locale-sensitive information is displayed in a correct way, including date and time format, name and address format, number and currency format, and dictionary sorting Whether the bi-directional data display is adequately supported, especially for Arabic and Hebrew Following is the globalization feature testing procedure on the Registration page of Our Global Travel Shanghai Demo: From the Welcome page, click the Register button and a form will be displayed. Figure 16-16 on page 133 shows part of this registration form.
Chapter 16. Testing
125
Figure 16-6 Registration Form in Our Global Travel Shanghai Demo
Local testers should check the following points according to their own cultural conventions: If all field names are displayed in the correct language. If the date-of-birth format conforms to the conventional date format for the current user locale. In Figure 16-6 (which is displayed for the en_US user locale), the date-of-birth format is month-day-year. The same section in the Web page for a Chinese, Japanese, or Korean user should be year-month-day. Testers should verify this feature. If the format for a person's name (including honorific) follows the cultural convention of the current user locale. For example, for an American user, the order is honorific, first name, middle name, and last name, while for a Japanese user, it should be last name, first name, and honorific (and the middle name field should not be displayed at all, since Japanese do not have middle names).
Figure 16-7 Our Global Travel Shanghai Demo showing the name format in ja_JP user locale
126
e-Business Globalization Solution Design Guide
If the default value for the Country/Region selection is set to be the country/region corresponding to the current user locale. If the list of country/region and state names is in the dictionary sort sequence according to the cultural convention of the current user locale.
Figure 16-8 Our Global Travel Shanghai Demo showing list of names of countries/regions
If the postal address order follows the corresponding cultural convention, and when the Current Address country/region is changed, if the address format changes accordingly. Figure 16-9 shows the change of address format when the Country/Region selection is changed from United States to China.
Figure 16-9 Our Global Travel Shanghai Demo showing different address formats in different user locales
When the Current Address country/region is changed, if the state list is updated correspondingly.
Chapter 16. Testing
127
Figure 16-10 Our Global Travel Shanghai Demo showing the change of state when the country/region Is changed
When the user locale is ar_EG or iw_IL, if the bi-directional data display is correct
Figure 16-11 Our Global Travel Shanghai Demo showing BiDi display
Whenever a defect is found in culture-sensitive features, local testers should report this to the development team for repair.
16.4 Linguistic testing Linguistic service testing verifies whether any linguistic service provided by a multilingual application follows the linguistic conventions of each supported locale.
128
e-Business Globalization Solution Design Guide
In our working example, the main linguistic service provided is machine translation performed by the IBM WebSphere Translation Server. Machine translation testing has two emphases: machine translation quality and the functional competence of machine-translated pages.
Machine translation quality With the aid of the user-defined file provided by WebSphere Translation Server, machine translation quality can be improved. This test can help us set up a User Dictionary File (UDF) in order to make the translation pertinent to this specific project. For example, Figure 16-12 shows the result of machine translation from English to German without UDF:
Figure 16-12 Content translated from English to German by the WebSphere Translation Server without UDF
Based on the original machine translation output, we set up a UDF to define the translation of certain words and phrases. As a result, Figure 16-12 now becomes like Figure 16-13 on page 130.
Chapter 16. Testing
129
Figure 16-13 Content translated from English to German by the WebSphere Translation Server with UDF
Functional competence of machine-translated pages As we mentioned in 15.5, “Machine translation” on page 107, the IBM WebSphere Translation Server in our working example is accessed via a translation servlet. This servlet retrieves data from the source Web page in English and sends it to the WebSphere Translation Server, which then translates and returns it to the translation servlet. This servlet assembles all translated data to form a new Web page displayed on the user's browser. Because the WebSphere Translation Server only translates Web content and does not pass parameters from one page to another, we need to add additional scripts in the translation servlet to pass these parameters. Functional competence testing of machine-translated pages can be viewed as a combination of linguistic and functional testing. It is used to ensure that: The linguistic service is complete. In other words, within Our Global Travel Shanghai Demo Web site, any Web page linked from a machine-translated page will also be translated by the WebSphere Translation Server. The linguistic services do not interfere with the business functions that the project must provide. No business function is lost or broken on machine-translated pages.
16.5 Browser testing Browser testing involves tasks users execute within a browser. Since people in different locales can have different browser preferences, testing multilingual applications covers more
130
e-Business Globalization Solution Design Guide
browser activities than testing monolingual ones. The following are some of the more common concerns for browser testing: Browser-dependent user operations. For example, what will happen if the Back, Forward, or Reset buttons are selected during the transaction cycle? Cookie-related activities, such as what will happen if the user enables or disables the cookie settings. Vendor-specific performance. For example, whether the application can survive under both Internet Explorer and Netscape Navigator. Artwork-related problems. For example, different fonts might display different glyphs against the same code entry, so that the Web page should be designed in such a way that the content in all supported languages can be displayed neatly and professionally. Our working example requires that both IE and Netscape be supported. Though there are few differences between these two browsers, those that exist can still adversely affect the application to a certain extent. For example, when the user locale is ar_EG, Internet Explorer can display numbers in Hindi while Netscape cannot. For this reason, Arabic numerals are used when the user locale is ar_EG so that both IE and Netscape can display numbers in the proper way. In this case, browser testing can help developers get a better understanding of browsers and thus make the application fit into the display capabilities of both supported browsers. After browser testing, we see that Our Global Travel Shanghai Demo Web site can be displayed with the best quality on Microsoft Internet Explorer 5.5 or higher and Netscape 6.2 or higher, both using a Windows 2000 Server. Browser testing should also check the artwork. For instance, in our working example, the original column width of some tables is just enough to contain the data in English in one line, but not to contain that in languages whose average word length is longer than that of English. During browser testing, German and Spanish testers found that some tables looked sloppy because they were not wide enough.
Figure 16-14 An example of a sloppy table
As a result, the artwork designer re-designed these tables so that they can now neatly accommodate Web content in all supported languages.
Chapter 16. Testing
131
Figure 16-15 The same table after width adjustment
For more details concerning artwork design and development for globalized applications, please refer to Appendix C, “CSS and artwork globalization” on page 165.
16.6 Usability testing Usability testing evaluates how user-friendly an application is. From a globalization perspective, testing should evaluate whether customers in various locales can interact freely with a multilingual application and thus make full use of it. It is not easy to form a technical testing team, because ideally such a team should include end users who speak the native language in various locales. There are two alternatives— formal review and extensive survey. 1. Formal review Good candidates for a formal review are professional translators, technical engineers, globalization consultants, and actual customers. By introducing pertinent suggestions, such a review can help to enhance the technical and cultural correctness of a multilingual application. During the testing phase of our working example, we requested senior IBM engineers, consultants, managers, and directors to review our product. 2. Extensive Survey An extensive survey can be made among companies that will market, sell, and/or support the localized version of a multilingual application, or through online surveys aimed at the end users of a Web application. In our working example, we used an online survey to test the usability of the machine translation performed by the IBM WebSphere Translation Server. Our working example is equipped with a tutorial that can be linked from the home page of Our Global Travel Shanghai Demo Web site. In the tutorial there is an online survey for users to give their feedback concerning the quality of the machine translation section of our working example. Figure 16-16 on page 133 shows part of this survey.
132
e-Business Globalization Solution Design Guide
Figure 16-16 A section of the online survey on the machine-translated pages of Our Global Travel Shanghai Demo
We requested our worldwide testers and users to navigate Our Global Travel Shanghai Demo through its machine-translated Web pages and then fill in and submit this survey. Such a survey gives the development team better knowledge of the performance and quality of the IBM WebSphere Translation Server.
Chapter 16. Testing
133
134
e-Business Globalization Solution Design Guide
17
Chapter 17.
Maintenance Apart from general issues involved in the maintenance of any Web application, special maintenance requirements can arise for globalized Web applications. In this section, we focus on such requirements.
17.1 Adding new languages One great advantage of a globalized application over non-globalized applications is that it is much easier to add new language versions of existing products. In the case of a non-globalized application, adding a new language version takes less effort than developing an entirely new product. The new language version must have a new set of application programming code. This is not just a re-installation. The coding team must modify all locale-related codes in order that the output conform to the cultural conventions of that locale. The translation team must thoroughly search throughout the application programming code to pick out language-dependent data and translate it into the new language. Locale-related computing and language-dependent data might be embedded with other codes throughout the program, and a change at one point might affect how the program runs at others. Due to complex coding and translating processes, errors can easily occur, and testing and debugging the new language version can also be complicated. For a globalized application, on the other hand, adding a new language version is much simpler. The main activity is localization—that is, creating a localized version in the new language. As discussed in Chapter 11, “Localization” on page 57, adding a new language can affect two areas—locale-related computing and language-dependent content.
17.1.1 Locale-related computing In a globalized application, locale-related computing is separated from the core business logic. In this case, even if there are locale-related codes to add for the new language version, the core business logic will be unaffected. Locale-related computing involves codes invoking locale-supporting code libraries such as ICU and Global Business Object (GBO). Normally, adding a new language version affects only the GBO by attaching a property file containing the required locale-related information for the new locale. The code library can then be invoked directly for the new locale. The majority of locale-related computing in Our Global Travel Shanghai Demo was implemented by invoking ICU4J, which contains culture-sensitive information and computing for many locales throughout the world. Therefore, we do not need to add any locale-related computing to our source code, only to pass the new locale as a parameter to the Locale Model Bean. The returned value will always conform to the cultural convention of that locale. For example, if we want Our Global Travel Shanghai Demo to support the Thai language in the Hotel Search Result page, the room price should be displayed in Thai Barr. By invoking the relevant ICU4J code with a locale parameter of “th_TH”, we can retrieve the price in the Thai currency code automatically. For the GBO-provided name and address format, if Thai must be supported, we only need add the Format.property file containing the name and address formats used in Thailand.
17.1.2 Language-dependent content In a globalized application based on the concept of the Single Executable, language-dependent content has also been separated from source codes and grouped into an independent resource bundle called the localization pack. To add a new language version only requires creating a new localization pack in the new language by translation. To translate this data, the development team need only send the localization pack in the source language to the new language's translation team. This team translates the data and
136
e-Business Globalization Solution Design Guide
stores it in a new localization pack with exactly the same file structure as the source localization pack and then returns the new pack to the development team. The development team loads the new localization pack onto the server under the same directory as other localization packs so that the content in the new language can be retrieved automatically by the program just as it does content in other languages. Thereby, what users in the new locale view on their browsers is content in their own language. Throughout the translation process, the translation team does not need to bother with the entire package's source codes, but only with content. Translation is thus focused on the localization pack and does not interfere with the application program. Therefore, translation can be much easier and will not cause any program errors.
17.2 Changing or adding globalization features Sometimes locale-related information can change or emerge with the vicissitude of politics, economy, finance, etc. As a result, the globalization features of existing globalized applications might need to be updated to reflect these changes. For example, during the development of Our Global Travel Shanghai Demo, the Euro became the unified currency of the European Commonwealth countries, and the local currencies of these countries withdrew from usage. Originally the currencies used in this project for Commonwealth countries such as France and Germany were their local currencies, not the Euro. To catch up with this historical change, our source code was modified. Since ICU4J supports the Euro currency, modifying source code only affected the Locale Model Bean that invokes the ICU4J V1.8 for currency information. Our original code for converting a value to French francs looked like this: NumberFormat.getCurrencyInstance(“fr_FR”).format(100000)
giving a result of “100 000,00 F” To get the currency in Euros, the code is now: NumberFormat.getCurrencyInstance(“fr_FR_EURO”).format(100000)
giving a result of 100 000,00 €.1 Adding new globalization features also involves modifying the Locale Model Bean, the localization pack, and possibly the JSP codes. For example, what if Our Global Travel Shanghai Demo needs to display the holiday information of various countries? First, the Locale Model Bean must invoke ICU4J to get the holiday information of various locales. Secondly, the word “holiday” itself might also need to be displayed. As a result, there should be an item containing the word “holiday” in various languages in the corresponding localization packs. Thirdly, the relevant JSP code might also need minor updating in order to make the new features visible on Web pages. Normally modification begins with the source version. After successfully testing this specific feature, a modification can easily be mirrored into all other languages by updating relevant sections of localization packs.
1 Please note that in later versions of ICU4J, EURO is the default currency for the European Commonwealth countries, and the variant “_EURO” is no longer needed to retrieve a value in EURO through ICU4J.
Chapter 17. Maintenance
137
138
e-Business Globalization Solution Design Guide
Part 4
Part
4
Appendixes These appendixes include the following: Appendix A, “Server-side installation and configuration for Our Global Travel Shanghai Demo” on page 141 Appendix B, “Client-side installation and configuration for Our Global Travel Shanghai Demo” on page 157 Appendix C, “CSS and artwork globalization” on page 165
Server-side installation and configuration for Our Global Travel Shanghai Demo In this appendix we describe the detailed processes for installing and configuring the products involved in Our Global Travel Shanghai Demo (our working example).
Prerequisites Windows 2000 Server Red Hat Linux 7.1 server along with two additional Linux packages required for WebSphere Application Server—Ncurses4 and Pdksh JDK1.1.8 or above pre-installed on both servers The procedures described here should be used in conjunction with your product installation guides, and all required parameters should be set in accordance with the machines used for building your own environment.
A.1 IBM HTTP Server V1.3.19 This section provides detailed instructions for installing and configuring IBM HTTP Server V1.3.19 for Linux on a Red Hat Linux 7.1 server. Before installation, ensure that the following IP ports are unused: 80 (standard HTTP port) 443 (standard HTTPS port) 8008 (IBM HTTP Server Administration port)
A.1.1 Install IBM HTTP Server 1. Log in as root. 2. Start a terminal session. 3. Insert the IBM WebSphere Application Server V4.0 CD-ROM (containing the IBM HTTP Server) into your CD-ROM drive and mount the CD. 4. Change to the ihs_128 subdirectory of the root installation directory on that CD. 5. Create an rpm.list text file listing all packages shown in Table A-1. Table A-1 Packages needed to be included in the rpm.list text file Package
Description
gsk5bas-5.0-3.61.i386.rpm
GSK certificate and security
IBM_ADMIN_EN-1.3.19-0.i386.rpm
IBM Administration Server documentation
IBM_ADMIN_Server-1.3.19-0.i386.rpm
IBM Administration Server program files
IBM_FastCGI-1.3.19-0.i386.rpm
Implementation of FastCGI standard - increases CGI performance
IBM_HTTP_Server-1.3.19-0.i386.rpm
IBM HTTP Server program files
IBM_MAN_ENU-1.3.19-0.i386.rpm
HTTP Server manual (man) pages
IBM_MSG_EN-1.3.19-0.i386.rpm
Language message files for IBM HTTP Server
IBM_SSL_128-1.3.19-0.i386.rpm
IBM SSL (Secure Sockets Layer) 128bit library
IBM_SSL_Base-1.3.19-0.i386.rpm
IBM SSL (Secure Sockets Layer) program files
IBM_SSL_EN-1.3.19-0.i386.rpm
Language message files for IBM SSL
6. Install these packages using the rpm (Red Hat Package Manager tool) command: #for i in `cat rpm.list`; do rpm -U --nodeps $i ; done
A.1.2 Configure IBM HTTP Server After installing IBM HTTP Server V1.3.19, complete the following configuration tasks on your IBM HTTP Server machine: 1. Create an HTTP Server admin account. This account is used to access the HTTP Administration Server Configuration G. 2. Create a UNIX runtime account. Although the HTTP Server process is started under the root account, it must be configured and then switched to run under another account. A UNIX account must be created 142
e-Business Globalization Solution Design Guide
especially for this purpose. Run the # ./setupadm script and supply user ID and any other information required by the UNIX runtime account. 3. Update httpd.conf The httpd.conf HTTP Server configuration file must be updated to reflect the fully qualified host name of the server and the UNIX account under which it must run. 4. Restart the HTTP Server.
A.2 IBM DB2 Universal Database V7.2.1 This section provides detailed instructions for installing and configuring IBM DB2 Universal Database V7.2.1 Enterprise Edition for Linux on a Red Hat Linux 7.1 server. Before installing, ensure that the following IP ports are unused:
A.2.1 Install DB2 Universal Database Server 1. Log in as root. 2. Start a terminal session. 3. Insert the DB2 V7.2.1 CD-ROM into your CD-ROM driver and mount the CD. 4. Using the following command, start the DB2 installer program: #./db2setup
5. On the Install DB2 V7.2.1 window, select only DB2 UDB Enterprise Edition. 6. Select the DB2 Product Library option. 7. On the DB2 Product Library window, select the correct option for your locale under the DB2 Product Library (HTML) section. 8. Create a DB2 Instance option. 9. On the DB2 Warehouse Control Database window, disable the Setup DB2 Warehouse Control Database. 10.Create the Administration Server. 11.The db2setup program will install the selected components. 12.If you are prompted to register the product, complete the registration and return to the install window. 13.Select OK as required as you proceed through the steps that follow until installation is complete.
A.2.2 Configure the DB2 Universal Database Server 1. Update root administrative groups The DB2 Server installation automatically sets up the db2asgrp administrative group. You must add the user root to this group.
Appendix A. Server-side installation and configuration for Our Global Travel Shanghai Demo
143
2. Update JDBC level Although the default installation for IBM DB2 V7.2.1 is JDBC1.2, IBM WebSphere Application Server 4.0 requires the usage of JDBC 2.0. To update the DB2 JDBC level, complete the following steps: a. Change to user . b. To the end of add the .bashrc environment file, shown in Example A-1. Example: A-1 Sample codes to add .bashrc environment file if [-f /home//sqllib/java12/usejdbc2 ];then ./home//sqllib/java12/usejdbc2 fi
3. Configure TCPIP communication mode If your DB2 server does not use TCP/IP as its primary communication method, then it must be configured to use TCP/IP: a. Change to user . b. Check whether TCPIP is the current DB2 communication method. The following command should return the value tcpip: $db2set DB2COMM
c. If not, use the following command to reset the DB2COMM DB2 environment variable: db2set DB2COMM=tcpip
4. Update root environment file WebSphere Application Server must run under the root account and requires access to the DB2 environment so that it can access the WebSphere Application Server administration database. This requires that the root account's environment.bashrc file be edited to add the content shown in Example A-2 to the end of that file. Example: A-2 Contents to be added to the environment.gatshrc file #Setup DB2 environment for root user. if [-f /home/db2inst1/sqllib/db2profile ] ; then . /home/db2inst1/sqllib/db2profile fi #Force DB2 to use JDBC 2.0. if [-f /home//sqllib/java12/usejdbc2 ];then . /home//sqllib/java12/usejdbc2 fi
5. Set up WebSphere Application Server administration database You are now ready to set up the WebSphere Application Server repository (also known as the WebSphere Application Server database).This database will be populated with WebSphere Application Server schema and default values in a later task. Once you have completed this task, reboot the system so that your changes can take effect.
A.3 IBM WebSphere Application Server V4.0 This section provides detailed instructions for installing IBM WebSphere Application Server Advanced Edition V4.0 for Linux on a Red Hat Linux 7.1 server.
144
e-Business Globalization Solution Design Guide
Before installing, please ensure that the following IP ports on your server are unoccupied by any active service: 900 (bootstrap port) 9000 (Location Service Daemon 9080 (default application server) Note: Since this installation updates the httpd.conf configuration file as part of its Web server plug-in component installation, the IBM HTTP Server process must be stopped prior to installing WebSphere Application Server.
A.3.1 Install WebSphere Application Server Advanced Edition V4.0 To install IBM WebSphere Application Server Advanced Edition V4.0, use the GUI installer interface as follows: 1. Log in as root. 2. Start a terminal session. 3. Insert the IBM WebSphere Application Server V4.0 CD-ROM into your CD-ROM drive and mount the CD. 4. Change your directory to the installation root. 5. Using the following command, execute the install.sh installation script: #./install.sh
6. In the Installation Options window, select Custom Installation. 7. In the Choose Application Server Components window, choose all options except IBM HTTP Server.
Appendix A. Server-side installation and configuration for Our Global Travel Shanghai Demo
145
Figure A-1 Choose Application Server Components window
8. In the Choose Webserver Plugin window, choose only the IBM HTTP Server Plugin. 9. In the Database Options window, enter data as depicted in Figure A-3 on page 148 and then click Next.
146
e-Business Globalization Solution Design Guide
Figure A-2 Database Options window
10.In the Select Destination Directory window, accept the default location for the WebSphere Application Server (/opt/WebSphere/AppServer). 11.In the Install Options Selected window, ensure that the correct components have been selected and then click Install to start the installation.
Appendix A. Server-side installation and configuration for Our Global Travel Shanghai Demo
147
Figure A-3 Install Options Selected window
12.In the Location of Configuration Files window, enter the path to the httpd.conf IBM HTTP Server configuration file. 13.In the Setup Complete window, click Finish.
A.3.2 Configure WebSphere Application Server Advanced Edition V4.0 Note: The configuration used in this section was specifically constructed for the multilingual enablement of WebSphere Application Server. Since Unicode is required to encode multilingual data, we set default.client.encoding on the WebSphere Advanced Administrative Console in Our Global Travel Shanghai Demo to “UTF-8”. See Figure A-4 on page 149.
148
e-Business Globalization Solution Design Guide
Figure A-4 Set default.client.encoding to UTF-8
Default.client.encoding is a system property that defines the client code-set for parsing input values. Unless default.client.encoding is modified, WebSphere Application Server will use ISO 8859-1 as its default character set.
A.4 IBM WebSphere Translation Server V1.0 In this section we describe the steps required to install WebSphere Translation Server V1.0 on a Windows 2000 Server. The WebSphere Translation Server software package includes the following:1
1
setup.exe setupwin.jar Language package mt**.jar UDM package udm**.jar () Fix pack JAR files.
Where “**” represents the language code (for example, mtcn.jar for Simplified Chinese or mtjp.jar for Japanese)
Appendix A. Server-side installation and configuration for Our Global Travel Shanghai Demo
149
Instructions 1. When you run setup.exe, the installation program will begin by searching for the Java VM on your server.
Figure A-5 InstallShield Wizard is searching for Java Virtual Machine
Warning: If JDK1.1.8 or above was not installed on the same machine, your installation will not continue. 2. Use the installation wizard to select necessary components from the list shown in Figure A-6.
Figure A-6 Product selection window of WebSphere Translation Server setup wizard
If WebSphere Translation Server is being installed on your machine for the first time, you must select the WebSphere Translation Server Plugin Support and WebSphere Translation Server Support components. Although more than one User Dictionary Manager can be selected (assuming you have downloaded the corresponding udm**.JAR files), only one language engine can be installed at a time.
150
e-Business Globalization Solution Design Guide
Before installing a language engine, please rename the corresponding “mt**.jar” to “language.jar”. For example, if you want to install the Simplified Chinese Engine, you must rename the “mtcn.jar” file to “language.jar”.1 Use the installation wizard to select the default options as you proceed through the steps that follow. 3. Repeat the installation process until all necessary language JARs and User Dictionary Managers have been installed. 4. When you run fixpack.jar, Figure A-7 will appear.
Figure A-7 Select FixPack Component(s) to install window of FixPack Installation Wizard
From this list, select the components that you want to install. 5. After the installation is finished, restart your server.
A.5 UDDI Registry Center In this section we introduce the steps required to create a private UDDI Registry Center on a Windows 2000 Server. 1. Install IBM DB2 Universal Database Enterprise Edition V7.2 for Windows: a. Run setup.exe for the DB2 software package. b. Use the installation wizard to select the appropriate options.2 Be sure to remember the user name and password that you establish for the DB2 administrator,3 since these will be needed for the installation tasks that follow. Add this user name to the Administrative Group for your computer. c. After installation is complete, restart your server.
1 Since only one language engine can be installed at a time, the installation wizard installs the language.jar only. By renaming “mtcn.jar” to “language.jar”, the Simplified Chinese language engine will be installed. 2 Defaults should be chosen in most instances. 3 For example, “db2admin”
Appendix A. Server-side installation and configuration for Our Global Travel Shanghai Demo
151
2. Install IBM WebSphere Application Server Advanced Edition Single Server V4.0 for Windows a. Run setup.exe for the WebSphere Application Server SE V4.0 software package. b. There are two installation options, Typical and Custom. Select Custom Installation. c. In the Choose Application Server Components window, select all necessary components.
Figure A-8 Choose Application Server Components window of WebSphere Application Server Single Server V4.0 Installation Wizard
If there is no IBM HTTP Server pre-installed on your machine, be sure to include this selection since it is required for proper functioning of the WebSphere Application Server Single Server and IBM UDDI Registry Preview. d. Select IBM HTTP Server in the Choose Web Server Plugins window.
152
e-Business Globalization Solution Design Guide
Figure A-9 Choose Web Server Plugins window of WebSphere Application Server Single Server V4.0 Installation Wizard
e. In the Security Options window, fill in the user name and password having DB2 database administrative privileges (for example, “db2admin”).
Figure A-10 Security Options window of WebSphere Application Server Single Server V4.0 Installation Wizard
f. Use the installation wizard to select the default options as you proceed through the steps that follow. g. Once installation is complete, restart your server. 3. Install IBM WebSphere UDDI Registry Preview for Windows a. Run install.bat for the UDDI Registry Preview software package.
Appendix A. Server-side installation and configuration for Our Global Travel Shanghai Demo
153
b. Fill in the DB2 database and HTTP Server administrative privileges user and password (for example, “db2admin”).
Figure A-11 Installation Wizard of IBM WebSphere UDDI Registry Preview
c. Select the default options as the installation wizard leads you through the steps that follow. For more details, refer to the readme.txt for this package. 4. Once installation is complete, WebSphere will automatically start your DB2, HTTP server, WebSphere, and WebSphere UDDI Registry Preview applications. 5. To start the UDDI Registry Center, run startUDDI.bat from the installation directory. 6. To stop the UDDI Registry Center, run stopUDDI.bat from the installation directory
A.6 IBM WebSphere Personalization Server V4.0 In this section we explain how to install an IBM WebSphere Personalization Server for Linux on a Red Hat Linux 7.1 server. Prerequisites: IBM WebSphere Application Server must have been installed on the same machine, and the WebSphere Application Server Administration Server must have been started. 1. Insert the IBM WebSphere Personalization Server V4.0 CD-ROM into your CD-ROM driver and mount the CD. 2. Change your directory to the installation root and run the ./install.sh installation script. 3. In the product selection window, select IBM WebSphere Personalization V4.0 Server.
154
e-Business Globalization Solution Design Guide
Figure A-12 IBM WebSphere Personalization V4.0 Suite Installer
4. During installation, the WebSphere Application Server configuration will be accessed and the rpm (Red Hat Package Manager) database automatically updated. 5. Use the installation wizard to select the appropriate options. Defaults should be chosen in most instances.
Appendix A. Server-side installation and configuration for Our Global Travel Shanghai Demo
155
156
e-Business Globalization Solution Design Guide
B
Appendix B.
Client-side installation and configuration for Our Global Travel Shanghai Demo This appendix is optional reading for end users, but should prove most helpful to globalization application developers.
B.1 Installation In our working example, Our Global Travel Shanghai Demo acts as a global Web site supporting 12 languages. In order to conserve resources, we need for one client PC to display the Web site in all supported languages. We recommend that the following software be installed on this client PC: Windows 2000 MUI Internet Explorer 5.5 (or higher) or Netscape Communicator 6.2 (or higher) The Macromedia Flash 5.0 plug-in
B.2 Configuration Client-side configuration requires two steps: configuring system settings and configuring browser settings.
B.2.1 System settings configuration This determines which languages can be supported by this PC and thus displayed on its browser. 1. In the Windows menu bar click Start -> Settings -> Control Panel to open the Control Panel window, and then double-click Regional Options to configure regional settings.
158
e-Business Globalization Solution Design Guide
Figure B-1 Open “egional Options under Control Panel
2. In the Language settings for the system pane, check all of the following: – – – – – – –
Arabic Hebrew Japanese Korean Simplified Chinese Traditional Chinese Western Europe and United States
and then click the Apply button.
Appendix B. Client-side installation and configuration for Our Global Travel Shanghai Demo
159
Figure B-2 Select language settings
Note: Here you might be asked to locate Windows 2000 setup files (that is, on your Windows 2000 installation CD) in order to install the files needed to support these languages.
B.2.2 Browser settings configuration Browser language settings determine the languages that browsers can display, as well as the default language for Our Global Travel Shanghai Demo Web page. Users can change language settings according to their preferences. Using Internet Explorer 5.5 as an example, configuring browser settings requires the following: 1. From the Internet Explorer menu bar, select Tools -> Internet Options.
Figure B-3 Click Internet Options on the IE menu
2. Now click the Language button at the lower right corner of the General tab.
160
e-Business Globalization Solution Design Guide
Figure B-4 Open the Internet Options window
3. The Language Preference window pops up. Select all languages that you need the Web site to display, and then position your preferred language at the top of the list, as shown in Figure B-5 on page 162, and click OK.
Appendix B. Client-side installation and configuration for Our Global Travel Shanghai Demo
161
Figure B-5 Edit Language Preference sequence
4. Click the Settings button to the right of the General tab, and from the Settings window (shown in Figure B-6) select the Every visit to the page option and click OK.
Figure B-6 Change settings for users visiting these pages
5. On the Internet Options window, select the Advanced tag and then click the Restore Defaults button at the bottom. Now click OK to apply those settings and close the “Internet Options” panel.
162
e-Business Globalization Solution Design Guide
Figure B-7 Restore default settings
6. On the first visit to Our Global Travel Shanghai Demo, you might be asked to install the “Flash 5.0" plugin. If so, click OK to download and install it.
Appendix B. Client-side installation and configuration for Our Global Travel Shanghai Demo
163
164
e-Business Globalization Solution Design Guide
C
Appendix C.
CSS and artwork globalization As e-businesses have increasingly gone global and multilingual, people have begun to recognize the importance of designing and developing their artwork from a globalization perspective—that is, in a way that makes it easy to add new language versions for these applications. Just as it is usually difficult to add new languages to non-globalized e-business applications, artwork done without a globalization perspective can be very expensive in terms of the time and effort required to produce new language versions. Many pictures and tables might otherwise need to be redrawn or re-designed that would not have been had they been done initially from a global perspective. The following sections of this appendix considers how to exploit CSS in design and development of globalization artwork.
What is CSS? CSS stands for cascading style sheets. Styles define how to display HTML elements. For example, “font-size: 12px” defines the font size, and “background-color: #FFFFFF” defines its background color. However, when a Web designer uses the class of “heading”, the related font and background color will be displayed on a Web page. Styles can be separated from HTML code and stored in external style sheets, and multiple external style sheets can be cascaded into a single CSS file. Example C-1 contains a CSS code sample. Example: C-1 Sample CSS code // define the font and background color of 'heading' element which can be user-defined heading { font-family: sans-serif; background-color: #FFFFFF; font-size:12px; }
Advantages of CSS CSS has brought with it many improvements in Web page design. A Web designer can now design layouts for multiple Web pages and even an entire site with just one CSS or a relatively small collection of them rather than designing the site page-by-page. Moreover, if a site-scope design change is needed, changing the CSS alone handles this chore for the whole site.
The big advantage that CSS brings to the field of globalized e-business artwork design is that a well-designed CSS can meet all display requirements of Web pages in different languages, and as such CSS can be viewed as the Single Executable for artwork design. We borrow the concept of the “Single Executable” from e-business globalization, where it refers to the technique of developing software programs that allow the Single Executable that is produced from the compiled source to handle the cultural convention needs of all supported languages.1 A globalized program is a program devoid of all cultural information. Similarly, a globalized CSS is a CSS without cultural information. To support the Single Executable in programming, the culture-sensitive computing is separated from the Single Executable codes, and whenever a piece of cultural information is needed, the Single Executable codes will invoke the culture-sensitive computing to retrieve the information. Moreover, language-dependent data is also separated from the source code and is grouped into localization packs. When an application needs to display textual data, the Single Executable codes will retrieve this specific data from the localization packs. The Single Executable technique is also applicable to artwork globalization, and the following are some tips for making a CSS file Single Executable for artwork globalization.
C.1 How to make CSS Single Executable We recommend using a central CSS that contains the general elements of Web page layout, and this CSS is devoid of culture-sensitive information or language-dependent data. Culture-sensitive information can be defined in separate CSS files that can be invoked by the central CSS. Similar to the globalization programming, language-dependent data can be stored in separate localization packs, which can be retrieved through JSP codes to be displayed in Web pages.
C.1.1 Avoid locale-related restrictions Different countries and regions have their own cultural preferences and restrictions, and we recommend that CSS codes handling these features be separated from the central CSS. The following is an example taken from the artwork design for Our Global Travel Shanghai Demo handling color preferences in different countries.
Color Color preferences exist in most countries and regions. Table C-1 lists the implications of preferred colors in various regions. Table C-1 Color preferences of various countries/regions
1
166
Country/Region
Color
Implication
United States
Blue
Trustworthy, official business, philosophy, soothing
Celebration, government, fire, summer, good luck, joy, fertility, good fortune
Taiwan
Red
Celebration, government, fire, summer, good luck, joy, fertility, good fortune
South Korea
Yellow
Joy, happiness
Japan
Orange
Love, happiness
In Our Global Travel Shanghai Demo, we consequently implemented different background colors for different language versions. See Figure C-1.
Figure C-1 Using different background colors for different language versions
For each locale supported by Our Global Travel Shanghai Demo, we defined a cascading style sheets file name as “default.css” containing culture-sensitive information. default.css is
Appendix C. CSS and artwork globalization
167
separated from the central CSS file. In default.css, we define the background color for this locale. See Example C-2. Example: C-2 Defining the background color .background {background-color: #327A38} (en_US) .background {background-color: # 456384} (de_DE) .background {background-color: # E87830} (ja_JP)
Each default.css is stored under the directory named with the locale, as shown in Example C-3. Example: C-3 Storing default.css /en_US/default.css /de_DE/default.css /ja_JP/default.css
168
e-Business Globalization Solution Design Guide
In this way, JSP codes can easily retrieve culture-sensitive information from these CSS fields, as illustrated in Example C-4. Example: C-4 Retrieving culture-sensitive information from CSS fields <% String localStr = “en_US” // localStr should be locale relative string, such as de_DE,jp_JP etc. %> /css/default.css”>
C.2 Avoid language-dependent restrictions Textual data in images Since image file textual data cannot be automatically translated by today's translation tools, when textual data in one language in a Web page image is displayed in another, that image must be re-drawn to accommodate the translated text. To avoid this kind of problem, we recommend that your images not contain textual data so that e-business application image files might be used in all language versions.
Tables Table design (especially with regards to column width) can also be adversely affected by the languages of the data contained in the table. A sentence or phrase in English can grow longer or shorter when translated into another language, so that if a piece of data must be contained in a single table, columns should be wide enough to contain that data in multiple language versions. Figure C-2 shows a table cell not wide enough to hold the French translation of “How are you.”
Figure C-2 Table with fixed column width
The CSS code producing the table in Example C-5 is shown in Example 1. Example: C-5 Sample CSS codes producing table with fixed column width .table { font-family: “Arial”, “Helvetica”, “sans-serif”; font-size: 12px; font-weight: bold; color: #000066; background-color: #BCEAFE; height: 26px; width: 120px; }
Since its table column width is fixed, the data in French is displayed on two lines instead of one, making the table look awkward.
Appendix C. CSS and artwork globalization
169
To solve this problem, we can remove the width fixation from CSS as follows: Example: C-6 Sample CSS codes producing table without fixed column width .table { font-family: “Arial”, “Helvetica”, “sans-serif”; font-size: 12px; font-weight: bold; color: #000066; background-color: #BCEAFE; height: 26px; }
As a result, the cells now appear as shown in Figure C-3.
Figure C-3 Table with flexible column width
Buttons As with table design, a button should also be wide enough to neatly display its label. Figure C-4 represents a bad example of button design.
Figure C-4 Button with distorted label due to fixed width
This overlap is caused by the fixed width of the button. Even in CSS, if a button width is fixed, then a label will occupy more than one line when it exceeds that button width. This happens frequently in e-business Web site globalization because design artists unfortunately fix button widths according to the length of the label in their source language versions. Then when the label is translated into other languages, its length can easily outgrow the button width. To solve this type of problem, we recommend that in artwork design, not to fix button widths by employing the following CSS code: button {width: n px; ...}
where “n” refers to a number of pixels.
Figure C-5 Button with correct label
If its width is not fixed, a button can adjust its length according to that of its label—a prime example of how a single CSS might handle Web pages in multiple languages.
170
e-Business Globalization Solution Design Guide
If a button is inset in a table cell, then we recommend against fixing that table's column width. Otherwise, you will limit button width so that when the label exceeds its column width, its button will then be unable to expand its own width to match the length of the label.
C.3 Further considerations for bi-directional data display If an e-business has Arabic or Hebrew versions, the design artist should also take into consideration bi-directional data display. This means that Arabic and Hebrew data is displayed from right to left rather than left to right, as is the case for data in most other languages.
Figure C-6 Uni-directional data display
Figure C-7 Bi-directional data display
Table C-2 reflects our experience in BiDi (bi-directional) support: Table C-2 Suggested CSS/HTML for BiDi support Items
BiDi Consideration
Explanation
Text Direction
div{direction: rtl; unicode-bidi: embed;}in BiDi. (rtl: right to left, ltr: left to right)
The direction style can be used in many HTML elements, such as or
. Normally, we add this attribute in the body tag, so the whole page will be right to left or left to right. This style should dynamically get its value from the localization pack, where we define the page direction.
Alignment
div { text-align : rightr; } is the default for Bidi elements with an right to left direction.
We recommend not to specify the alignment explicitly when it is the same as the default in html/jsp.
Frame
Framesets do not support the direction specification.
If the relative horizontal positions of frames have to be mirrored for right to left presentation, this must be done by manually changing the order of the frames within the frameset. Consequently,using frames in HTML is not recommended. Furthermore, it is not easy to communicate between the different frames in one frameset.
Appendix C. CSS and artwork globalization
171
Items
BiDi Consideration
Explanation
Picture
Some pictures need to be mirrored for right to left presentation.
It is recommended that the JSP generate image file names dynamically.
Special Character
Some special chars will be handled, such as <,(,{,[ swapped with >, ], },)respectively.
This swap will be done within right to left segments of text. But in some cases, the boundaries of those right to left segments cannot be correctly induced algorithmically, so it is possible to “help” the algorithm by adding LRM(\u200E, left to right mark), RLM(\u200F, right to left mark), or <span> and
tags with a dir attribute or style at the proper places.
Conclusion With the proper use of CSS in your globalized e-business project artwork, it is possible that only one set of CSS files will be needed for all language versions. Any modification and updating regarding style—including font, color, and page layout—can be implemented by simply changing the relevant parts in CSS rather than re-drawing or re-designing Web pages in every language. In this way, significant savings in time, effort, and monies can be realized.
172
e-Business Globalization Solution Design Guide
Glossary agent. (1) In systems management, a user that, for a particular interaction, has assumed an agent role. (2) An entity that represents one or more managed objects by (a) emitting notifications regarding the objects and (b) handling requests from managers for management operations to modify or query the objects. (3) A system that assumes an agent role. (4) Software that acts on behalf of a user as it performs tasks within an application program. An agent may run on both the client and the server. API.
See application programming interface.
code page. (1) An assignment of graphic characters and control function meanings to all code points; for example, assignment of characters and meanings to 256 code points for an 8-bit code, assignment of characters and meanings to 128 code points for a 7-bit code. (2) In the Print Management Facility, a font library member that associates code points and character identifiers. A code page also identifies invalid code points. (3) A particular assignment of hexadecimal identifiers to graphic characters. (4) In AFP support, a font file that associates code points and graphic character identifiers.
application programming interface. (1) A software interface that enables applications to communicate with each other. An API is the set of programming language constructs or statements that can be coded in an application program to obtain the specific functions and services provided by an underlying operating system or service program.
code point. (1) A 1-byte code representing one of 256 potential characters.
(2) In VTAM, the language structure used in control blocks so that application programs can reference them and be identified to VTAM.
DBCS.
BiDi. Languages with BiDirectional scripts such as Arabic and Hebrew, whose general flow of text proceeds horizontally from right to left, but numbers, English, and other left-to-right language text are written from left to right. browser.
See Web browser.
class. (1) In object-oriented design or programming, a model or template that can be instantiated to create objects with a common definition and therefore, common properties, operations, and behavior. An object is an instance of a class.
(2) In SNA management services (SNA/MS), a 1- or 2-byte value that identifies a particular meaning to the receiver of an alert so that appropriate text can be displayed. See Double Byte Character Set.
Document Object Model (DOM). A programming interface specification being developed by the World Wide Web Consortium (W3C) that lets a programmer create and modify HTML pages and XML documents as full-fledged program objects. Currently, HTML (HyperText Markup Language) and XML (eXtensible Markup Language) are ways to express a document in terms of a data structure. As program objects, such documents will be able to have their contents and data “hidden” within the object, helping to ensure control over who can manipulate the document. As objects, documents can carry with them the object-oriented procedures called methods. DOM is a strategic and open effort to specify how to provide programming control over documents.
client. A computer system or process that requests a service of another computer system or process that is typically referred to as a server. Multiple clients may share access to a common server.
document type definition (DTD). The rules that specify the structure for a particular class of SGML or XML documents. The DTD defines the structure with elements, attributes, and notations, and it establishes constraints for how each element, attribute, and notation may be used within the particular class of documents. A DTD is analogous to a database schema in that the DTD completely describes the structure for a particular markup language.
code.
DOM.
(2) In the AIX operating system, pertaining to the I/O characteristics of a device. System devices are classified as block or character devices.
Double Byte Character Set (DBCS). A set of characters in which each character is represented by 2 bytes. Scripts such as Japanese, Chinese, and Korean contain more characters than can be represented by 256 code points, thus requiring two bytes to uniquely represent each character. DTD.
See Document Type Definition.
e-business. Either (a) the transaction of business over an electronic medium such as the Internet or (b) any organization (for example, commercial, industrial, nonprofit, educational, or governmental) that transacts its business over an electronic medium such as the Internet. An e-business combines the resources of traditional information systems with the vast reach of an electronic medium such as the Internet (including the World Wide Web, intranets, and extranets); it connects critical business systems directly to critical business constituencies--customers, employees, and suppliers. The key to becoming an e-business is building a transaction-based Web site in which all core business processes (especially all processes that require a dynamic and interactive flow of information) are put online to improve service, cut costs, and sell products. EJB.
See Enterprise Java Beans.
Enterprise JavaBeans (EJB). EJB reduces the complexity of developing middleware by providing automatic support for middleware services such as transactions, security, database connectivity and so on. It simplifies the middleware development that are transactional, portable and scalable. eXtensible Markup Language (XML). A standard meta language for defining markup languages that was derived from and is a subset of SGML. XML omits the more complex and less-used parts of SGML and makes it much easier to (a) write applications to handle document types, (b) author and manage structured information, and (c) transmit and share structured information across diverse computing systems. The use of XML does not require the robust applications and processing that is necessary for SGML. XML is being developed under the auspices of the World Wide Web Consortium (W3C). eXtensible Stylesheet Language (XSL). A Working Draft of the World Wide Web Consortium (W3C) that describes a language for specifying style sheets for XML documents. XSL originates from the Cascading Style Sheet (CSS) language that was developed for HTML and the Document Style Semantics and Specification Language (DSSSL) that was developed for SGML. Just as XML provides the capability to create any number of classes of documents, XSL provides styling capabilities that can be applied to any class of document. font. A family of characters of a given size and style; for example, 9-point Helvetica.
174
e-Business Globalization Solution Design Guide
G11N.
See Globalization.
globalization. In software engineering, the combined processes of internationalization and localization. Globalization is sometimes known as National Language Support (NLS), internationalization is sometimes known as NLS enablement, and localization is sometimes known as NLS implementation. The word “globalization” is sometimes abbreviated as “G11N”; this notation is used because there are 11 letters between the first letter “G” and the last letter “N” in the word “globalization.” See also Internationalization and localization. glyph. (1) The Unicode Standard (Version 1.0) defines glyph as the actual shape (bit pattern, outline) of a character image. For example, an italic A and a roman A. are two different glyphs representing the same underlying character. Strictly speaking, any two images which differ in shape constitute different glyphs. In this usage, glyph is a synonym for character image, or simply image. (2) An image, usually of a character, in a font. (IBM Dictionary of Computing). Graphical User Interface (GUI). A type of computer interface consisting of a visual metaphor of a real-world scene, often of a desktop. Within that scene are icons, representing actual objects, that the user can access and manipulate with a pointing device. GUI.
See Graphical User Interface.
home page. The initial Web page that is returned by a Web site when a user specifies the uniform resource locator (URL) for the Web site. For example, if a user specifies the URL for the IBM Web site, which is http://www.ibm.com, the Web page that is returned is the IBM home page. Essentially, the home page is the entry point for accessing the contents of the Web site. The home page may sometimes be called the “welcome page” or the “front page.” HTML.
See HyperText Markup Language.
HTTP.
See HyperText Transport Protocol.
HyperText Markup Language (HTML). A markup language that conforms to the SGML standard and was designed primarily to support the online display of textual and graphical information that includes hypertext links. HyperText Transport Protocol (HTTP). In the Internet suite of protocols, the protocol that is used to transfer and display hypertext documents. I18N.
See Internationalization.
icon. (1) A graphic symbol, displayed on a screen, that a user can point to with a device such as a mouse in order to select a particular function or software application. (2) A graphical representation of an object (for example, a file or program) that consists of an image, an image background, and a label. ICU. See International Components for Unicode Technology. ICU4C. “ICU for C” is an ICU implementation in C and C++. It provides robust, full-featured and most up-to-date Unicode support on a wide variety of platforms. ICU4J. “ICU for J” is an ICU implementation in Java. It supplies functionality that is not found in the standard Java runtime and allow you to provide a more completely internationalized application. ideogram. A picture or symbol used in a system of writing to represent a thing or an idea but not a particular word or phrase for it.1 ideographic language. A written language in which each character (ideogram) represents a thing or an idea (but not a particular word or phrase). An example of such a language is written Chinese (Zhongwen). Contrast with logogram and phonetic language. IME.
See Input Method Editor.
Input Method Editor (IME). Software components that perform conversions between user operations such as typing keys, speaking, or writing using a pen device to generate text input for applications, usually by user-guided dictionary lookup. The most common input method editors allow users to type text in Chinese, Japanese, or Korean languages, that have thousands of different characters, on a regular-sized keyboard. Typically a sequence of several characters are entered and then converted as a single entity. This conversion may have to be retried because there may be several possible translations. Similarly, for hand-writing recognition, the user may write a series of characters, they are converted, and then the user selects the correct text from several possible conversion results. International Components for Unicode Technology (ICU). The International Components for Unicode supply a complete package for Unicode enabling software for both C/C++ and Java programming languages. ICU provides robust, full-featured, commercial-quality and freely available Unicode support. It includes Unicode compliant support for locale-sensitive string comparison, date/time/number/currency/message formatting, text boundary detection, character set conversion, transliteration and so on. The design and architecture of ICU is parallel to the internationalization support in JDK.
International Organization for Standardization (ISO). An organization of national standards bodies from various countries established to promote development of standards to facilitate international exchange of goods and services, and develop cooperation in intellectual, scientific, technological, and economic activity. internationalization. In software engineering, the process of producing a product that is independent of any particular language, script, culture, and coded character set. Strictly speaking, an internationalized product is not usable in any region of the world until it is localized to a specific region. Once a product has been internationalized, it can be localized for a specific language, script, culture, and coded character set with minimal expense and effort. The word “internationalization” is sometimes abbreviated as “I18N”; this notation is used because there are 18 letters between the first letter “I” and the last letter “N” in the word “internationalization.” See also globalization and localization. ISO.
See International Organization for Standardization.
JAR. Java ARchive, a file that contains the class, image, and sound files for a Java applet gathered into a single file and compressed for faster downloading to your Web browser. An applet that comes as part of a Web page that you may happen to request may include several files, each of which would have to be downloaded along with the Web page. By putting the applet components in a single file and compressing that file, download time is saved. When a programmer gets a Java program development kit, a small program or utility called “jar” is included. The jar utility lets you create, list, and extract the individual files from a JAR file. Ordinarily, a browser user will not need to “open” or view a JAR file directly. It is opened when the Web page is received and the applet is in some manner initiated. The JAR format is based on the popular zip file format. Java. An object-oriented programming language for portable interpretive code that supports interaction among remote objects. Java was developed and specified by Sun Microsystems, Incorporated. Java Database Connectivity (JDBC). An application programming interface (API) that has the same characteristics as Open Database Connectivity (ODBC) but is specifically designed for use by Java database applications. Also, for databases that do not have a JDBC driver, JDBC includes a JDBC to ODBC bridge, which is a mechanism for converting JDBC to ODBC; it presents the JDBC API to Java database applications and converts this to ODBC. JDBC was developed by Sun Microsystems, Inc. and various partners and vendors.
Glossary
175
Java Development Kit (JDK). A software package that can be used to write, compile, debug, and run Java applets and applications. Java Virtual Machine (JVM). A software implementation of a central processing unit (CPU) that runs compiled Java code (applets and applications). JDK.
See Java Development Kit.
JavaServer Pages (JSP). A technology for controlling the content or appearance of Web pages through the use of servlets, small programs that are specified in the Web page and run on the Web server to modify the Web page before it is sent to the user who requested it. Sun Microsystems, the developer of Java, also refers to the JSP technology as the Servlet application program interface (API). JDBC.
See Java Database Connectivity.
JSP.
See JavaServer Pages.
JVM.
See Java Virtual Machine.
L10N.
See localization.
locale. The term locale was borrowed by software engineering from geography to indicate that the distribution of human cultural expectations of computer behavior fall into clumps that can be grouped together, most commonly by language and country or region. This clumping of expectations has allowed the use of computer standards that describe sets of related expectations, such as how dates and times are formatted and how words are sorted. For the purposes of this book, a locale means a specification of a language and country, or a specification of a language, country, and variant. Thus a locale can be specified by a string such as “French-Belgium”. It does not mean a data structure that contains information for a language and country. localization. In software engineering, the process of adapting an internationalized product for a specific language, script, culture, and coded character set. In localization, semantics are preserved, but syntax may change. The word “localization” is sometimes abbreviated as “L10N”; this notation is used because there are 10 letters between the first letter “L” and the last letter “N” in the word “localization.” See also globalization and internationalization. localization pack. For a single program to have one executable for multiple locales, it must have a standardized approach to working with different sets of locale-specific program data. 'Localization Packs' is the generic name given to the standardized approaches that this globalization architecture recommends.
logogram. (1) A letter, symbol, or sign used to represent an entire word.1 See also ideogram. (2) The Chinese writing system is called a logographic or character system, because each symbol is a character/logograph. (The following is a paraphrasing of Comrie's definition.) There are five processes by which the characters are created: pictographs, ideographs, compound ideographs, loan characters, and phonetic compounds2. machine translation. Automatic translation of human language by computers. middleware. A vague term that refers to the software between an application program and the lower-level platform functions. multilingual. (1) A multilingual program can support multiple languages, either concurrently, or one language at any given time. (2) Referring to many languages. A multilingual program strives to handle data in a way that is not dependent on a particular language or writing system. Multilingual documents combine text which is written in different languages. Multilingual may refer to many languages which all use the same script (such as English, French, and German), or to many languages which use distinct scripts (such as German, Hebrew, and Korean). The latter case is also referred to as multiscript. National Language Version. A variant of an original product that implements national language support for a particular region of the world. pel.
See picture element.
phonetic language. A written language in which separate symbols represent vowels and consonants. Examples of phonetic languages are English, Greek, and Russian. Contrast with ideographic language. picture element. Also called a “pel” or “pixel”. (1) In computer graphics, the smallest element of a display surface that can be independently assigned color and intensity. (2) The area of the finest detail that can be reproduced effectively on the recording medium. (3) An element of a raster pattern about which a toned area on a photoconductor can appear. pixel.
See picture element.
SAX.
See Simple API for XML.
SBCS.
176
e-Business Globalization Solution Design Guide
See Single Byte Character Set.
server. A functional unit that provides services to one or more clients over a network. Examples include a file server, a print server, and a mail server. Service Oriented Architecture (SOA). The architecture applied in Web services. It sets forth three roles and three operations. The three roles are the service provider, the service requester, and the service registry. The objects acted upon are the service and the service description, and the operations performed by the actors on these objects are publish, find, and bind. servlet. An application program, written in the Java programming language, that is executed on a Web server. A reference to a servlet appears in the markup for a Web page, in the same way that a reference to a graphics file appears. The Web server executes the servlet and sends the results of the execution (if there are any) to the Web browser.
Single Executable. The technique of developing software programs that allows the Single Executable that is produced from the compiled source to handle the cultural convention needs of all supported languages. That is, a globalized program is produced. A globalized program is devoid of all cultural information. SOA. SOAP.
See Service Oriented Architecture. See Simple Object Access Protocol.
Traditional Chinese. Characters used mainly in Taiwan and historic Chinese documents. See also Simplified Chinese. UDDI. See Universal Description, Discovery, and Integration. UDM.
See User Dictionary Manager.
Simple API for XML (SAX). An application program interface (API) that allows a programmer to interpret a Web file that uses the Extensible Markup Language (XML)—that is, a Web file that describes a collection of data. SAX is an alternative to using the Document Object Model (DOM) to interpret the XML file. As its name suggests, it's a simpler interface than DOM and is appropriate where many or very large files are to be processed, but it contains fewer capabilities for manipulating the data content.3
Unicode. An international character code for information processing. Its aim is to encode all characters used for written communication in a simple and consistent manner. The Unicode character encoding was established as a fixed-width encoding of 16 bits, to provide enough code points for all the scripts and technical symbols in common usage around the world, plus some ancient scripts. Accented characters can be composed by concatenating two or more Unicode characters.
Simple Object Access Protocol (SOAP). A communications technology, and message enveloping mechanism based on XML (Extensible Markup Language). SOAP was developed jointly by IBM, Microsoft, and several other companies and has been submitted to W3C for standardization. A major application of SOAP is to allow businesses to link their computing systems over the Internet in a platform-independent manner.
Unicode Standard. A universal character encoding standard that supports the interchange, processing, and display of text that is written in any of the languages of the modern world. It can also support many classical and historical texts and is continually being expanded. The Unicode Standard is compatible with ISO/IEC 10646.4
Simplified Chinese. Characters defined and used in the People's Republic of China and Singapore. Simplified Chinese characters are derived from traditional Chinese characters in one of two ways: 1. Remove or simplify the strokes of the traditional Chinese character; 2. Replace the traditional Chinese character by another (simpler) traditional Chinese character.
Universal Description, Discovery, and Integration (UDDI). The focus of Universal Description Discovery & Integration (UDDI) is the definition of a set of services supporting the description and discovery of (1) businesses, organizations, and other Web services providers, (2) the Web services they make available, and (3) the technical interfaces which may be used to access those services. Based on a common set of industry standards, including HTTP, XML, XML Schema, and SOAP, UDDI provides an interoperable, foundational infrastructure for a Web services-based software environment for both publicly available services and services only exposed internally within an organization.5
See also Traditional Chinese. Single Byte Character Set (SBCS). A set of characters in which each character is represented by 1 byte.
Glossary
177
Universal Resource Locator (URL). (1) A sequence of characters that represent information resources on a computer or in a network such as the Internet. This sequence of characters includes (a) the abbreviated name of the protocol used to access the information resource and (b) the information used by the protocol to locate the information resource. For example, in the context of the Internet, these are abbreviated names of some protocols used to access various information resources: http, ftp, gopher, telnet, and news; and this is the URL for the IBM home page: http://www.ibm.com. (2) The address of an item on the World Wide Web. It includes the protocol followed by the fully qualified domain name (sometimes called the host name) and the request. The Web server typically maps the request portion of the URL to a path and file name. For example, if the URL is http://www.ibm.com/e-business/info/, the protocol is http; the fully qualified domain name is www.ibm.com; and the request is /e-business/info/. URL.
See Uniform Resource Locator.
user. (1) Any person or any thing that may issue or receive commands and messages to or from the information processing system. (2) Anyone who requires the services of a computing system. User Dictionary Manager (UDM). In the IBM WebSphere Translation Server, User Dictionary Manager (UDM) is a dictionary creation utility that allows you to create domain-specific dictionaries for use with the IBM WebSphere Translation Server. The existing dictionaries of the WebSphere Translation Server may not address words or senses of words specific to your application. The User Dictionary Manager helps you compose supplementary dictionaries that contain words and phrases specific to your application. UTF-8. UCS Transformation Format, 8-bit. An X/Open standardized encoding which includes all of the characters represented in ISO/IEC 10646, such that no null bytes (signalling End of File on a UNIX file system) are imbedded in the data stream. The encoding uses one to six bytes to represent a character. The encoding can be used to support a Unicode charmap for an XPG4 locale. See also ISO/IEC 10646 and Unicode. UTF-16. UCS transformation format for 16 planes of group 00, 16-bit form. UTF-16 is the ISO/IEC encoding that is equivalent to the Unicode standard with the use of surrogates. See also ISO/IEC 10646 and Unicode. UTF-32. UCS transformation format, 32-bit form. It is restricted in values to the range 0x00000000 to 0x0010FFFF. See also ISO/IEC 10646 and Unicode. W3C.
See World Wide Web Consortium.
178
e-Business Globalization Solution Design Guide
Web.
See World Wide Web.
Web browser. A client program that initiates requests to a Web server and displays the information that the server returns. Web page. Any document that can be accessed by a uniform resource locator (URL) on the World Wide Web. Web Service. An interface whose service description specifies a set of operations using a standard form of XML notation to provide the low level details required to invoke the service over the Web. Web Services Description Language (WSDL). An XML format for describing network services as a set of endpoints operating on messages containing either document-oriented or procedure-oriented information. The operations and messages are described abstractly, and then bound to a concrete network protocol and message format to define an endpoint. Related concrete endpoints are combined into abstract endpoints (services). WSDL is extensible to allow description of endpoints and their messages regardless of what message formats or network protocols are used to communicate, however, the only bindings described in this document describe how to use WSDL in conjunction with SOAP 1.1, HTTP GET/POST, and MIME.6 Web site. A Web server that is managed by a single entity (an organization or an individual) and contains information in hypertext for its users, often including hypertext links to other Web sites. Each Web site has a home page. In a uniform resource locator (URL), the Web site is indicated by the fully qualified domain name. For example, in the URL http://www.ibm.com/e-business/info/, the Web site is indicated by www.ibm.com, which is the fully qualified domain name. World Wide Web (WWW). A network of servers that contain programs and files. Many of the files contain hypertext links to other documents available through the network.
World Wide Web Consortium (W3C). The World Wide Web Consortium (W3C) describes itself as follows: “The World Wide Web Consortium exists to realize the full potential of the Web. The W3C is an industry consortium which seeks to promote standards for the evolution of the Web and interoperability between WWW products by producing specifications and reference software. Although W3C is funded by industrial members, it is vendor-neutral, and its products are freely available to all. “The Consortium is international; jointly hosted by the MIT Laboratory for Computer Science in the United States and in Europe by INRIA who provide both local support and performing core development. The W3C was initially established in collaboration with CERN, where the Web originated, and with support from DARPA and the European Commission.” WSDL.
See Web Services Description Language.
XML.
See eXtensible Markup Language.
XSL.
See eXtensible Stylesheet Language.
1. Webster's Ninth 2. Comrie 3. See http://searchwebservices.techtarget.com/sDefinition/0,,sid26_gci213728,00.html 4. For more information, see http://www.unicode.org/. 5. UDDI Version 3.0 Published Specification 6. Web Services Description Language (WSDL) 1.1
Glossary
179
180
e-Business Globalization Solution Design Guide
Related publications The publications listed in this section are considered particularly suitable for a more detailed discussion of the topics covered in this redbook.
IBM Redbooks For information on ordering these publications, see “How to get IBM Redbooks” on page 182. WebSphere Advanced V4.0 Handbook, SG24-6176 Web Services Wizardry with WebSphere Studio Application Developer, SG24-6292 IBM WebSphere Voice Server 2.0 Implementation Guide, SG24-6537
Other resources Further information sources: XML SPY Version 3.5 documentation, found by clicking Help on the menu bar InfoCenter of the WebSphere Translation Server V1.0, available in the product software under the WTS directory Java Servlet Specifications Version 2.3, found at http://jcp.org/jsr/detail/053.jsp The Internationalization Service in IBM WebSphere, by Debasish Banerjee, Jeffrey Frey, and Robert High, presented at the 20th International Unicode Conference, Washington, DC (Jan 2002) found at http://www.unicode.org/iuc/iuc20/a318.html Towards the Internationalization of Web Services in IBM WebSphere by Debasish Banerjee and Casey A. Swenson, found at http://www.w3.org/2002/02/01-i18n-workshop/Banerjee.html The Unicode Standard 3.0, found at http://www.unicode.org/unicode/uni2book/u2.html UDDI Version 3.0 Published Specification, found at http://www.oasis-open.org/committees/uddi-spec/tcspecs.shtml#uddiv2 Web Services Description Language (WSDL) 1.1, found at http://www.w3.org/TR/wsdl
Referenced Web sites These Web sites are also relevant as further information sources: IBM Globalization Strategy http://eou2.austin.ibm.com/global/global_int.nsf/Publish/982
The Java Tutorial—Using JFC/Swing Text Components http://java.sun.com/docs/books/tutorial/uiswing/components/text.html
JavaTM 2 Platform, Standard Edition, v 1.3.1 API Specification http://java.sun.com/j2se/1.3/docs/api/java/util/Locale.html
IBM TranslationManager http://tcdct1.bbhulb91.de.ibm.com/ibmtrans/tm2.htm
Hypertext Transfer Protocol—HTTP/1.1 (RC 2616) http://www.cis.ohio-state.edu/cgi-bin/rfc/rfc2616.html
TRADOS Language Technology http://www.trados.com
Unicode Home Page http://www.unicode.org
How to get IBM Redbooks You can order hardcopy Redbooks, as well as view, download, or search for Redbooks at the following Web site: ibm.com/redbooks
You can also download additional materials (code samples or diskette/CD-ROM images) from that site.
IBM Redbooks collections Redbooks are also available on CD-ROMs. Click the CD-ROMs button on the Redbooks Web site for information about all the CD-ROMs offered, as well as updates and formats.
182
e-Business Globalization Solution Design Guide
Index Numerics 8859-1 character encoding 30, 43, 149
A abbreviations 27, 46 Access 31 address format 127 Africa 27 AIX 81, 85 all-in-one character repository 30 American 10, 99, 103, 126 English 97 AMI See Application Managed Internationalization ampersand 27 antonyms 45 Apache 80 applets 79 application program structure 92 programming code 136 run time environment 120 scenario 119 Application Managed Internationalization 99 application-dependent localization packs 29 application-independent localization packs 29, 57 ar_EG 25, 97, 128, 131 Arabic 3, 7, 13, 26–27, 40–41, 64–66, 69, 93, 117, 125, 131, 159, 171 character shapes 39 architectural design phase 88 artwork design 131–132 artwork-related problems 131 asterisk 27 automatic categorization 48 clustering 48 language detection 47 identification 47 summarization 47
B B2B 75 B2C 75 base 10 24 Belgium 23, 27 BiDi See bi-directional bi-directional display 65, 125, 128, 171 input and output 3 rendering 41
formal review 132 Format Bean 94, 114 formatBean object 94 fortunate colors See lucky colors fr_FR 94, 97, 115 France 117, 137, 166 French 3, 13, 23, 27, 40, 64, 67–68, 97, 103, 107–108, 117, 137, 169 function testing 120, 130
G GBO See Global Business Object GBOMeasure 115 GBONameAddressStyle 113–114 GCL ix, xi–xii, 13, 60 See also Globalization Certification Laboratory German 8, 13, 26–27, 49, 52–53, 64–65, 70–71, 107–112, 117, 129–131 Germany 137, 166 global orientation 40 Global Business Object 51, 54, 94, 112–117, 136 globalization ix–xii, 1, 3, 5–7, 9–10, 13–14, 19, 23, 30, 37, 57–58, 64, 81, 85, 87–88, 96, 103, 107, 119–120, 125, 137, 157, 165–166, 170 features 119 testing 125 Globalization Certification Laboratory ix–xi Globalization Comprehensive Interoperability Test Services ix Globalization Consultation Services ix Globalization Enablement Services ix glyphs 4, 38, 41–42, 131, 174 description 39 grammar checking 44 grammatical mistakes 43 Greece 27 Greek 27 Gregorian calendar 26, 99 GSK certificate 142 GUI elements 41 installer interface 145
H handwriting recognition 14, 42 hardware and software requirements 119 hash mark 27 Hebrew 7, 13, 25, 39–41, 64–66, 117, 125, 159, 171 Hijri calendar 26 Hindi 25, 131 holiday information 137 Hong Kong 4 honorific 126 hot keys 38 hotel search and room reservation 70 HTML 40, 47, 79, 94, 114, 143, 165, 171
servlets 81 HTTP 96, 98, 142, 154, 174 See also IBM HTTP Server HTTPR 80 HTTPS 142 HTTPSession 99 hypernyms 45 hyphenation 44–45 hyponyms 45
I IBM 9–10, 13–14, 22, 49, 57–58, 99, 107, 129–130, 132, 144, 151, 154 300PL Personal Computer 80 Administration Server documentation 142 Corporate Globalization ix external users 60 globalization organization 30 HTTP Server 78–80, 142, 145–146, 148, 152 localization business 59 SSL (Secure Sockets Layer) 142 terminologist 60 translation process 60 projects 60 service centers 59 UDDI Business Registry See UDDI Business Registry Universal Database DB2 85 Web application offerings and services 79 WebSphere See WebSphere XML parser 92 IBM China 60 IBM Translation Community 60 IBM Translation Service Centers 106, 117 icons 4, 27, 58 ICU 14, 22, 24–25, 57, 96, 136, 175 See also International Components for Unicode Technology ICU4C 58, 175 ICU4J 58, 93–94, 96–97, 99–100, 103, 117, 136–137, 175 IDE See Integrated Development Environment ideograph 37 ideographic character 46 language 43–44, 175 IE See Internet Explorer ihs_128 subdirectory 142 IME 14, 37–38, 43, 175 See also Input Method Editor India 27 Indian 25 inflection 47 Input Method Editor 13, 175 installation wizard See InstallShield Index
185
InstallShield 150 Integrated Development Environment 30 international banking 24 International Components for Unicode 14 International Components for Unicode Technology 175 Internationalization Service 98 Internet ix, xiv, 4, 9, 30, 49, 73, 80, 97, 161–162 Internet Explorer 60, 96, 131, 158, 160 intranet 43, 49 IP ports 142–143, 145 ISO 4217 24 ISO 8859-1 character encoding See 8859-1 character encoding ISO currency tag 25 Israel 117, 167 it_IT 97 Italian 64, 107–108, 117 Italy 117, 166 iteration 96 iw_IL 97, 128
K Kg See kilogram kilogram 27 kilometers 27, 115 knowledge management 49 ko_KR 97 Korea 112–113, 117, 167 Korean 43–44, 64, 107–108, 117, 125–126, 159
L language
186
e-Business Globalization Solution Design Guide
barriers 107 code 149 dictionary order 104 engine 107–109, 150–151 identification 47 information 99 interface 125 JARs See also JAR files list 98 markup 47 message files 142 package 149 pair 108 preference 14, 97, 108, 161–162 scripts 37–38 selection 104, 108 sets 18, 21, 37 setting 93, 97–99, 106, 159–160 support requirements 10 supported 119 translation team 136 translators 59 versions 11, 29, 87, 120, 136, 165, 167, 169, 172 Language Selection List 64, 99 language-dependent content 119, 136 data 92, 105, 136, 166 information 18 Web content 117 language-dependent data 166 language-independent algorithms 43 program code 18 technology 43 language-neutral algorithms 47 sequencing 104 language-sensitive collation 57 processing 43 language-specific algorithms 43 code 41 data 43 grammar 44 legacy character set APIs See non-Unicode character set APIs code 80 objects 80 programming 73 legal conventions 51 line-break boundaries 44 lingua franca 22 linguistic analysis 47 boundaries 7 conventions 128 features 43
message formats 104 service 37, 43, 46, 129–130 technologies 49 testing 120, 128, 130 tools 43 linguistically sensitive searches 43 linguistic-based processing 43 Linux 81, 141 See also Red Hat Linux ListResourceBundle 30 locale definition 23 directory 105 model 23–24, 28, 97 localization 117 settings 24 Locale Model Bean 93, 100, 103, 117, 136–137 locale-related codes 136 computing 136 features 96 information 57, 117, 136–137 restrictions 166 locale-sensitive changes 98 components 24 computing 93, 96–97, 117 content 119 data 66 features 99, 103 information 113, 117, 119, 125 logic 92 operations 19 locale-specific program data 29 locale-supporting code libraries 136 localization pack 11, 29, 32–34, 92, 104, 125 development 88 formats 30 manager 30, 32 resources 30 Localization Pack Manager Bean 92, 105–106 localization service providers 119 lucky colors 28 lunar calendar 7, 26, 99
M machine translation 6, 49, 107–110, 112, 129, 132 quality 129 machine-translated pages 130 Macromedia Flash 158, 163 Mainland China 4 message format 104 formatting 57, 93 messaging 19 metadata 47 metric 27 Metric system 115 Microsoft
COM objects 80 Internet Explorer See Internet Explorer Windows See Windows 2000 Middle East 27 miles 27, 115 monolingual applications 131 morphological analysis 47, 49 morphology 47 MT See machine translation multi-byte character sets 125 multilingual 5, 9, 11, 14–15, 17, 30, 32, 38, 43, 51, 64, 66, 75, 81, 83, 87–88, 95, 106, 119, 128, 130, 132, 148, 165, 176 multiple currency support 25
N name and address format 6, 51, 125, 136 sequences 94 national conventions 9 language support 28 National Language Version 176 native language 132 testers 119 natural language 43 question-answering applications 44 text 47 Ncurses4 141 Netfinity Server 80 Netherlands 27 Netscape 60, 96, 106, 131, 158 new language versions 136 non-decimal numbering systems 24 non-Unicode character set APIs 38 encoded language 83 encoding 38 normalization 57 North America 27 NOWRAP option 93 number eight 27 format ix, 24, 117 formatting 57, 93 representation 19, 21 sign symbol 27 thirteen 27 numbering system 24 numeric superstitions 27
O OCR See optical character recognition OEM character set APIs
Index
187
See non-Unicode character set APIs on-demand translation 108 online survey 132 on-the-fly translation 108 optical character recognition 42 orthographic validity 43 orthographically correct word 44 Our Global Travel Shanghai Demo 63–72, 74–76, 78, 88, 92–93, 95–97, 99, 103, 105–109, 117–118, 120–121, 123–126, 128, 130–133, 136–137, 141, 148, 158, 160, 163, 166–167 Global Travel Shanghai Translation 118 outsourced localization services 119
P page design 88 Palm devices 42 Paris 103 parser API 33 parsing queries 44 part-of-speech disambiguation 46 pct 27 Pdksh 141 percent symbol 27 Personal Information Table 94 pervasive computing 6, 13, 49 Pin Yin 7, 37 Latin alphabet characters 38 portlet 33–34 Portuguese 64, 117 postal address order 127 pound sign 27 preferred colors 166 product life cycle 18 property file 136 PropertyResourceBundle 30 prototype development 88 proxygen 80 pseudo-code 33 pt_BR 97
Q Québec 27 question marks 27
R Red Hat Linux 80, 85, 141–144, 154 Redbooks Web site 182 Contact us xiv Redhat Package Management 142, 155 resource bundle 58 management 106 Roman numerals 24 root account 142, 144 rpm See Redhat Package Management RTF 59 ru_RU 106
188
e-Business Globalization Solution Design Guide
rupee 25 Russia 27, 106
S SAX 33 See also Simple API for XML SBCS languages 107 scenario execution details 119 Schema 30, 32 segmentation 43, 46, 49 semantical mistakes 43 service components 74 description 73 interfaces 74–75, 92 provider 73–75, 124 registry 73–75 requestor 73–75 Service Oriented Architecture See SOA servicegen 80 service-invoking messages 74 servlet 33–34, 79, 81–82, 87, 89, 98, 107, 109, 130, 177 development 88 Shanghai 64, 66, 103, 121 Simple API for XML 177 Simple Object Access Protocol See SOAP Simplified Chinese 4, 7, 64, 72, 99, 107–108, 110, 112, 117, 149, 151, 159, 177 Singapore 106 Single Executable 10–11, 17–19, 29, 120, 136, 166, 177 SmartGuides 79 SOA 73–74, 177 SOAP 75, 80, 83, 177 sorting order 7, 28 source language 49 product 120 Web page 130 South Africa 27 South America 27 South Korea 117, 167 Spain 117, 167 Spanish 13, 27, 64, 107–108, 117, 131 speech input 37 recognition 13, 42, 49 synthesis 49 technology 49 spell checking 43–44 state list 127 string libraries multi byte-aware 18 single byte-only 18 summarization 47, 49 Swedish 8 synonyms 45, 47 synsets 45 syntactic disambiguation 46
synthesize mode 32, 34–35
T Taiwan 4, 117, 167 target language 49 taxonomy 48 TCP/IP See TCPIP communication mode TCPIP communication mode 144 technical testing team 132 telephone numbers 26 test case 119 design and documentation 119 environment 120 objectives 119 results 119 testing team 119–120 text annotation 47 boundary analysis 7 mining 47 search 49 TextPane 41 See Java TextPane component text-to-speech technology 6, 42, 49 th_TH 136 Thai 13, 39 Barr (currency) 136 Thailand 136 thesaurus 45 time format 117, 125 formatting 57, 93 representation 19 zone 25, 58, 93, 98, 103, 117 TM See TranslationManager TPRS See Translation Problem Reporting System trading flow 88 Traditional Chinese 4, 64, 107–108, 117, 159, 177 Trados 59 training the algorithms 48 translated Web pages 124 translation accuracy and contextual pertinence 124 coordinator 87, 89 defects 125 memory 59–60 quality 124 servlet 130 team 136–137 testing 120, 124–125 tools 59 verification test 118 Translation Problem Reporting System 59–60, 118 Translation Server 108 Translation Service Center 59, 87, 89, 106, 117–118 Translation Services Gateway 107
TranslationManager 59–60, 106, 118 TranslatorServlet 109 transliteration rules 29, 57 TSC See Translation Service Center TSG See Translation Services Gateway Turkey 27 tutorial 132
U UDDI 177 Business Registry 73–76, 92 centers 84 find example 85 publish example 84 registry 80 Registry Center 151, 154 Registry Preview 81, 152–153 Web site 75 UDDI4J 84 API 80 UDF 109–112, 129–130 UDM 149–151 See User Dictionary Manager UK 27 Unicode 14, 28, 30, 38, 57, 81–83, 85, 93, 95–96, 98, 148 unified font 41 United States 9, 26, 51, 112–113, 127, 159, 166 See also USA Universal Description, Discovery, and Integration See UDDI UNIX run time account 142–143 Urdu 40 URL-specified Web page 108 USA 26–27, 115 usability testing 120, 132 user database 95 interface 19, 41, 64, 75, 98, 105–106 User Dictionary File See UDF User Dictionary Manager 107, 109 user-friendly application 132 UTF-8 85, 96, 105, 148
V vendor-specific performance 131 Visual Composition Editor 79 VisualAge for Java 79 voice 43 reception program 49 servers 6 service 49 technology 6, 14
Index
189
W Web applications 4–5, 30, 79, 135 content tuning 88 Services ix, xi, 72–75, 78–80, 83, 99, 181 Description Language See WSDL Flow Language 80 ToolKit 80 Web-centric programming model 73 WebSphere Advanced Administrative Console 148 Application Server ix, xii, 79–81, 83, 99, 109, 141–142, 144, 147–149, 152, 154 execution environment 79 Personalization Server 80, 154 Studio 79 Translation Server 80, 107–111, 129–130, 132–133, 149–150 UDDI Registry Preview 153 Voice Server 42 WebSphere Translation Server 112 Western Europe 159 whole-word indexing 7 searching 7 wildcard symbols 27 Windows 41, 152–153 See also Windows 2000 Windows 2000 38, 85 MUI 158 Server 78, 80–81, 131, 141, 149, 151 setup files 160 Unicode character IMEs 38 word processors 44 word-breaking 41 word-sense disambiguation 45 word-wrapping text 7 World Wide Web 43, 73, 178 worldwide testers and users 133 WRAP option 93 WSDL 73, 75, 179 for Java 80 WSDL4J See WSDL for Java WSFL See Web Services Flow Language WSTK See Web Services ToolKit
X XML 14, 30, 33–34, 47, 59, 73, 83, 96, 105, 116, 124, 179 header 35 instance tree 33 localization packs 32 parser 32–33, 35, 92 See also eXtensible Markup Language XML Spy 30–31
190
e-Business Globalization Solution Design Guide
XPATH 33 XSD Schema 32 XSL 33–35, 179 See also eXtensible Stylesheet Language
Y Yahoo directory structure 48 Yiddish 40
Z zh_CN 25, 32, 97, 106 zh_SG 106 zh_TW 97
e-business Globalization Solution Design Guide: Getting Started
(0.2”spine) 0.17”<->0.473” 90<->249 pages
Back cover
®
e-business Globalization Solution Design Guide Getting Started Easily comprehend state-of-the-art globalization technologies See how best practice design guidelines can work for you Learn ways to achieve cost-effective globalization
The Internet transcends national boundaries and geographical barriers. Many e-business entities have sought help from IBM in extending their e-business worldwide. IBM’s own marketing messages have stressed the global aspect of e-business, and our customers therefore expect IBM to be able to provide the solutions. Take a simple e-commerce application, for example. A company wants to set up a Web site to sell to customers from all over the world. Studies also have shown that users are much more likely to purchase from a Web site in their own language. With the worldwide growth of e-business, globalization is not only an add-on value but a must for global e-business applications. In fact, globalization has become an architecture in the realm of e-business. The key to globalization architecture is the Single Executable, which is the proper design and execution of systems, software, services, and procedures so that one instance of software, executing on a single server or end-user machine, can process multilingual data and present culturally correct information (for example, collation, date, and number formats). This IBM Redbook presents a globalization architecture, a working example, and an accompanying set of methodologies. It explains from the customer’s point of view how to plan and then design a multilingual solution with the IBM-recommended globalization application architecture, how it works throughout the application development cycle, and how the working example validates the soundness of this architecture.
INTERNATIONAL TECHNICAL SUPPORT ORGANIZATION
BUILDING TECHNICAL INFORMATION BASED ON PRACTICAL EXPERIENCE IBM Redbooks are developed by the IBM International Technical Support Organization. Experts from IBM, Customers and Partners from around the world create timely technical information based on realistic scenarios. Specific recommendations are provided to help you implement IT solutions more effectively in your environment.
For more information: ibm.com/redbooks SG24-6851-00