SAP TechEd ‘03 Las Vegas
Unicode July 7th 2005
Dr. Christian Hansen Netweaver DT Internationalization, SAP AG
Agenda
1.What is Unicode? 2.Who needs Unicode? 3.How to go there?
SAP AG 2005, Dr. Christian Hansen
© 2003 SAP Labs, LLC
Session ID, Speaker
1
SAP TechEd ‘03 Las Vegas
Agenda
1.What is Unicode? 2.Who needs Unicode? 3.How to go there?
SAP AG 2005, Dr. Christian Hansen
About Code Pages: Conventional Code Pages Disadvantages of old standard code pages Each covers only a subset of all characters used Incompatibilities between different codepages Only restricted data exchange possible Too many of them KYOCERA Canon APPLE HP IBM IS0-9 IS0-5 Mircosoft IS0-9 EBCDIC 1250 697/ 1251 IS0-3 IS0-2 697/ 0277 IS0-7 IS0-3 12 0500 IS0-2 12571256 1252 IS0-7 ASCII1252 12 1250 1251 IS0-3 IS0-2 BIG-5 1252 1254 BIG-5 IS0-9 IS0-5 IS0-9 IS0-5 SJIS IS0-8 SJIS IS0-8 IS0-4 IS0-3 IS0-4 IS0-3 IS0-2 IS0-7 IS0-2 IS0-7 IS0-6 IS0-1 SAP: IS0-1IS0-6 Languages: 41 Characters: 22,378 SAP AG 2005, Dr. Christian Hansen
© 2003 SAP Labs, LLC
Code Pages: 390
Session ID, Speaker
2
SAP TechEd ‘03 Las Vegas
Solution: Unicode, one Code Page for all Scripts Japanese
Chinese
Hebrew
Korean
Greek Taiwanese
Icela ndic
Russian Ukrainian
English
Danish Dutch, German Finnish French, Italian Norwegian Portuguese Spanish Swedish Turkish
Thai Croatian Czech Hungarian Polish Rumanian Slovakian Slovene
And more languages can be supported easily without the need for new code pages or other new methods (e.g. Vietnamese!)
SAP AG 2005, Dr. Christian Hansen
Solution: Unicode characters ASCII General Scripts Symbols
CJK Ideographs 65,000 characters
Hangul
Compatibility Surrogate Area
Additional 1,000,000 characters
SAP AG 2005, Dr. Christian Hansen
© 2003 SAP Labs, LLC
Session ID, Speaker
3
SAP TechEd ‘03 Las Vegas
Representation of Unicode Characters UTF-16 – Unicode Transformation Format, 16 bit encoding Fixed length, 1 character = 2 bytes (surrogate pairs = 2 + 2 bytes) Platform-dependent byte order (big/little endian) 2 byte alignment restriction
UTF-8 – Unicode Transformation Format, 8 bit encoding Variable length, 1 character = 1...4 bytes Platform independent no alignment restriction 7 bit US ASCII compatible Character
Unicode scalar value
UTF-16 big endian
UTF-16 little endian
UTF-8
a
U+0061
00 61
61 00
61
ä
U+00E4
00 E4
E4 00
C3 A4
α
U+03B1
03 B1
B1 03
CE B1
U+3479
34 79
79 34
E3 91 B9
SAP AG 2005, Dr. Christian Hansen
Internationalized Software with Unicode The Unicode Standard was adopted by IBM and several other companies including Apple, HP, JustSystem, Microsoft®, Oracle, Sun™, Sybase, and Unisys. Unicode is required by modern standards such as XML, Java™, ECMAScript (JavaScript™), LDAP, CORBA 3.0, and WML. Unicode is also the official way to implement ISO/IEC 10646 and is supported in many operating systems and all modern browsers.
Check out http://www.unicode.org/ !
SAP AG 2005, Dr. Christian Hansen
© 2003 SAP Labs, LLC
Session ID, Speaker
4
SAP TechEd ‘03 Las Vegas
Agenda
1.What is Unicode? 2.Who needs Unicode? 3.How to go there?
SAP AG 2005, Dr. Christian Hansen
Scenario: System integration and non-Unicode
Jörg Müller 조희정 Web Dynpro ABAP KSC5601: 조희정 J#rg M#ller
J2EE Unicode: 조희정 Jörg Müller
조희정 J#rg M#ller
조희정 Jörg Müller
Unicode Non-Unicode
SAP AG 2005, Dr. Christian Hansen
© 2003 SAP Labs, LLC
Session ID, Speaker
5
SAP TechEd ‘03 Las Vegas
Scenario: System integration and Unicode
Jörg Müller 조희정 Web Dynpro ABAP Unicode: 조희정 Jörg Müller 조희정 Jörg Müller
J2EE Unicode: 조희정 Jörg Müller 조희정 Jörg Müller
Unicode Non-Unicode
SAP AG 2005, Dr. Christian Hansen
SAP NetWeaver™ the integration platform? Evolution of mySAP Technology
SAP NetWeaver™ NetWeaver™ People Integration
Unifies and aligns people, information and business processes
Portal
Information Integration Business Intelligence
Knowledge Management
Master Data Management
Process Integration Integration Broker
Business Process Management
Application Platform J2EE
.NET
Collaboration
Integrates across technologies and organizational boundaries A safe choice with full .NET and J2EE interoperability
Life Cycle Management
…
Composite Application Framework
Multi-Channel Access
The business foundation for SAP and partners …
ABAP
andOS OS Abstraction Abstraction DBDBand
WebSphere
Powers business-ready solutions that reduce custom integration Its Enterprise Services Architecture increases business process flexibility
SAP AG 2005, Dr. Christian Hansen
© 2003 SAP Labs, LLC
Session ID, Speaker
6
SAP TechEd ‘03 Las Vegas
SAP NetWeaver™ with non-Unicode ABAP stack Evolution of mySAP Technology
SAP NetWeaver™ NetWeaver™ People Integration
Unifies and aligns people, information and business processes
Multi-Channel Access Composite Application Framework
Collaboration
Information Integration Business Intelligence
Knowledge Management
Master Data Management
Process Integration Integration Broker
Business Process Management
Application Platform J2EE
The business foundation for SAP and partners …
non-Unicode ABAP
andOS OS Abstraction Abstraction DBDBand
.NET
Integrates across technologies and organizational boundaries A safe choice with full .NET and J2EE interoperability
Life Cycle Management
…
Portal
WebSphere
Powers business-ready solutions that reduce custom integration Its Enterprise Services Architecture increases business process flexibility
SAP AG 2005, Dr. Christian Hansen
SAP NetWeaver™ with non-Unicode ABAP stack Evolution of mySAP Technology
SAP NetWeaver™ NetWeaver™ People Integration
Unifies and aligns people, information and business processes
no
Portal
Information Integration Business Intelligence
Knowledge Management
Master Data Management
Process Integration Integration Broker
Business Process Management
Application Platform J2EE
.NET
Collaboration
Integrates across technologies and organizational boundaries A safe choice with full .NET and J2EE interoperability
Life Cycle Management
…
Composite Application Framework
Multi-Channel Access
The business foundation for SAP and partners …
non-Unicode ABAP
andOS OS Abstraction Abstraction DBDBand
WebSphere
Powers business-ready solutions that reduce custom integration Its Enterprise Services Architecture increases business process flexibility
SAP AG 2005, Dr. Christian Hansen
© 2003 SAP Labs, LLC
Session ID, Speaker
7
SAP TechEd ‘03 Las Vegas
SAP NetWeaver™ with non-Unicode ABAP stack Evolution of mySAP Technology
SAP NetWeaver™ NetWeaver™ People Integration
Unifies and aligns people, information and business processes
no
Multi-Channel Access Composite Application Framework
Collaboration
Information Integration Business Intelligence
Knowledge Management
Master Data Management
Process Integration Integration Broker
Business Process Management
Application Platform J2EE
no
The business foundation for SAP and partners …
non-Unicode ABAP
andOS OS Abstraction Abstraction DBDBand
.NET
Integrates across technologies and organizational boundaries A safe choice with full .NET and J2EE interoperability
Life Cycle Management
…
Portal
WebSphere
Powers business-ready solutions that reduce custom integration Its Enterprise Services Architecture increases business process flexibility
SAP AG 2005, Dr. Christian Hansen
SAP NetWeaver™ with non-Unicode ABAP stack Evolution of mySAP Technology
SAP NetWeaver™ NetWeaver™ People Integration
Unifies and aligns people, information and business processes
no
Portal
Information Integration Business Intelligence
Knowledge Management
Master Data Management
Process Integration Integration Broker
Business Process Management
Application Platform J2EE
.NET
Collaboration
Integrates across technologies and organizational boundaries A safe choice with full .NET and J2EE interoperability
no
Life Cycle Management
…
Composite Application Framework
Multi-Channel Access
no
The business foundation for SAP and partners …
non-Unicode ABAP
andOS OS Abstraction Abstraction DBDBand
WebSphere
Powers business-ready solutions that reduce custom integration Its Enterprise Services Architecture increases business process flexibility
SAP AG 2005, Dr. Christian Hansen
© 2003 SAP Labs, LLC
Session ID, Speaker
8
SAP TechEd ‘03 Las Vegas
Only solution for full integration: Unicode Evolution of mySAP Technology
SAP NetWeaver™ NetWeaver™ People Integration
s ye
Unifies and aligns people, information and business processes
Multi-Channel Access Composite Application Framework
Collaboration
Information Integration Business Intelligence
Knowledge Management
Master Data Management
Process Integration Integration Broker
Business Process Management
Application Platform J2EE
s ye es y The business foundation for SAP and partners
…
Unicode ABAP
andOS OS Abstraction Abstraction DBDBand
.NET
Integrates across technologies and organizational boundaries A safe choice with full .NET and J2EE interoperability
Life Cycle Management
…
Portal
WebSphere
Powers business-ready solutions that reduce custom integration Its Enterprise Services Architecture increases business process flexibility
SAP AG 2005, Dr. Christian Hansen
Agenda
2.Who needs Unicode? Everybody making full use of SAP Netweaver
SAP AG 2005, Dr. Christian Hansen
© 2003 SAP Labs, LLC
Session ID, Speaker
9
SAP TechEd ‘03 Las Vegas
Old solution for multiple languages: MDMP* West European View
Japanese View
* Check your system type with report RSCPINST
Korean View
current configuration
SAP AG 2005, Dr. Christian Hansen
Old solution for multiple languages: MDMP West European View
Japanese View
Korean View
ve a e W t e
4 0 ‘ r
N79991 h t i and
)
En
f o d
s
w38402 t r o tes 8
p ee no p u (s
(As of release NetWeaver 04s and moving forward, MDMP will no longer be supported) SAP AG 2005, Dr. Christian Hansen
© 2003 SAP Labs, LLC
Session ID, Speaker
10
SAP TechEd ‘03 Las Vegas
Only solution for multiple languages: Unicode
Unicode
SAP AG 2005, Dr. Christian Hansen
Agenda
2.Who needs Unicode? Everybody making full use of SAP Netweaver All Korean customers using more languages than only English and Korean
SAP AG 2005, Dr. Christian Hansen
© 2003 SAP Labs, LLC
Session ID, Speaker
11
SAP TechEd ‘03 Las Vegas
Oracle Database setting KO16KSC5601 Default database character set for SAP systems running on Oracle is WE8DEC. Note 695899 described a way to change this to the Korean character set KO16KSC5601. Several Korean customers used this to make their database transparent for access by non-SAP products (e.g. DB-Link). Due to an incompatible change in the Oracle database this will not be supported in the future (Oracle 10, ERP2005). See SAP note 858869 (still in progress).
The only way to make the database transparent is a Unicode setup.
SAP AG 2005, Dr. Christian Hansen
Agenda
2.Who needs Unicode? Everybody making full use of SAP Netweaver All Korean customers using more languages than only English and Korean All Korean customers that have been using KO16KSC5601 as Oracle character set …
SAP AG 2005, Dr. Christian Hansen
© 2003 SAP Labs, LLC
Session ID, Speaker
12
SAP TechEd ‘03 Las Vegas
Agenda
1.What is Unicode? 2.Who needs Unicode? 3.How to go there?
SAP AG 2005, Dr. Christian Hansen
Unicode Statistics: Current Figures Unicode shares since 6.10 (%)
20 04 20 -08 04 -1 20 -09 1 04 -0 20 -10 8 04 -0 20 -11 6 04 -0 20 -12 3 04 -0 20 -12 1 05 -2 20 -01 9 05 -2 20 -02 6 05 -2 20 -03 3 05 -2 20 -04 3 05 -2 20 -05 0 05 -1 20 -06 8 05 -1 -0 5 713
Shares (%)
8.0 7.5 7.0 6.5 6.0 5.5 5.0 4.5 4.0 3.5 3.0 2.5
World wide more than 1700 Unicode systems are already running SAP AG 2005, Dr. Christian Hansen
© 2003 SAP Labs, LLC
Session ID, Speaker
13
SAP TechEd ‘03 Las Vegas
How to go there: Unicode availability Unicode enabled mySAP Components SAP Web Application Server:
WAS 6.20
mySAP Customer Relationship Management (CRM):
mySAP CRM 4.0
mySAP Supply Chain Management (SCM):
mySAP SCM 4.X
mySAP Supplier Relationship Management (SRM):
mySAP SRM 4.0
mySAP Business Intelligence (BW):
mySAP BW 3.5
mySAP Product Lifecycle Management (PLM):
Ramp up
mySAP Strategic Enterprise Management (SEM):
SEM 4.0
SAP R/3 Enterprise:
Ext. Set 2.00
SAP Note 79991 SAP AG 2005, Dr. Christian Hansen
How to go there: platform support SAP supports Unicode systems on the following platforms: Database system
Platform
W2K
Linux3
Solaris1
HP1
Tru641
AIX1
AS/400
OS/390
SQL Server
✓
-
-
-
-
-
-
-
Oracle
✓
✓
✓
✓
✓
✓
-
-
DB/2
✓
✓
✓
✓
-
✓
✓
-²
SAP DB
✓
✓
✓
✓
✓
✓
-
-
164
bit versions only
2OS/390
support is planned for Q3/2004 with DB/2 V8.1.
3Tentatively,
64 bit version will be available in Q2/2004
There will be no support for Informix.
SAP Note 379940
SAP AG 2005, Dr. Christian Hansen
© 2003 SAP Labs, LLC
Session ID, Speaker
14
SAP TechEd ‘03 Las Vegas
How to go there: Unicode System installation Unicode is the default for new installations!
SAP AG 2005, Dr. Christian Hansen
How to go there: Unicode System conversion Unicode System conversions: Converting existing systems to Unicode needs several steps: Upgrade to a Unicode compliant version of the application (see note 79991) Adapt ABAP, C/C++ programs Convert the database (System Copy) Install Unicode executables Check interfaces (3rd party software meight not be ready for Unicode)
Conversion projects need thorough planing and execution!
SAP AG 2005, Dr. Christian Hansen
© 2003 SAP Labs, LLC
Session ID, Speaker
15
SAP TechEd ‘03 Las Vegas
What to consider I: Hardware Requirements based on parallel benchmarking of Unicode / non-Unicode test systems
CPU
RAM +30% depending on existing scenario (MDMP, double byte)
Database size
+50% Application Servers are based on UTF-16 internally
Network Load
UTF-8* :up to +35% UTF-8** :up to +10% UTF-16 :up to +60..70%
* +35% is the observed maximum in growth for small systems (db size < 200GB) **+10% is the observed maximum for bigger systems (db size > 200GB)
UTF-8 almost no change due to efficient compression * *first customer conversions indicate: DB size increase due to Unicode conversion is outweighed by size decrease due to DB reorganization – so actually DB shrinks!
SAP AG 2005, Dr. Christian Hansen
What to consider II: Outside communication
Outside communication: Sapnet quicklink Unicode@sap: Unicode@sap --> Unicode library --> ABAP and Unicode --> TechEd 2004: CI253 External Unicode Interfaces
SAP AG 2005, Dr. Christian Hansen
© 2003 SAP Labs, LLC
Session ID, Speaker
16
SAP TechEd ‘03 Las Vegas
Communication: The Ideal Picture The ideal Picture: only Unicode components Conversions are done algorythmically (1:1 relation) No data misinterpretation No data loss
JAVA Application (Portal)
RFC Client (SAP_UC)
R/3 Enterprise
mySAP BW 3rd Party
All business relevant characters available at the same time R/3 Enterprise
...
Files
Internet
SAP AG 2005, Dr. Christian Hansen
Communication: Reality The reality: Unicode and non-Unicode components
Conversions between incompatible code pages everywhere Only common subset exchangeable Special rules have to be obeyed to make communication possible
JAVA Application (Portal)
R/3 4.6C ISO8859-1 SJIS
RFC Client (char) SAP_CODEPAGE = 1100
mySAP BW ISO8859-1
3rd Party EBCDIC
... R/3 Enterprise Files
Internet 1251 IS0-1 IS0-8 ...charset=iso-8859-1" > ...charset=windows-1257" > ...charset=Shift_JIS" > ...charset=utf-8" >
IS0-9 SJIS BIG-5697/ 697/ 0500 IS0-3 0277 IS0-2 IS0-7 1252
SAP AG 2005, Dr. Christian Hansen
© 2003 SAP Labs, LLC
Session ID, Speaker
17
SAP TechEd ‘03 Las Vegas
What to consider III: ABAP Programming
ABAP Programming: Sapnet quicklink Unicode@sap: Unicode@sap --> Unicode library --> ABAP and Unicode --> TechEd 2004: CI252 Making ABAP Programs Unicode enabled
SAP AG 2005, Dr. Christian Hansen
Transparent Unicode Enabling of R/3 Character Expansion Model Separate Unicode and non-Unicode versions of R/3
ABAP ABAP ABAP source source source
NonUnicode R/3
1 character = 1 byte (types C, N, D, T, STRING) Non-Unicode kernel Non-Unicode database
Unicode R/3
1 character = 2 bytes (UTF16), (types C, N, D, T, STRING) Unicode kernel Unicode database
No explicit Unicode data type in ABAP Single ABAP source for Unicode and non-Unicode systems
SAP AG 2005, Dr. Christian Hansen
© 2003 SAP Labs, LLC
Session ID, Speaker
18
SAP TechEd ‘03 Las Vegas
Transparent Unicode Enabling of R/3 Implications: Major part of ABAP coding is ready for Unicode without any changes
Challenge: clear distinction between character and byte processing:
1 Character ≠ 1 Byte Minor part of ABAP coding has to be adapted to comply with Unicode restrictions Find the relevant places with transaction UCCHECK (Call the transaction today if you are already on SAP_BASIS ≥ 6.10)
SAP AG 2005, Dr. Christian Hansen
Unicode Restrictions – Example Access To Structures With Offset/Length Structure must begin with characters Offset/length counted in characters Access only allowed within the character type prefix of a structure
N(6)
C(4)
X(3)
C(5)
+off(len) … = stru+13(5).
"Unicode error!
SAP AG 2005, Dr. Christian Hansen
© 2003 SAP Labs, LLC
Session ID, Speaker
19
SAP TechEd ‘03 Las Vegas
ABAP list layout in Unicode systems ABAP lists: Difference between memory and display length
Character units in the memory
Display columns
Non-Unicode
2
2
Unicode
1
2
’ 한’
1 Character ≠ 1 Display Column
SAP AG 2005, Dr. Christian Hansen
Example: ABAP list layout in Unicode systems non-Unicode System
Unicode System
SAP AG 2005, Dr. Christian Hansen
© 2003 SAP Labs, LLC
Session ID, Speaker
20
SAP TechEd ‘03 Las Vegas
What to consider IV: Database conversion
Database Conversion: Sapnet quicklink Unicode@sap: Unicode@sap --> Unicode library --> Unicode Conversion Library --> Basic Information --> CI206 Conversion of SAP Systems to Unicode
SAP AG 2005, Dr. Christian Hansen
Conversion Preparation: Concept
Before the database conversion to Unicode is executed, all text data must be assigned a correct code page. Single Code Page Systems/ Unambiguous Blended Code Page Systems (ca. 90% of all customer installations)
y eas
MDMP Systems/ Ambiguous Blended Code Page Systems (ca. 10% of all customer installations)
Why?
x ple m co
SAP AG 2005, Dr. Christian Hansen
© 2003 SAP Labs, LLC
Session ID, Speaker
21
SAP TechEd ‘03 Las Vegas
Documentation The following documents are required for the conversion of non-Unicode SAP systems to Unicode:
Unicode Conversion Guide; available for Web AS 6.20/6.30/6.40.
System Copy Guide; available for Web AS 6.20/6.30/6.40.
Single Code Page and MDMP!
SAP Note 548016; valid for Web AS 6.20/6.30/6.40.
SAP AG 2005, Dr. Christian Hansen
References and Contacts www.service.sap.com/unicode@sap
Unicode Conversion Library
SAP AG 2005, Dr. Christian Hansen
© 2003 SAP Labs, LLC
Session ID, Speaker
22
SAP TechEd ‘03 Las Vegas
Summary
1.What is Unicode? The state of the art technology for internationalized text processing
2.Who needs Unicode? Everybody making full use of SAP Netweaver All Korean customers using more languages than only English and Korean All Korean customers that have been using KO16KSC5601 as Oracle character set
3.How to go there? New installation System conversion
SAP AG 2005, Dr. Christian Hansen
Further information Find more information about Unicode at: service.sap.com/Unicode@sap service.sap.com/Unicode
Unicode Conversion Unicode@sap --> Unicode library --> Unicode Conversion Library --> Basic Information --> CI206 Conversion of SAP Systems to Unicode Unicode@sap --> Unicode library --> Unicode Conversion Library --> Unicode Conversion Kit 620 --> Unicode Conversion Guide Web AS 6.20/6.30 SP 50
ABAP Unicode programming Unicode@sap --> Unicode library --> ABAP and Unicode --> TechEd 2004: CI252 Making ABAP Programs Unicode enabled Unicode@sap --> Unicode library --> ABAP and Unicode --> TechEd 2004: CI253 External Unicode Interfaces
Details for further reading Unicode@sap --> Unicode library --> ABAP and Unicode --> ABAP Programs in Unicode Systems: Requirements Unicode@sap --> Unicode library --> ABAP and Unicode --> ABAP List Layout in Unicode Systems: Development Guide
Also recommend: SAP Unicode learning maps available at quicklink rkt-unicode.
SAP AG 2005, Dr. Christian Hansen
© 2003 SAP Labs, LLC
Session ID, Speaker
23