Sponsored by:
CDMA TroubleShooting
Ft. Lauderdale, FL March 4, 2008
Agenda 9:10 Voice Trouble Resolution David Weixelman, Network Engineer, Sprint 9:55 Q&A 10:05 SMS Trouble Resolution Daniel Salek, Staff Engineer, Qualcomm 10:40 Q&A 10:50 Break 11:05 Packet Data Nars Haran, US Cellular Bryan Cook, Senior Staff Engineer, Qualcomm
11:50 Q&A Ft Lauderdale, March ’08 2
www.cdg.org
Sponsored by Verisign
Contributions • Many thanks to the following for their contributions to the materials. – Bryan Cook, Qualcomm – Nars Haran, US Cellular – Jeff Kraus, US Cellular – Devora Pippenger, Syniverse – Daniel Salek, Qualcomm
Ft Lauderdale, March ’08 3
www.cdg.org
Sponsored by Verisign
Sponsored by:
CDMA Voice TroubleShooting
Ft. Lauderdale, FL March 4, 2008
Opening Remarks •
•
•
This presentation will attempt to take a holistic view of the trouble ticketing process with Document 87 as the centerpiece. I will speak about processes leading up to the main points of information in Document 87 as well as discuss information within the document and address processes after a ticket is resolved. This will hopefully provide you a template for improving your customer service and employee development in the world of roaming.
Ft Lauderdale, March ’08 5
www.cdg.org
Sponsored by Verisign
Outline / Agenda 1. 2. 3. 4. 5. 6. 7. 8.
Organizational Preparation Ticket Methodology Entrance criteria Tools Object lesson: Checklist Investigation Results Report Work Load And Root Cause Analysis Wrap Up
Ft Lauderdale, March ’08 6
www.cdg.org
Sponsored by Verisign
#1 Organizational Preparation • •
•
•
Know your roaming network configuration. Know your own network configuration Billing system access and basic navigational skills to access the customers account. This helps in validating any discrepancies between the billing system and the HLR. Network access to STP’s, HLR’s, MSC’s, SMSC’s and any other mission critical applications needed for roaming troubleshooting, such as the trouble ticket system. IS41 fundamental call flow and standards knowledge for analysis of call traces. Pictures, pictures, pictures. If you can’t draw it, you don’t know it as well as you should
Ft Lauderdale, March ’08 7
www.cdg.org
Sponsored by Verisign
Understanding your network configuration.
#1 Organizational Preparation
Verisign
Direct Links with Carrier B
Direct Links with Carrier A
Syniverse
Ft Lauderdale, March ’08 8
www.cdg.org
Sponsored by Verisign
#1 Organizational Preparation • •
•
•
Detailed oriented people are usually best suited for this type of job. Also, people who can empathize with the customers situation and go the extra mile for resolution are your key personnel. Employees that personalize the situation are well suited for troubleshooting. If you have cross functional teams (Customer Care, Tier II, Tier III etc) handling roaming tickets, make sure all teams are in agreement on best practices for trouble ticket resolution Define where each team’s roles and responsibilities start and stop. This is usually best done through Service Level Agreements between the groups Ft Lauderdale, March ’08
9
www.cdg.org
Sponsored by Verisign
#2 Ticket Methodology •
Every ticket logged is an opportunity to evaluate and learn something about: – At the customer level • Carriers get a first hand look at what the customer telling them about their service – At the troubleshooting level (Customer Care, Tier 2 and Tier 3 levels) • Are they prompted to ask the correct clarifying questions? • Are they following the established processes? • Do they have the proper tools at each level to correct the issues? Ft Lauderdale, March ’08
10
www.cdg.org
Sponsored by Verisign
#2 Ticket Methodology –
–
–
At the implementation level • Did something get overlooked during the implementation process? • Does the implementation process need to be modified to accommodate new service enhancements? At the roaming partner level • Are there particular areas having the same, repeated issues with the same roaming partner? At the device level • Are there particular devices having certain issues? • Are the newly launched devices?
Ft Lauderdale, March ’08 11
www.cdg.org
Sponsored by Verisign
#2 Ticket Methodology •
•
•
Evaluation of these types of questions can/will identify and drive inefficiencies out of all levels of the troubleshooting process and possibly other internal organizations, especially on the device front. Typically with many carriers, roaming is an afterthought and if device testing is not thoroughly completed from a network and a roaming perspective, carriers put the roaming ‘testing’ into the hands of their subscribers. This is not a good way to have a positive roaming experience for your subscribers, especially if they are internationally roaming half way around the world. Of course a balance must be struck between time to launch and testing. This is more easier said than done.
Ft Lauderdale, March ’08 12
www.cdg.org
Sponsored by Verisign
#2 Ticket Methodology •
What are the ticket metrics that are important for determining quality work throughout the troubleshooting process? – Is the purpose of the process only focused on how fast tickets are closed? • Fast closure alone = quality customer service? – Or is there post mortem ticket analysis performed by the respective management teams to gauge their teams strength and weaknesses? • Knowledgeable, skilled employees + consistent performance through defined best practices + ticket analysis = quality customer service
Ft Lauderdale, March ’08 13
www.cdg.org
Sponsored by Verisign
#2 Ticket Methodology Fast
Represents someone with poor knowledge, great closing speed with poor quality
Represents someone with great knowledge, great closing speed with great quality
Speed To Closure
Quality
Represents someone with great knowledge, slow closing speed with good quality
Slow Large Amount
Small Amount
Knowledge Ft Lauderdale, March ’08 14
www.cdg.org
Sponsored by Verisign
#2 Ticket methodology • • •
Standardize what technical information to capture. This means establishing entrance and exit criteria. (Document 87) Identify and document common problems and solutions and from that create a troubleshooting ‘check list’. Establish systematic methodology for trouble resolution – Once personnel get comfortable with a methodology they can ‘free lance’ to match their individual skill sets and talents provided the resolution and quality metrics are met. The key point being resolution and quality are to be monitored.
Ft Lauderdale, March ’08 15
www.cdg.org
Sponsored by Verisign
#3 Entrance Criteria – – – – – – – –
MDN, MIN/IRM ESN/MEID/UIMID Detected date and time Roaming MSCID Problem description Location (City/State or City/Country) Duration of stay and alternate contact information Problem carriers contact information
Ft Lauderdale, March ’08 16
www.cdg.org
Sponsored by Verisign
#4 System Tools – – – – – – – –
MSC access HLR access to validate customer profiles SMSC access STP access SS7 messaging analyzer (Access7) RSP messaging analyzer Troubleshooting ticketing system Billing system
Ft Lauderdale, March ’08 17
www.cdg.org
Sponsored by Verisign
#5 Object Lesson: Checklist Reg Not Return Result(VLR Info) (B2)
Reg Not (B2)
Registration = B Call Termination = A STP
Direct Connects
STP
Home STP
STP
Home STP
Home HLR LOC REQ (A1)
Serving MSC STP
STP
LOC REQ RET (A4)
STP
in g om In c
XFER NUM (A7)
SS7 Providers
Home STP
Home STP
Home HLR
XFER NUM (A8)
Home MSC
ROUTE (A10)
Routing Request(A2) Route Request Return Result(TLDN#) (A3)
Standard Route Through the PSTN(A5)
No Answer & Page TMO/Redirect Request (A6)
Redirect Request Response(A9)
Hom Voice
Ft Lauderdale, March ’08 18
www.cdg.org
Sponsored by Verisign
#5 Object Lesson: Checklist Type of Issue Call origination: Can’t originate calls because of ‘manual roaming’ / credit card prompt
Focus Areas Serving MSC Home HLR MBI/IRM block Special events
Possible Cause • Handset
has ‘locked’ on a network where no automatic roaming agreement is implemented •MBI/IRM block of which the phone belongs is not loaded in the roaming partners serving MSC •MBI/IRM block not pointed to the correct HLR point code •SS7 network capacity and/or issues with causing registration failure (example: Super Bowl or other large public events) •MSCID of the roaming partner’s serving MSC is not loaded in the home HLR
Ft Lauderdale, March ’08 19
www.cdg.org
Sponsored by Verisign
#5 Object Lesson: Checklist Type of Issue Call origination: Fast busy / call failed
Focus Areas Serving MSC Home HLR MBI/IRM block Special events
Possible Cause • Poor signal strength / to week to connect to cell site •Network capacity / cell site in use is at capacity •Handset / phone equipment transmitter failure or error •PRL cycle has not yet acquired an available network
Ft Lauderdale, March ’08 20
www.cdg.org
Sponsored by Verisign
#5 Object Lesson: Checklist Type of Issue Call origination: Fast busy / call failed
Focus Areas Serving MSC Home HLR MBI/IRM block Special events
Possible Cause • Similar
to ‘fast busy’, a channel could not be found due to network capacity constraints •If the mobile is not stationary and the network attempts to hand off to a cell cite that in which all channels are being utilized by other callers, the call will drop •Signal strength is lower than the minimum to maintain the call •Network outage and/or maintenance
Ft Lauderdale, March ’08 21
www.cdg.org
Sponsored by Verisign
#5 Object Lesson: Checklist Type of Issue Call termination: Receive recording ‘The number you have dialed is incorrect. Please check the number and dial again’
Focus Areas Customer education Serving MSC Home MSC
Possible Cause •The required number of digits is not keyed correctly by the caller •Validate the TLDN being received from the roaming partner is sending the correct digits. If the Digits Identifier is labeled as International, 011 should not be sent to the Home carrier by the Roaming Partner or by their RSP. The standard is the TLDN should not have 011 in front of the TLDN on a Routing Request Response •On a Lucent MSC, validate the ‘Apply Dialing Prefix For International TLDN’ is turned on in the switch.
Ft Lauderdale, March ’08 22
www.cdg.org
Sponsored by Verisign
#5 Object Lesson: Checklist Type of Issue
Focus Areas
Call Termination: Incoming calls go directly to voice mail without any ‘rings’ while the handset is roaming
Home HLR Home MSC Customer education Coverage
Possible Cause • Home carriers HLR point codes are not loaded in the roaming partners network •Applicable home MSC point code is not loaded in the roaming partners network •HLR has not registered the handset on a roaming network (for various reasons) and the HLR has deleted the last location of the registration i.e. the HLR does not know where to tell the home MSC to direct the call and thus the traffic switch routes the call by default to the voice mail platform without paging any network •Signal strength is lower that the minimum to ‘page’ the mobile on the roaming network and the call is redirected to the home traffic switch to terminate on the voice mail platform
Ft Lauderdale, March ’08 23
www.cdg.org
Sponsored by Verisign
#6 Investigation Results Report • The Investigation Results Report (IRR) is to provide the ticket analyst(s) information on – What was done to resolve the issue – The date and time it was resolved – The root cause of the problem – And any action items needed to be taken • If this information is consistently captured and analyzed, it can be highly useful information to apply toward root cause analysis • There is no reason to have a results report if it is not utilized further in the wider scope of root cause analysis. It essentially becomes reporting for the sake of reporting and increases inefficiency in the troubleshooting process.
Ft Lauderdale, March ’08 24
www.cdg.org
Sponsored by Verisign
#7 Work Load And Root Cause Analysis • From the entrance criteria received trending should be analyzed on that data – Trend tickets on volume of roaming tickets – Handset types – Categorized issues – Root cause and other information • For categorized problems there should be a troubleshooting check list associated with them
Ft Lauderdale, March ’08 25
www.cdg.org
Sponsored by Verisign
#7 Work Load And Root Cause Analysis Case Report
MDN
Device
City
ST/Country
Problem
17406207-080209
RIM Blackberry 8830
Buenos Aires
Argentina
Searching for service
17415876-080212
RIM Blackberry 8830
Adelaide
Australia
Can't originate/terminate
17335469-080117
Handspring Treo 700W
Georgetown
Cayman Islands
Can't originate/terminate
17354258-080124
Samsung SPH-A900M
Bogota
Columbia
Can't terminate
17362912-080127
Handspring Treo 650
Guayaquil
Ecuador
Can't originate
17364974-080128
Handspring Treo 700W
Bangalore
India
Can't originate
17415744-080212
RIM Blackberry 8830
Bengal
Jamaica
Can't originate/terminate
17314585-080110
Handspring Treo 650
Acapulco
Mexico
Can't originate/terminate due to auth issue
17386313-080202
Motorola Q
Bangkok
Thailand
Can't originate/terminate
Ft Lauderdale, March ’08 26
www.cdg.org
Sponsored by Verisign
#7 Work Load And Root Cause Analysis •
Categories For CDMA Voice Issues – Generally speaking most voice issues are going to fall within one of the below categories • • • • • • • • • • • • •
Can’t originate Can’t originate. Fast busy Can’t originate to a specific number Can’t originate international calls Can’t originate or terminate Can’t originate or terminate. Welcome to…carrier’s name. Can’t retrieve voice mails Can’t deposit a voice mail Can’t terminate Can’t terminate from specific numbers Coverage complaints, dropped calls etc. Not a roaming issue Searching for service
Ft Lauderdale, March ’08 27
www.cdg.org
Sponsored by Verisign
#8 Wrap Up Process Improvement
Resolution Process Yes
Roaming customer
Customer Care
End
Catagorize issue
No
Investigation Results Report
Where ? Issue? Billing /Account verification Logs ticket Yes
Root cause analysis
System checks of MBI /Network Elements Capture /Analyze RSP and /or live traces Possibly test with customer or roaming partner Clarify the issue if need be
No
Trending
Lessons learned
Yes Tier 3
No
Updated training /tools/ knowledge
No resolution . New problem
Ft Lauderdale, March ’08 28
www.cdg.org
End
Tier 2
Sponsored by Verisign
End Further system checks in the network Analyze live traces received from Tier 2 Possibly test with customer or roaming partner
#8 Wrap Up 25 20 15
Knowledge Resolution
10
Tickets
5 0
Month 1
Month 2
Month 3
Month 4
Ft Lauderdale, March ’08 29
www.cdg.org
Sponsored by Verisign
#8 Wrap Up • Questions • Action Items
Ft Lauderdale, March ’08 30
www.cdg.org
Sponsored by Verisign
Sponsored by:
SMS Roaming Troubleshooting
Ft Lauderdale March, 2008
Contents • • • • •
Assumptions Background Reference documentation/Tools Possible Problems Troubleshooting Process
Ft Lauderdale, March ’08 32
www.cdg.org
Sponsored by Verisign
Assumptions • Voice Roaming working – System Determination – Registration – ANSI-41 authorization
• Focus on SMS-specific issues • Assume element/link failures alarmed – Focus here on subscriber-reported issues
• Not addressing Billing issues – In general assume billing records produced at MC
• Post-implementation issues – Assume initial testing completed Ft Lauderdale, March ’08 33
www.cdg.org
Sponsored by Verisign
Background – Roaming Architecture • ANSI-41 network elements involved in SMS Roaming: – Message Center (MC) – aka Short Message Service Center (SMSC). Store and forward function for messages. End-point for SMS communication with a Mobile Station (MS) – Mobile Switching Center (MSC) – Includes (for convenience) the VLR and Base Station. ANSI-41 to IS-2000 interface, and relay point for SMS messages – Home Location Register (HLR) – Stores subscriber location and profile information. Doesn’t see actual SMS message contents – Roaming Service Provider (RSP) – Usually present in CDMACDMA roaming today. Provides signaling connectivity and ANSI41 translation. Looks like an MSC/VLR to the home network, and an HLR/MC to the serving network.
Ft Lauderdale, March ’08 34
www.cdg.org
Sponsored by Verisign
Background – Message Flows (1 of 3) • Mobile-Terminated (MT):
2 MSC
HLR 3 4
5 MC 1
4. MC 1. Message sendsarrives message at MC, to MSC addressed using the to MS address received in the previous step – SMS Delivery 2. MC To queries for MS location – SMS Point PointHLR (SMDPP) message Request (SMSREQ) message 5. deliverssubscriber message to over the returns air 3. MSC HLR checks is MS authorized, address (SMS_Address from registration time) Ft Lauderdale, March ’08
35
www.cdg.org
Sponsored by Verisign
MS
Background – Message Flows (2 of 3) • Mobile-Terminated with delayed delivery (MT):
2 MSC
HLR 3 4
MC 1
8
7 slide 1 - 4. As per previous 5.Message MS goesaccess into coverage hole, message delivery 9. is delivered successfully to MS 7. System plus pending flag trigger MSC fails. MSC sets to “SMS Delivery for Other to send notification advice scenarios MC that MS areisPending” possible available–flag –if HLR MS knows 6. SMSNotification Some that time subscriber later, (SMSNOT) MSisreturns unavailable, message to coverage, it will issue the makes 8. MC SMSNOT resends system instead access SMDPP of the MSC Ft Lauderdale, March ’08
36
www.cdg.org
Sponsored by Verisign
59 6
MS
Background – Message Flows (3 of 3) • Mobile-Originated (MO) – Indirect Routing • Indirect routing means that the message is routed through the originator’s MC: 2 1 MSC 3
MC
1. The MS originates a short message 2. The MSC sends the message to the MC for this MS (SMDPP) 3. The MC analyzes the destination address, and routes the message on. If the destination is a MS which belongs to another MC, the message will be sent to that MC Ft Lauderdale, March ’08
37
www.cdg.org
Sponsored by Verisign
MS
Reference Documentation • There are several sources of information you can turn to when faced with a problem: – General Reference • ANSI-41 standard • ANSI-41 textbook • SMPP standard
– Roaming-specific • SMS Roaming Reference Document
– Carrier-specific • TDS • SMS Roaming Partner Qualification Form • SMS Test Plan Results Ft Lauderdale, March ’08 38
www.cdg.org
Sponsored by Verisign
Tools • Tools available to assist in troubleshooting: – HLR • O/b Subscriber profile, registration status
– MC • O/b message queue, maybe subscriber profile • Billing records
– MSC/VLR • I/b subscriber profile/status, SMSDPF? • No billing records produced
– Protocol Analyzer • Real time, may be swamped by roaming traffic
– RSP Trace • See message delivery attempts, longer storage Ft Lauderdale, March ’08 39
www.cdg.org
Sponsored by Verisign
Possible Problems (1 of 5) • List some potential areas where problems can arise: • Subscriber Provisioning – Home vs Roaming • Some HLRs define a separate value of SMSTERMREST and SMSORIGREST to be sent to MSCs designated as “roaming”.
– Unusual values • The most common values for these parameters are 0 (Block all) and 3 (Allow all). Other values might be handled poorly…
– Service Option • Specific service options are defined for SMS (6 & 14). Usually however these aren’t required to be present in the CDMASOL.
Ft Lauderdale, March ’08 40
www.cdg.org
Sponsored by Verisign
Possible Problems (2 of 5) • MSC Datafill – SMSADDR Population • MSC’s PC/GT in application layer • ITU vs ANSI encoding can be tricky • This value usually overwritten by the RSP
– MC address • Required for MO-SMS. • Associated by MIN range or HLR • For roamers typically the same as the HLR address – i.e. RSP.
Ft Lauderdale, March ’08 41
www.cdg.org
Sponsored by Verisign
Possible Problems (3 of 5) • RSP Datafill – SMSADDR Population • Overwrite with their own address
– MC address • Required for MO-SMS • Info supplied by home operator • MC defined as valid sender for MT-SMS
– Addressing • Map serve-supplied addresses to home-required values – e.g. MDN in SMS_OOA.
• HLR Datafill – SMSADDR (Again) • Some HLRs statically define the SMSADDR against the MSCID Ft Lauderdale, March ’08 42
www.cdg.org
Sponsored by Verisign
Possible Problems (4 of 5) • User Error – Wrong dialplan • Enter destination address in format for visited country • Enter a short code only valid for the visited network’s subscribers
• Message “Jamming” – Subscriber not able to receive any messages • Can occur when an overlength message arrives – this fails delivery but remains at the front of the queue in the MC – it is attempted again before any new incoming message
• Commercial Issues – SMS Roaming not yet implemented in a particular market • Customers often expect/assume SMS to be present wherever voice roaming available Ft Lauderdale, March ’08 43
www.cdg.org
Sponsored by Verisign
Possible Problems (5 of 5) • Intermittent / Performance Issues – Hardest to troubleshoot • Often reported after subscriber returns home • Roaming cases may actually provide more information – access to trace information after-the-fact via RSP
– Examples: • “I never received an important message, but I received other messages” • “I was powered on in good coverage for hours before my messages arrived”
– Trending/aggregation may be important to decide if a bigger problem exists
Ft Lauderdale, March ’08 44
www.cdg.org
Sponsored by Verisign
Troubleshooting Process • General stages will be equivalent to other roaming services • Specific details will vary for SMS within the stages: – – – – –
Clarifying the issue Confirming expected behavior Investigation Resolution Actions Feedback/lessons
Ft Lauderdale, March ’08 45
www.cdg.org
Sponsored by Verisign
Clarifying the Issue • Eliminate wider roaming issues – Phone shows signal strength – Make/receive voice calls
• SMS specific – MO, MT or both affected? – Exact destination address for MO issues – Length of attempted message
• Impact – User-, MSC-, HLR-, MC-, Application-, Operator-wide? – Works at home?
• Time – Used to work/never worked/past fault
Exchange troubleshooting information as specified by IRT Ft Lauderdale, March ’08 46
www.cdg.org
Sponsored by Verisign
Confirming Expected Behavior • Is SMS supposed to work for this market? – Does troubleshooting team have access to an up-to-date list of markets where MO/MT SMS is expected?
• Reference Check – Test Results/TDS/RPQF – Historical troubleshooting information – Is this a new issue?
Ft Lauderdale, March ’08 47
www.cdg.org
Sponsored by Verisign
Investigation • Checklist – Subscriber authorized for SMS at HLR & VLR – Check RSP tool for delivery attempts • If not present, may not be reaching RSP (datafill error, link/element outage) or may not be reaching RSP application layer (overlength) • If present, check response. “Postponed” is the only SMSCAUSE value that indicates a notification is pending
– Check MC logs/queue
• Retest – Recreate issue if possible – Capture complete logs with protocol analyzer or MC/MSC tool – MC retry schedule may mask SMSNOT functioning Ft Lauderdale, March ’08 48
www.cdg.org
Sponsored by Verisign
Resolution Actions • Datafill errors: fix per operational policy – e.g. maintenance window only
• Provisioning errors: fix • Subscriber “reset” actions – – – –
E.g. power cycle, VLR clear at RSP May fix an unexplained problem May prevent the problem from ever being explained Balance between short- and long-term benefit to subscriber base
• Capability Gaps – Escalate per company procedures
Ft Lauderdale, March ’08 49
www.cdg.org
Sponsored by Verisign
Feedback • How to ensure knowledge gained during troubleshooting process is captured and available in the future? – – – –
Knowledgebase Training Vendor follow-up Statistical analysis
Ft Lauderdale, March ’08 50
www.cdg.org
Sponsored by Verisign
Thank You!
[email protected]
Ft Lauderdale, March ’08 51
www.cdg.org
Sponsored by Verisign
Sponsored by:
Packet Data Roaming Troubleshooting
Ft Lauderdale March, 2008
Packet Data Roaming • For the purposes of this module, “data roaming” implies: – A subscriber accessing data services in a foreign network – 1xRTT and/or EV-DO used to access data services – Voice roaming is also functioning
Ft Lauderdale, March ’08 53
www.cdg.org
Sponsored by Verisign
Roaming IP Access with Mobile IP • IP address assigned by home agent (HA) – Visited operator provides COA. – Mobile IP tunnel created between visited PDSN/FA and HA. • Public Internet access tunnels back to home network • Access to home network servers without NAT Home Operator 10.23.45.13
Visited Operator COA
HA AAA
AAA PDSN
Internet/CRX Internet/CRX PCF RAN
Application Server
PCF
Ft Lauderdale, March ’08 54
www.cdg.org
PDSN FA
Sponsored by Verisign
RAN
Roaming IP Access with Simple IP • • • •
Serving network assigns IP address to roamer NAT required If private IP address assigned. Direct access to the public Internet VPN over public Internet to access home application servers
Home Network
Serving Network NAT
AAA
AAA
Internet/CRX Internet/CRX
PDSN
PDSN
10.23.45.13 PCF RAN
Application Server
PCF
Ft Lauderdale, March ’08 55
www.cdg.org
Sponsored by Verisign
RAN
Implementing Roaming with L2TP • • • •
Home operator LNS assigns roaming MS its IP address. L2TP tunnel is created between visited PDSN/LAC and LNS. Must tunnel back to home network to access public Internet Access application servers in home network without NAT
Home Operator 10.23.45.13
Visited Operator
LNS AAA
AAA PDSN
Internet/CRX Internet/CRX PCF RAN
Application Server
PCF
Ft Lauderdale, March ’08 56
www.cdg.org
PDSN FA
Sponsored by Verisign
RAN
Aspects of Data Roaming Troubleshooting • Pre vs. Post commercial implementation (focus here is post) • Functional vs. performance – Functional troubleshooting (It doesn’t work!) – Performance troubleshooting (It works, but not very well!) • Billing for data roaming out of scope of this training module
Ft Lauderdale, March ’08 57
www.cdg.org
Sponsored by Verisign
Organizational Procedures • Essentially same as described in voice troubleshooting – Prepare organization (personnel, trouble ticket system, etc.) – Standardize what technical information to capture – Identify and document common problems and solutions – Establish systematic methodology
Ft Lauderdale, March ’08 58
www.cdg.org
Sponsored by Verisign
Functional Troubleshooting Ft Lauderdale, March ’08 59
www.cdg.org
Sponsored by Verisign
Troubleshooting Scenarios •
Subscribers reporting trouble vs. engineers troubleshooting a known issue with a device: – Engineers have access to many more tools than subscribers – Different methodologies are used in each case
•
Device scenarios – Handset only: Depends strongly on network logs for troubleshooting – Handset with data cable and laptop: More tools available – Data card (or tethered handset): Allows access to greatest number of network tools, although handset applications more difficult to test
Ft Lauderdale, March ’08 60
www.cdg.org
Sponsored by Verisign
Clarifying Questions (1/2) • The first step in troubleshooting is providing a high-level clarification of the situation • Important to all trouble shooting scenarios – Data roaming implementation exists? • Obviously, this should be “yes” or no issue exists – Does handset/application function in home network? • If “no”, then focus on issues in home network first
Ft Lauderdale, March ’08 61
www.cdg.org
Sponsored by Verisign
Clarifying Questions (2/2) • Voice roaming work in foreign network? – If “no”, then focus on voice roaming first – System selection or HLR authentication related? • Do any data applications work at all? – If “yes” then many potential issues eliminated • System selection, authentication, basic network connectivity – Shift focus to the specific application • Data authentication obviously fails? – If “yes”, then focus on data authentication component Ft Lauderdale, March ’08 62
www.cdg.org
Sponsored by Verisign
Troubleshooting Subscriber Reported Issue (1/3) •
Assume clarifying questions have been answered
•
Assume subscriber can’t access device tools (e.g. tracert, WireShark)
•
Important for home operator to gather information about the subscriber’s device
•
The required device information currently being standardized in CDG reference document
•
Identifies troubleshooting info operators should gather: – – – – – – – – –
MSID (IMSI, IRM) MEID/ESN MDN NAI IP Address Technology MIP, SIP Application tracert (if available, but requires data card and subscriber sophistication)
Ft Lauderdale, March ’08 63
www.cdg.org
Sponsored by Verisign
Troubleshooting Subscriber Reported Issue (2/3) •
Essentially, dependent on infrastructure logs as subscribers don’t have access to or knowledge of device tools
Methodology: •
Use systematic approach, and eliminate categories of issues
•
System selection failure – Look at subscriber’s PRL and roaming partner’s TDS – Work with roaming partner to determine possible issues
•
Authentication failure – Review relevant H-AAA logs – Look for clues on reason for failure (bad password?) Ft Lauderdale, March ’08
64
www.cdg.org
Sponsored by Verisign
Troubleshooting Subscriber Reported Issue (3/3)
•
Routing Issues – Check Home HA or LNS logs (pass authentication, etc?) – Look for possible firewall, port blocking, and routing table issues – Work with CRX and roaming partner engineers
•
PPP Issues – Obtain roaming subscriber’s A10/A11 logs if available (e.g. RADCOM) – Otherwise, very difficult
Ft Lauderdale, March ’08 65
www.cdg.org
Sponsored by Verisign
Field Engineering Troubleshooting • Implies an engineer troubleshooting in roaming market • Engineer could be from home or visited market • In either case, coordination between home/visited operators is usually required • More tools are available and, obviously, a greater level of technical knowledge
Ft Lauderdale, March ’08 66
www.cdg.org
Sponsored by Verisign
Network Tools Assumes data card or tethered laptop: Tool Name ipconfig netstat Ping, hrping, pathping tracert, traceroute Nslookup Route Hostname telnet FTP, TFTP WireShark/Ethereal
Purpose Provides TCP/IP information (i.e., IP address, adapters, gateways, etc.) Displays current TCP/IP connections and protocol information Generates ICMP echo requests to diagnosis routing, address resolution, latency, etc. Provides hop count and RTT for a server Provides DNS and IP address information of a remote host View and modify the local routing table Provides the local computers NETBIOS hostname Terminal emulator to allow terminal-mode sessions with a host Allows for TCP and UDP file transfers to and from a server Allows for packet sniffing, stream analysis, TCP traces, throughput calculation, etc.
Ft Lauderdale, March ’08 67
www.cdg.org
Sponsored by Verisign
Mobile IP Error Code Values •
Code Values for Mobile IP Registration Reply Messages – 0-8 Success Codes • 0 = registration accepted – 9-63 No allocation guidelines currently exist – 64-127 Error Codes from the Foreign Agent • 67 = MN Failed Authentication • 68 = HA Failed Authentication – 128-192 Error Codes from the Home Agent • 129 = Administratively prohibited – 193-200 Error Codes from the Gateway Foreign Agent – 201-255 No allocation guidelines currently exist
•
The error codes values can help explain the reason why Mobile IP registration failed.
•
General MIP numbers found at: http://www.iana.org/assignments/mobileip-numbers
Ft Lauderdale, March ’08 68
www.cdg.org
Sponsored by Verisign
PPP Connection Failures •
When PPP connections are unexpectedly failing a few items can be verified
•
Checklist: – Verify the correct networking interface/modem is selected for the connection – Verify RF conditions are sufficient for establishing a connection – Verify no other interfaces have active TCP/IP bindings on the device – View PDSN, AAA, and PPP logs (from device)
Ft Lauderdale, March ’08 69
www.cdg.org
Sponsored by Verisign
Application Connectivity Issues •
Variety of reasons may cause Application Connectivity issues: – Firewalls • IP address ranges or specific application traffic may be blocked • Examples: ICMP, SSH, Instant Messenger, Peer-to-peer traffic – Port blocking • Port ranges an application needs may be closed for security reasons – Server availability • A server may not exist or may have been moved • May have exceeded the maximum number of connections – Routing table • Routes to an application server may not exist in routing tables Ft Lauderdale, March ’08
70
www.cdg.org
Sponsored by Verisign
Application Connectivity Issues • A few things can be tried to mitigate application connectivity issues: • Checklist: – Try pinging the local host to verify the network interface is up – Try pinging the server (remote host) – Verify port blocking may be occurring – Try different source/destination ports (if possible) – Verify the route to the gateway host is defined – Try another default gateway that may have a route to the host – Try using another application server that may be less loaded
Ft Lauderdale, March ’08 71
www.cdg.org
Sponsored by Verisign
Performance Troubleshooting Ft Lauderdale, March ’08 72
www.cdg.org
Sponsored by Verisign
Performance Troubleshooting • Assumes application(s) working, but not well • Obviously, geographic distance to home servers can add significant latency (can’t be avoided) • Usually requires engineers to troubleshoot • Most performance troubleshooting requires significant coordination of: – Internal routing engineers – CRX – ISPs Ft Lauderdale, March ’08 73
www.cdg.org
Sponsored by Verisign
Performance Troubleshooting Type of Issue Latency Issues Throughput Issues
Focus Area Network and Device Network Transport Application and Device
High Packet Error/Loss Rate
Cables and Devices Network
Sub-optimal Media and Application Performance
Network / Transport / Core Network / Application / Physical Cables
Ft Lauderdale, March ’08 74
www.cdg.org
Sponsored by Verisign
Possible Cause • Number of hops and Routing problems • Routing problem • Spurious device traffic and laptop/device performance • IP fragmentation • TCP congestion control • UDP packet loss • Spurious device traffic and laptop/device performance • Application server settings • Server Selection and loading • Physical cables and devices • IP fragmentation • Insufficient core network capacity • Networking loading • QoS • Server settings • Latency • IP fragmentation • High Packet Error/Loss rates
Latency Issues •
Variety of issues may cause high/variable latency: – Number of hops • Too many hops between the client and server increases the RTT – Routing problem • Inefficiencies in routing tables may cause packets to not take the minimum path • Incorrect default gateway selection causes redirection to other hosts – Network loading • Other users sharing the same data pipe cause packets to be queued – Spurious device traffic • Unaccounted for traffic generated by malware applications, spam, etc. will share the data pipe and reduce throughputs Ft Lauderdale, March ’08
75
www.cdg.org
Sponsored by Verisign
What to Verify for Latency Issues • When performance does not meet expectations due to latency issues – Throughput may be lower than expected – Application responsiveness may be poor • Checklist: – Verify number of hops to server (traceroute) – Verify round-trip time to server (Ping) – Verify network loading (# of other users) – Verify no extraneous or foreign traffic being generated by the device Ft Lauderdale, March ’08 76
www.cdg.org
Sponsored by Verisign
Throughput Performance Issues •
Variety of reasons may cause throughput issues: – IP fragmentation • Fragmenting of IP packets causes additional physical layer packets to be generated • Results in a high percentage of packets in error, retransmissions, and delays – TCP Congestion Control Issues / UDP packet loss • Retransmissions will cause TCP Slow Start and Congestion avoidance • Network congestion may cause lost UDP datagrams – Spurious device traffic • Unaccounted for traffic generated by malware applications, spam, etc. will share the data pipe and reduce throughputs – Application server settings / server selection / server loading • Sub-optimal FTP server settings will reduce data transfer capabilities • A public server or a server located too many hops away may cause reduced throughputs Ft Lauderdale, March ’08
77
www.cdg.org
Sponsored by Verisign
Thank You!
[email protected]
Ft Lauderdale, March ’08 78
www.cdg.org
Sponsored by Verisign