Network Troubleshooting Overview

  • November 2019
  • PDF

This document was uploaded by user and they confirmed that they have the permission to share it. If you are author or own the copyright of this book, please report to us by using this DMCA report form. Report DMCA


Overview

Download & View Network Troubleshooting Overview as PDF for free.

More details

  • Words: 5,391
  • Pages: 19
Network Troubleshooting Overview These sections introduce you to the concepts and practice of network troubleshooting: • • •

Introduction to Network Troubleshooting Network Troubleshooting Framework Troubleshooting Strategy

Network troubleshooting means recognizing and diagnosing networking problems with the goal of keeping your network running optimally. As a network administrator, your primary concern is maintaining connectivity of all devices (a process often called fault management). You also continually evaluate and improve your network's performance. Because serious networking problems can sometimes begin as performance problems, paying attention to performance can help you address issues before they become serious.

About Connectivity Problems Connectivity problems occur when end stations cannot communicate with other areas of your local area network (LAN) or wide area network (WAN). Using management tools, you can often fix a connectivity problem before users even notice it. Connectivity problems include: •





Loss of connectivity - When users cannot access areas of your network, your organization's effectiveness is impaired. Immediately correct any connectivity breaks. Intermittent connectivity - Although users have access to network resources some of the time, they are still facing periods of downtime. Intermittent connectivity problems can indicate that your network is on the verge of a major break. If connectivity is erratic, investigate the problem immediately. Timeout problems - Timeouts cause loss of connectivity, but are often associated with poor network performance.

About Performance Problems Your network has performance problems when it is not operating as effectively as it should. For example, response times may be slow, the network may not be as reliable as usual, and users may be complaining that it takes them longer to do their work. Some performance problems are intermittent, such as instances of duplicate addresses. Other problems can indicate a growing strain on your network, such as consistently high utilization rates.

If you regularly examine your network for performance problems, you can extend the usefulness of your existing network configuration and plan network enhancements, instead of waiting for a performance problem to adversely affect the users' productivity.

Solving Connectivity and Performance Problems When you troubleshoot your network, you employ tools and knowledge already at your disposal. With an in-depth understanding of your network, you can use network software tools, such as "Ping", and network devices, such as "Analyzers", to locate problems, and then make corrections, such as swapping equipment or reconfiguring segments, based on your analysis. Transcend® provides another set of tools for network troubleshooting. These tools have graphical user interfaces that make managing and troubleshooting your network easier. With "Transcend Applications", you can: • • • • •

Baseline your network's normal status to use as a basis for comparison when the network operates abnormally Precisely monitor network events Be notified immediately of critical problems on your network, such as a device losing connectivity Establish alert thresholds to warn you of potential problems that you can correct before they affect your network Resolve problems by disabling ports or reconfiguring devices

See "Your Network Troubleshooting Toolbox" for details about each troubleshooting tool.

Network Troubleshooting Framework The International Standards Organization (ISO) Open Systems Interconnect (OSI) reference model is the foundation of all network communications. This seven-layer structure provides a clear picture of how network communications work. Protocols (rules) govern communications between the layers of a single system and among several systems. In this way, devices made by different manufacturers or using different designs can use different protocols and still communicate. By understanding how network troubleshooting fits into the framework of the OSI model, you can identify at what layer problems

are located and which type of troubleshooting tools to use. For example, unreliable packet delivery can be caused by a problem with the transmission media or with a router configuration. If you are receiving high rates of "FCS Errors" and "Alignment Errors", which you can monitor with Status Watch, then the problem is probably located at the physical layer and not the network layer. Figure 1 shows how to troubleshoot the layers of the OSI model. Table 5 describes the data that the network management tools can collect as it relates to the OSI model layers. Table 5 Network Data and the OSI Model Layers Layer

Data Collected

TranscendcNCS Tool Used

Application Presentatio n Protocol information and other Remote Monitoring (RMON) and RMON2 data Session



LANsentry Manager



Traffix Manager (for more detail)

• •

Status Watch LANsentry Manager (for more detail)



Traffix Manager (for more detail)



Status Watch

Transport

Network

Routing information

Data Link

Traffic counts and other packet breakdowns



LANsentry Manager (for more detail)

Physical

Error counts



Status Watch

Figure 1 OSI Reference Model and Network Troubleshooting

For information about network troubleshooting tools, see "Your Network Troubleshooting Toolbox".

Troubleshooting Strategy How do you know when you are having a network problem? The answer to this question depends on your site's network configuration and on your network's normal behavior. See "Knowing Your Network" for more information. If you notice changes on your network, ask the following questions: • •

Is the change expected or unusual? Has this event ever occurred before?

• • •

Does the change involve a device or network path for which you already have a backup solution in place? Does the change interfere with vital network operations? Does the change affect one or many devices or network paths?

After you have an idea of how the change is affecting your network, you can categorize it as critical or noncritical. Both of these categories need resolution (except for changes that are one-time occurrences); the difference between the categories is the time that you have to fix the problem. By using a strategy for network troubleshooting, you can approach a problem methodically and resolve it with minimal disruption to network users. It is also important to have an accurate and detailed map of your current network environment. Beyond that, a good approach to problem resolution is: • • • •

Recognizing Symptoms Understanding the Problem Identifying and Testing the Cause of the Problem Solving the Problem

Recognizing Symptoms The first step to resolving any problem is to identify and interpret the symptoms. You may discover network problems in several ways. Users may complain that the network seems slow or that they cannot connect to a server. You may pass your network management station and notice that a node icon is red. Your beeper may go off and display the message: WAN connection down. User Comments Although you can often solve networking problems before users notice a change in their environment, you invariably get feedback from your users about how the network is running, such as: • • • • • •

They cannot print. They cannot access the application server. It takes them much longer to copy files across the network than it usually does. They cannot log on to a remote server. When they send e-mail to another site, they get a routing error message. Their system freezes whenever they try to Telnet.

Network Management Software Alerts

Network management software, as described in "Your Network Troubleshooting Toolbox", can alert you to areas of your network that need attention. For example: • •



The application displays red (Warning) icons. Your weekly Top-N utilization report (which indicates the 10 ports with the highest utilization rates) shows that one port is experiencing much higher utilization levels than normal. You receive an e-mail message from your network management station that the threshold for broadcast and multicast packets has been exceeded.

These signs usually provide additional information about the problem, allowing you to focus on the right area. Analyzing Symptoms When a symptom occurs, ask yourself these types of questions to narrow the location of the problem and to get more data for analysis: •

• • • • •

To what degree is the network not acting normally (for example, does it now take one minute to perform a task that normally takes five seconds)? On what subnetwork is the user located? Is the user trying to reach a server, end station, or printer on the same subnetwork or on a different subnetwork? Are many users complaining that the network is operating slowly or that a specific network application is operating slowly? Are many users reporting network logon failures? Are the problems intermittent? For example, some files may print with no problems, while other printing attempts generate error messages, make users lose their connections, and cause systems to freeze.

Understanding the Problem Networks are designed to move data from a transmitting device to a receiving device. When communication becomes problematic, you must determine why data are not traveling as expected and then find a solution. The two most common causes for data not moving reliably from source to destination are: • •

The physical connection breaks (that is, a cable is unplugged or broken). A network device is not working properly and cannot send or receive some or all data.

Network management software can easily locate and report a physical connection break (layer 1 problem). It is more difficult to determine why a network device is not working as expected, which is often related to a layer 2 or a layer 3 problem. To determine why a network device is not working properly, look first for: •





Valid service - Is the device configured properly for the type of service it is supposed to provide? For example, has Quality of Service (QoS), which is the definition of the transmission parameters, been established? Restricted access - Is an end station supposed to be able to connect with a specific device or is that connection restricted? For example, is a firewall set up that prevents that device from accessing certain network resources? Correct configuration - Is there a misconfiguration of IP address, subnet mask, gateway, or broadcast address? Network problems are commonly caused by misconfiguration of newly connected or configured devices. See "Manager-toAgent Communication" for more information.

Identifying and Testing the Cause of the Problem After you develop a theory about the cause of the problem, test your theory. The test must conclusively prove or disprove your theory. Two general rules of troubleshooting are: • •

If you cannot reproduce a problem, then no problem exists unless it happens again on its own. If the problem is intermittent and you cannot replicate it, you can configure your network management software to catch the event in progress.

For example, with "LANsentry Manager", you can set alarms and automatic packet capture filters to monitor your network and inform you when the problem occurs again. See "Configuring Transcend NCS" for more information. Although network management tools can provide a great deal of information about problems and their general location, you may still need to swap equipment or replace components of your network until you locate the exact trouble spot. After you test your theory, either fix the problem as described in "Solving the Problem" or develop another theory. Sample Problem Analysis This section illustrates the analysis phase of a typical troubleshooting incident.

On your network, a user cannot access the mail server. You need to establish two areas of information: • •

What you know - In this case, the user's workstation cannot communicate with the mail server. What you do not know and need to test • Can the workstation communicate with the network at all, or is the problem limited to communication with the server? Test by sending a "Ping" or by connecting to other devices. • Is the workstation the only device that is unable to communicate with the server, or do other workstations have the same problem? Test connectivity at other workstations. • If other workstations cannot communicate with the server, can they communicate with other network devices? Again, test the connectivity.

The analysis process follows these steps: 1 . Can the workstation communicate with any other device on the subnetwork? • •

2. • •

If no, then go to step 2. If yes, determine if only the server is unreachable. • If only the server cannot be reached, this suggests a server problem. Confirm by doing step 2. • If other devices cannot be reached, this suggests a connectivity problem in the network. Confirm by doing step 3. Can other workstations communicate with the server? If no, then most likely it is a server problem. Go to step 3. If yes, then the problem is that the workstation is not communicating with the subnetwork. (This situation can be caused by workstation issues or a network issue with that specific station.)

3.

Can other workstations communicate with other network devices? • •

If no, then the problem is likely a network problem. If yes, the problem is likely a server problem.

When you determine whether the problem is with the server, subnetwork, or workstation, you can further analyze the problem, as follows:



• •

For a problem with the server - Examine whether the server is running, if it is properly connected to the network, and if it is configured appropriately. For a problem with the subnetwork - Examine any device on the path between the users and the server. For a problem with the workstation - Examine whether the workstation can access other network resources and if it is configured to communicate with that particular server.

Equipment for Testing To help identify and test the cause of problems, have available: •



• •

A laptop computer that is loaded with a terminal emulator, TCP/IP stack, TFTP server, CD-ROM drive (to read the online documentation), and some key network management applications, such as LANsentry® Manager. With the laptop computer, you can plug into any subnetwork to gather and analyze data about the segment. A spare managed hub to swap for any hub that does not have management. Swapping in a managed hub allows you to quickly spot which port is generating the errors. A single port probe to insert in the network if you are having a problem where you do not have management capability. Console cables for each type of connector, labeled and stored in a secure place.

Solving the Problem Many device or network problems are straightforward to resolve, but others yield misleading symptoms. If one solution does not work, continue with another. A solution often involves: •



Upgrading software or hardware (for example, upgrading to a new version of agent software or installing Gigabit Ethernet devices) Balancing your network load by analyzing: • What users communicate with which servers • What the user traffic levels are in different segments Based on these findings, you can decide how to redistribute network traffic.



Adding segments to your LAN (for example, adding a new switch where utilization is continually high)



Replacing faulty equipment (for example, replacing a module that has port problems or replacing a network card that has a faulty jabber protection mechanism)

To help solve problems, have available: • •

Spare hardware equipment (such as modules and power supplies), especially for your critical devices A recent backup of your device configurations to reload if flash memory gets corrupted (which can sometimes happen due to a power outage) Use the Transcend NCS application suite Network Admin Tools to save and reload your software configurations to devices.

Want to do this for a living? Check out Start Your Own Computer Business

Network Cable and Hub Troubleshooting Does the PC (Workstation) see any network resources, servers, other workstations? Note that some versions of networking software display "remembered" resources, even when the PC can't access them, so you'll need to actually click on a given resource to see if it's really available. Return to Diagnostic Chart Have you recently added a network hub? Is the workstation the first workstation on a new hub, being stacked or chained to existing hub(s)? When connecting hubs or switches with twisted pair (RJ-45 connector) cabling, whether 10BaseT or 100BaseT, make sure that you either connect to an "X" port (uplink port), or use a special crossover cable. A crossover cable, unlike a straight through cable, connects pins 1 and 2 on one end to 3 and 6 on the other end, and vice versa. You must use a twisted pair for each for noise protection. If you do have an X or uplink port, you normally see that it is connected to an adjacent port by a line or other symbol. You can only use one or the other, since they are using the same physical circuitry, with the X port making the pair reversal. Some people seem to think that hubs are bullet proof, but I've seen as many bad hubs and bad ports on hubs as bad network adapters. Hubs also have a power transformer that needs to be plugged into a live outlet. Return to Diagnostic Chart Most network adapters have one or more onboard LEDs to show the status of the link and network activity (traffic). If your documentation tells you that you have such a link light, is it lit? No link light indicates there's a actual break in your physical layer. Check the physical connectors at all points on your network in the failed path, and make sure that you are within all of the limits for your physical layer in terms of number of workstations and distances. On a 10BaseT or 100BaseT network, swap the workstation cable to another port on the hub and see if it works. While it's possible the adapter or the next port or device to which it's connected is bad or powered off, the problem is usually caused by the cable. Wireless and IR adapters may fail simply due to the physical location (blind spot) or distance from the transceiver. Old coax networks can have the wrong or missing termination (the most common, Thin Ethernet, requires 50 Ohm terminators at segment ends). Return to Diagnostic Chart

Have you cloned the software configuration from another workstation on the network (everything but the unique portion of the IP address, assuming you're set up for TCP/IP)? It's too easy to make a mistake with which protocol should be the default or with the spelling of a Workgroup, etc. At an active workstation, go through every option in the network setup and print screen every page and sub page that comes up. Keep it around for future reference when you run into networking problems with a similar workstation. If this is the first workstation on the network, or the second on a peer-to-peer, go with the defaults and make use of the operating system's built in troubleshooter, at least in Windows versions. Your problem is most likely software configuration, which is far too in depth to address in the chart. When in doubt, reboot. Return to Diagnostic Chart Does the Device Manager see the network adapter and report no conflicts? Try reinstalling the driver and rebooting. In Windows, start by deleting the existing network device in Device Manager. If Windows still won't recognize the network adapter, it could be a conflict with another hardware adapter or it could be faulty. If the adapter is built-in, either on the motherboard or in a notebook, try restoring the defaults in CMOS Setup. Proceed to the Conflict Resolution flowchart. Return to Diagnostic Chart Have you tried a known good cable? Even if the link light is lit, it doesn't mean your cable is capable of carrying network traffic. An incredible number of techs make these cables wrong out of sheer laziness or ignorance. Don't say, "But it's a new cable!" Four conductors are actually used for normal implementations 10BaseT and 100BaseT, and the wiring is straight through, 1-1, 2-2, 3-3, 6-6. Pins 1 and 2 and pins 3 and 6 must each use a twisted pair, or the longer runs will fail and shorter runs will act unpredictably. Visually inspect connectors to make sure they are solid and wired properly (i.e. two shared pairs for 1-2, 3-6). Squint into the transparent connector and try to take note of the color coding for pins 1, 2, 3 and 6. Then go look at the other end of the cable and make sure that the color coding is the same, AND that a pair (i.e, blue, blue stripe) is used for the pair 1 and 2 and the pair 3 and 6. Return to Diagnostic Chart Take the PC (just the system box which some people call the CPU) to another workstation location and swap it out with that PC. If you get right on the network, that tells you that the physical link to the location where it failed is bad. That could be the patch cable, the inwall wiring, or the port on the hub it connects to. If it doesn't work at the new location, that tells you it's either the network adapter or the software configuration. If it's an add-in adapter and you have a spare, by all means try swapping it out, but the software settings are more often the culprit. Make sure the driver is up-to-date and the correct

version for the OS, make sure that you have cloned all the settings (except the machine name or final IP address) from a working machine, and try going through the OS troubleshooting steps. Return to Diagnostic Chart Are your network access problems of a random or intermittent nature? Check for loose connectors. It's very easy to install a RJ-45 connector improperly or fail to crimp it tightly enough to hold to the cable such that it loosens up with just a minor physical movement. The problem might also be interference somewhere in the cable run. Make sure it's not draped over the back of a CRT or running directly over florescent lights or other noisy RF emitters. You could be experiencing software conflicts with other processes on the PC. You can try eliminating all tasks except the minimal network configuration and do some large file movements to see if the hardware layer is solid. More likely it's simply the loading of the network, a traffic jam, or you're exceeding the number of simultaneous users supported by the hardware (including wireless) or the software. Return to Diagnostic Chart Are you using Shielded Twisted Pair (STP) or any other cable type with a non-signaling shield? Note that this is not the usual case for twisted pair cabling. Make sure that the shield is grounded at one end only, or you could end up with a ground loop and a constant leakage current. If it's not grounded at either end, it may act as an antenna to pick up and disperse interference. Also, make sure that your cables, even when grounded, are intelligently routed. Stay away from transformers, high current junctions, heavy equipment that can induce lots of electrical noise, though it's primarily the higher frequencies you need to worry about. Return to Diagnostic Chart Are you within the physical layer limits for your network? This applies to both wired and wireless networks. Don't go by the number in the IEEE standard, use the limit in the hub, switch or base station documentation. Be aware that the distance limitations are based on a normal operating environment with the proper cabling or antennas installed. If your cables are made wrong, routed poorly, or are low quality, the limits will be reduced. Rerouting cables, adding repeaters (amplifiers) or eliminating sources of interference can increase the reach of your network. Return to Diagnostic Chart Have you tried a different port on the hub? There's no rule that says hubs have to fail all at once, and even though a performance degradation of a single port is a rarity, it's worth trying. It could also be that the cable end plugged into the hub wasn't crimped on as tightly as it could have been, causing the performance of the link to be dependent on the exact position of the cable, an unacceptable situation.

Return to Diagnostic Chart Does the problem, be it lost connections, slow performance or anything else, occur during periods when network traffic is high or a large number of users are logged on? There are many reasons a network can bog down or have trouble in high traffic or high user count situations, including the natural limitations of the technologies being used. In general, if you are using a passive hub, you can greatly increase your network performance during high traffic periods by swapping the hub for an active switch. Also, if you are running a hybrid LAN, with a mix of 10BaseT and 100BaseT adapters, you should upgrade them all to 100BaseT, providing the cable plant is all Cat 5, which it better be! Return to Diagnostic Chart Is the PC flaky when it's not on the network? If so, don't waste any more time on network diagnostics, proceed to Motherboard, CPU and RAM failure and look for the symptoms the PC is displaying. This isn't a good test of software problems, since you run different applications and have different resource usage when you're connected to the network. Return to Diagnostic Chart You should always have a proven long bypass cable for testing, that you can run directly from the workstation to the hub without going through walls, ceilings, etc. Make sure you are within the distance limits for twisted pair, wireless and IR, and within the total number of active stations limit for wireless and IR. Check for physical cable damage. The sheathing on the Cat 5 cables is thin and the inner conductors can be easily broken if the cable is stretched or crimped. Return to Diagnostic Chart Does a new network adapter fix the problem? New PCI network adapters cost less than $10, so there's no reason not to try one. If you're running a wireless network with notebooks and add on wireless adapters, borrow one from a good unit. If the new network adapter hasn't fixed the problem and you've gone through all the physical layer diagnostics to get here, it's a software issue.

Issue: Basic network troubleshooting.

Cause:

If a computer is unable to connect to a network or see other computers on a network, it may be necessary to troubleshoot the network. A network may not work because of any of the below reasons. 1. 2. 3. 4. 5.

Network card not connected properly. Bad network card drivers or software settings. Firewall preventing computers from seeing each other. Connection related issues. Bad network hardware.

Solution: Because of the large variety of network configurations, operating systems, setup, etc... not all of the below information may apply to your network or operating system. If your computer is connected to a company or large network, or you are not the administrator of the network, it is recommended that if you are unable to resolve your issues after following the below recommendations that you contact the network administrator or company representative. Note: If you are being prompted for a Network password and do not know the password, Computer Hope is unable to assist users with obtaining a new or finding out the old password.

Verify connections / LEDs Verify that the network cable is properly connected to the back of the computer. In addition, when checking the connection of the network cable, ensure that the LEDs on the network are properly illuminated. For example, a network card with a solid green LED or light usually indicates that the card is either connected or receiving a signal. Note: generally, when the green light is flashing, this is an indication of data being sent or received. If, however, the card does not have any lights or has orange or red lights, it is possible that either the card is bad, the card is not connected properly, or that the card is not receiving a signal from the network. If you are on a small or local network and have the capability of checking a hub or switch, verify that the cables are properly connected and that the hub or switch has power.

Adapter resources

Ensure that if this is a new network card being installed into the computer that the card's resources are properly set and/or are not conflicting with any hardware in the computer. Users who are using Windows 95, 98, ME, 2000 or XP, verify that device manager has no conflicts or errors. Additional help and information about device manager and resources can be found on our device manager page.

Adapter functionality Verify that the network card is capable of pinging or seeing itself by using the ping command. Windows / MSDOS users ping the computer from a MS-DOS prompt. Unix / Linux variant users ping the computer from the shell. To ping the card or the localhost, type either ping 127.0.0.1 or ping localhost This should show a listing of replies from the network card. If you receive an error or if the transmission failed, it is likely that either the network card is not physically installed into the computer correctly, or that the card is bad.

Protocol Verify that the correct protocols are installed on the computer. Most networks today will utilize TCP/IP, but may also utilize or require IPX/SPX and NetBEUI. Additional information and help with installing and reinstalling a network protocol can be found on document CH000470. When the TCP/IP protocol is installed, unless a DNS server or other computer assigns the IPX address, the user must specify an IP address as well as a Subnet Mask. To do this, follow the below instructions.

1. Click Start / Settings / Control Panel 2. Double-click the Network icon 3. Within the configuration tab double-click the TCP/IP protocol icon. Note: Do not click on the PPP or Dial-Up adapter, click on the network card adapter. 4. In the TCP/IP properties click the IP address tab 5. Select the option to specify an IP address 6. Enter the IP address and Subnet Mask address, an example of such an address could be: IP Address: 102.55.92.1 Subnet Mask: 255.255.255.192 7. When specifying these values, the computers on the network must all have the same Subnet Mask and have a different IP Address. For example, when using the above values on one computer you would want to use an IP address of 102.55.92.2 on another computer and then specify the same Subnet Mask.

Firewall If your computer network utilizes a firewall, ensure that all ports required are open. If possible, close the firewall software program or disconnect the computer from the firewall to ensure it is not causing the problem.

Additional time In some cases it may take a computer some additional time to detect or see the network. If after booting the computer you are unable to see the network, give the computer 2-3 minutes to detect the network. Windows users may also want to try pressing the F5 (refresh) key when in Network Neighborhood to refresh the network connections and possibly detect the network.

Additional troubleshooting If after following or verifying the above recommendations you are still unable to connect or see the network, attempt one or more of the below recommendations. If you have installed or are using TCP/IP as your protocol you can attempt to ping another computer's IP address to verify if the computer is able to send and receive data. To

do this, Windows or MS-DOS users must be at a prompt and Linux / Unix variant users must open or be at a shell. Once at the prompt assuming, that the address of the computer you wish to attempt to ping is 102.55.92.2, you would type: ping 102.55.92.2 If you receive a response back from this address (and it is a different computer), this demonstrates that the computer is communicating over the network. If you are still unable to connect or see the network, it is possible that other issues may be present. Another method of determining network issues is to use the tracert command if you are a MS-DOS or Windows user or the traceroute command if you are a Linux / Unix variant user. To use this command you must be at the command prompt or shell. Once at the prompt, assuming that the address is again 102.55.92.2, type: tracert 102.55.92.2 or traceroute 102.55.92.2 This should begin listing the hops between the computer and network devices. When the connection fails, determine which device is causing the issue by reviewing the traceroute listing.

Related Documents