IEG3090 A case study: Skype
Dah Ming Chiu Chinese University of Hong Kong
Skype overview
VoIP, Instant Messaging and Videoconferencing Original name of project – “Sky Peer-to-Peer” Developers of P2P system Kazaa
Kazaa was used for content sharing; but the system got into legal problems The company changed direction to adapt Kazaa for VoIP
Skype was free, easy to use, with good performance
It had rapid growth since it came out in August 2003 It was sold to eBay in 2006 for US$2.6B
User interface of latest version
User population
By Q4 of 2008, Skype has 405M accounts, with more than 33M daily active users It had 20.5B Skype-to-Skype minutes, and 2.6 Skype-out minutes.
Skype-out is charged; Skype needs to pay local phone companies for access to their customers – a form of “peering”
It had US$145M net revenue
New announcements New initiatives to generate revenue Skype for SIP Skype for iPhone
Why is Skype successful?
Many possible reasons
Using P2P technology is one possible reason Better Codecs? Better handling of peers behind NAT and firewalls? Better integration with Instant Messaging and Presence? Better integration with PSTN? …
The Skype protocol is encrypted, so it is not possible to truly “reverse engineer” it, but guess what it does
Learning about a protocol from traffic analysis Baset and Schulzrinne, “An analysis of the Skype peer-topeer Internet telephony protocol”, Infocom 2006
Capture packets using tools such as Wireshark
Try to deduce the design of the system from traffic analysis We can compare Skype with a SIP-based VoIP system
SIP – session initiation protocol
Look up user’s addr/port, based on name
First look up proxy server of user domain in DNS, similar to mail server lookup Then look up user from proxy server
Then negotiate the rest of parameters between the two peers SIP is a text-based protocol allowing you to
1) register Proxy server
3) What is my friend’s IP addr?
4) Set up rest of parameters
DNS 2) Where is my friend’s proxy server?
Skype nodes Skype client (SC) Use a host cache to remember Super nodes
Super node (SN) providing p2p service a bit like proxy server but providing more services
Login Server: only centralized server provide login service
Not all nodes are born equal What a node is capable of doing depends on: Does the node have a public IP address? Is the node behind a NAT box? Is the node behind a firewall that blocks UDP packets? Only peers with public IP addresses (and with sufficient CPU and memory) can become a SN A peer cannot control whether it becomes a SN
Services provided by SN
A few special “bootstrap” SNs help SC find login server A SN helps a SC determine whether it is behind NAT or UDP-blocking firewall A SN helps a SC search a user
Any user logged in during the last 72 hours can be found This is validated by the paper
A SN helps two SC behind firewalls to relay voice packets
Transport protocols used
Signaling is done using TCP
Voice packets are usually transferred using UDP
Why? Why?
If there are UDP-blocking firewall in between, then TCP is used.
Bootstrapping
To avoid hard-wiring the login-server info in the software, perhaps a small number of SNs are hardwired (with fixed IP addresses and using port 80 and 443). These bootstrap SNs seems to provide addr/port of login-server They may also help SC to determine whether it is behind NAT or firewall
STUN
STUN = Simple Traversal of UDP datagram protocol through NATs Skype may use something similar to STUN A server implementing STUN will respond to a client, after repeated messages, the client knows whether it is behind a NAT or/and a firewall
STUN client
NAT or firewall, or nothing?
STUN server
Login
The login server ensures each user has unique name Establish session keys for SCs Based on close examination of the few protocol messages, it appears Skype is using SSL The login servers appear to be in Denmark and Netherlands Try UDP first, then TCP
Searching for users When not restricted by firewall, SC goes through this loop 1) Talk to SN (TCP), get a list of peers 2) Talk to the list of peers (UDP) 3) If not found, get a longer list from SN to repeat search
Something in-between flood and random walk?
If restricted by firewall, SN does the search instead
More about search
Login server may help search Search result is cached in intermediate nodes Search for wildcards allowed
The retrieved results are not identical
Tools used to help analysis
Ethereal is used to capture packets generated by Skype NetPeeker is used to tune the bandwidth to study Skype operation under congestion DNS – reverse DNS lookup MaxMind – used to look up country/city of IP address NAT box UDP-blocking firewall Autolt – a scripting tool used to start/stop Skype sessions Netstat – to see to check addr/port of a
Experiments to reverse engineer
Try the three scenarios one by one:
Public IP Behind NAT Behind firewall
Try emptying Host Cache Write a script to try many sessions, capture session specific data
Number of SN seen
After 8163 successful logins
Distribution of SNs
Note, majority (83.7%) found in US
About the authors
Professor Henning Schulzrinne is chair of Columbia’s CS Dept. He is one of the most important contributors to multimedia networking, particularly VoIP He was the main contributor for RTP, SIP etc P2P SIP also started with his project at Columbia
There is more references about “Skype analysis” at: http://www1.cs.columbia.edu/~salman/skype/