1
ARTICLE ON BITTORRENT PROTOCOL FOR (P2P) TRANSFER OF DATA
Submitted on 05/11/2008
Submitted by PRIYA V V CSE, JEC
[email protected]
2
BITTORRENT PROTOCOL FOR (P2P) TRANSFER OF DATA Introduction BitTorrent is a peer-to-peer file sharing protocol used to distribute large amounts of data. BitTorrent is a protocol (developed by Bram Cohen in 2001) that enables fast downloading of large files using minimum internet bandwidth. It costs nothing to use and includes no spyware or pop-up advertising. Unlike other download methods, Bit Torrent maximizes transfer speed by gathering pieces of the file and downloading these pieces simultaneously from people who already have them. This makes the download process much faster as compared to other protocols especially for popular and very large files, like videos and television programs. It is now maintained by Cohen's company BitTorrent, Inc.
Client-server download process Figure 1 shows the client server downloading process of a Bittottent protocol. Main steps are: • • •
Open a Web page and click a link to download a file to the Computer. The Web browser software on the computer (the client) tells the server (a central computer that holds the Web page and the file user want to download) to transfer a copy of the file to users computer. The transfer is handled by a protocol (a set of rules) like FTP (File Transfer Protocol) or HTTP (HyperText Transfer Protocol). Open a web page and click a link to down load a file
The client tells the server to transfer a copy of the file to the computer The transfer speed is affected by a number of variables, including the type of protocol, the amount of traffic on the server, and the number other computers that are downloading the file
The transfer is handled by a protocol such as FTP or HTTP
If the file is both large and popular, the demands of the server are great and the download will be slow
Fig1: Client-server download process
3
BitTorrent Overview
Fig 2: Participants of the file transfer process
Common terminology and participants for the file transfer Seed or seeder – A computer with a complete copy of a BitTorrent file (at least one seed computer is necessary for a BitTorrent download to operate). Swarm – A group of computers simultaneously sending (uploading) or receiving (downloading) the same file. .torrent – A pointer file that directs the computer to the file that wants to download. It usually includes: the URL of the tracker pieces
piece length name length of the file
Tracker – A server that manages the BitTorrent file-transfer process and has a complete view of the swarm. It includes information like:
peer cache (IP address, port, peer id) state information (completed or downloading) Leechers – are those systems that have some or no chunks of the file
4
Clients – (seeders or leechers)
Messages (transfer of information between nodes) Peer-Peer messages use TCP Sockets to communicate Peer-Tracker messages use HTTP re-quest / response returns a random list of peers
BitTorrent Download Process
Fig 3: BitTorrent's peer-to-peer download process
1. Open a Web page and click a link for the file. 2. BitTorrent client software communicates with a tracker to find other computers running BitTorrent that have the complete file (seed computers) and those with a portion of the file (peers that are usually in the process of downloading the file). 3. The tracker identifies the swarm. It includes the connected computers that have all or a portion of the file and is in the process of sending or receiving it. 4. The tracker helps the client software trade pieces of the file with other computers in the swarm. A person’s computer receives multiple pieces of the file simultaneously. If the person continue to run the BitTorrent client software after their download is complete, others can receive .torrent files from person’s computer; the person future download rates improve because he/she are ranked higher in the "tit-for-tat" system (a system used for fair sharing of data).
5 Downloading pieces of the file at the same time helps solve a common problem with other peer-to-peer download methods: Peers upload at a much slower rate than they download. By downloading multiple pieces at the same time, the overall speed is greatly improved. The more computers involved in the swarm, the faster the file transfer occurs because there are more sources of each piece of the file. For this reason, BitTorrent is especially useful for large and popular files.
Algorithms used for piece selection The order in which pieces are selected by different peers is critical for good performance If an inefficient policy is used, then peers may end up in a situation where each has all the identical set of easily available pieces, and none of the missing ones. If the original seed is prematurely taken down, then the file cannot be completely downloaded. What are “good policies”? Random first piece initially, a peer has nothing to trade important to get a complete piece immediately select a random piece of the file and download it Rarest piece first Determine the pieces that are most rare among your peers and download those first. This ensures that the most commonly available pieces are left till the end to download. Rarest first also ensures that a large variety of pieces are downloaded from the seed. Endgame Mode Near the end, missing pieces are re-quested from every peer containing them. When the piece arrives, the pending requests for that piece are cancelled. This ensures that a download is not prevented from completion due to a single peer with a slow transfer rate. Some bandwidth is wasted, however in practice, this is not too much. Choking Choking is a temporary refusal to up-load. It is one of BitTorrent’s most powerful ideas to deal with free rid-ers (those who only download but never upload). Given a set of file block requests from its peers, a BitTorrent node needs to determine which requests to service and which to ignore A peer chooses periodically a set of peers with whom it opens TCP connections to upload pieces. Only 4 simultaneous connections 3 best up-loaders & 1 random peer. Peers are chosen for un-choking in a round-robin fashion .This optimistic un-choking mechanism allows a new BitTorrent node to receive some file pieces so that it has a chance to con-tribute back within the swarm.
Advantages The advantages are: BitTorrent networking is not a ‘publish-subscribe’ model like Kazaa; instead, BitTorrent is true peer-peer networking where users do the actual file serving.
6 Torrents enforce 99% quality control by filtering out corrupted and dummy files, ensuring that downloads contain only what they claim to contain. Torrents actively encourage users to share their complete files, while punishing users who only download files. BitTorrent can achieve download speeds over 1.5 megabits per second. BitTorrent code is open-source, advertising-free, and adware / spyware-free. This means that no single person profits from BitTorrent's success.
Limitations and security vulnerabilities (a) Lack of anonymity BitTorrent does not offer its users anonymity. It is possible to obtain the IP addresses of all current, and possibly previous, participants in a swarm from the tracker. This may expose users with insecure systems to attacks. (b) Dialup versus broadband BitTorrent is best suited for continuously connected broadband environments, since dial-up users find it less efficient due to frequent disconnects and slow download rates. (c) Leech problem BitTorrent file sharers, compared to users of client / server technology, often have little incentive to become seeders after they finish downloading. The result of this is that torrent swarms gradually die out; meaning a lower possibility of obtaining older torrents. Some BitTorrent websites have attempted to address this by recording each user's download and upload ratio for all or just the user to see, and the provision of access to newer torrent files to people with better ratios. (d) Legal issues There has been much controversy over the use of BitTorrent trackers. BitTorrent metafiles themselves and do not store copyrighted data, hence BitTorrent itself is not illegal – it is the use of it to copy copyrighted material that contravenes laws in some locations.
Technologies built on BitTorrent The BitTorrent protocol is still under development and therefore may still acquire new features and other enhancements such as improved efficiency. Distributed trackers In June 2005, BitTorrent, Inc. released version 4.2.0 of the Mainline BitTorrent client. This release supported "trackerless" torrents, featuring a DHT implementation which allowed the client to use torrents that do not have a working BitTorrent tracker. Another idea that has surfaced in Vuze is that of virtual torrents. This idea is based on the distributed tracker approach and is used to describe some web resource. Currently, it is used for instant messaging. It is implemented using a special messaging protocol and requires an appropriate plugin. Anatomic P2P is another approach, which uses a decentralized network of nodes that route traffic to dynamic trackers. Most BitTorrent clients also use Peer exchange (PEX) to gather peers in addition to trackers and DHT. Peer exchange checks with known peers to see if they know of any other peers. Web seeding
7 Web seeding was implemented in 2006 as the ability of BitTorrent clients to download torrent pieces from an HTTP source in addition to the swarm. The advantage of this feature is that a site may distribute a torrent for a particular file or batch of files and make those files available for download from that same web server; this can simplify seeding and load balancing greatly once support for this feature is implemented in the various BitTorrent clients. The latest version of the popular download manager GetRight supports downloading a file from HTTP, FTP, and BitTorrent protocols. RSS feeds A technique called Broadcatching combines RSS with the BitTorrent protocol to create a content delivery system, further simplifying and automating content distribution. A script would periodically check the feed for new items, and use them to start the download.
Encryption Since BitTorrent makes up a large proportion of total traffic, some ISPs have chosen to throttle (slow down) BitTorrent transfers to ensure network capacity remains available for other uses. For this reason methods have been developed to disguise BitTorrent traffic in an attempt to thwart these efforts. Protocol header encrypt (PHE) and Message stream encryption/Protocol encryption (MSE/PE) are features of some BitTorrent clients that attempt to make BitTorrent hard to detect and throttle. The latest official BitTorrent client (v6) support MSE/PE encryption. In September 2006 it was reported that some software could detect and throttle BitTorrent traffic masquerading as HTTP traffic. In general, although encryption can make it difficult to determine what is being shared, BitTorrent is vulnerable to traffic analysis. Thus even with MSE/PE, it may be possible for an ISP to recognize BitTorrent and also to determine that a system is no longer downloading, only uploading, information and terminate its connection by injecting TCP RST (reset flag) packets. Multitracker BitTorrent metadata format proposed by John Hoffman and implemented by several indexing websites. It allows the use of multiple trackers per file, so if one tracker fails, others can continue supporting file transfer. It is implemented in several clients, such as Vuze, BitComet, BitTornado, KTorrent and µTorrent. Trackers are placed in groups, or tiers, with a tracker randomly chosen from the top tier and tried, moving to the next tier if all the trackers in the top tier fail.Torrents with multiple trackers can decrease the time it takes to download a file, but also has a few consequences: • •
Users have to contact more trackers, leading to more overhead-traffic. Torrents from closed trackers suddenly become downloadable by non-members, as they can connect to a seed via an open tracker.
Decentralized keyword search Even with distributed trackers, a third party is still required to find a specific torrent. This is usually done in the form of a direct hyperlink from the website of the content owner or through indexing websites like The Pirate Bay or Torrentz.In May 2007 Cornell University published a paper proposing a new approach to searching a peer-to-peer network for inexact strings which could replace the functionality of a central indexing site.
Development
8 An as-yet (2 February 2008) unimplemented unofficial feature is Similarity Enhanced Transfer (SET), a technique for improving the speed at which peer-to-peer file sharing and content distribution systems can share data. SET, proposed by researchers Pucha, Andersen, and Kaminsky, works by spotting chunks of identical data in files that are an exact or near match to the one needed and transferring these data to the client if the 'exact' data are not present. Their experiments suggested that SET will help greatly with less popular files, but not as much for popular data, where many peers are already downloading it.
Conclusion In 2008 Canadian Broadcasting Corporation (CBC), became the first public broadcaster in North America to make a full show available for download using BitTorrent. The Norwegian Broadcasting Corporation (NRK) have since March 2008 experimented with bittorrent distribution from this site. Only selected material in which NRK owns all royalties are published. Responses have been very positive, and NRK is planning to offer more content. Usage of the protocol accounts for significant Internet traffic, though the precise amount has proven difficult to measure. There are numerous BitTorrent clients available for a variety of computing platforms. According to isoHunt the size of the torrents is currently more than 1.1 Petabytes.
References http://bitconjurer.org/BitTorrent www.wikipedia.org