World Wide Web World Wide Web The World Wide Web (abbreviated as the Web or WWW) is a system of Internet servers that supports hypertext to access several Internet protocols on a single interface. Almost every protocol type available on the Internet is accessible on the Web. This includes email, FTP, Telnet, and Usenet News. In addition to these, the World Wide Web has its own protocol: HyperText Transfer Protocol, or HTTP. These protocols will be explained below. The World Wide Web provides a single interface for accessing all these protocols. This creates a convenient and user-friendly environment. It is not necessary to be conversant in these protocols within separate, command-level environments, as was typical in the early days of the Internet. The Web gathers together these protocols into a single system. Because of this feature, and because of the Web's ability to work with multimedia and advanced programming languages, the Web is the most popular component of the Internet. The operation of the Web relies primarily on hypertext as its means of information retrieval. HyperText is a document containing words that connect to other documents. These words are called links and are selectable by the user. A single hypertext document can contain links to many documents. In the context of the Web, words or graphics may serve as links to other documents, images, video, and sound. Links may or may not follow a logical path, as each connection is programmed by the creator of the source document. Overall, the Web contains a complex virtual web of connections among a vast number of documents, graphics, videos, and sounds. Producing hypertext for the Web is accomplished by creating documents with a language called HyperText Markup Language, or HTML. With HTML, tags are placed within the text to accomplish document formatting, visual features such as font size, italics and bold, and the creation of hypertext links. Graphics and multimedia may also be incorporated into an HTML document. HTML is an evolving language, with new tags being added as each upgrade of the language is developed and released. For example, visual formatting features are now often separated from the HTML document and placed into Cascading Style Sheets (CSS). This has several advantages, including the fact that an external style sheet can centrally control the formatting of multiple documents. The World Wide Web Consortium (W3C), led by Web founder Tim Berners-Lee, coordinates the efforts of standardizing HTML. The W3C now calls the language XHTML and considers it to be an application of the XML language standard. The World Wide Web consists of files, called pages or home pages, containing links to documents and resources throughout the Internet.
1
The Web provides a vast array of experiences including multimedia presentations, realtime collaboration, interactive pages, radio and television broadcasts, and the automatic "push" of information to a client computer or to an RSS reader. Programming languages such as Java, JavaScript, Visual Basic, Cold Fusion and XML extend the capabilities of the Web. Much information on the Web is served dynamically from content stored in databases. The Web is therefore not a fixed entity, but one that is in a constant state of development and flux.
•
The World Wide Web (or the "Web") is a system of interlinked, hypertext documents accessed via the Internet. With a Web browser, a user views Web pages that may contain text, images, and other multimedia and navigates between them using hyperlinks. The Web was created around 1990 by the Englishman Sir Tim Berners-Lee and the Belgian Robert Cailliau working at CERN in Geneva, Switzerland. Since then, Berners-Lee has played an active role in guiding the development of Web standards (such as the markup languages in which Web pages are composed), and in recent years has advocated his vision of a Semantic Web.
How the Web works Viewing a Web page or other resource on the World Wide Web normally begins either by typing the URL of the page into a Web browser, or by following a hypertext link to that page or resource. The first step, behind the scenes, is for the server-name part of the URL to be resolved into an IP address by the global, distributed Internet database known as the Domain name system or DNS. The browser then establishes a TCP connection with the server at that IP address. The next step is for an HTTP request to be sent to the Web server, requesting the resource. In the case of a typical Web page, the HTML text is first requested and parsed by the browser, which then makes additional requests for graphics and any other files that form a part of the page in quick succession. When considering web site popularity statistics, these additional file requests give rise to the difference between one single 'page view' and an associated number of server 'hits'. The Web browser then renders the page as described by the HTML, CSS and other files received, incorporating the images and other resources as necessary. This produces the on-screen page that the viewer sees. Most Web pages will themselves contain hyperlinks to other related pages and perhaps to downloads, source documents, definitions and other Web resources. Such a collection of useful, related resources, interconnected via hypertext links, is what has been dubbed a 'web' of information. Making it available on the Internet created what
2
Tim Berners-Lee first called the WorldWideWeb (note the name's use of CamelCase, subsequently discarded) in 1990.[1]
Caching If the user returns to a page fairly soon, it is likely that the data will not be retrieved from the source Web server, as above, again. By default, browsers cache all web resources on the local hard drive. An HTTP request will be sent by the browser that asks for the data only if it has been updated since the last download. If it has not, the cached version will be reused in the rendering step. This is particularly valuable in reducing the amount of Web traffic on the Internet. The decision about expiration is made independently for each resource (image, stylesheet, JavaScript file etc., as well as for the HTML itself). Thus even on sites with highly dynamic content, many of the basic resources are only supplied once per session or less. It is worth it for any Web site designer to collect all the CSS and JavaScript into a few site-wide files so that they can be downloaded into users' caches and reduce page download times and demands on the server. There are other components of the Internet that can cache Web content. The most common in practice are often built into corporate and academic firewalls where they cache web resources requested by one user for the benefit of all. Some search engines such as Google or Yahoo! also store cached content from Web sites. Apart from the facilities built into Web servers that can ascertain when physical files have been updated, it is possible for designers of dynamically generated web pages to control the HTTP headers sent back to requesting users, so that pages are not cached when they should not be — for example Internet banking and news pages. This helps with understanding the difference between the HTTP 'GET' and 'POST' verbs data requested with a GET may be cached, if other conditions are met, whereas data obtained after POSTing information to the server usually will not.
Java and JavaScript A significant advance in Web technology was Sun Microsystems' Java platform. It enables Web pages to embed small programs (called applets) directly into the view. These applets run on the end-user's computer, providing a richer user interface than simple web pages. Java client-side applets never gained the popularity that Sun had hoped for, for a variety of reasons including lack of integration with other content (applets were confined to small boxes within the rendered page) and the fact that many computers at the time were supplied to end users without a suitably installed JVM, and so required a download by the user before applets would appear. Adobe Flash now performs many of the functions that were originally envisioned for Java applets including the playing of video
3
content, animation and some rich UI features. Java itself has become more widely used as a platform and language for server-side and other programming. JavaScript, on the other hand, is a scripting language that was initially developed for use within Web pages. The standardized version is ECMAScript. While its name is similar to Java, JavaScript was developed by Netscape and it has almost nothing to do with Java, apart from that, like Java, its syntax is derived from the C programming language. In conjunction with a Web page's Document Object Model, JavaScript has become a much more powerful technology than its creators originally envisioned. The manipulation of a page's Document Object Model after the page is delivered to the client has been called Dynamic HTML (DHTML), to emphasize a shift away from static HTML displays. In its simplest form, all the optional information and actions available on a JavaScripted Web page will have been downloaded when the page was first delivered. Ajax ("Asynchronous JavaScript And XML") is a JavaScript-based technology that may have a significant effect on the development of the World Wide Web. Ajax provides a method whereby large or small parts within a Web page may be updated, using new information obtained over the network in response to user actions. This allows the page to be much more responsive, interactive and interesting, without the user having to wait for wholepage reloads. Ajax is seen as an important aspect of what is being called Web 2.0. Examples of Ajax techniques currently in use can be seen in Gmail, Google Maps etc.
Publishing Web pages The Web is available to individuals outside mass media. In order to "publish" a Web page, one does not have to go through a publisher or other media institution, and potential readers could be found in all corners of the globe. Unlike books and documents, hypertext does not need to have a linear order from beginning to end. It is not necessarily broken down into the hierarchy of chapters, sections, subsections, etc. Many different kinds of information are now available on the Web, and for those who wish to know other societies, their cultures and peoples, it has become easier. When traveling in a foreign country or a remote town, one might be able to find some information about the place on the Web, especially if the place is in one of the developed countries. Local newspapers, government publications, and other materials are easier to access, and therefore the variety of information obtainable with the same effort may be said to have increased, for the users of the Internet. Although some Web sites are available in multiple languages, many are in the local language only. Additionally, not all software supports all special characters, and RTL languages. These factors would challenge the notion that the World Wide Web will bring a unity to the world.
4
The increased opportunity to publish materials is certainly observable in the countless personal pages, as well as pages by families, small shops, etc., facilitated by the emergence of free Web hosting services.
Statistics According to a 2001 study,[11] there were more than 550 million documents on the Web, mostly in the "invisible Web". A 2002 survey of 2,024 million Web pages[12] determined that by far the most Web content was in English: 56.4%; next were pages in German (7.7%), French (5.6%) and Japanese (4.9%). A more recent study which used web searches in 75 different languages to sample the Web determined that there were over 11.5 billion web pages in the publicly indexable Web as of the end of January 2005.[13]
Speed issues Frustration over congestion issues in the Internet infrastructure and the high latency that results in slow browsing has led to an alternative name for the World Wide Web: the World Wide Wait. Speeding up the Internet is an ongoing discussion over the use of peering and QoS technologies. Other solutions to reduce the World Wide Wait can be found on W3C. Standard guidelines for ideal Web response times are (Nielsen 1999, page 42): • • •
0.1 second (one tenth of a second). Ideal response time. The user doesn't sense any interruption. 1 second. Highest acceptable response time. Download times above 1 second interrupt the user experience. 10 seconds. Unacceptable response time. The user experience is interrupted and the user is likely to leave the site or system.
These numbers are useful for planning server capacity.
Internet The Internet is a computer network made up of thousands of networks worldwide. No one knows exactly how many computers are connected to the Internet. It is certain, however, that these number in the millions and are growing. No one is in charge of the Internet. There are organizations which develop technical aspects of this network and set standards for creating applications on it, but no governing body is in control. The Internet backbone, through which Internet traffic flows, is owned by private companies.
5
All computers on the Internet communicate with one another using the Transmission Control Protocol/Internet Protocol suite, abbreviated to TCP/IP. Computers on the Internet use a client/server architecture. This means that the remote server machine provides files and services to the user's local client machine. Software can be installed on a client computer to take advantage of the latest access technology. An Internet user has access to a wide variety of services: electronic mail, file transfer, vast information resources, interest group membership, interactive collaboration, multimedia displays, real-time broadcasting, breaking news, shopping opportunities, and much more. The Internet consists primarily of a variety of access protocols. Many of these protocols feature programs that allow users to search for and retrieve material made available by the protocol. •
The Internet is a worldwide, publicly accessible network of interconnected computer networks that transmit data by packet switching using the standard Internet Protocol (IP). It is a "network of networks" that consists of millions of smaller domestic, academic, business, and government networks, which together carry various information and services, such as electronic mail, online chat, file transfer, and the interlinked Web pages and other documents of the World Wide Web.
Creation The USSR's launch of Sputnik spurred the United States to create the Advanced Research Projects Agency, known as ARPA, in February 1958 to regain a technological lead.[1][2] ARPA created the Information Processing Technology Office (IPTO) to further the research of the Semi Automatic Ground Environment (SAGE) program, which had networked country-wide radar systems together for the first time. J. C. R. Licklider was selected to head the IPTO, and saw universal networking as a potential unifying human revolution. Licklider had moved from the Psycho-Acoustic Laboratory at Harvard University to MIT in 1950, after becoming interested in information technology. At MIT, he served on a committee that established Lincoln Laboratory and worked on the SAGE project. In 1957 he became a Vice President at BBN, where he bought the first production PDP-1 computer and conducted the first public demonstration of time-sharing. At the IPTO, Licklider recruited Lawrence Roberts to head a project to implement a network, and Roberts based the technology on the work of Paul Baran who had written an exhaustive study for the U.S. Air Force that recommended packet switching (as opposed to circuit switching) to make a network highly robust and survivable. After much work, the first node went live at UCLA on October 29, 1969 on what would be called the ARPANET, one of the "eve" networks of today's Internet. Following on from this, the British Post Office, Western Union International and Tymnet collaborated to create the
6
first international packet switched network, referred to as the International Packet Switched Service (IPSS), in 1978. This network grew from Europe and the US to cover Canada, Hong Kong and Australia by 1981. The first TCP/IP-wide area network was operational by January 1, 1983, when the United States' National Science Foundation (NSF) constructed a university network backbone that would later become the NSFNet. It was then followed by the opening of the network to commercial interests in 1985. Important, separate networks that offered gateways into, then later merged with, the NSFNet include Usenet, BITNET and the various commercial and educational networks, such as X.25, Compuserve and JANET. Telenet (later called Sprintnet) was a large privately-funded national computer network with free dial-up access in cities throughout the U.S. that had been in operation since the 1970s. This network eventually merged with the others in the 1990s as the TCP/IP protocol became increasingly popular. The ability of TCP/IP to work over these pre-existing communication networks, especially the international X.25 IPSS network, allowed for a great ease of growth. Use of the term "Internet" to describe a single global TCP/IP network originated around this time.
Internet protocols In this context, there are three layers of protocols: •
At the lower level (OSI layer 3) is IP (Internet Protocol), which defines the datagrams or packets that carry blocks of data from one node to another. The vast majority of today's Internet uses version four of the IP protocol (i.e. IPv4), and although IPv6 is standardized, it exists only as "islands" of connectivity, and there are many ISPs without any IPv6 connectivity. [1]. ICMP (Internet Control Message Protocol) also exists at this level. ICMP is connectionless; it is used for control, signaling, and error reporting purposes.
•
TCP (Transmission Control Protocol) and UDP (User Datagram Protocol) exist at the next layer up (OSI layer 4); these are the protocols by which data is transmitted. TCP makes a virtual 'connection', which gives some level of guarantee of reliability. UDP is a best-effort, connectionless transport, in which data packets that are lost in transit will not be re-sent.
•
The application protocols, sit on top of TCP and UDP and occupy layers 5, 6, and 7 of the OSI model. These defines the specific messages and data formats sent and understood by the applications running at each end of the communication. Examples of these protocols are HTTP, FTP, and SMTP.
•
There is another protocol (in layer 2 of the OSI model) below IP. This is rarely talked about, and is probably PPP. Of course on LANs this protocol is nearly always Ethernet.
7
Internet structure There have been many analyses of the Internet and its structure. For example, it has been determined that the Internet IP routing structure and hypertext links of the World Wide Web are examples of scale-free networks. Similar to the way the commercial Internet providers connect via Internet exchange points, research networks tend to interconnect into large subnetworks such as: • • • •
GEANT GLORIAD Abilene Network JANET (the UK's Joint Academic Network aka UKERNA)
These in turn are built around relatively smaller networks. See also the list of academic computer network organizations In network diagrams, the Internet is often represented by a cloud symbol, into and out of which network communications can pass.
ICANN The Internet Corporation for Assigned Names and Numbers (ICANN) is the authority that coordinates the assignment of unique identifiers on the Internet, including domain names, Internet Protocol (IP) addresses, and protocol port and parameter numbers. A globally unified namespace (i.e., a system of names in which there is one and only one holder of each name) is essential for the Internet to function. ICANN is headquartered in Marina del Rey, California, but is overseen by an international board of directors drawn from across the Internet technical, business, academic, and noncommercial communities. The US government continues to have the primary role in approving changes to the root zone file that lies at the heart of the domain name system. Because the Internet is a distributed network comprising many voluntarily interconnected networks, the Internet, as such, has no governing body. ICANN's role in coordinating the assignment of unique identifiers distinguishes it as perhaps the only central coordinating body on the global Internet, but the scope of its authority extends only to the Internet's systems of domain names, IP addresses, and protocol port and parameter numbers. On November 16, 2005, the World Summit on the Information Society, held in Tunis, established the Internet Governance Forum (IGF) to discuss Internet-related issues.
Language The prevalent language for communication on the Internet is English. This may be a result of the Internet's origins, as well as English's role as the lingua franca. It may also be related to the poor capability of early computers to handle characters other than those in the basic Latin alphabet. 8
After English (30% of Web visitors) the most-requested languages on the World Wide Web are Chinese 14%, Japanese 8%, Spanish 8%, German 5%, French 5%, Portuguese 3.5%, Korean 3%, Italian 3% and Arabic 2.5% (from Internet World Stats, updated January 11, 2007). By continent, 36% of the world's Internet users are based in Asia, 29% in Europe, and 21% in North America ([2] updated January 11, 2007). The Internet's technologies have developed enough in recent years that good facilities are available for development and communication in most widely used languages. However, some glitches such as mojibake (incorrect display of foreign language characters, also known as kryakozyabry) still remain. and Web applications.
The mobile Internet The Internet can now be accessed virtually anywhere by numerous means. Mobile phones, datacards, handheld game consoles and cellular routers allow users to connect to the Internet from anywhere there is a cellular network supporting that device's technology.
Common uses of the Internet E-mail The concept of sending electronic text messages between parties in a way analogous to mailing letters or memos predates the creation of the Internet. Even today it can be important to distinguish between Internet and internal e-mail systems. Internet e-mail may travel and be stored unencrypted on many other networks and machines out of both the sender's and the recipient's control. During this time it is quite possible for the content to be read and even tampered with by third parties, if anyone considers it important enough. Purely internal or intranet mail systems, where the information never leaves the corporate or organization's network, are much more secure, although in any organization there will be IT and other personnel whose job may involve monitoring, and occasionally accessing, the email of other employees not addressed to them. Web-based email (webmail) between parties on the same webmail system may not actually 'go' anywhere —it merely sits on the one server and is tagged in various ways so as to appear in one person's 'sent items' list and in others' 'in boxes' or other 'folders' when viewed.
The World Wide Web
Many people use the terms Internet and World Wide Web (a.k.a. the Web) interchangeably, but in fact the two terms are not synonymous. The Internet and the Web 9
are two separate but related things. The Internet is a massive network of networks, a networking infrastructure. It connects millions of computers together globally, forming a network in which any computer can communicate with any other computer as long as they are both connected to the Internet. Information that travels over the Internet does so via a variety of languages known as protocols. The World Wide Web, or simply Web, is a way of accessing information over the medium of the Internet. It is an information-sharing model that is built on top of the Internet. The Web uses the HTTP protocol, only one of the languages spoken over the Internet, to transmit data. Web services, which use HTTP to allow applications to communicate in order to exchange business logic, use the the Web to share information. The Web also utilizes browsers, such as Internet Explorer or Netscape, to access Web documents called Web pages that are linked to each other via hyperlinks. Web documents also contain graphics, sounds, text and video. The Web is just one of the ways that information can be disseminated over the Internet. The Internet, not the Web, is also used for e-mail, which relies on SMTP, Usenet news groups, instant messaging, file sharing (text, image, video, mp3 etc.) and FTP. So the Web is just a portion of the Internet, albeit a large portion, but the two terms are not synonymous and should not be confused. Through keyword-driven Internet research using search engines, like Google, millions worldwide have easy, instant access to a vast and diverse amount of online information. Compared to encyclopedias and traditional libraries, the World Wide Web has enabled a sudden and extreme decentralization of information and data. Many individuals and some companies and groups have adopted the use of "Web logs" or blogs, which are largely used as easily-updatable online diaries. Some commercial organizations encourage staff to fill them with advice on their areas of specialization in the hope that visitors will be impressed by the expert knowledge and free information, and be attracted to the corporation as a result. One example of this practice is Microsoft, whose product developers publish their personal blogs in order to pique the public's interest in their work. For more information on the distinction between the World Wide Web and the Internet itself—as in everyday use the two are sometimes confused—see Dark internet where this is discussed in more detail.
Remote access The Internet allows computer users to connect to other computers and information stores easily, wherever they may be across the world. They may do this with or without the use of security, authentication and encryption technologies, depending on the requirements. This is encouraging new ways of working from home, collaboration and information sharing in many industries. An accountant sitting at home can audit the books of a
10
company based in another country, on a server situated in a third country that is remotely maintained by IT specialists in a fourth. These accounts could have been created by home-working book-keepers, in other remote locations, based on information e-mailed to them from offices all over the world. Some of these things were possible before the widespread use of the Internet, but the cost of private, leased lines would have made many of them infeasible in practice. An office worker away from his desk, perhaps the other side of the world on a business trip or a holiday, can open a remote desktop session into their normal office PC using a secure Virtual Private Network (VPN) connection via the Internet. This gives the worker complete access to all of their normal files and data, including e-mail and other applications, while away from the office. This concept is also referred to by some network security people as the Virtual Private Nightmare, because it extends the secure perimeter of a corporate network into its employees' homes; this has been the source of some notable security breaches, but also provides security for the workers.
Collaboration
The low-cost and nearly instantaneous sharing of ideas, knowledge, and skills has made collaborative work dramatically easier. Not only can a group cheaply communicate and test, but the wide reach of the Internet allows such groups to easily form in the first place, even among niche interests. An example of this is the free software movement in software development which produced GNU and Linux from scratch and has taken over development of Mozilla and OpenOffice.org (formerly known as Netscape Communicator and StarOffice). Internet 'chat', whether in the form of IRC 'chat rooms' or channels, or via instant messaging systems allow colleagues to stay in touch in a very convenient way when working at their computers during the day. Messages can be sent and viewed even more quickly and conveniently than via e-mail. Extension to these systems may allow files to be exchanged, 'whiteboard' drawings to be shared as well as voice and video contact between team members. Version control systems allow collaborating teams to work on shared sets of documents without either accidentally overwriting each other's work or having members wait until they get 'sent' documents to be able to add their thoughts and changes.
File sharing A computer file can be e-mailed to customers, colleagues and friends as an attachment. It can be uploaded to a Web site or FTP server for easy download by others. It can be put into a "shared location" or onto a file server for instant use by colleagues. The load of
11
bulk downloads to many users can be eased by the use of "mirror" servers or peer-to-peer networks. In any of these cases, access to the file may be controlled by user authentication; the transit of the file over the Internet may be obscured by encryption and money may change hands before or after access to the file is given. The price can be paid by the remote charging of funds from, for example a credit card whose details are also passed—hopefully fully encrypted—across the Internet. The origin and authenticity of the file received may be checked by digital signatures or by MD5 or other message digests. These simple features of the Internet, over a world-wide basis, are changing the basis for the production, sale, and distribution of anything that can be reduced to a computer file for transmission. This includes all manner of office documents, publications, software products, music, photography, video, animations, graphics and the other arts. This in turn is causing seismic shifts in each of the existing industry associations, such as the RIAA and MPAA in the United States, that previously controlled the production and distribution of these products in that country.
Streaming media Many existing radio and television broadcasters provide Internet 'feeds' of their live audio and video streams (for example, the BBC). They may also allow time-shift viewing or listening such as Preview, Classic Clips and Listen Again features. These providers have been joined by a range of pure Internet 'broadcasters' who never had on-air licenses. This means that an Internet-connected device, such as a computer or something more specific, can be used to access on-line media in much the same way as was previously possible only with a television or radio receiver. The range of material is much wider, from pornography to highly specialized technical Web-casts. Podcasting is a variation on this theme, where—usually audio—material is first downloaded in full and then may be played back on a computer or shifted to a digital audio player to be listened to on the move. These techniques using simple equipment allow anybody, with little censorship or licensing control, to broadcast audio-visual material on a worldwide basis. Webcams can be seen as an even lower-budget extension of this phenomenon. While some webcams can give full frame rate video, the picture is usually either small or updates slowly. Internet users can watch animals around an African waterhole, ships in the Panama Canal, the traffic at a local roundabout or their own premises, live and in real time. Video chat rooms, video conferencing, and remote controllable webcams are also popular. Many uses can be found for personal webcams in and around the home, with and without two-way sound.
Voice telephony (VoIP) VoIP stands for Voice over IP, where IP refers to the Internet Protocol that underlies all Internet communication. This phenomenon began as an optional two-way voice extension to some of the Instant Messaging systems that took off around the year 2000. In recent years many VoIP systems have become as easy to use and as convenient as a normal
12
telephone. The benefit is that, as the Internet carries the actual voice traffic, VoIP can be free or cost much less than a normal telephone call, especially over long distances and especially for those with always-on ADSL or DSL Internet connections. Thus VoIP is maturing into a viable alternative to traditional telephones. Interoperability between different providers has improved and the ability to call or receive a call from a traditional telephone is available. Simple inexpensive VoIP modems are now available that eliminate the need for a PC. Voice quality can still vary from call to call but is often equal to and can even exceed that of traditional calls. Remaining problems for VoIP include emergency telephone number dialing and reliability. Currently a few VoIP providers provide some 911 dialing but it is not universally available. Traditional phones are line powered and operate during a power failure, VoIP does not do so without a backup power source for the electronics. Most VoIP providers offer unlimited national calling but the direction in VoIP is clearly toward global coverage with unlimited minutes for a low monthly fee. VoIP has also become increasingly popular within the gaming world, as a form of communication between players. Popular gaming VoIP clients include Ventrilo and Teamspeak, and there are others available also.
Censorship Some governments, such as those of Saudi Arabia, Iran, North Korea, the People's Republic of China and Cuba, restrict what people in their countries can access on the Internet, especially political and religious content. This is accomplished through software that filters domains and content so that they may not be easily accessed or obtained without elaborate circumvention. In Norway, Finland and Sweden, major Internet service providers have voluntarily (possibly to avoid such an arrangement being turned into law) agreed to restrict access to sites listed by police. While this list of forbidden URLs is only supposed to contain addresses of known child pornography sites, the content of the list is secret.[citation needed] Many countries have enacted laws making the possession or distribution of certain material, such as child pornography, illegal, but do not use filtering software. There are many free and commercially available software programs with which a user can choose to block offensive Web sites on individual computers or networks, such as to limit a child's access to pornography or violence. See Content-control software.
Internet access 13
Common methods of home access include dial-up, landline broadband (over coaxial cable, fibre optic or copper wires), Wi-Fi, satellite and technology 3G (EVDO) cell phones. Public places to use the Internet include libraries and Internet cafes, where computers with Internet connections are available. There are also Internet access points in many public places such as airport halls and coffee shops, in some cases just for brief use while standing. Various terms are used, such as "public Internet kiosk", "public access terminal", and "Web payphone". Many hotels now also have public terminals, though these are usually fee-based. Wi-Fi provides wireless access to computer networks, and therefore can do so to the Internet itself. Hotspots providing such access include Wi-Fi-cafes, where a would-be user needs to bring their own wireless-enabled devices such as a laptop or PDA. These services may be free to all, free to customers only, or fee-based. A hotspot need not be limited to a confined location. The whole campus or park, or even the entire city can be enabled. Grassroots efforts have led to wireless community networks. Commercial WiFi services covering large city areas are in place in London, Vienna, Toronto, San Francisco, Philadelphia, Chicago and Pittsburgh. The Internet can then be accessed from such places as a park bench.[5] Apart from Wi-Fi, there have been experiments with proprietary mobile wireless networks like Ricochet, various high-speed data services over cellular phone networks, and fixed wireless services. High-end mobile phones such as smartphones generally come with Internet access through the phone network. Web browsers such as Opera are available on these advanced handsets, which can also run a wide variety of other Internet software. More mobile phones have Internet access than PCs, though this is not as widely used. An Internet access provider and protocol matrix differentiates the methods used to get online.
Leisure The Internet has been a major source of leisure since before the World Wide Web, with entertaining social experiments such as MUDs and MOOs being conducted on university servers, and humor-related Usenet groups receiving much of the main traffic. Today, many Internet forums have sections devoted to games and funny videos; short cartoons in the form of Flash movies are also popular. Over 6 million people use blogs or message boards as a means of communication and for the sharing of ideas. The pornography and gambling industries have both taken full advantage of the World Wide Web, and often provide a significant source of advertising revenue for other Web sites. Although many governments have attempted to put restrictions on both industries' use of the Internet, this has generally failed to stop their widespread popularity. A song in the Broadway musical show Avenue Q is titled "The Internet is for Porn" and refers to the popularity of this aspect of the Internet. 14
One main area of leisure on the Internet is multiplayer gaming. This form of leisure creates communities, bringing people of all ages and origins to enjoy the fast-paced world of multiplayer games. These range from MMORPG to first-person shooters, from roleplaying games to online gambling. This has revolutionized the way many people interact and spend their free time on the Internet. While online gaming has been around since the 1970s, modern modes of online gaming began with services such as GameSpy and MPlayer, which players of games would typically subscribe to. Non-subscribers were limited to certain types of gameplay or certain games. Many use the Internet to access and download music, movies and other works for their enjoyment and relaxation. As discussed above, there are paid and unpaid sources for all of these, using centralized servers and distributed peer-to-peer technologies. Discretion is needed as some of these sources take more care over the original artists' rights and over copyright laws than others. Many use the World Wide Web to access news, weather and sports reports, to plan and book holidays and to find out more about their random ideas and casual interests. People use chat, messaging and email to make and stay in touch with friends worldwide, sometimes in the same way as some previously had pen pals. Social networking Web sites like Friends Reunited and many others like them also put and keep people in contact for their enjoyment. The Internet has seen a growing amount of Internet operating systems, where users can access their files, folders, and settings via the Internet. An example of an opensource webOS is Eyeos.
Complex architecture Many computer scientists see the Internet as a "prime example of a large-scale, highly engineered, yet highly complex system".[6] The Internet is extremely heterogeneous. (For instance, data transfer rates and physical characteristics of connections vary widely.) The Internet exhibits "emergent phenomena" that depend on its large-scale organization. For example, data transfer rates exhibit temporal self-similarity. Further adding to the complexity of the Internet is the ability of more than one computer to use the Internet through only one node, thus creating the possibility for a very deep and hierarchal based sub-network that can theoretically be extended infinitely (disregarding the programmatic limitations of the IPv4 protocol). However, since principles of this architecture date back to the 1960s, it might not be a solution best suited to modern needs, and thus the possibility of developing alternative structures is currently being looked into.[7]
Marketing
15
The Internet has also become a large market for companies; some of the biggest companies today have grown by taking advantage of the efficient nature of low-cost advertising and commerce through the Internet; also known as e-commerce. It is the fastest way to spread information to a vast amount of people simultaneously. The Internet has also subsequently revolutionized shopping—for example; a person can order a CD online and receive it in the mail within a couple of days, or download it directly in some cases. The Internet has also greatly facilitated personalized marketing which allows a company to market a product to a specific person or a specific group of people more so than any other advertising medium. Examples of personalized marketing include online communities such as MySpace, Friendster, Orkut, and others which thousands of Internet users join to advertise themselves and make friends online. Many of these users are young teens and adolescents ranging from 13 to 25 years old. In turn, when they advertise themselves they advertise interests and hobbies, which online marketing companies can use as information as to what those users will purchase online, and advertise their own companies' products to those users.
The name Internet Internet is traditionally written with a capital first letter, as it is a proper noun. The Internet Society, the Internet Engineering Task Force, the Internet Corporation for Assigned Names and Numbers, the World Wide Web Consortium, and several other Internet-related organizations use this convention in their publications. Historically, Internet and internet have had different meanings, with internet meaning “an interconnected set of distinct networks,” and Internet referring to the world-wide, publicly-available IP internet. Under this distinction, "the Internet" is the familiar network via which websites such as Wikipedia are accessed, however "an internet" can exist between any two remote locations.[8] Any group of distinct networks connected together is an internet; each of these networks may or may not be part of the Internet. The distinction was evident in many RFCs, books, and articles from the 1980s and early 1990s (some of which, such as RFC 1918, refer to "internets" in the plural), but has recently fallen into disuse.[citation needed] Instead, the term intranet is generally used for private networks. See also: extranet. Some people use the lower-case term as a medium (like radio or newspaper, e.g. I've found it on the internet), and capitalized (or first letter capitalized) as the global network.
Web server The term Web server can mean one of two things: 1. A computer program that is responsible for accepting HTTP requests from clients, which are known as Web browsers, and serving them HTTP responses along with 16
optional data contents, which usually are Web pages such as HTML documents and linked objects (images, etc.). 2. A computer that runs a computer program which provides the functionality described in the first sense of the term.
Common features Although Web server programs differ in detail, they all share some basic common features. 1. HTTP: every Web server program operates by accepting HTTP requests from the network, and providing an HTTP response to the requester. The HTTP response typically consists of an HTML document, but can also be a raw text file, an image, or some other type of document (defined by MIME-types); if something bad is found in client request or while trying to serve the request, a Web server has to send an error response which may include some custom HTML or text messages to better explain the problem to end users. 2. Logging: usually Web servers have also the capability of logging some detailed information, about client requests and server responses, to log files; this allows the Webmaster to collect statistics by running log analyzers on log files. In practice many Web servers implement the following features too. 1. Authentication, optional authorization request (request of user name and password) before allowing access to some or all kind of resources. 2. Handling of not only static content (file content recorded in server's filesystem(s)) but of dynamic content too by supporting one or more related interfaces (SSI, CGI, SCGI, FastCGI, JSP, PHP, ASP, ASP .NET, Server API such as NSAPI, ISAPI, etc.). 3. HTTPS support (by SSL or TLS) to allow secure (encrypted) connections to the server on the standard port 443 instead of usual port 80. 4. Content compression (i.e. by gzip encoding) to reduce the size of the responses (to lower bandwidth usage, etc.). 5. Virtual Hosting to serve many web sites using one IP address. 6. Large file support to be able to serve files whose size is greater than 2 GB on 32 bit OS. 7. Bandwidth throttling to limit the speed of responses in order to not saturate the network and to be able to serve more clients.
Origin of returned content The origin of the content sent by server is called: •
static if it comes from an existing file lying on a filesystem;
17
•
dynamic if it is dynamically generated by some other program or script or API called by the Web server.
Serving static content is usually much faster (from 2 to 100 times) than serving dynamic content, especially if the latter involves data pulled from a database.
Performances Web servers (programs) are supposed to serve requests quickly from more than one TCP/IP connection at a time. Main key performance parameters (measured under a varying load of clients and requests per client), are: • • •
number of requests per second (depending on the type of request, etc.); latency time in milliseconds for each new connection or request; throughput in bytes per second (depending on file size, cached or not cached content, available network bandwidth, etc.).
Above three parameters vary noticeably depending on the number of active connections, so a fourth parameter is the concurrency level supported by a Web server under a specific configuration. Last but not least, the specific server model used to implement a Web server program can bias the performance and scalability level that can be reached.
Load limits A web server (program) has defined load limits, because it can handle only a limited number of concurrent client connections (usually between 2 and 60,000, by default between 500 and 1,000) per IP address (and IP port) and it can serve only a certain maximum number of requests per second depending on: • • • • •
its own settings; the HTTP request type; content origin (static or dynamic); the fact that the served content is or is not cached; the hardware and software limits of the OS where it is working.
When a web server is near to or over its limits, it becomes overloaded and thus unresponsive.
Software The most common HTTP serving programs are:[2] 18
• • •
Apache HTTP Server from the Apache Software Foundation. Internet Information Services (IIS) from Microsoft. Sun Java System Web Server from Sun Microsystems.
There are thousands of different Web server programs available, many of which are specialized for very specific purposes, so the fact that a web server is not very popular does not necessarily mean that it has a lot of bugs or poor performance.
Apache HTTP Server Apache HTTP Server Developer: Latest release: OS:
Apache Software Foundation 2.2.4 / January 10, 2007 Cross-platform
Genre:
Web server
License:
Apache License
Website:
http://httpd.apache.org/
The Apache HTTP Server, commonly referred to simply as Apache, is a web server notable for playing a key role in the initial growth of the World Wide Web. Apache was the first viable alternative to the Netscape Communications Corporation web server (currently known as Sun Java System Web Server), and has since evolved to rival other Unix-based web servers in terms of functionality and performance. Since April 1996 Apache has been the most popular HTTP server on the World Wide Web; as of March 2007 Apache served 58% of all websites.[1] The project's name was chosen for two reasons:[2] out of respect for the Native American Indian tribe of Apache (Indé), well-known for their endurance and their skills in warfare,[3] and due to the project's roots as a set of patches to the codebase of NCSA HTTPd 1.3 - making it "a patchy" server.[4] Apache is developed and maintained by an open community of developers under the auspices of the Apache Software Foundation. The application is available for a wide variety of operating systems including Microsoft Windows, Novell NetWare and Unixlike operating systems such as Linux and Mac OS X. Released under the Apache License, Apache is free and open source software.
History
19
The first version of the Apache web server was created by Robert McCool, who was heavily involved with the National Center for Supercomputing Applications web server, known simply as NCSA HTTPd. When Rob left NCSA in mid-1994, the development of httpd stalled, leaving a variety of patches for improvements circulating through e-mails. Rob McCool was not alone in his efforts. Several other developers helped form the original "Apache Group": Brian Behlendorf, Roy T. Fielding, Rob Hartill, David Robinson, Cliff Skolnick, Randy Terbush, Robert S. Thau, Andrew Wilson, Eric Hagberg, Frank Peters, and Nicolas Pioch. Version 2 of the Apache server was a substantial re-write of much of the Apache 1.x code, with a strong focus on further modularization and the development of a portability layer, the Apache Portable Runtime. The Apache 2.x core has several major enhancements over Apache 1.x. These include UNIX threading, better support for non-Unix platforms (such as Microsoft Windows), a new Apache API, and IPv6 support.[5] The first alpha release of Apache 2 was in March 2000, with the first general availability release on 6 April 2002.[6] Version 2.2 introduced a new authorization API that allows for more flexibility. It also features improved cache modules and proxy modules.[7]
Features Apache supports a variety of features, many implemented as compiled modules which extend the core functionality. These can range from server-side programming language support to authentication schemes. Some common language interfaces support mod_perl, mod_python, Tcl, and PHP. Popular authentication modules include mod_access, mod_auth, and mod_digest. A sample of other features include SSL and TLS support (mod_ssl), a proxy module, a useful URL rewriter (also known as a rewrite engine, implemented under mod_rewrite), custom log files (mod_log_config), and filtering support (mod_include and mod_ext_filter). Apache logs can be analyzed through a web browser using free scripts such as AWStats/W3Perl or Visitors. Virtual hosting allows one Apache installation to serve many different actual websites. For example, one machine, with one Apache installation could simultaneously serve www.example.com, www.test.com, test47.test-server.test.com, etc. Apache features configurable error messages, DBMS-based authentication databases, and content negotiation. It is also supported by several graphical user interfaces (GUIs) which permit easier, more intuitive configuration of the server.
Usage Apache is primarily used to serve both static content and dynamic Web pages on the World Wide Web. Many web applications are designed expecting the environment and features that Apache provides.
20
Apache is the web server component of the popular LAMP web server application stack, alongside Linux, MySQL, and the PHP/Perl/Python programming languages. Apache is redistributed as part of various proprietary software packages including the Oracle RDBMS or the IBM WebSphere application server. Mac OS X integrates Apache as its built-in web server and as support for its WebObjects application server. It is also supported in some way by Borland in the Kylix and Delphi development tools. Apache is included with Novell NetWare 6.5, where it is the default web server. Apache is used for many other tasks where content needs to be made available in a secure and reliable way. One example is sharing files from a personal computer over the Internet. A user who has Apache installed on their desktop can put arbitrary files in the Apache's document root which can then be shared. Programmers developing web applications often use a locally installed version of Apache in order to preview and test code as it is being developed. Microsoft Internet Information Services (IIS) is the main competitor to Apache, trailed by Sun Microsystems' Sun Java System Web Server and a host of other applications such as Zeus Web Server.
License The software license under which software from the Apache Foundation is distributed is a distinctive part of the Apache HTTP Server's history and presence in the open source software community. The Apache License allows for the distribution of both open and closed source derivations of the source code. The Free Software Foundation does not consider the Apache License to be compatible with version 2.0 of the GNU General Public License (GPL) in that software licensed under the Apache License cannot be integrated with software that is distributed under the GPL: This is a free software license but it is incompatible with the GPL. The Apache Software License is incompatible with the GPL because it has a specific requirement that is not in the GPL: it has certain patent termination cases that the GPL does not require. We don't think those patent termination cases are inherently a bad idea, but nonetheless they are incompatible with the GNU GPL. —http://www.gnu.org/philosophy/license-list.html
The current draft of Version 3 of the GPL includes a provision (Section 7e) which allows it to be compatible with licenses that have patent retaliation clauses, including the Apache License.
21
The name Apache is a registered trademark and may only be used with the trademark holder's express permission.[8]
Web browser From Wikipedia, the free encyclopedia
A Web browser is a software application that enables a user to display and interact with text, images, and other information typically located on a Web page at a website on the World Wide Web or a local area network. Text and images on a Web page can contain hyperlinks to other Web pages at the same or different website. Web browsers allow a user to quickly and easily access information provided on many Web pages at many websites by traversing these links. Web browsers format HTML information for display, so the appearance of a Web page may differ between browsers. Some of the Web browsers available for personal computers include Internet Explorer, Mozilla Firefox, Safari, Opera, and Netscape in order of descending popularity (as of August 2006).[1] Web browsers are the most commonly used type of HTTP user agent. Although browsers are typically used to access the World Wide Web, they can also be used to access information provided by Web servers in private networks or content in file systems.
Protocols and standards Web browsers communicate with Web servers primarily using HTTP (hypertext transfer protocol) to fetch webpages. HTTP allows Web browsers to submit information to Web servers as well as fetch Web pages from them. The most commonly used HTTP is HTTP/1.1, which is fully defined in RFC 2616. HTTP/1.1 has its own required standards that Internet Explorer does not fully support, but most other current-generation Web browsers do.
Market Share for May, 2007 [2] Internet Explorer - 78.67% Firefox - 14.54% Safari - 4.82% Netscape - 0.83% Opera - 0.74% Opera Mini - 0.16% Mozilla - 0.15%
Pages are located by means of a URL (uniform resource locator, RFC 1738 ), which is treated as an address, beginning with http: for HTTP access. Many browsers also support a variety of other URL types and their corresponding protocols, such as gopher: for Gopher (a hierarchical hyperlinking protocol), ftp: for FTP (file transfer protocol), rtsp: for RTSP (real-time streaming protocol), and https: for HTTPS (an SSL encrypted version of HTTP). The file format for a Web page is usually HTML (hyper-text markup language) and is identified in the HTTP protocol using a MIME content type. Most browsers natively support a variety of formats in addition to HTML, such as the JPEG, PNG and GIF image formats, and can be extended to support more through the use of plugins. The 22
combination of HTTP content type and URL protocol specification allows Web page designers to embed images, animations, video, sound, and streaming media into a Web page, or to make them accessible through the Web page. Early Web browsers supported only a very simple version of HTML. The rapid development of proprietary Web browsers led to the development of non-standard dialects of HTML, leading to problems with Web interoperability. Modern Web browsers support a combination of standards- and defacto-based HTML and XHTML, which should display in the same way across all browsers. No browser fully supports HTML 4.01, XHTML 1.x or CSS 2.1 yet. Currently many sites are designed using WYSIWYG HTML generation programs such as Macromedia Dreamweaver or Microsoft FrontPage. These often generate non-standard HTML by default, hindering the work of the W3C in developing standards, specifically with XHTML and CSS (cascading style sheets, used for page layout). Some of the more popular browsers include additional components to support Usenet news, IRC (Internet relay chat), and e-mail. Protocols supported may include NNTP (network news transfer protocol), SMTP (simple mail transfer protocol), IMAP (Internet message access protocol), and POP (post office protocol). These browsers are often referred to as Internet suites or application suites rather than merely Web browsers.
Internet Explorer Windows Internet Explorer (formerly Microsoft Internet Explorer, abbreviated MSIE), and commonly abbreviated to IE, is a series of proprietary graphical web browsers developed by Microsoft and included as part of the Microsoft Windows line of operating systems starting in 1995. After the first release for Windows 95, additional versions of Internet Explorer were developed for other operating systems: Internet Explorer for Mac and Internet Explorer for UNIX (the latter for use through the X Window System on Solaris and HP-UX). Only the Windows version remains in active development; the Mac OS X version is no longer supported. It has been the most widely used web browser since 1999, peaking at nearly 90% market share with IE6 in the early 2000s—corresponding to over 900 million users worldwide by 2006.[1][2]
Netscape (web browser) Jump to: navigation, search Netscape is the general name for a series of web browsers originally produced by Netscape Communications Corporation, now a subsidiary of AOL. The original browser
23
was once the dominant browser in terms of usage share, but as a result of the first browser war it lost virtually all of its share to Internet Explorer. Netscape Navigator was the name of Netscape's web browser from versions 1.0–4.8. The first beta release versions of the browser were released in 1994 and known as Mosaic and then Mosaic Netscape until a legal challenge from the National Center for Supercomputing Applications (makers of NCSA Mosaic), which many of Netscape's founders used to develop, led to the name change to Netscape Navigator. The company's name also changed from Mosaic Communications Corporation to Netscape Communications Corporation. The browser was easily the most advanced available and was therefore an instant success, becoming market leader while still in beta. Netscape's feature-count and market share continued to grow rapidly after version 1.0 was released. Version 2.0 added a full mail reader called Netscape Mail, thus transforming Netscape from a mere web browser to an Internet suite. During this period, both the browser and the suite were known as Netscape Navigator. Around the same time, AOL started bundling their software with Microsoft's Internet Explorer.
24