Birth of the World Wide Web Historical Perspective S.No. Year 1. 1989
2.
3.
1990
1991
4.
1993
5.
1993 onwards
Activity Tim Berners Lee at CERN proposed hypertext based information management system Robert Calliau, alongwith Tim, reformatted the proposal as World Wide Web (WWW) Berners Lee implemented a server and a command line browser using initial version of HTTP CERN made these software available for anonymous FTP download 50 different sites running HTTP servers; grew to 200 within 6 months HTTP being an open specification, people started writing their own browser & server software including GUI based browsers that supported typographic controls & display of images
HyperText Markup Language (HTML) allows crossreferencing of documents via hyperlinks 2. A Uniform Notation scheme for addressing web accessible resources over the network Such scheme is called Uniform Resource Identifier (URI) or Uniform Resource Locator (URL) 3. A protocol for transporting messages over the network e.g. HyperText Transport Protocol (HTTP)
The Uniform Resource Locator A flexible and extensible scheme to support other protocols besides HTTP Difference between URL, URN and URI
•
URL utilize ‘locator’ information that embeds both a server address and a file location
•
URN utilize a simpler human readable name that does not change even when the resource is moved to another location. URN failed to materialize as a globally supported web notation. So, practically, URL is used.
•
URI W3C defined it as union of URL & URN. URI is formally more correct.
Building Blocks of the Web Tim devised following as essential components of web technology. 1. A Markup Language for formatting hypertext documents. Prepared By: Syed Feroz Zainvi E-mail:
[email protected]
Available At: http://www.zainvi.tophonors.com
Birth of the World Wide Web
Generalized Notation for URL scheme://host[:port#]/path/…/[;urlparams][?query_string]{#anchor] Æ scheme – underlying protocol to be used e.g. HTTP or FTP Æ host - name or IP address for the web server being accessed. Æ port# - Port number that the target web server listens to. Default port for HTTP server is 80 Æ path -File System path from ‘root’ directory of the server to the desired document. In practice, web server may make use of aliasing to point to documents, gateways & services that ate not explicitly accessible from the server’s root directory Æ url params -Used for session identifiers in web servers supporting the Java Servlet API Æ query-string - Produced as result of user-entered variables in HTML forms. ‘=’ is used parameter-value pair and ‘&’ mark boundaries in between parametervalue pairs. Æ anchor - Reference to a positional marker within the requested document like a bookmark. If present, it follows a hash mark or pound sign #. []-optional parameters Pay attention to positions of / ; ? =
This notation applies to most protocols like http, https & ftp Some protocols use their own notation e.g mailto:
[email protected] Fundamentals of HTTP HTTP is an application level protocol in TCP/IP protocol suite, using TCP as the underlying transport protocol for transmitting messages. HTTP is a basic protocol that enables communication between web programs. Advantage: Simple and widely used Disadvantage: Stateless and limited functionality a) HTTP protocol uses request/response paradigm b) The structure of request/response consists of group of lines containing message headers, followed by a blank line and then message body c) It is a stateless protocol. HTTP Transaction is a single request and single response HTTP Servers, Browsers and Proxies
Æ Web servers are essentially HTTP servers. Æ Web Browsers, however, are much more than HTTP client. Usually, web browsers also have FTP, local file access, e-mail client, netnews, Gopher etc. functionality as well.
e.g. http://www.mywebsite.com/sj/test;id=80 97?name=sviergn&x=true#stuff Prepared By: Syed Feroz Zainvi E-mail:
[email protected]
Available At: http://www.zainvi.tophonors.com
Birth of the World Wide Web Æ Proxies: Æ May act as server or as a client making requests to web server on behalf of other clients. Æ Enable HTTP transfers across firewalls Æ Support for caching of HTTP messages Æ filtering of HTTP requests Æ Many other roles also Æ Generally, there are one or more proxies between servers and browsers. A connection is defined as a virtual circuit that is composed of HTTP agents, including browsers, servers and intermediate proxies participating in the exchange.
Stateless Protocol When a protocol supports ‘state’ this means that it provides for the interaction between client and server to contain a sequence of commands. The server is required to maintain the state of connection throughout the transmission of successive commands, until the connection is terminated. The sequence of transmitted & executed commands is often called a session. HTTP is stateless for simplicity but impose limitations on the capabilities of the web application.
No way to batch requests together – to ask a web server for an HTML page & all the images in references during course of the connection Cookies can be used for maintaining state in web applications HTTP/1.1 support connections that outlive a single request/response exchange. It assumes that connection remains in place until it is broken, or until an HTTP client requests that it be broken. But it does so for the sake of efficiency and not state support Structure of HTTP Messages METHOD /path-to-resource HTTP/version-number Å--- REQUEST LINE Header-Name1:value Header-Name2:value
[optional request body] METHOD – GET, POST,… path-to-resource – path portion of requested URL version-number->HTTP version used by the client e.g. http://www.mywebsite.com/sj/index.html
HTTP request message will be Lifetime of a connection is single request/response exchange No way to maintain persistent information about a ‘session’ of successive interactions between a client and server
GET /sj/index.html HTTP/1.1 Host: www.mywebsite.com In case of GET request, there is no body. HTTP/version-number message (human-readble)
Prepared By: Syed Feroz Zainvi E-mail: [email protected]
status-code
Available At: http://www.zainvi.tophonors.com
Birth of the World Wide Web Header-Name1:value Header-Name2:value [response body] e.g. HTTP/1.1 200 OK Content-Type:text/html Content-Length:9934 ………. ………. ….. ….. Request/Response transmission is not that simple. Complex negotiations occur between browsers and servers to determine what information should be sent. For instance, HTML pages may contain references to other accessible resources, such as images and applets. Clients that support rendering of images and applets must parse the retrieved HTML page to determine what additional resources are needed and then send HTTP request to retrieve those additional resources. Server browser interactions are much more complex for advanced applications.
To be continued… References Web Application Architecture – Principles, Protocols, Practices by Leon Shklar and Richard Rosen, John Wiley & Sons Ltd.
Prepared By: Syed Feroz Zainvi E-mail: [email protected]
Available At: http://www.zainvi.tophonors.com