The Web and HTTP
Last updated
Last updated
The World Wide Web is an Internet application.
Uniform Resource Identifier (统一资源标识符): Identify resources
Uniform Resource Location (统一资源定位符): A common form of URI
HTTP(Hyper Text Transfer Protocol) is the Web's application-layer protocol. It is implemented in two programs: a client program and a server program. The client and server exchange HTTP messages.
A Web page consists of objects. A HTML file references the other objects in the page with the object's URLs.
HTTP uses TCP as its underlying transport protocol.
Because an HTTP server maintains no information about the clients, HTTP is said to be a stateless protocol.
Although HTTP uses persistent connections in its default mode, HTTP clients and servers can be configured to use non-persistent connections instead.
Example:
The HTTP client process initiates a TCP connection to the server on port number 80. (1 RTT)
The HTTP client sends an HTTP request message to the server via its socket. The request message includes the path name /someDepartment/home.index
.
The HTTP server process receives the request message via its socket, encapsulates the object in an HTTP response message, and sends the response message to the client via its socket.
The HTTP server process tells TCP to close the TCP connection.
The HTTP client receives the response message. The TCP connection terminates.
The first four steps are then repeated for each of the referenced JPEG objects.
In their default modes, most browsers open 5 to 10 parallel TCP connections, and each of these connections handles one request-response transaction.
Typically, the HTTP server closes a connection when it isn’t used for a certain time (a configurable timeout interval)
Pipeling: Subsject to head of line blocking.
Request line and header lines
GET
is case sensitive.
Connection: close
means non-persistent connection.
HEAD
leaves out the requested object. For debugging.
PUT
upload something to the server.
The real URL accessed is www.someschool.edu/somedir/page.html
. When using a proxy, the URI in the request line contains the complete URI, including the authority (host and port number) part.
Status line and header lines
The Date:
header line indicates the time and date when the HTTP response was created and sent by the server.
HTTP/1.1 specifies that the Host filed in the header is mandatory. All other header fileds are optional.
Status codes:
1XX - informational, not the final response yet
2XX - success
3XX - redirection, retrieve from another location or cache
4XX - client error
5XX - server error
HTTP is stateless. We can use cookies to keep track of users.
Cookie technology has four components:
a cookie header line in the HTTP response message
a cookie header line in the HTTP request message
a cookie file kept on the user’s end system and managed by the user’s browser
a back-end database at the Web site.
Example: in the response message:
Then the messages sent by the client has the header line:
Cookies can thus be used to create a user session layer on top of stateless HTTP.
A Web cache—also called a proxy server—is a network entity that satisfies HTTP requests on the behalf of an origin Web server. The Web cache has its own disk storage and keeps copies of recently requested objects in this storage.
A cache is both a server and a client at the same time.
Typically a Web cache is purchased and installed by an ISP.
Through the use of Content Distribution Networks (CDNs), Web caches are increasingly playing an important role in the Internet.
There are shared CDNs (such as Akamai and Limelight) and dedicated CDNs (such as Google and Netflix).
Conditional GET
uses GET
method
the request message includes an If-Modified-Since:
header line
N + 1 TCP connections are generated (N is number of resources). The time of reaction is (N+1) 2RTT + Transmission Time. (Note that N + 1 counts the web page in)
2 TCP connections are generated (N is number of resources). The time of reaction is (2 + N)RTT + Transmission Time