Transportation Layer
Last updated
Last updated
A transport-layer protocol provides for logical communication between application processes running on different hosts.
On the sending side, the transport layer converts the application-layer messages it receives from a sending application process into transport-layer packets, known as transport-layer segments.
Break the application messages into smaller chunks
Add a transport-layer header to each chunk to create the transport-layer segment.
Pass the segment to the network layer at the sending end system, where the segment is encapsulated within a network-layer packet (a datagram).
Sent to the destination.
Relationship Between Transport and Network Layers:
A transport-layer protocol provides logical communication between processes running on different hosts
A network-layer protocol provides logical communication between hosts.
We temporarily refer to TCP and UDP packets as segments, network packet as datagram.
But In real world terminology:
"Packets" are used in the network layer (e.g., IP packets).
"Datagrams" are used in the transport layer for UDP.
"Segments" are used in the transport layer for TCP.
Every host has at least one network-layer address, a so-called IP address. Extending host-to-host delivery to process-to-process delivery is called transport-layer multiplexing and demultiplexing. The transport layer receives segements from the network layer and deliver these segments to the processes.
A process can have one or more sockets. The transport layer in the receiving host does not actually deliver data directly to a process, but instead to an intermediary socket.
A process can have one or more sockets
A port can be shared by several sockets. For example, a connection is identified by the 4-tuple (source_ip, source_port, dest_ip, dest_port) and a server can handle mutiple connections on the same port.
In general, a port cannot be shared by multiple applications (processes), but it can be done with forking.
Each transport-layer segment has a set of fields to direct incoming transport-layer segements to the appropriate sockets. This job of delivering the data in a transport-layer segment to the correct socket is called demultiplexing.
The job of gathering data chunks at the source host from different sockets, encapsulating each data chunk with header information (that will later be used in demultiplexing) to create segments, and passing the segments to the network layer is called multiplexing.
UDP way of demultiplexing: a segment host corresponding socket process.
We can associate a specific port number to this UDP socket via the socket bind()
method.
**A UDP socket is fully identified by a 3-tuple consisting of a destination IP address, a destination port number and the protocol. **
A UDP communication is fully identified by a 5-tuple consisting of a source IP address, source port number, destination IP address, destination port number and protocol. The source IP address and port number are needed because the destination may need to send some information back to the source.
A TCP socket is identified by a 5-tuple: (source IP address, source port number, destination IP address, destination port number, protocol)
In contrast with UDP, two arriving TCP segments with different source IP addresses or source port numbers will (with the exception of a TCP segment carrying the original connection- establishment request) be directed to two different sockets.
The TCP server application has a "welcoming socket", that waits for connection-establishment request from TCP clients on port 12000.
The TCP client creates a socket and sends a connection establishment request segement with the lines:
A connection-establishment request is nothing more than a TCP segment with destination port number 12000 and a special connection-establishment bit set in the TCP header. The segment also includes a source port number that was chosen by the client.
When the host operating system of the computer running the server process receives the incoming connection-request segment with destination port 12000, it locates the server process that is waiting to accept a connection on port number 12000. The server process then creates a new socket:
Also, the transport layer at the server notes the following four values in the connection-request segment: (1) the source port number in the segment, (2) the IP address of the source host, (3) the destination port number in the segment, and (4) its own IP address. The newly created connection socket is identified by these four values plus the protocol.
The two minimal transport-layer services—process-to-process data delivery and error checking—are the only two services that UDP provides!
UDP can provide error checking!
UDP takes messages from the application process, attaches source and destination port number fields for the multiplexing/demultiplexing service, adds two other small fields, and passes the resulting segment( datagram more specificly) to the network layer. The network layer encapsulates the transport-layer segment into an IP packet and then makes a best-effort attempt to deliver the packet to the receiving host.
Why some application developer chooses UDP rather than TCP?
UDP doesn't have a congestion-control
UDP is faster without three-way handshake
No connection state. A server devoted to a particular application can typically support more active clients
Small packet header overhead. The TCP segment has 20 bytes of header overhead in every segment, whereas UDP has only 8 bytes of overhead.
The UDP header has only four fields, each consisting of two bytes.
The length field specifies the number of bytes in the UDP segment (header plus data).
The checksum is used by the receiving host to check whether errors have been introduced into the segment.
UDP at the sender side performs the 1s complement of the sum of all the 16-bit words in the segment, with any overflow encountered during the sum being wrapped around.
In the receiver side, all 16-bit words are added, including the checksum. The expected outcome is 1111111111111111. If we encounter 0s, then errors are introduced.
Although UDP provides error checking, it does not do anything to recover from an error. Some implementations of UDP simply discard the damaged segment; others pass the damaged segment to the application with a warning.
Full-duplex service
Point-to-point
Once the data passes through the socket, the data is in the hands of TCP running in the client. TCP directs this data to the connection's send buffer, which is one of the buffers that is set aisde during the initial three-way handshake. From time to time, TCP will grab chunks of data from the send buffer and pass the data to the network layer.
The maximum amount of data that can be grabbed and placed in a segment is limited by the maximum segment size (MSS). The MSS is typically set by first determining the length of the largest link-layer frame that can be sent by the local sending host (the so-called maximum transmission unit, MTU), and then setting the MSS to ensure that a TCP segment (when encapsulated in an IP datagram) plus the TCP/IP header length (typically 40 bytes) will fit into a single link-layer frame.
TCP pairs each chunk of client data with a TCP header, thereby forming TCP segments. The segments are passed down to the network layer, where they are separately encapsulated within network-layer IP datagrams. The IP datagrams are then sent into the network. When TCP receives a segment at the other end, the segment’s data is placed in the TCP connection’s receive buffer.
TCP is buffer capable. TCP entity can choose to buffer prior to sending or not. Buffering reduces overhead (fewer headers), but increases delay.
The connection can be constructed either passively or actively. In a passive manner, the TCP receives a connection segment. In a active manner, the TCP executes some connect primitives.
The 4-bit header length field specifies the length of the TCP header in 32-bit words. The TCP header can be of variable length due to the TCP options field. (Typically, the options field is empty, so that the length of the typical TCP header is 20 bytes.)
The optional and variable-length options field is used when a sender and receiver negotiate the maximum segment size (MSS) or as a window scaling factor for use in high-speed networks. A timestamping option is also defined
The flag field contains 6 bits.
The ACK bit is used to indicate that the value carried in the acknowledgment field is valid; that is, the segment contains an acknowledgment for a segment that has been successfully received.
The RST, SYN, and FIN bits are used for connection setup and teardown.
The CWR and ECE bits are used in explicit congestion notification.
Setting the PSH bit indicates that the receiver should pass the data to the upper layer immediately.
Finally, the URG bit is used to indicate that there is data in this segment that the sending-side upper-layer entity has marked as “urgent.”
The location of the last byte of this urgent data is indicated by the 16-bit urgent data pointer field. TCP must inform the receiving-side upper- layer entity when urgent data exists and pass it a pointer to the end of the urgent data. (In practice, the PSH, URG, and the urgent data pointer are not used. However, we mention these fields for completeness.)
Sequence Numbers and Acknowledgement Numbers
TCP views data as an unstructured, but ordered, stream of bytes. The sequence number is viewed upon the byte, not segment.
The sequence number for a segment is therefore the byte-stream number of the first byte in the segment.
Each of the segments that arrive from Host B has a sequence number for the data flowing from B to A. The acknowledgment number that Host A puts in its segment is the sequence number of the next byte Host A is expecting from Host B.
In truth, both sides of a TCP connection randomly choose an initial sequence number. This is done to minimize the possibility that a segment that is still present in the network from an earlier, already-terminated connection between two hosts is mistaken for a valid segment in a later connection between these same two hosts (which also happen to be using the same port numbers as the old connection)
A Case Study for Sequence and Acknowledgment Numbers
The third segment intends to acknowledge the data it has received from the server. It also has a sequence number but the data is empty. But because TCP has a sequence number field, the segment needs to have some sequence number.
The recommended value of α is .
In statistics, such an average is called an exponential weighted moving average (EWMA). The word “exponential” appears in EWMA because the weight of a given SampleRTT decays exponentially fast as the updates proceed
n addition to having an estimate of the RTT, it is also valuable to have a measure of the variability of the RTT. DevRTT is an estimate of how much SampleRTT typically deviates from EstimatedRTT:
The recommended value of β is 0.25.
Setting and Managing the Retransmission Timeout Interval
The timeout should be set a little bit larger than the average estimated RTT.
Receiver buffer overflow.
TCP provides flow control by having the sender maintain a variable called the receive window. Informally, the receive window is used to give the sender an idea of how much free buffer space is available at the receiver.
Host B tells Host A how much spare room it has in the connection buffer by placing its current value of rwnd in the receive window field of every segment it sends to A. Initially, Host B sets rwnd = RcvBuffer. Note that to pull this off, Host B must keep track of several connection-specific variables.
By keeping the amount of unacknowledged data less than the value of rwnd, Host A is assured that it is not overflowing the receive buffer at Host B.
The problem is, the rand is sent by sending data or ACK. If the buffer is full and then emptied, no message will be sent to the sender to inform it that the buffer is available again.
To solve this problem, the TCP specification requires Host A to continue to send segments with one data byte when B’s receive window is zero. These segments will be acknowledged by the receiver. Eventually the buffer will begin to empty and the acknowledgments will contain a nonzero rwnd value.
Eatablish a TCP connection:
SYN = 1
a random sequence number client_isn
SYN segment
Server allocate TCP buffers and variables to the connection
Send a connection-granted segment to the client
SYN = 1
The acknowledgment field of the TCP segment header is set to client_isn+1
Choose its own sequence number server_isn
SYNACK segment
Client allocate TCP buffers and variables to the connection
SYN = 0
The acknowledgment field of the TCP segment header is set to server_isn+1
May carry client-to-server data in the segment payload.
In the following message exchanges, the SYN bit will be set up to zero.
Close a TCP connection
The client issues a close command. A special segment with FIN set to 1 is sent to the server. The client goes into FIN_WAIT_1. The ACK is received and the client proceeds to FIN_WAIT_2. When the client receives FIN from the server, goes into TIME_WAIT and wait 30 seconds to enter CLOSED.
If a illegal connection establishing segment is received by the server, a special segment with RST set to 1 is sent back to client.
A singe-timer practice:
You can imagine that the timer is associated with the oldest unacknowledged segment.
A subtlety:
If ACK=120
is received, even if ACK=100
is not received, the Host A doesn't need to retransmit the segment.
Modifications
Each time TCP retransmits, it sets the next timeout interval to twice the previous value. This modification provides a limited form of congestion control.
The receiver will send a duplicate ACK to the sender to trigger a fast retransmit.
Arrival of in-order segment with expected sequence number. All data up to expected sequence number already acknowledged.
Delayed ACK. Wait up to 500 msec for arrival of another in-order segment. If next in-order segment does not arrive in this interval, send an ACK.
Arrival of in-order segment with expected sequence number. One other in-order segment waiting for ACK transmission.
One Immediately send single cumulative ACK, ACKing both in-order segments.
Arrival of out-of-order segment with higher- than-expected sequence number. Gap detected.
Immediately send duplicate ACK, indicating sequence number of next expected byte (which is the lower end of the gap).
Arrival of segment that partially or completely fills in gap in received data.
Immediately send ACK, provided that segment starts at the lower end of gap.
If the TCP sender receives three duplicate ACKs for the same data, it takes this as an indication that the segment following the segment that has been ACKed three times has been lost. In the case that three duplicate ACKs are received, the TCP sender performs a fast retransmit, retransmitting the missing segment before that segment’s timer expires.
The difference between TCP and GBN Auctually the case above.
Considering the introduction of selective acknowledgement policy, TCP’s error-recovery mechanism is probably best categorized as a hybrid of GBN and SR protocols.
End-to-end congestion control
TCP segment loss (as indicated by a timeout or the receipt of three duplicate acknowledgments) is taken as an indication of network congestion, and TCP decreases its window size accordingly. We’ll also see a more recent proposal for TCP congestion control that uses increasing round-trip segment delay as an indicator of increased network congestion
Network-assisted congestion control
With network-assisted congestion control, routers provide explicit feedback to the sender and/or receiver regarding the congestion state of the network.
For network-assisted congestion control, congestion information is typically fed back from the network to the sender in one of two ways. Direct feedback may be sent from a network router to the sender. This form of notification typically takes the form of a choke packet (essentially saying, “I’m congested!”). The second and more common form of notification occurs when a router marks/updates a field in a packet flowing from sender to receiver to indicate congestion. Upon receipt of a marked packet, the receiver then notifies the sender of the congestion indication. This latter form of notification takes a full round-trip time.
The approach taken by TCP is to have each sender limit the rate at which it sends traffic into its connection as a function of perceived network congestion.
A congestion window to limit the send rate:
We define a loss event at the sender side. The loss event is either a timeout ot a three duplicate ACK. If the network is way too congested, some datagrams may be discarded -- cause of loss event.
If there's no loss events, TCP will increse its congestion window size. Because TCP uses acknowledgments to trigger (or clock) its increase in congestion window size, TCP is said to be self-clocking. The faster the ACKs arrive, the faster the window size grows.
A lost segment implies congestion, and hence, the TCP sender’s rate should be decreased when a segment is lost.
An acknowledged segment indicates that the network is delivering the sender’s segments to the receiver, and hence, the sender’s rate can be increased when an ACK arrives for a previously unacknowledged segment.
When a TCP connection begins, the value of cwnd is typically initialized to a small value of 1 MSS. Thus, in the slow-start state, the value of cwnd begins at 1 MSS (i.e. one segment) and increases by 1 MSS every time a transmitted segment is first acknowledged.
This means the cwnd doubles per RTT.
The three ways in which the TCP slow start process can end are:
Loss Event (Timeout): If a loss event such as network congestion is detected, marked by a timeout, the TCP sender sets the congestion window size (cwnd) to 1 and initiates the slow start process from the beginning. Alongside, it also sets the slow start threshold (ssthresh) to cwnd/2, which is half the cwnd value when the congestion was noticed.
Reaching Slow Start Threshold: The second way slow start ends is related to the value of ssthresh. Since ssthresh is set to half the value of cwnd when congestion was last detected, it would be imprudent to continue doubling cwnd when it matches or surpasses the ssthresh value. Therefore, when cwnd equals ssthresh, slow start concludes, and TCP shifts into congestion avoidance mode. Don
Three Duplicate ACKs: The final scenario in which slow start can terminate is if three duplicate ACKs (acknowledgements) are detected. In this case, TCP executes a fast retransmit and enters the fast recovery state. This mechanism allows TCP to quickly respond to packet losses and recover the lost data.
TCP adopts a more conservative approach and increases the value of cwnd by just a single MSS every RTT. The TCP sender increases the size of the congestion window (cwnd) every time it receives an acknowledgement (ACK) for a packet it has sent. The amount by which it increases cwnd is equal to MSS divided by the current size of the cwnd.
TCP’s congestion-avoidance algorithm behaves the same when a timeout occurs as in the case of slow start: The value of cwnd is set to 1 MSS, and the value of ssthresh is updated to half the value of cwnd when the loss event occurred. Recall, however, that a loss event also can be triggered by a triple duplicate ACK event.
The value of cwnd is increased by 1 MSS for every duplicate ACK received for the missing segment that caused TCP to enter the fast-recovery state.
Eventually, when an ACK arrives for the missing segment, TCP enters the congestion-avoidance state after deflating cwnd. If a timeout event occurs, fast recovery transitions to the slow-start state after performing the same actions as in slow start and congestion avoidance: The value of cwnd is set to 1 MSS, and the value of ssthresh is set to half the value of cwnd when the loss event occurred.
TCP congestion control is often referred to as an additive-increase, multiplicative-decrease (AIMD) form of congestion control.
Assuming that RTT and W are approximately constant over the duration of the connection, the TCP transmission rate ranges from to .
The average throughput of a connection is
TCP congestion control converges to provide an equal share of a bottleneck link’s bandwidth among competing TCP connections.
A to B to C ... converging to the intersection.
At the network layer, two bits (with four possible values, overall) in the Type of Service field of the IP datagram header are used for ECN.
The TCP sender, in turn, reacts to an ACK with an ECE congestion indication by halving the congestion window, as it would react to a lost segment using fast retransmit, and sets the CWR (Congestion Window Reduced) bit in the header of the next transmitted TCP sender-to-receiver segment.