The Sockets Interface
Last updated
Last updated
From the perspective of the Linux kernel, a socket is an end point for communication. From the perspective of a Linux program, a socket is an open file with a corresponding descriptor.
Internet socket addresses are stored in 16-byte structures having the type sockaddr_in
.
sin_addr
is a 32-bit address. The IP address and port number are always stored in network byte order.
The connect
, bind
, and accept
functions require a pointer to a protocol-specific socket address structure.
In old days, the sockets functions expect a pointer to a generic sockaddr
structure.
Pointers pointing to protocol-specific structures should be cast to sockaddr
.
Now, we use a generic void *
pointer.
Literally we setup a server through the following procedures:
We invoke getaddrinfo(const char *host, const char *service, const struct addrinfo *hints, struct addrinfo **result)
to obtain a list of addrinfo
structures that represent the possible network addresses we can use.
For a server, the host
is set to NULL
and the service
is set to the port number.
We iterate through each of the addrinfo
structures in the result list and try to use them with the socket
and bind
functions. The goal is to create a socket and bind it to an address and port where it can listen for incoming connections. If the socket
or bind
call fails, we try the next addrinfo
structure in the list until we find one that works or exhaust all options.
We invoke the listen
function on the bound socket to turn it into a listening socket. This tells the operating system that we want this socket to be used to accept incoming connection requests.
We call accept
in a loop to wait for and accept incoming connections. When a client connects, accept
returns a new socket that's specifically linked to that client. We can then use this new socket to communicate with the client.
socket
FunctionClients and servers use the socket
function to create a socket descriptor.
AF_INET
indicates that we are using 32-bit IP addresses and SOCK_ STREAM
indicates that the socket will be an end point for a connection.
The best practice is to use the getaddrinfo
function to generate these parameters automatically, so that the code is protocol-independent.
connect
FunctionThe connect
function attempts to establish an Internet connection with the server at the socket address addr
, where addrlen is sizeof(sockaddr_in)
. The connect
function blocks until either the connection is successfully established or an error occurs. If successful, the clientfd
descriptor is now ready for reading and writing, and the resulting connection is characterized by the socket pair (x:y, addr.sin_addr:addr.sin_port)
. As with socket, the best practice is to use getaddrinfo
to supply the arguments to connect.
bind
FunctionThe bind
function asks the kernel to associate the server’s socket address in addr
with the socket descriptor sockfd
.
listen
FunctionThe client is the active side that initiates connection requests. The server side is the passive entities that wait for connection requests from clients. By default, the kernel assumes that a descriptor created by the socket
function is the client side of a connection. A server calls the listen
function to tell the kernel that the descriptor is on the server side.
The backlog
argument is a hint about the number of outstanding connection requests that the kernel should queue up before it starts to refuse requests.
accept
FunctionServers wait for connection requests from clients by calling the accept
function.
The accept function waits for a connection request from a client to arrive on the listening descriptor listenfd, then fills in the client’s socket address in addr, and returns a connected descriptor that can be used to communicate with the client using Unix I/O functions.
The listening descriptor serves as an end point for client connection requests. It is typically created once and exists for the lifetime of the server.
The connected descriptor is the end point of the connection that is established between the client and the server. It is created each time the server accepts a connection request and exists only as long as it takes the server to service a client.
This implies that a port in a host can have multiple connections. A connection is identified by the 5-tuple.
getaddrinfo
and getnameinfo
converts back and forth between binary socket address strcutures and the string representations of hostnames, host addresses, service names and port numbers.
The getaddrinfo
Function
The getaddrinfo
function converts string representations of hostnames, host addresses, service names, and port numbers into socket address structures.
It is reentrant and works with any protocol.
Given host
and service
(the two components of a socket address), getaddrinfo
returns a result that points to a linked list of addrinfo
structures, each of which points to a socket address structure that corresponds to host
and service
.
After a client calls getaddrinfo
, it walks this list, trying each socket address in turn until the calls to socket and connect succeed and the connection is established.
Similarly, a server tries each socket address on the list until the calls to socket and bind succeed and the descriptor is bound to a valid socket address.
To avoid memory leaks, the application must eventually free the list by calling freeaddrinfo
. If getaddrinfo
returns a nonzero error code, the application can call gai_strerror
to convert the code to a message string.
The optional hints
argumet is an addrinfo
structure that provides finer control over the list of socket addresses that getaddrinfo
returns. When passes as a hints
argument, only the ai_family
, ai_socktype
, ai_protocol
, and ai_flags
fields can be set. The other fields must be set to zero (or NULL). We use memset
to zero the entire structure and set a few selected fields:
Setting ai_family
to AF_INET
restricts the list to IPv4 addresses. Setting it to AF_INET6
restricts the list to IPv6 addresses.
Setting ai_socktype
to SOCK_STREAM
restricts the list to at most one addrinfo
structure for each unique address, one whose socket address can be used as the end point of a connection.
SOCK_STREAM
is a type of reliable and connection-oriented socket adopted by TCP. UDP uses SOCK_DGRAM
.
The ai_flags
field is a bit mask that further modifies the default behavior. You create it by oring combinations of various values.
AI_ADDRCONFIG
. This flag is recommended if you are using connections. It asks getaddrinfo to return IPv4 addresses only if the local host is configured for IPv4. Similarly for IPv6.
AI_CANONNAME
. By default, the ai_canonname field is NULL. If this flag is set, it instructs getaddrinfo
to point the ai_canonname field in the first addrinfo structure in the list to the canonical (official) name of host.
AI_NUMERICSERV
. By default, the service argument can be a service name or a port number. This flag forces the service argument to be a port number.
AI_PASSIVE
. Bydefault, getaddrinfo
returns socket addresses that can be used by clients as active sockets in calls to connect. This flag instructs it to return socket addresses that can be used by servers as listening sockets. In this case, the host argument should be NULL. The address field in the resulting socket address structure(s) will be the wildcard address, which tells the kernel that this server will accept requests to any of the IP addresses for this host. This is the desired behavior for all of our example servers.
When getaddrinfo
creates an addrinfo
structure in the output list, it fills in each field except for ai_flags
.
One of the elegant aspects of getaddrinfo
is that the fields in an addrinfo
structure are opaque, in the sense that they can be passed directly to the functions in the sockets interface without any further manipulation by the application code.
The getnameinfo
Function
The getnameinfo
function is inverse of getaddrinfo
. It is reentrant and protocol-independent.
The sa
argument points to a socket address structure of size salen
bytes, host
to a buffer of size hostlen
bytes and service
to a buffer of size servlen
bytes.
If getnameinfo
returns a nonzero error code, the application can convert it to a string by calling gai_strerror
.
The flags
argument is a bit mask that modifies the default behavior.
NI_NUMERICHOST
. By default, getnameinfo
tries to return a domain name in host. Setting this flag will cause it to return a numeric address string instead.
NI_NUMERICSERV
. By default, getnameinfo
will look in /etc/services and if possible, return a service name instead of a port number. Setting this flag forces it to skip the lookup and simply return the port number.
It is good programming practice to explicitly close any descriptors that you have opened.