The Sockets Interface

Screenshot 2023-06-11 at 4.01.01 AM

From the perspective of the Linux kernel, a socket is an end point for communication. From the perspective of a Linux program, a socket is an open file with a corresponding descriptor.

Internet socket addresses are stored in 16-byte structures having the type sockaddr_in.

/* IP socket address structure */
struct sockaddr_in  {
	uint16_t        sin_family;  /* Protocol family*/
  uint16_t        sin_port;    /* Port number in network byte order */
  struct in_addr  sin_addr;    /* IP address in network byte order */
  unsigned char   sin_zero[8]; /* Pad to sizeof(struct sockaddr) */
};

sin_addr is a 32-bit address. The IP address and port number are always stored in network byte order.

The connect, bind, and accept functions require a pointer to a protocol-specific socket address structure.

In old days, the sockets functions expect a pointer to a generic sockaddr structure.

struct sockaddr {
	uint16_t  sa_family;    /* Protocol family */
	char      sa_data[14];  /* Address data  */
};

Pointers pointing to protocol-specific structures should be cast to sockaddr.

Now, we use a generic void * pointer.

Literally we setup a server through the following procedures:

  • We invoke getaddrinfo(const char *host, const char *service, const struct addrinfo *hints, struct addrinfo **result) to obtain a list of addrinfo structures that represent the possible network addresses we can use.

    For a server, the host is set to NULL and the service is set to the port number.

  • We iterate through each of the addrinfo structures in the result list and try to use them with the socket and bind functions. The goal is to create a socket and bind it to an address and port where it can listen for incoming connections. If the socket or bind call fails, we try the next addrinfo structure in the list until we find one that works or exhaust all options.

  • We invoke the listen function on the bound socket to turn it into a listening socket. This tells the operating system that we want this socket to be used to accept incoming connection requests.

  • We call accept in a loop to wait for and accept incoming connections. When a client connects, accept returns a new socket that's specifically linked to that client. We can then use this new socket to communicate with the client.

The socket Function

Clients and servers use the socket function to create a socket descriptor.

#include <sys/types.h>
#include <sys/socket.h>

int socket(int domain, int type, int protocol);

// for the parameter protocol you can just use 0 and the system will choose the correct protocol based on the type.
// Returns: nonnegative descriptor if OK, −1 on error

clientfd = socket(AF_INET, SOCK_STREAM, 0);

AF_INET indicates that we are using 32-bit IP addresses and SOCK_ STREAM indicates that the socket will be an end point for a connection.

The best practice is to use the getaddrinfo function to generate these parameters automatically, so that the code is protocol-independent.

The connect Function

#include <sys/socket.h>

int connect(int clientfd, const struct sockaddr *addr, socklen_t addrlen);

// Returns: 0 if OK, −1 on error

The connect function attempts to establish an Internet connection with the server at the socket address addr, where addrlen is sizeof(sockaddr_in). The connect function blocks until either the connection is successfully established or an error occurs. If successful, the clientfd descriptor is now ready for reading and writing, and the resulting connection is characterized by the socket pair (x:y, addr.sin_addr:addr.sin_port). As with socket, the best practice is to use getaddrinfo to supply the arguments to connect.

The bind Function

#include <sys/socket.h>

int bind(int sockfd, const struct sockaddr *addr, socklen_t addrlen);

// Returns: 0 if OK, −1 on error

The bind function asks the kernel to associate the server’s socket address in addr with the socket descriptor sockfd.

The listen Function

The client is the active side that initiates connection requests. The server side is the passive entities that wait for connection requests from clients. By default, the kernel assumes that a descriptor created by the socket function is the client side of a connection. A server calls the listen function to tell the kernel that the descriptor is on the server side.

#include <sys/socket.h>

int listen(int sockfd, int backlog);

The backlog argument is a hint about the number of outstanding connection requests that the kernel should queue up before it starts to refuse requests.

The accept Function

Servers wait for connection requests from clients by calling the accept function.

#include <sys/socket.h>

int accept(int listenfd, struct sockaddr *addr, int *addrlen);

// Returns: nonnegative connected descriptor if OK, −1 on error

The accept function waits for a connection request from a client to arrive on the listening descriptor listenfd, then fills in the client’s socket address in addr, and returns a connected descriptor that can be used to communicate with the client using Unix I/O functions.

The listening descriptor serves as an end point for client connection requests. It is typically created once and exists for the lifetime of the server.

The connected descriptor is the end point of the connection that is established between the client and the server. It is created each time the server accepts a connection request and exists only as long as it takes the server to service a client.

Screenshot 2023-05-13 at 9.13.48 PM
Screenshot 2023-05-27 at 7.12.33 AM

This implies that a port in a host can have multiple connections. A connection is identified by the 5-tuple.

Host and Service Conversion

getaddrinfo and getnameinfo converts back and forth between binary socket address strcutures and the string representations of hostnames, host addresses, service names and port numbers.

The getaddrinfo Function

The getaddrinfo function converts string representations of hostnames, host addresses, service names, and port numbers into socket address structures.

It is reentrant and works with any protocol.

#include <sys/types.h>
#include <sys/socket.h>
#include <netdb.h>

int getaddrinfo(const char *host, const char *service,
                const struct addrinfo *hints,
                struct addrinfo **result);

// Returns: 0 if OK, nonzero error code on error

void freeaddrinfo(struct addrinfo *result);

// Returns: nothing

const char *gai_strerror(int errcode);

// Returns: error message

struct addrinfo {
  int ai_flags;     /* Hints argument flags */
	int ai_family;    /* First arg to socket function */
	int ai_socktype;  /* Second arg to socket function */
	int ai_protocol;  /* Third arg to socket function */
  char *ai_canoname; /* Canonical hostname */
  size_t ai_addrlen; /* Size of ai_addr struct */
  struct sockaddr *ai_addr; /* Ptr to socket address structure */
  struct addrinfo *ai_next; /* Ptr to next item in linked list */
};

Given host and service (the two components of a socket address), getaddrinfo returns a result that points to a linked list of addrinfo structures, each of which points to a socket address structure that corresponds to host and service.

After a client calls getaddrinfo, it walks this list, trying each socket address in turn until the calls to socket and connect succeed and the connection is established.

Similarly, a server tries each socket address on the list until the calls to socket and bind succeed and the descriptor is bound to a valid socket address.

To avoid memory leaks, the application must eventually free the list by calling freeaddrinfo. If getaddrinfo returns a nonzero error code, the application can call gai_strerror to convert the code to a message string.

The optional hints argumet is an addrinfo structure that provides finer control over the list of socket addresses that getaddrinfo returns. When passes as a hints argument, only the ai_family, ai_socktype, ai_protocol, and ai_flags fields can be set. The other fields must be set to zero (or NULL). We use memset to zero the entire structure and set a few selected fields:

  • Setting ai_family to AF_INET restricts the list to IPv4 addresses. Setting it to AF_INET6 restricts the list to IPv6 addresses.

  • Setting ai_socktype to SOCK_STREAM restricts the list to at most one addrinfo structure for each unique address, one whose socket address can be used as the end point of a connection.

    SOCK_STREAM is a type of reliable and connection-oriented socket adopted by TCP. UDP uses SOCK_DGRAM.

  • The ai_flags field is a bit mask that further modifies the default behavior. You create it by oring combinations of various values.

    • AI_ADDRCONFIG. This flag is recommended if you are using connections. It asks getaddrinfo to return IPv4 addresses only if the local host is configured for IPv4. Similarly for IPv6.

    • AI_CANONNAME. By default, the ai_canonname field is NULL. If this flag is set, it instructs getaddrinfo to point the ai_canonname field in the first addrinfo structure in the list to the canonical (official) name of host.

    • AI_NUMERICSERV. By default, the service argument can be a service name or a port number. This flag forces the service argument to be a port number.

    • AI_PASSIVE. Bydefault, getaddrinfo returns socket addresses that can be used by clients as active sockets in calls to connect. This flag instructs it to return socket addresses that can be used by servers as listening sockets. In this case, the host argument should be NULL. The address field in the resulting socket address structure(s) will be the wildcard address, which tells the kernel that this server will accept requests to any of the IP addresses for this host. This is the desired behavior for all of our example servers.

When getaddrinfo creates an addrinfo structure in the output list, it fills in each field except for ai_flags.

One of the elegant aspects of getaddrinfo is that the fields in an addrinfo structure are opaque, in the sense that they can be passed directly to the functions in the sockets interface without any further manipulation by the application code.

The getnameinfo Function

The getnameinfo function is inverse of getaddrinfo. It is reentrant and protocol-independent.

#include <sys/socket.h>
#include <netdb.h>

int getnameinfo(const struct sockaddr *sa, socklen_t salen,
                char *host, size_t hostlen,
                char *service, size_t servlen, int flags);

// Returns: 0 if OK, nonzero error code on error

The sa argument points to a socket address structure of size salen bytes, host to a buffer of size hostlen bytes and service to a buffer of size servlen bytes.

If getnameinfo returns a nonzero error code, the application can convert it to a string by calling gai_strerror.

The flags argument is a bit mask that modifies the default behavior.

  • NI_NUMERICHOST. By default, getnameinfo tries to return a domain name in host. Setting this flag will cause it to return a numeric address string instead.

  • NI_NUMERICSERV. By default, getnameinfo will look in /etc/services and if possible, return a service name instead of a port number. Setting this flag forces it to skip the lookup and simply return the port number.

Helper Functions for the Sockets Interface

int open_clientfd(char *hostname, char *port) {
  int clientfd;
  struct addrinfo hints, *listp, *p;
  
  memset(&hints, 0, sizeof(struct addrinfo));
  hints.ai_socktype = SOCK_STREAM;
  hints.ai_flags = AI_NUMERICSERV;
  hints.ai_flags |=  AI_ADDRCONFIG;
  
  getaddrinfo(hostname, port, &hints, &lisp);
  
  for(p = listp; p; p = p->ai_next) {
    if ((clientfd = socket(p->ai_family, p->ai_socktype, p->ai_protocol)) < 0)
      continue;
    if (connect(clientfd, p->ai_addr, p->ai_addrlen) != -1)
      break;
    close(clientfd);
  }
  
  freeaddrinfo(listp);
  if(!p)
    return -1;
  else
    return clientfd;
}
int open_listenfd(char *port) {
  struct addrinfo hints, *listp, *p;
  int listenfd, optval=1;
  
  memset(&hints, 0, sizeof(struct addrinfo));
  hints.ai_socktype = SOCK_STREAM;
  hints.ai_flags = AI_PASSIVE | AI_ADDRCONFIG; // on any IP addresses
  hint.ai_flags = AI_NUMERICSERV; // using port number
  getaddrinfo(NULL, port, &hints, &listp);
  
  for(p = listp; p; p = p->ai_next) {
    if((listenfd = socket(p->ai_family, p->ai_socktype, p->ai_protocol)) < 0)
      continue;
    
    // Eliminates "Address already in use" error from bind
    setsockopt(listenfd, SOL_SOCKET, SO_REUSEADDR, (const void *)&optval, sizeof(int));
    
    if (bind(listenfd, p->addr, p->ai_addrlen) == 0)
      break;
    
    close(listenfd);
  }
  
  freeaddrinfo(listp);
  if(!p)
    return -1;
  
  if(listen(listenfd, LISTRENQ) < 0) {
    close(listenfd);
    return -1;
  }
  return listenfd;
}

It is good programming practice to explicitly close any descriptors that you have opened.

Last updated