# The Sockets Interface

<figure><img src="https://p.ipic.vip/e0m23m.png" alt="Screenshot 2023-06-11 at 4.01.01 AM"><figcaption></figcaption></figure>

From the perspective of the Linux kernel, a socket is an end point for communication. From the perspective of a Linux program, a socket is an open file with a corresponding descriptor.

Internet socket addresses are stored in 16-byte structures having the type `sockaddr_in`.

```c
/* IP socket address structure */
struct sockaddr_in  {
	uint16_t        sin_family;  /* Protocol family*/
  uint16_t        sin_port;    /* Port number in network byte order */
  struct in_addr  sin_addr;    /* IP address in network byte order */
  unsigned char   sin_zero[8]; /* Pad to sizeof(struct sockaddr) */
};
```

`sin_addr` is a 32-bit address. The IP address and port number are always stored in network byte order.

The `connect`, `bind`, and `accept` functions require a pointer to a protocol-specific socket address structure.

In old days, the sockets functions expect a pointer to a generic `sockaddr` structure.

```c
struct sockaddr {
	uint16_t  sa_family;    /* Protocol family */
	char      sa_data[14];  /* Address data  */
};
```

Pointers pointing to protocol-specific structures should be cast to `sockaddr`.

Now, we use a generic `void *` pointer.

Literally we setup a server through the following procedures:

* We invoke `getaddrinfo(const char *host, const char *service, const struct addrinfo *hints, struct addrinfo **result)` to obtain a list of `addrinfo` structures that represent the possible network addresses we can use.

  For a server, the `host` is set to `NULL` and the `service` is set to the port number.
* We iterate through each of the `addrinfo` structures in the result list and try to use them with the `socket` and `bind` functions. The goal is to create a socket and bind it to an address and port where it can listen for incoming connections. If the `socket` or `bind` call fails, we try the next `addrinfo` structure in the list until we find one that works or exhaust all options.
* We invoke the `listen` function on the bound socket to turn it into a listening socket. This tells the operating system that we want this socket to be used to accept incoming connection requests.
* We call `accept` in a loop to wait for and accept incoming connections. When a client connects, `accept` returns a new socket that's specifically linked to that client. We can then use this new socket to communicate with the client.

## The `socket` Function

Clients and servers use the `socket` function to create a *socket descriptor*.

```c
#include <sys/types.h>
#include <sys/socket.h>

int socket(int domain, int type, int protocol);

// for the parameter protocol you can just use 0 and the system will choose the correct protocol based on the type.
// Returns: nonnegative descriptor if OK, −1 on error

clientfd = socket(AF_INET, SOCK_STREAM, 0);
```

`AF_INET` indicates that we are using 32-bit IP addresses and `SOCK_ STREAM` indicates that the socket will be an end point for a connection.

The best practice is to use the `getaddrinfo` function to generate these parameters automatically, so that the code is protocol-independent.

## The `connect` Function

```c
#include <sys/socket.h>

int connect(int clientfd, const struct sockaddr *addr, socklen_t addrlen);

// Returns: 0 if OK, −1 on error
```

The `connect` function attempts to establish an Internet connection with the server at the socket address `addr`, where addrlen is `sizeof(sockaddr_in)`. **The `connect` function blocks until either the connection is successfully established or an error occurs.** If successful, the `clientfd` descriptor is now ready for reading and writing, and the resulting connection is characterized by the socket pair `(x:y, addr.sin_addr:addr.sin_port)`. As with socket, the best practice is to use `getaddrinfo` to supply the arguments to connect.

## The `bind` Function

```c
#include <sys/socket.h>

int bind(int sockfd, const struct sockaddr *addr, socklen_t addrlen);

// Returns: 0 if OK, −1 on error
```

The `bind` function asks the kernel to associate the server’s socket address in `addr` with the socket descriptor `sockfd`.

## The `listen` Function

The client is the active side that initiates connection requests. The server side is the passive entities that wait for connection requests from clients. By default, the kernel assumes that a descriptor created by the `socket` function is the client side of a connection. A server calls the `listen` function to tell the kernel that the descriptor is on the server side.

```c
#include <sys/socket.h>

int listen(int sockfd, int backlog);
```

The `backlog` argument is a hint about the number of outstanding connection requests that the kernel should queue up before it starts to refuse requests.

## The `accept` Function

Servers wait for connection requests from clients by calling the `accept` function.

```c
#include <sys/socket.h>

int accept(int listenfd, struct sockaddr *addr, int *addrlen);

// Returns: nonnegative connected descriptor if OK, −1 on error
```

The accept function waits for a connection request from a client to arrive on the listening descriptor listenfd, then fills in the client’s socket address in addr, and returns a **connected descriptor** that can be used to communicate with the client using Unix I/O functions.

The **listening descriptor** serves as an end point for client connection requests. It is typically created once and exists for the lifetime of the server.

The **connected descriptor** is the end point of the connection that is established between the client and the server. It is created each time the server accepts a connection request and exists only as long as it takes the server to service a client.

![Screenshot 2023-05-13 at 9.13.48 PM](https://p.ipic.vip/901bi8.png)

![Screenshot 2023-05-27 at 7.12.33 AM](https://p.ipic.vip/wph8pk.png)

This implies that a port in a host can have multiple connections. A connection is identified by the 5-tuple.

## Host and Service Conversion

`getaddrinfo` and `getnameinfo` converts back and forth between binary socket address strcutures and the string representations of hostnames, host addresses, service names and port numbers.

**The `getaddrinfo`** Function

The `getaddrinfo` function converts string representations of hostnames, host addresses, service names, and port numbers into socket address structures.

It is reentrant and works with any protocol.

```c
#include <sys/types.h>
#include <sys/socket.h>
#include <netdb.h>

int getaddrinfo(const char *host, const char *service,
                const struct addrinfo *hints,
                struct addrinfo **result);

// Returns: 0 if OK, nonzero error code on error

void freeaddrinfo(struct addrinfo *result);

// Returns: nothing

const char *gai_strerror(int errcode);

// Returns: error message

struct addrinfo {
  int ai_flags;     /* Hints argument flags */
	int ai_family;    /* First arg to socket function */
	int ai_socktype;  /* Second arg to socket function */
	int ai_protocol;  /* Third arg to socket function */
  char *ai_canoname; /* Canonical hostname */
  size_t ai_addrlen; /* Size of ai_addr struct */
  struct sockaddr *ai_addr; /* Ptr to socket address structure */
  struct addrinfo *ai_next; /* Ptr to next item in linked list */
};
```

Given `host` and `service` (the two components of a socket address), `getaddrinfo` returns a result that points to a linked list of `addrinfo` structures, each of which points to a socket address structure that corresponds to `host` and `service`.

After a client calls `getaddrinfo`, it walks this list, trying each socket address in turn until the calls to socket and connect succeed and the connection is established.

Similarly, a server tries each socket address on the list until the calls to socket and bind succeed and the descriptor is bound to a valid socket address.

To avoid memory leaks, the application must eventually free the list by calling `freeaddrinfo`. If `getaddrinfo` returns a nonzero error code, the application can call `gai_strerror` to convert the code to a message string.

The optional `hints` argumet is an `addrinfo` structure that provides finer control over the list of socket addresses that `getaddrinfo` returns. When passes as a `hints` argument, only the `ai_family`, `ai_socktype`, `ai_protocol`, and `ai_flags` fields can be set. The other fields must be set to zero (or NULL). We use `memset` to zero the entire structure and set a few selected fields:

* Setting `ai_family` to `AF_INET` restricts the list to IPv4 addresses. Setting it to `AF_INET6` restricts the list to IPv6 addresses.
* Setting `ai_socktype` to `SOCK_STREAM` restricts the list to at most one `addrinfo` structure for each unique address, one whose socket address can be used as the end point of a connection.

  `SOCK_STREAM` is a type of reliable and connection-oriented socket adopted by TCP. UDP uses `SOCK_DGRAM`.
* The `ai_flags` field is a bit mask that further modifies the default behavior. You create it by oring combinations of various values.
  * `AI_ADDRCONFIG`. This flag is recommended if you are using connections. It asks getaddrinfo to return IPv4 addresses only if the local host is configured for IPv4. Similarly for IPv6.
  * `AI_CANONNAME`. By default, the ai\_canonname field is NULL. If this flag is set, it instructs `getaddrinfo` to point the ai\_canonname field in the first addrinfo structure in the list to the canonical (official) name of host.
  * `AI_NUMERICSERV`. By default, the service argument can be a service name or a port number. This flag forces the service argument to be a port number.
  * `AI_PASSIVE`. Bydefault, `getaddrinfo` returns socket addresses that can be used by clients as active sockets in calls to connect. This flag instructs it to return socket addresses that can be used by servers as listening sockets. In this case, the host argument should be NULL. The address field in the resulting socket address structure(s) will be the *wildcard address*, which tells the kernel that this server will accept requests to any of the IP addresses for this host. This is the desired behavior for all of our example servers.

When `getaddrinfo` creates an `addrinfo` structure in the output list, it fills in each field except for `ai_flags`.

One of the elegant aspects of `getaddrinfo` is that the fields in an `addrinfo` structure are opaque, in the sense that they can be passed directly to the functions in the sockets interface without any further manipulation by the application code.

**The `getnameinfo` Function**

The `getnameinfo` function is inverse of `getaddrinfo`. It is reentrant and protocol-independent.

```c
#include <sys/socket.h>
#include <netdb.h>

int getnameinfo(const struct sockaddr *sa, socklen_t salen,
                char *host, size_t hostlen,
                char *service, size_t servlen, int flags);

// Returns: 0 if OK, nonzero error code on error
```

The `sa` argument points to a socket address structure of size `salen` bytes, `host` to a buffer of size `hostlen` bytes and `service` to a buffer of size `servlen` bytes.

If `getnameinfo` returns a nonzero error code, the application can convert it to a string by calling `gai_strerror`.

The `flags` argument is a bit mask that modifies the default behavior.

* `NI_NUMERICHOST`. By default, `getnameinfo` tries to return a domain name in host. Setting this flag will cause it to return a numeric address string instead.
* `NI_NUMERICSERV`. By default, `getnameinfo` will look in /etc/services and if possible, return a service name instead of a port number. Setting this flag forces it to skip the lookup and simply return the port number.

## Helper Functions for the Sockets Interface

```c
int open_clientfd(char *hostname, char *port) {
  int clientfd;
  struct addrinfo hints, *listp, *p;
  
  memset(&hints, 0, sizeof(struct addrinfo));
  hints.ai_socktype = SOCK_STREAM;
  hints.ai_flags = AI_NUMERICSERV;
  hints.ai_flags |=  AI_ADDRCONFIG;
  
  getaddrinfo(hostname, port, &hints, &lisp);
  
  for(p = listp; p; p = p->ai_next) {
    if ((clientfd = socket(p->ai_family, p->ai_socktype, p->ai_protocol)) < 0)
      continue;
    if (connect(clientfd, p->ai_addr, p->ai_addrlen) != -1)
      break;
    close(clientfd);
  }
  
  freeaddrinfo(listp);
  if(!p)
    return -1;
  else
    return clientfd;
}
```

```c
int open_listenfd(char *port) {
  struct addrinfo hints, *listp, *p;
  int listenfd, optval=1;
  
  memset(&hints, 0, sizeof(struct addrinfo));
  hints.ai_socktype = SOCK_STREAM;
  hints.ai_flags = AI_PASSIVE | AI_ADDRCONFIG; // on any IP addresses
  hint.ai_flags = AI_NUMERICSERV; // using port number
  getaddrinfo(NULL, port, &hints, &listp);
  
  for(p = listp; p; p = p->ai_next) {
    if((listenfd = socket(p->ai_family, p->ai_socktype, p->ai_protocol)) < 0)
      continue;
    
    // Eliminates "Address already in use" error from bind
    setsockopt(listenfd, SOL_SOCKET, SO_REUSEADDR, (const void *)&optval, sizeof(int));
    
    if (bind(listenfd, p->addr, p->ai_addrlen) == 0)
      break;
    
    close(listenfd);
  }
  
  freeaddrinfo(listp);
  if(!p)
    return -1;
  
  if(listen(listenfd, LISTRENQ) < 0) {
    close(listenfd);
    return -1;
  }
  return listenfd;
}
```

It is good programming practice to explicitly close any descriptors that you have opened.
