Fundamental Types

The rest of this chapter covers Rust’s types from the bottom up, starting with simple numeric types like integers and floating-point values then moving on to types that hold more data: boxes, tuples, arrays, and strings.

Fixed-Width Numeric Types

Integer Types

u8: Rust uses the u8 type for byte values. For example, reading data from a binary file or socket yields a stream of u8 values.

char: Characters in Rust are 32 bits long.

Integer Literals:

  • Integer literals in Rust can take a suffix indicating their type: 42u8 is a u8 value, and 1729isize is an isize.

  • In the end, if multiple types could work, Rust defaults to i32 if that is among the possibilities.

  • The prefixes 0x, 0o, and 0b designate hexadecimal, octal, and binary literals.

Byte Literals:

char c1 = b'A';
char c2 = b'\\';
char c3 = b'\x1b'; // Emphasize that c3 is an ASCII code.

Type Casts: cast based on raw bits:

// Conversions that are out of range for the destination
// produce values that are equivalent to the original modulo 2^N, 
// where N is the width of the destination in bits. This
// is sometimes called "truncation."
assert_eq!( 1000_i16 as u8, 232_u8);
assert_eq!(65535_u32 as i16, -1_i16);
assert_eq!( -1_i8 as u8, 255_u8); assert_eq!( 255_u8 as i8, -1_i8);

Overflow: defined behavior (In C/C++, it is undefined behavior)

  • In debug build, Rust panics.

  • In release build, the operation wraps around: it produces the value equivalent to the mathematically correct result modulo the range of the value.

Different Mechanisms for Handling Overflows:

Floating-Point Types

Type Inference: If Rust finds that either floating-point type could fit a variable without type specified, it chooses f64 by default.

Floating Constants: The types f32 and f64 have associated constants for the IEEE-required special values like INFINITY, NEG_INFINITY (negative infinity), NAN (the not-a-number value), and MIN and MAX (the largest and smallest finite values):

The std::f32::consts and std::f64::consts modules provide various commonly used mathematical constants like E, PI, and the square root of two.

Conversions: Unlike C and C++, Rust performs almost no numeric conversions implicitly. But you can always write out explicit conversions using the as operator: i as f64, or x as i32.

The bool Type

Conversion to Integer: Rust’s as operator can convert bool values to integer types:

Characters

Rust’s character type char represents a single Unicode character, as a 32-bit value. You can write any Unicode character as '\u{HHHHHH}', where HHHHHH is a hexadecimal number up to six digits long.

Conversion:

  • For target types smaller than 32 bits, the upper bits of the character’s value are truncated.

  • u8 is the only type the as operator will convert to char. Every integer type other than u8 includes values that are not permitted Unicode code points.

Tuples

Indices: Tuples allow only constants as indices, like t.4. You can’t write t.i or t[i] to get the ith element.

Zero-tuple:

Pointer Types

Rust is designed to help keep allocations to a minimum.

The value ((0, 0), (1440, 900)) is stored as four adjacent integers. If you store it in a local variable, you’ve got a local variable four integers wide. Nothing is allocated in the heap.

Three pointer types: references, boxes, and unsafe pointers.

References

The expression &x borrows a reference to x. Given a reference r, the expression *r refers to the value r points to. Like a C pointer, a reference does not automatically free any resources when it goes out of scope.


Unlike C:

  • Rust references are never null.

  • Rust tracks the ownership and lifetimes of values, so mistakes like dangling pointers, double frees, and pointer invalidation are ruled out at compile time.


Two Flavors of References:

  • &T: as with const T* in C.

  • &mut T: as long as the reference exists, you may not have any other references of any kind to that value.

Boxes

The simplest way to allocate a value in the heap is to use Box::new​.

When b goes out of scope, the memory is freed immediately, unless b has been moved —by returning it, for example.

Raw Pointers

Rust also has the raw pointer types *mut T and *const T.

Arrays, Vectors, and Slices

  • The type [T; N] represents an array of N values, each of type T. An array’s size is a constant determined at compile time and cannot be changed.

  • The type Vec<T>, called a vector of Ts, is a dynamically allocated, growable sequence of values of type T. A vector’s elements live on the heap, so you can resize vectors at will.

  • The types &[T] and &mut [T], called a shared slice of Ts and mutable slice of Ts, are references to a series of elements that are a part of some other value, like an array or vector. A mutable slice &mut [T] lets you read and modify elements, but can’t be shared; a shared slice &[T] lets you share access among several readers, but doesn’t let you modify elements.

Arrays

The useful methods you’d like to see on arrays—iterating over elements, searching, sorting, filling, filtering, and so on—are all provided as methods on slices, not arrays. But Rust implicitly converts a reference to an array to a slice when searching for methods, so you can call any slice method on an array directly.

Vectors

A Vec<T> consists of three values:

  • A pointer to the heap-allocated buffer for the elements, which is created and owned by the Vec<T>

  • The number of elements that buffer has the capacity to store

  • The number it actually contains now (in other words, its length).


Instead of Vec::new you can call Vec::with_capacity to create a vector with a buffer large enough to hold them all. Then the overhead of reallocation is mitigated.


The pop method will remove the last element and return an Option<T>.

Slices

Slices are always passed by reference. A reference to a slice is a fat pointer: a two-word value comprising a pointer to the slice’s first element, and the number of elements in the slice.

Screenshot 2024-02-20 at 6.32.46 PM

A reference to the slices offers the possibility that we can write a function to operate on vectors and arrays at the same time.

[!important]

A reference to a vector is intrinsically different from a reference to a slice, though Rust can implictly convert a reference to a vector to a reference to a slice.

Gives:

String Types

In C++, there are two types of strings: const char * and std::string.

String Literals

will yield:


No escape sequences are recognized in raw strings:

If you want to add " in a raw string, you should write in this way:

Byte Strings

A string literal with the b prefix is a byte string, which means it is neither String nor &str since it does not operate on UTF-8. Such a string is a slice of u8 values—that is, bytes—rather than Unicode text:

Raw byte strings start with br".

Strings in Memory

Rust strings are sequences of Unicode characters, stored using UTF-8. Each ASCII character in a string is stored in one byte. Other characters take up multiple bytes.

A String has a resizable buffer holding UTF-8 text while a &str (pronounced “stir” or “string slice”) is a reference to a run of UTF-8 text owned by someone else: it “borrows” the text.

A String or &str’s .len() method returns its length. The length is measured in bytes, not characters.

It is impossible to modify a &str. For creating new strings at run time, use String.

The type &mut str does exist, but it is not very useful, since almost any operation on UTF-8 can change its overall byte length, and a slice cannot reallocate its referent. In fact, the only operations available on &mut str are make_ascii_uppercase and make_ascii_lowercase, which modify the text in place and affect only single-byte characters, by definition.

Create Strings

  • .to_string() converts a &str to a String

  • format!() macro works just like println!(), except that it returns a new String

  • Arrays, slices, and vectors of strings have two methods, .concat() and .join(sep), that form a new String from many strings.

Other String-Like Types

Rust’s solution is to offer a few string-like types for these situations:

  • Stick to String and &str for Unicode text.

  • When working with filenames, use std::path::PathBuf and &Path instead.

  • When working with binary data that isn’t UTF-8 encoded at all, use Vec<u8> and &[u8].

  • When working with environment variable names and command-line arguments in the native form presented by the operating system, use OsString and &OsStr.

  • When interoperating with C libraries that use null-terminated strings, use std::ffi::CString and &CStr.

Type Aliases

Last updated