golang

Data Structures

Arrays

Mechanical Sympathy

Relative speeds of caches on a 3GHz 4 core CPU

  • L1 cache (64kb cache): 16 instructions or 1.3 ns
  • L2 cache (256kb cache): 44 instructions or 3.6 ns
  • L3 cache (8MB): 153 instructions or 13 ns
    • Cache for all cores.
    • On intel, usually data on L1 and L2 is copied on L3.
  • Main memory: 428 instructions or 35.6 ns (27 times slower)

Computer breaks up memory into cache lines = 64 bytes (or can change based on computer)

Data from main memory is moved into L1 using cache lines. Hardware decides where the cache lines live, either on L1 or L2 or L3.

Preferred: The cache line is already in L1 or L2 when we need it instead of moving it every time from main memory.

  • Write code predictable access patterns to memory.
  • Write code which makes data access more efficient.

TLB miss affect performance quite a bit. TLB caches are very important.

Slices

Slices are dynamic arrays, without a fixed length. They use arrays under the hood.

Slices use 3 words: 1 is a pointer to the backing array that stores the elements, 1 is the length of the array, 1 is the capacity of the array.

Appending to a slice uses value semantics, a new slice is returned, if the capacity is reached, it creates a new backing array double the capacity and points to it.

Append only doubles the backing array when it gets to 1024 capacity. After that, it increases by 25%.

If we know the capacity , we can preset it using the make function.

Taking a slice of a slice, results in a new slice but uses the same backing array. This can be dangerous as both the slices can modify the backing array and the changes will be seen by all slices.

Using a 3 notation slice [a, b, c], sets the capacity to c, on appending, if length = capacity c, a new backing array is created.

Ideally, use copy function to copy the slice backing array to a new backing array to isolate changes and avoid side effects.

append is generally a smell, because there is a chance if length = capacity, the array is doubled and a new backing array is created. All the pointers to the previous array (other pointers) will be invalid and it creates a memory leak. Hence append can cause side affects unless we are very careful or know what we are dealing with.

More about strings

Go uses UTF-8, and the order of storage is byte > code point > character

A code point is basically the number of bytes used to represent 1 UTF-8 character. Example, for english letters, code point = 1 byte but for chinese characters, it can be 3 bytes. When we iterate over a string, if it contains 1 chinese character and 1 english letter, the underlying array is of length 4 and when we iterate using index and value semantics, index will be 0 and then 3 in the 2nd iteration.