Home   About Me   Blog  

Go GC and maps with pointers.(25 Jan 2018)

The other day I read a blog post titled GC is bad and you should feel bad by @philpearl. It's a well written blogpost and I would welcome you to go read it.
In it he says:
In our ML fraud scoring pipeline we have a huge (>5GB) table of pre-calculated probabilities that we kept in Go map[string]X in RAM.
Recently we’d seen high GC pause times, corresponding with high 95th percentile response time peaks on our API.
Once we’d moved the allocations off-heap into a custom hash-map backed by mmap allocated memory,.., the GC pauses practically disappeared.


Soon after, I saw a tweet by Damian Gryski, (if you are on twitter and love Go, then I would highly recommend you follow Damian)
In the tweet, he suggested that a minimum perfect hash function can be created to be used in place of the string keys to the map thus reducing GC times.
I was curious as to how using a hash would help to relieve the pressure on the garbage collector.
So, I set out to profile the GC times; first when you have a map whose keys are strings and second when you have a map whose keys are strings that have been hashed into ints.

The program I used to profile that is:


package main
import (
    "strconv"
    "fmt"
    "os"
    "runtime"
    "time"
)
// run this program as:
/*
#!/bin/bash
for t in 1 2; do go run gctest.go $t; done
*/          

func timeGC() time.Duration {
    start := time.Now()
    runtime.GC()
    return time.Since(start)
}

func main() {
    const N = 30e6
    if len(os.Args) != 2 {
        fmt.Printf("usage: %s [1 2]\n(number selects the test)\n", os.Args[0])
        return
    }
    switch os.Args[1] {
    case "1":
        // create a big map of strings. since strings contain pointers, 
        // we expect this to have more pressure on GC.
        m := make(map[string]string)
        for i := 0; i < N; i++ {
            n := strconv.Itoa(i)
            m[n] = n
        }
        runtime.GC()
        fmt.Printf("With a map of strings, GC took: %s\n", timeGC())
        _ = m["0"]
    case "2":
        // create a map of ints. We want to store strings in the map but unlike in case 1, 
        // we will hash the strings to ints (which have no pointer)
        // so we expect less pressure on GC. We are using  strconv.Atoi(str) 
        // as a stand-in for a proper hash function
        m := make(map[int]int)
        for i := 0; i < N; i++ {
            str := strconv.Itoa(i)
            // hash string to int
            n,_ := strconv.Atoi(str)
            m[n] = n
        }
        runtime.GC()
        fmt.Printf("With map of strings(that are hashed to ints), GC took:: %s\n", timeGC())
        _ = m[0]
    }
}
                 


On running it, the results were;
With a map of strings, GC took:: 775.280442ms
With map of strings(that are hashed to ints), GC took:: 19.086413ms


For the map whose keys are strings(ie *pointers) the Garbage collector ran for 775.2ms on my machine, while for the other map whose keys are strings that have been hashed into ints the GC ran for 19ms.
NB: Note that I used strconv.Atoi(str) in place of a proper string to integer hash function. In practice, you would be using a proper hash implementation

That contrast is much more clear when visualized, So I used Dave Cheney's gcvis program to visualize the programs GC trace data.
The results are as shown below:

GC profile of a creating a large Go map whose keys are strings;
map with string keys GC profile
GC profile of a creating a large Go map whose keys are ints(ie keys are strings that are hashed into ints);
map with string keys hashed to ints GC profile

In conclusion;
(a) The Go Garbage collector is designed to minimise latencies.
(b) And it seems like one of the things that can cause the Garbage collector to have long pause times are large maps that have strings(any pointer really) as keys.
(c) And it seems like one of the ways to solve the issue in (b) above is to hash the strings into ints and then using the ints as keys, (You would of course have methods to Query the map using your earlier strings)
I'm left wondering, shouldn't the Go runtime then not do it for me/you by default? Anytime the runtime sees a map that is larger than X and the keys to that map contains pointers, the runtime inlines(for lack of a better term) that map into one whose keys are ints(or whatever) and does the string->ints hashing for you without you even noticing.
Maybe it is just my naivety...