Darkfoe's Blog

sync.Pool & performance enhancement in Golang

2025-01-14 10:00pm ADT

Had quite the time while trying to make an API to give some GeoJSON polygon data into a react native MapView. Just due to the calls it was melting and locking up pods in k8s, but the postgres database was not really taking a beating, which was weird.

Little bit of digging with logging and I was able to see assembling the datapoints for the polygon into structs was destroying the performance somewhere inside. Little more digging confirmed this with a pprof run - 30% of our execution time with only one request was spent on memory allocations, particularly where we were assembling the features that may have arrays of lat, longs as large as a few hundred points.

So, a sanitized snippet of the old code:

var features []models.GeoJSONFeature
for rows.Next() {
	var result models.Response
	if err := rows.Scan(&result.ID, &result.ObjectID, &result.Holder, &result.ShapeArea, &result.ShapeLen, &result.Geom); err != nil {
		c.JSON(http.StatusInternalServerError, gin.H{"error": "Failed to scan row"})
		fmt.Println(err)
		return
	}

	// Create a GeoJSON feature for each result
	feature := models.GeoJSONFeature{
		Type:     "Feature",
		Geometry: json.RawMessage(result.Geom), // Use the GeoJSON geometry directly
		ID:       result.ID,
	}
	features = append(features, feature)
}

// Construct the GeoJSON response
geoJSONResponse := models.GeoJSONResponse{
	Type:     "FeatureCollection",
	Features: features,
}

This part right here in particular is the murder scene

// Create a GeoJSON feature for each result
feature := models.GeoJSONFeature{
	Type:     "Feature",
	Geometry: json.RawMessage(result.Geom), // Use the GeoJSON geometry directly
	ID:       result.ID,
}
features = append(features, feature)

In a nutshell, what is going on here is that we are creating and destroying a GeoJSONFeature struct for each row in the database, then appending it to the slice, which generates a copy. Then, our garbage collector has to delete our created struct after each iteration right away. It's quite agressive.

So here comes in sync.Pool to save the day.

var geoJSONFeaturePool = sync.Pool{
	New: func() interface{} {
		return &models.GeoJSONFeature{}
	},
}

var features []models.GeoJSONFeature        // Slice to hold GeoJSON features
var pooledFeatures []*models.GeoJSONFeature // Track features to return to the pool

for rows.Next() {
	var result models.Response
	if err := rows.Scan(&result.ID, &result.ObjectID, &result.Holder, &result.ShapeArea, &result.ShapeLen, &result.Geom); err != nil {
		c.JSON(http.StatusInternalServerError, gin.H{"error": "Failed to scan row"})
		fmt.Println(err)
		return
	}

	// Get a GeoJSONFeature from the pool
	feature := geoJSONFeaturePool.Get().(*models.GeoJSONFeature)
	feature.Type = "Feature"
	feature.Geometry = json.RawMessage(result.Geom)
	feature.ID = result.ID

	// Append the feature to the slice
	features = append(features, *feature)

	// Track the feature to return it to the pool later
	pooledFeatures = append(pooledFeatures, feature)
}

// Return all pooled features to the pool after use
for _, feature := range pooledFeatures {
	geoJSONFeaturePool.Put(feature)
}

// Construct the GeoJSON response
geoJSONResponse := models.GeoJSONResponse{
	Type:     "FeatureCollection",
	Features: features,
}

This change pretty much immediately made the performance of the API. It went from using up to a gig of ram for three pods with just a handful of requests, to using up to a max of about 50mb of RAM for some requests with a few megabytes of data.

But how does it work? The gist of it is that we are creating a pool of objects, and then we are reusing them instead of creating new ones. This way, we are reducing the amount of memory allocations and deallocations. So the garbage collector does not need to create a several hundred point array, copy it, then destroy it, then do it over and over again - it instead has objects it can re-use, that it can simply just zero out the data in and not have to mallock a new one every single time.