Garbage Collection in Redux Applications

January 24th, 2019 by Jan Monschke

The SoundCloud Xbox application is a regular React Redux application that runs in a native web view wrapper on Microsoft’s UWP framework. In order to allow users to play music in the background, we also ship a native C++ playback layer with the app. A thin API wrapper allows for communication between the web layer and the native layer.

Background audio is not the only reason for us to ship a native player; we also have to make sure that our application doesn’t use more than 128 MB when put into background mode. This is because applications with more than 128 MB can get killed by the Xbox application scheduler when it notices that the system or a game needs more memory. Our application shares the 128 MB with the web browser process, so the actual amount of memory someone is allowed to use is much lower than 128 MB.

The native player we’re using is the same one used in our mobile applications, which means it’s optimized for low memory consumption. Our initial tests, which we ran before we released the application, showed a low-enough memory usage of the application, and we were confident that we would not hit the maximum memory limit anytime soon.

The Memory Leak That Was Just a Cache

When we ran more memory tests later on, we noticed that our memory consumption for sessions with a use time of more than one hour was steadily growing and never shrinking. As a result, our hunt for a memory leak began. We quickly identified our Redux store as the part of our application that just kept growing, which makes complete sense, since Redux does not define explicit ways to remove parts of the state in order to save memory.

Just to be clear, I’m not blaming Redux here for clogging up our users’ memory! Memory consumption should not be your concern when you are building a web application, unless you have to work in an environment that is enforcing strict memory limits (like the Xbox application scheduler) or you are building an application that users are spending a lot of time in without reloading the page or closing the app (e.g. in the case of Electron apps).

Normalized State

Our Redux store is normalized, which means that we organize incoming data into its different type entities instead of keeping nested structures. Working with nested structures can be very tricky — e.g. when you want to propagate changes of an entity. Let’s look at an example:

{
  "playlists": [
    {
      "id": "p1",
      "tracks": [{ "id": "t1" }, { "id": "t2" }]
    },
    {
      "id": "p2",
      "tracks": [{ "id": "t1" }, { "id": "t3" }]
    }
  ]
}

The above code represents a user’s playlists. They have two playlists (p1 and p2) that each have two tracks. p1 contains tracks t1 and t2, and p2 contains tracks t1 and t3. If the user now wants to update the title of track t1, we have to iterate over all playlists and all tracks in those playlists, because track t1 might be in other playlists as well. This problem becomes even more complicated if you have other collections of tracks in your Redux store (e.g. a user’s liked tracks), because then you will have to iterate over all track collections in your store to make sure you’re updating the track in all places:

{
  "entities": {
    "playlists": {
      "p1": {},
      "p2": {}
    },
    "tracks": {
      "t1": {},
      "t2": {},
      "t3": {}
    }
  }
}

A normalized state like the one above allows for better change propagation because it a) removes duplicates, so you only ever have to change a single object, and b) merges entities from different requests into a single “entities” store so that you don’t have to worry about updating tracks from other collections. In this case, we only have to update the title of the object at entities.tracks.t1 and it will propagate correctly throughout our entire application.

Another great property of normalized state is that it becomes trivial to check if you already have a locally cached instance of, for example, a track you want to render. This can decrease the number of requests your application makes by a lot:

{
  "entities": {
    "playlists": {
      "p1": {},
      "p2": {}
    },
    "tracks": {}
  },
  "playlists": ["p1", "p2"]
}

In the example above, you can see that our playlist representation now only consists of an array of IDs that “point” to playlist entities.

If you are interested in learning more about normalization and how to add it to your application, I recommend checking out normalizr.

Garbage Collection in JavaScript

In the case of our problem, with every request we made, we were increasing the number of objects in our entities store, and we never removed objects, even if they were no longer needed. Usually in JavaScript, your data gets garbage collected once you remove all references to it (e.g. myState = null). In the case where all your data is encapsulated in your React components, garbage collection happens automatically when React unmounts your component. However, when you use Redux, the Redux store will always have a reference to all your data, so it will never get released.

Another way to free memory in JavaScript is to just reload the page every now and then when you detect that the session is already very long or when a lot of objects have accumulated in your entities store. Of course, this requires that your app be able to reload quickly so that users don’t get frustrated with your site’s performance. Additionally, you should make sure your users are not playing any media, because you really don’t want to interrupt them.

For our use case, we wanted to save memory for a background-listening experience, so we were not able to just reload the page. We came to the conclusion that we had to build a garbage collector on top of our Redux store to free memory efficiently. However, this is not the first time we’ve had to build a garbage collection algorithm into one of our applications. The main website at soundcloud.com has had a garbage collection system built into its model layer from the very beginning. But since the main site is built with Backbone.js, we could not reuse the garbage collecting implementation for the Xbox application.

Tracing Garbage Collection

When we looked at the different approaches to garbage collection, we saw a striking similarity between how tracing garbage collection works and the way Redux works in general.

Mark and sweep in a tracing garbage collector

A tracing garbage collector works in two steps (mark and sweep):

Mark — This marks objects that still have references.
Sweep — This scans all objects in memory and removes all the ones that are not marked.

If we were to implement a garbage collector in Redux this way, we would never be able to remove objects, because there would always be references to all objects. In order to make this concept work, we needed to be able to remove references, and as such, find a way to determine which parts of the state were no longer needed. If we could identify unused state, we could remove the state and all its referenced entities.

Redux provides, by design, all the tools necessary to identify used and unused state. Each component’s selector function (mapStateToProps) tells you which part of the state it needs, so we could use those to keep track of state that was no longer used. The only problem is that we didn’t have a list of all selectors currently onscreen, but that’s exactly where our garbage collector library comes in.

Our Redux Garbage Collector

The first thing our library does is keep track of selector functions that are currently onscreen. We do this by wrapping our component with a higher-order component that mimics React Redux’s connect API:

function (mapStateToProps) {
  return compose(
    connect(mapStateToProps),
    lifecycle({
      componentDidMount() {
        // Register onscreen selector
        registerSelector(mapStateToProps, this.props, this)
      },
      componentWillUnmount() {
        // Remove onscreen selector
        removeSelector(this)
      },
    })
  );
}

(I left the other parameters of connect out to shorten the example code.)

We are using Recompose to create a composed component that a) connects to the Redux store, and b) registers and removes the selector when the component mounts/unmounts:

let selectors = [];
function registerSelector(selectorFn, props, instance) {
  selectors.push({ selectorFn, props, instance });
}

In registerSelector, we add the selector to our list of onscreen selectors, along with a reference to the passed props and a reference to the component’s instance:

function removeSelector(instance) {
  selectors = selectors.filter(selector => selector.instance !== instance);
}

We need the reference to the instance of the component because we use it to remove a selector from the list. It’s important to mention that we need to check against the reference of the component and not against a reference of the selector itself. This is because we want to prevent removing a selector of the component that is going to unmount, regardless of the selector function. Another component — one that is still onscreen — might use the same selector function (by reference), but we want to keep its selector function. If we were to filter by the reference of the component’s selector function, we’d run the risk of removing selectors we still wanted to keep around.

Right now, we are keeping track of selectors of components that are onscreen. We can use this information to calculate the part of our component’s state that we can remove, but first we have to change our selectors slightly so that we can make sense of the data they return.

As described before, the bulk of our data is organized in the entities part of our Redux store, which is the part of the store we should remove unused objects from. All other parts of the store are only ID references or arrays of ID references, and they are not consuming that much data, so we can keep them.

However, one problem we’re facing now is that selector functions can return arbitrary shapes of data, and they might aggregate data in a way that does not represent the pure data of the Redux store. One example would be a component that renders the title of a track:

function (store, { trackId }) {
  return {
    title: store.entities.tracks[trackId].title
  };
}

The result of that function would be something like this:

{
  "title": "Redbone"
}

Just looking at this result, the garbage collector will not be able to determine which parts of the state are used to compute the title. In order to make sense of a selector’s result, we need it to somehow return its dependencies in addition to its regular result. This is why all garbage collectable selectors have to return data in the following shape:

{
  "result": {
    "title": "Redbone"
  },
  "dependencies": [
    {
      "entity": "track",
      "id": "t1"
    }
  ]
}

The data returned in dependencies gives us clear instructions on which type of entity and which instance of this entity this selector depends on. Now, returning this extra metadata from all of your components is pretty tedious, which is why we mainly use these selectors in specialized container components that abstract this extra overhead away for us:

<WithTrack trackId="t1">
  (track) => <TrackComponent track={track}>
</WithTrack>

In the above example, WithTrack is connected to the Redux store, and it makes sure to return the proper metadata. This removes a lot of the extra overhead created by the garbage collection library.

Taking Out the Garbage

The actual garbage collection is initiated by a regular Redux action. Let’s call it the sweep action since we are now in the sweep phase of the original algorithm:

function reducer(state, action) {
  if (action.type === SWEEP_ACTION_TYPE) {
    return sweep(state, action.selectors);
  } else {
    return combinedReducers(state, action);
  }
}

The sweep action is “intercepted” in our top-level reducer, which passes the action to our sweep reducer:

function sweep(oldState, selectors) {
  const entities = selectors
    .reduce(
      (dependencies, { selector, props }) =>
        dependencies.concat(selector(props).dependencies),
      []
    )
    .reduce(
      (newState, { entity, id }) =>
        (newState.entities[entity][id] = oldState.entities[entity][id]),
      {}
    );

  return {
    ...oldState,
    entities
  };
}

There are essentially three things happening in our sweep reducer.

We create a list of all dependencies by iterating over and evaluating all selectors. Here we make use of the stored props as well, because we need to pass them to each selector function in order to get the correct dependencies. When we’re done, we end up with an array that contains the dependencies of all components that are onscreen right now.
We start off with an empty object and then copy over each required entity from the oldState to the newState by iterating over the dependencies. The result of this is a fresh entities object that only consists of entities that are required by mounted components. This essentially performs the sweep we know from the algorithm, but instead of removing objects from the state, we actually started with a clean state and then added objects to it. Instead of mark & sweep, we basically execute a mark & keep.
We return the garbage collected state with the slimmed down entities object. Our store now only contains the objects that are necessary to render the current screen.

Determining when is best to start the garbage collection depends on the nature of your application. Some applications might just do it at a certain interval, some might want to check if they are using a lot of memory, and others might just execute every time the user is moved to a different major screen.

Performance

We have not observed a slowdown of our application due to our garbage collection process. Our initial thoughts were that components might rerender after garbage collection, but because the state references in the components are the same before and after garbage collection, components do not rerender. The only thing that differentiates garbage collection from any other Redux action is that it will evaluate all the selector functions on top of all reducers. If you keep your selector functions pure, you should not see a performance hit.

You might be thinking to yourself: “Wow, this would really make my application so much more efficient. I will add it right away!!!” Please, before you consider adding a garbage collection process to your application, check if you actually have problems with too-high memory usage. I would argue that 99 percent of Redux applications don’t require a custom garbage collection process, as they’re either too shortlived or they don’t have strict memory restrictions.

This system is working very well for us and we will keep using it on new applications when needed. We would love to know what you think about our approach if you have dealt with similar problems :)