Crave For Games

The text below is one of the notes from the StoneDrop development log. If you want a list of only technical articles you can go to the Articles page.
Optimizations in INSIDE

Key points from Playdead talk about optimization techniques and tools used in the Inside game (powered by Unity).

Video 1. Tools, Tricks and Technologies for Reaching Stutter Free 60 FPS in INSIDE. November 2016, Unite


- scrolling world of INSIDE is organized in a hierarchy of sub-scenes (layers): area, puzzle (multiple within area - 1-3 mins of gameplay, ~100 meters), for each puzzle: gameplay (mutable) + environment + backdrop (immutable)

- only mutable scenes reloaded on respawn

- in view frustum - zone with active scenes, in some distance from it - prepare zone with loaded scenes, all other scenes are unloaded

- visibility set determined (both automatically and manually) for each camera "window", only potentially visible scenes are loaded and activated

- each scene is voxelized at build-time and data of voxels with active objects serialized; later it will be used to narrow down active scenes count by testing active voxels with camera frustum

- each object has a boundsculler - a bounds with a set of facing planes with AND or OR logic; the object is active only if the bounds are inside the frustum and plane(s) front side facing camera

- boundscullers used to subdivide scenes (with walls or floor/ceiling)

- to time-slice the initialization process the concept of PreAwake was introduced: at build time all objects with PreAwake are cached in the scene and after loading one step of IEnumerator PreAwake() on one object is called per frame until all steps in all objects are completed; Awake should contain only very fast code to avoid stutters on scene loading

- game start used to pre-warm shaders and other global time-consuming operations

- scene unloading is a time-sliced bottom-up objects destroy

- scheduler with prioritized requests implemented to distribute workload evenly across multiple frames.



- zero garbage policy (per-frame basis)

- full game can be played with no GC calls with 400 Mb heap

- do not use Resources.UnloadUnusedAssets (too heavy), destroy unused assets manually after scene was unloaded

- when player dies or loads a chapter - call GC.Collect() and UnloadUnusedAssets().


Unity source modifications:

- time-sliced scene integration (last step of loading)

- time-sliced initialization of legacy animation system

- GameObject activation optimizations

- modified GC strategy of mono to collect only when heap is full (by default it happens at 20% usage).



- profile runner - combined in one graph stabilized profiler frame data from different parts of the game

- automated continuous integration (CI) with profiling on a target platform and server with all the history; it allows to monitor progressions and regressions in a particular sub-scene

- separate sub-scene loading profiler to get rid of spikes while loading/activating

- automated playback (recorded input with error-correction) and automated testing both for performance check (framerate, memory, heap) and culling logic check (monitor events of scene loading, activation, deactivation and unloading).


Optimizations (for Unity 5.4, both mono and il2cpp):

- use parenthesis to avoid unnecessary costly operations (group scalars first, then multiply them to a vector only once, not the other way around)

- use cached components (overhead of Unity safe-checks is noticeable)

- if possible, use local positions instead of global (Unity traverses the hierarchy up and down on each global position request/change because internally only local position is stored)

- reduce engine calls (cache everything if possible)

- avoid using Vector math, do it yourself

- avoid foreach, it generates garbage (UPD: fixed in 5.5 for some collections)

- replacing Lists with Arrays doesn't give noticeable performance boost

- Unity skips transform updates if you set the same position

- every position or rotation set has a penalty of sending messages down the hierarchy to update child world matrices, so use SetLocalPositionAndRotation (only global variant is exposed in 5.6) and GetPositionAndRotation (not exposed yet, do it if you have source licence)

- if you have source licence add transform methods SetLocalPositionNoNotify and SetLocalRotationNoNotify to avoid penalty of going down the hierarchy in animated objects (because animation will update child positions anyway) - it will give a huge performance boost for animated characters (in inside it gave 8-10x speed-up)

- avoid Dictionaries and other "modern" structures (also subscriptions, because they are built on linked lists, but keep in mind multi-threading if you do so)

- use StringBuilder instead of string concatinations to avoid garbage

- avoid using Update and FixedUpdate - do your own manager to propagate this calls through your hierarchy (it resulted in 1-2 ms speed up in INSIDE).

This article in social networks: