Avoiding compiler crash (or endless compilation)

Sometime, shader compilation is long. Or ultra-long. Or freezing the browser. Or even crashing it after a timeout. Worse: this can happen for other peoples (often under another OS) on your shaders while it was okay for you, then your shader can be unlisted because of something you don’t experience (very frustrating).

Before suggesting solutions and what to care about, it’s important to understand…

What happens at early compilation

  • Functions do not really exist on GPU, because there is no stack to jump out then go back (that’s why recursivity is not allowed). This is just a writing aid, like macros. So all functions are first inlined.
  • Loops used to be fake as well. But even now that dynamic loops do exist, optimizers strongly prefer to keep unrolling them for performances: loop content is duplicated as many times as loop steps, with loop variable replaced by its successive const values. One problem is that optimizers don’t foresee that it might overwhelm resources (starting with final code length).
  • Branching vs divergence: when in a same warp (i.e. 32 pixels neighborhood) different conditional (“if“) branches are followed, SIMD parallelism force each thread to run them all (masking the result when not the right branch for a given thread), as shown in these demos.  For variable length loops (for, while) or early exit (conditional break in a loop) this can be even more involved.
    This firstly impact runtime performances, but branches obviously also lengthen the inlined code length (e.g. if big functions are called in branches).
    Also,  while dFdx, dFdy, fwidth might just give silly values or get unset/reset across diverging pixels, on some systems the function texture() try to do better to find the MIPmap LOD to use,  which may consist in evaluating the whole code 4 time to recover a 2×2 neighborhood on which evaluating the derivatives of texture coordinates.
  • The resulting functionless and (almost) loopless simplified but very longer GLSL code is then really compiled and optimized. But the code length and compile duration might overwhelm resources and fail, causing a crash.
  • Note that before compilation, Angle applies various code modifications to turn around some bugs occurring on some drivers/boards, then on Windows it transpiles GLSL to HLSL. And after, both shading languages are compiled into intermediate assembly language ARB, to be compiled and optimized again to get a true GPU executable. So in total there is a full stack of code rewrite and optimization.

Now, just consider the big figure: e.g. for a ray-marching code with a long stepping loop, containing branches (e.g. “if hit”) calling functions (to get the normal, the textures values, etc), that might themselves contain loops on function (e.g. for procedural texturing). Worse: the shading part launching shadow rays (or reflected/refracted rays) with a brand new marching loop (yes, it would be duplicated for each step of the main loop). In addition to the “map” function testing the whole scene for ray-intersection at every step, and this one is likely to also contain loops and further functions call.
The true code length before the true compilation is the huge combinatory of all this. You have no idea how long it could be. Well, indeed you have to.

What can we do ?

Think

  • Do you really need 1000000 steps ? sure ?
  • Do you really need to detail procedural texture (or shape) down to nanometer ? (think about where falls the pixel size limit).
  • Do you really need to compute the texture also for shadow evaluation ?
  • Can’t you first test a raw hit, then inspect the details once this step is reached ?
  • Can’t some part be done in as separate buffer (i.e., stored for the whole frame rather than evaluated for each pixel) ? BTW, does it really need to be re-evaluated at each time step ?
  • Can’t a repeated pattern be done implicitly with a simple mod/fract rather than with an explicit loop ?
  • Or can’t you find the only one (or few) items that can really meet the pixel ?

Within the unroll & inline logic

  • Deferred heavy  processing out of loops:
    Replace
        if (end_condition) { process; break; }
    with
        if (end_condition) { set_parameters; break; }
    then process the parameters after the loop.
    Typically: shading evaluation, shadows, reflected rays…
  • Deferred heavy  processing out of branches:
    Replace
        ...else if ( cond_N ) do_action(params);
    with
        ...else if ( cond_N ) set_parameters;
    then process the parameters after the loop.
  • Specialize functions, or use branches inside only if triggered by const params.
    Worst case would be an shape(P, kind, params)  implementing a whole bank of possible shapes called into a ray-marching loop: if kind is not const, the whole shape() source will be multi-duplicated.
  • Don’t call texture() in any divergence-prone area (“if” branch, variable-length or early breakable loop), at least if MIPmap is activated. Or use explicit LOD via textureLod() or textureGrad() .

Keep your critical judgement: the above advices are not always possible, and not always useful. Small loops, small processes don’t deserve special action, plus the GPU *is* powerful enough to deliver good performances on complicated code. Just, learn to recognize the coding patterns that make two “similarly complicated” shaders (by the number of lines or functions)  having totally different fate by how the compiler react. And avoid blindly following the dark path, the one nastily looking “as you would have done on CPU”.

Fighting the unroll & inline logic

You can also fight loop unrolling by making the compiler unable to know the length. E.g.:
    for (int i=0; i<N+min(0,iFrame); i++)

You can forbid optimizations [ which, exactly ? and is it really working ? ] by adding at the top of each code, or later but still outside functions definition :
    #pragma optimize(on)
    #pragma optimize(off)

Compilation can be a lot faster, but of course runtime perfs will be impacted.

More

See also:

One thought on “Avoiding compiler crash (or endless compilation)

Leave a comment