Avoiding compiler crash (or endless compilation)

Sometime, shader compilation is long. Or ultra-long. Or freezing the browser. Or even crashing it after a timeout. Worse: this can happen for other peoples (often under another OS) on your shaders while it was okay for you, then your shader can be unlisted because of something you don’t experience (very frustrating).

Before suggesting solutions and what to care about, it’s important to understand…

What happens at early compilation

  • Functions do not really exist on GPU, because there is no stack to jump out then go back (that’s why recursivity is not allowed). This is just a writing aid, like macros. So all functions are first inlined.
  • Loops used to be fake as well. But even now that dynamic loops do exist, optimizers strongly prefer to keep unrolling them for performances: loop content is duplicated as many times as loop steps, with loop variable replaced by its successive const values. One problem is that optimizers don’t foresee that it might overwhelm resources (starting with final code length).
  • Branching vs divergence: when in a same warp (i.e. 32 pixels neighborhood) different conditional (“if“) branches are followed, SIMD parallelism force each thread to run them all (masking the result when not the right branch for a given thread), as shown in these demos.  For variable length loops (for, while) or early exist (conditional break in a loop) this can be even more involved.
    This firstly impact runtime performances, but branches obviously also lengthen the inlined code length (e.g. if big functions are called in branches).
    Also,  while dFdx, dFdy, fwidth might just give silly values or get unset/reset across diverging pixels, on some systems the function texture() try to do better to find the MIPmap LOD to use,  which may consist in evaluating the whole code 4 time to recover a 2×2 neighborhood on which evaluating the derivatives of texture coordinates.
  • The resulting functionless and (almost) loopless simplified but very longer GLSL code is then really compiled and optimized. But the code length and compile duration might overwhelm resources and fail, causing a crash.
  • Note that before compilation, Angle applies various code modifications to turn around some bugs occurring on some drivers/boards, then on Windows it transpiles GLSL to HLSL. And after, both shading languages are compiled into intermediate assembly language ARB, to be compiled and optimized again to get a true GPU executable. So in total there is a full stack of code rewrite and optimization.

Now, just consider the big figure: e.g. for a ray-marching code with a long stepping loop, containing branches (e.g. “if hit”) calling functions (to get the normal, the textures values, etc), that might themselves contain loops on function (e.g. for procedural texturing). Worse: the shading part launching shadow rays (or reflected/refracted rays) with a brand new marching loop (yes, it would be duplicated for each step of the main loop). In addition to the “map” function testing the whole scene for ray-intersection at every step, and this one is likely to also contain loops and further functions call.
The true code length before the true compilation is the huge combinatory of all this. You have no idea how long it could be. Well, indeed you have to.

What can we do ?

Think

  • Do you really need 1000000 steps ? sure ?
  • Do you really need to detail procedural texture (or shape) down to nanometer ? (think about where falls the pixel size limit).
  • Do you really need to compute the texture also for shadow evaluation ?
  • Can’t you first test a raw hit, then inspect the details once this step is reached ?
  • Can’t some part be done in as separate buffer (i.e., stored for the whole frame rather than evaluated for each pixel) ? BTW, does it really need to be re-evaluated at each time step ?
  • Can’t a repeated pattern be done implicitly with a simple mod/fract rather than with an explicit loop ?
  • Or can’t you find the only one (or few) items that can really meet the pixel ?

Within the unroll & inline logic

  • Deferred heavy  processing out of loops:
    Replace
        if (end_condition) { process; break; }
    with
        if (end_condition) { set_parameters; break; }
    then process the parameters after the loop.
    Typically: shading evaluation, shadows, reflected rays…
  • Deferred heavy  processing out of branches:
    Replace
        ...else if ( cond_N ) do_action(params);
    with
        ...else if ( cond_N ) set_parameters;
    then process the parameters after the loop.
  • Specialize functions, or use branches inside only if triggered by const params.
    Worst case would be an shape(P, kind, params)  implementing a whole bank of possible shapes called into a ray-marching loop: if kind is not const, the whole shape() source will be multi-duplicated.
  • Don’t call texture() in any divergence-prone area (“if” branch, variable-length or early breakable loop), at least if MIPmap is activated. Or use explicit LOD via textureLod() or textureGrad() .

Keep your critical judgement: the above advices are not always possible, and not always useful. Small loops, small processes don’t deserve special action, plus the GPU *is* powerful enough to deliver good performances on complicated code. Just, learn to recognize the coding patterns that make two “similarly complicated” shaders (by the number of lines or functions)  having totally different fate by how the compiler react. And avoid blindly following the dark path, the one nastily looking “as you would have done on CPU”.

Fighting the unroll & inline logic

You can also fight loop unrolling by making the compiler unable to know the length. E.g.:
    for (int i=0; i<N+min(0,iFrame); i++)

You can forbid optimizations [ which, exactly ? and is it really working ? ] by adding at the top of each code, or later but still outside functions definition :
    #pragma optimize(on)
    #pragma optimize(off)

Compilation can be a lot faster, but of course runtime perfs will be impacted.

Advertisements

Playable games in Shadertoy !

Yes, even if we are stuck in pixel shader and have no access to input data, many people managed to program games in Shadertoys ! From basic to huge (might not always compile on your machine 🙂 ), from pro to amateur, are there :
( recommanded: HoverZoom plugin over previews )

But so many classical games are still missing, comprising simple ones : will you dare to program one ?  🙂 ( remind to add the tag “game” ).

More:

 

Key shortcuts

Did you know that ShaderToy GUI as well as it’s CodeMirror shader editor had many key-shortcuts ? Seems that most people ignore it (despite GUI ones pop-up as tool-tips… if you try).

Reminders:

  • Cmd = “microsoft” or “apple” key.
  • Ctrl  might be overridden by your browser.
  • Mac: replace Ctrl by Cmd, or sometime by Alt or Ctrl-Alt.

GUI key shortcuts

  • “Down”, “Cmd Down”:  resetTime
  • “Alt Up”, “Cmd Up”:       pauseTime

  • right menu:                   save or copy image
  • “Alt+r” :                          video record

  • “Ctrl S”, “Cmd S”:         Save shader
  • “Alt Enter”, “Cmd-Enter”: Compile shader
  • “F5” :                              refresh page ( indeed it’s a browser shortcut )

Editor key shortcuts

Editor GUI

“Alt F”:                                     Editor in FullScreen
“Alt Right”, “Cmd Right”:     Go to Right Tab
“Alt Left”, “Cmd Left”:          Go to Left Tab
“Alt -“, “Cmd -“:                     decrease FontSize
“Alt =”,” Cmd =”:                   increase FontSize

Basic obvious keys

Left, Right, Up, Down,      Home (of line), End (of line),      PageUp, PageDown
Delete, Backspace, Shift-Backspace,       Tab,       Enter,       Insert

extra key shortcuts

  • basic emacs shortcut

  • “Ctrl-S”: “save”,

  • “Ctrl-F”: find,    “Ctrl-G”: find Next,    “Shift-Ctrl-G”: find Prev,
    “Shift-Ctrl-F”:  replace,    “Shift-Ctrl-R”: replace All,

  • “Ctrl-Z”: undo,    “Shift-Ctrl-Z”,”Ctrl-Y”: redo,

  • “Ctrl-A”: select All,    “Ctrl-U”: undo Selection,    “Shift-Ctrl-U”,”Alt-U”: redo Selection, “Esc”: single Selection ( ? )

  • “Ctrl-Home”,”Ctrl-Up”: Doc Start, ”   Ctrl-End”,”Ctrl-Down”: Doc End,
    “Ctrl-Left”: Word Left,    “Ctrl-Right”: Word Right,
    “Alt-Left”: Line Start,      “Alt-Right”: LineEnd,

  • “Ctrl-D”: delete Line,
    “Ctrl-Backspace”: del Word Before,    “Ctrl-Delete”: del Word After,

  • “Ctrl-[“: indent Less,    “Ctrl-]”: indent More,  ( NB: not working for me )
    “Shift-Tab”: indent Auto (and replace tabs by spaces)

 

2 more for mac only:

  • “Cmd-Backspace”: del Wrapped Line Left (?), “Cmd-Delete”: del Wrapped Line Right (?)

Extending Shadertoy (& more)

Chrome plugin “Shadertoy unofficial (unrelated to this site 🙂 )

Features:

  • Adjustable slider for full control of ‘iGlobalTime’ uniform and audio/video inputs.
  • 4 sliders simulating mouse position
  • Shader previews on shaders list on “My profile” page
  • Change resolution in windowed and fullscreen mode by pressing keys 1…9.
  • Take screenshot width doubled resolution.
  • Pause/Restart in fullscreen mode.
  • Fullscreen edit mode.
  • Clone own shaders.

 

Using custom textures (only on your local machine)

Applications compatible with Shadertoy

Shadertoy cousins

  • glslsandbox     Goodies: clone, camera
  • shdr@bkcore  Goodies: vertex shaders, snippets, camera, 3D models, custom models
  • shaderfrog      Goodies: shaders + composer, vertex shaders, clone, uniforms, camera, 3D models, custom models
  • shader_editor@kickjs Goodies : vertex shaders, load texture, uniforms, 3D models
  • GLSLbin          Goodies: includes ( from many base stack.gl shaders )
  • vertexShaderArt    Goodies: tune vertex position and color at the same time.
  • Shaderoo         Goodies:  geometry shader, infinite amount of buffers, external textures, includes

On desktop (quick shader prototypers):

  • KodeLife   Goodies: vertex, tesselators, geometry, fragments, custom model, Shadertoy compatibility mode
  • glslEditor

More

Note that Firefox and Chrome now allow direct editing (and more) of any online shader:

  • Activating the shader developer tool on Firefox
  • Equivalent plugin on Chrome

 

WebGL 2.0 vs WebGL 1.0

on February,15 2017, Shadertoy moved to WebGL 2.0. What does it change ?

NB: WebGL 2.0 corresponds to OpenGL ES 3.0 , which is derived from OpenGL 3.3 et 4.2 , minus some features.  Cf specs or quick card, or cheesy slides.  Test availability on your browser, and among users.

What new does it bring to Shadertoy ?

New features, … and new constraints.
And new compatibility issues: see last section there.

New features:

  • Arrays:  (still 1D only )                                example
    • dynamic indexing                              A[k]
    • initialization                                        float[] A = float[] ( 17., 23.4, 56.3, 0., 7. );
    • implicit sizing                                      size = A.length();              \       or float A[]
    • a function can return an array.      float[5] foo() { }
      A reminder that there is very little memory per thread: don’t abuse of arrays !
  • All int operations:                                      examples  1 ,
    • bits and logic: & , | , ^ , >> , << ,  &= , |= , ^= , >>= , <<=
    • unsigneds:   uint, uvec, 1234U, 0x3f800000U  ( but 0b010101 is still missing )
    • % (i.e., mod) , %= , abs()
  • Flow control:
    • while(){},  do{}while(),
      switch(int){case:default:}     Some bugs on Windows. 😦 Avoid return inside.
    • Loops bounds no longer need to be constant.
      (attention: const loops might still be inlined if the compiler thinks it optimizes… even if this cause compiling timeout or too long shader.
      Hack bound if you want to forbid inlining: 100+1e-30*iMouse.x ).
    • Functions can still not be recursive (they are still inlined, since there is no stack on GPU. Or manage one yourself with arrays if you really need. example ).
  • Formatting:
    • defines can be multiline !       continuation to next line mark:   \
      Not compiling on firefox for now 😦  [apparently now fixed]
    • UTF8 allowed (really everywhere ? possible issues on the right of #define )
    • more float check at compilation time.        example
  • New matrix and vector operations and types:
    • mat inverse(), determinant(),  transpose()
    • mat2x2 mat2x3 mat2x4 mat3x2 mat3x3 mat3x4 mat4x2 mat4x3 mat4x4
    • outerProduct()
    • cross(vec2) is still missing. Worse: you can no longer overload it.
  • New math operators:
    • hyperbolic funcs: sinh, cosh, tanh, asinh, acosh, atanh
    • trunc(), round(), roundEven(),  f=modf(v,i), isnan(), isinf()
    • [un]pack[S|U]norm2x16() , [un]packHalf2x16() :  pack vec2 in uint and back
      [U]intBitsToFloat and back: convert to int comprising special floats (NaN, Inf…)
    • be careful: you can no longer overload (e.g. defining min(struct,struct) ).
  • New textures operations:
    • sampler: 2D or 3D
    • textureLod() ( previously an EXT )   the one to use to control MIPmap level 
    • textureGrad()  (previously an EXT)  the one to use if you have dFdX,dFdY
    • texture*Offset()
    • texelFetch()                   avoid any interpolation, e.g., to use a texture as an array
    • textureSize()                 useless since more chars than iChannelResolution[n] :-p

New constraints:     Verify your previews shaders, they might be broken !

  • Globals initialization must really be constant. Uniforms (like iGlobalTime , iResolution, iMouse), and even sqrt(3.) or float a=1.,b=a;  are no longer allowed… but for const variables.
  • It’s now totally forbidden to reuse a reserved keyword (built-in funcs, etc) as variable or function name. You cannot even overload functions. E.g., 
    • min(struct1, struct2) or cross(vec2,vec2) are no longer licit
    • you didn’t know new keyword will exist !
      frequent conflict: sample, smooth, round, inverse …
  • “precision” is no longer allowed… even in comments !
  • texture2D*() is now texture*()   (the team has already patched your shaders for this).
    But guessing the implicit LOD is now an error on windows in non-unrollable loops, and generates a lot of extra code anyway (+ possible compiler freeze). 
    -> now prefer texelFetch or textureLOD (at the price of WebGl1 compatibility).
More GLSL ES 3 reserved keywords:

Possibly/soon available to ShaderToy ?
invariant layout centroid flat smooth
lowp mediump highp precision
sampler2D sampler3D samplerCube
sampler2DShadow samplerCubeShadow
sampler2DArray sampler2DArrayShadow
isampler2D isampler3D isamplerCube isampler2DArray
usampler2D usampler3D usamplerCube usampler2DArray

Keywords reserved for future use:
attribute varying coherent volatile restrict readonly writeonly resource atomic_uint
noperspective patch sample subroutine common partition active
asm class union enum typedef template this goto
inline noinline volatile public static extern external interface
long short double half fixed unsigned superp
input output sizeof cast namespace using
hvec* sampler3DRect  filter

Compatibility issues in Shadertoy / webGLSL

[ New 28/02/2017 :  WebGL2.0 compatibility issues. See last section. ]

Sometimes, somebody else shader looks strange, or blank, or broken, on your machine.
Conversely, if your shader works on your machine, it’s not a proof that it works elsewhere.

Usual suspects:

  • Noisy image:  Variable not initialized.
  • Blank image:  Negative parameter for log, sqrt, pow.
                                 clamp(min,max,v) instead of clamp(v,min,max).
    smoothstep(v, v0,v1) instead of smoothstep(v0,v1, v)
                                 out parameter used as inout.
                                 mod(x,0.) or atan(0.,0.)
                                 MIPmap on grey image on firefox.
  • Compilation error (assuming it was ok on author’s system):
                                Bug in your compiler (e.g. const struct)
                                too permissive author’s compiler (global initialized with not const expression)
                                Your system has less than 32bits and a const value is more.
                                Shader too costly to compile on your system.
  • Browser or system freeze (or crash):
                                Shader way too costly for your system.

More details are given below.

Classical Reasons:

There is a full stack of subsystems digesting your shadertoy GLSL ES source before it reaches the GPU: the shadertoy API, the web browser, the OS, the OpenGL handler, the 3D driver, the GPU driver, its GLSL/HLSL compiler, the GPU.  Some driver embedded compilers have different behaviors and different bugs, but it’s even worse. E.g., Windows use either nativeOpenGL or Angle which translate your GLSL source to HLSL (!)  (to switch to OpenGL: Firefox → about:config → webgl.disable-angle = true, webgl.force-enabled = truewhile linux and macOS use (better) native OpenGL. On Windows, Chrome and firefox don’t even rely on the same version of DirectX3D. Some high end tablets like the Ipad use crude GLSL implementation. Browsers override some format flags for textures and buffers. etc,etc,etc.
Any of these can cause issues, so from here we will call this stack “your system”.

  • Some systems implicitly initialize variables and some others not (in the spirit of the spec).
    => Always initialize variables. Comprising “out” parameters.
  • Some systems implicitly extend the validity of invalid operations like log, sqrt or pow of negative, mod(x,0.), atan(0.,0.), and not the others (following the specs).
    => Never use negatives for log, sqrt, pow. (prefer x*x if you want to square).
           Don’t mod on 0.
           Don’t ask for the angle of a null vector. => atan(y,x+1e-15) 
           clamp API is (v,min,max).
    smoothstep API is (v0,v1, v).
  • Some systems loosely implement the spec, making inout for out.
    => Be strict on code requirement; do initialize out parameters.
  • Some system are more picky than others. For instance grey-level textures implemented as “luminance” are not “renderable” according to the spec, thus not compatible with some operations like MIPmap. Apparently only Firefox is so picky, resulting in blank values.
    => Switch to a colored texture or replace MIPmap flag by Linear.

There are genuine bugs and hard limitations of some systems:

  • Shadertoy buffers are supposed to be 16bits floats, but some browsers (e.g. chrome) override this for 32bits floats. A shader writer on chrome will not see overflows.
  • Complex expressions including redefinition of variables ( e.g., x = (x=y)*x ) may not be evaluate in the same order on some systems (typically, after translation to HLSL by Windows Angle), or can even be bugged (O += x -O on some old systems).  (OpenGL is wrongly evaluating first all the subdefinitions while here Angle is right.)
    => for wide compatibility, avoid reusing the same variable redefinition within a single expression – which span includes comma-separated expressions.
  • Low-end devices or old systems sometimes not implement the full IEEE math such as NaN, or less than 32bits for floats and ints.
    => you may won’t afford a bigger device, but at least be sure to do the updates (drivers, etc).
  • Shadertoy relies on GLSL ES 1.0 which is really basic. E.g., it unrolls all loops and function calls, so your shader can be extremely longer than you think. This can crash the compiler, timeout it, or request more resource than your GPU can afford.
    => Try to guess the consequences of your coding style.
            Do loop for selecting then treat after, rather than treat or call a function inside loops.
             Fear long nested loops (including the ones in called functions).
             User side: try replacing the loop end value by a shorter value.
  • Some expressions are solved at compilation time rather than at run time (e.g. #define and const expressions), and thus can have different precision or treatment of exceptions. E.g. on some systems the full IEEE is not obeyed at compilation time while it is at run-time: float x=0., y=x/x, z=1./x often behave differently with const float (or using 0. directly).
    The optimizer can also solve or partly solve some more. But optimizers vary a lot with drivers, versions, etc.
  • Compilers are still full of bugs. E.g. complex types like structs, mat, vec4 might do wrong with const qualifier, or in cond?v1:v2 statements, or in loops without {}.
    Redefining locally a global variable already used in the current function crash de compiler or even the driver on linux. A variable having the name of a function might be not accepted as well on some compilers.
    => Do the updates.
          – Suppress suspect const qualifier,
    – don’t reuse function name for variables
    – don’t reuse global names when both the local and global variables are used in the same block
          – Protect suspect blocks by {} or (). Try replacing by if then else.
  • Some bugs occur at unrolling of long loops or long shaders (e.g. the last instruction of a loop is not always executed, or implicit initialization not always done).
    => Do the updates.
           Rearrange the suspect loop. E.g., move a conditional break as earlier statement.
           Initialize all variables.
            (See also section about long shaders.)
  • The number of simultaneous key accounted in Shadertoy keyboard texture depends on the keys, and probably on the OS. (Web events are just a compatibility mess).
  • There seems to be some GPU-specific and OSX-specific bugs.

There are some pure GLSL bugs… consistent or not through OSs

  • clamp(NaN,0.,1.) should be NaN. It is not. And it is 0 on linux (GLSL), but 1 on windows (Angle).
  • same for smoothstep(NaN) (since it uses clamp)
  • sqrt(v) with v=-1 is NaN… but if v is a const or a #define. Then, it is 0.

Attention: Shadertoy masks the warnings if the compilation is ok.  Maybe it could be good practice to once insert a syntax error just to see the warnings, then inserted in the code and indistinguishable from full errors. Example here : sqrt(-1) were indeed warned as invalid at compilation time.

Classical float “bugs”: (not specific to GLSL)

  • x/x might not be exactly 1., even for int-in-floats (integers up to 16,777,216 are exactly represented by IEEE floats on 32bits. + – * will be exact… but not the division). This is due to the fact that compilers generally replace division by multiplication with the inverse.
    A consequence is that fract(x/x) might be ~1 instead of 0 (about 10% of times)
  • mod on int-in-floats has some bugs (due to the division as above). e.g. mod(33.,33.) might be 33, some for about 10% of values. 😦 (Note that you can’t verify by testing this on const since it would be resolved at compiler time, not run time).
    => If you really aim at integer operations, emulate a%b with a-a/b*b
  • A reminder than floats have limited precision and span… and worse for 16bits floats.
    A goodies and a trap are denormalized floats: IEEE provides an extension of the range, at the price of collapsing precision.
    Note that without this extension, 32 floats overflow for exp(83.) (same for sinh,cosh,tanh), giving NaN and thus black in Shadertoy.
  • -0 is not totally equal to +0. If you test equality or order directly you’ll find as expected, but a surprise comes when comparing there inverse. Indeed it’s a IEEE feature. If some rare case, this unforgotten sign can create bugs (or save the day in geometry).
  • IEEE treatment of NaN and INF can be surprising, despite logical. In shadertoy NaN is displayed as contaminating black, while +INF is white and -INF is black.
    But their implementation can also be bugged in some cases, or not implemented at all in low-end GPUs.

Extensions:

Not all extensions are available on all browser (check here). You can check that an extension is there using #ifdef GL_EXT_shader_texture_lod (for instance).
Alas, all extensions now part of WebGL2 core won’t have the old define set: the extension bag is totally different. You can test webGL version with __VERSION__ ( example here.

Some more subtle issues:

  • Some browsers seem to decompress R,G,B texture channels on slightly different ways.
  • Shadertoy don’t currently use sRGB textures. You have to ungamma – regamma them yourself, but it means that interpolation and MIPmap are slightly biased (but most shader writers seems to don’t know gamma issues at all, anyway 🙂 ).
  • Sound buffering seems to be done on very different ways depending of the system. No problem with sound playing, but issues start when you inspect inside (e.g. time sync and precise buffering range).
  • A shadertoy can be displayed at very different resolutions, depending on your screen size, window size, and various other factors. The aspect ratio can varies, the size might not even be even. So a special configuration causing a glitch might occurs just at your personal display size.
  • In particular, derivative and MIPmap level evaluation are done within 2×2 pixels blocks. So a very slight shift might make a discontinuity invisible or causing a glitch. This typically occurs when you force fract(uv) or use angles as texture coordinates. Also with derivative of variables that might not be set in the neighbor pixel.
  • The shader looks a lot darker or more saturated for you (or for all others).
    => Ever heard about gamma correction ? 🙂
    In particular, is your monitor in “multimedia mode”,
    or didn’t you played with the contrast or gamma curve (on monitor or on GPU preferences window) ?
  • Textures, sound, video are loaded asynchronously and can sometime be a few frame late.
    =>  If you precompute data in a Buffer, keep redoing it for a few dozen frames.
            e.g.,  if ( iFrame < 30 ) { init } .

Testing OpenGL vs Angle vs D3D version on Windows

On Windows the browser (at least Chrome and Firefox) can use either true GLSL-ES or transpiling to HLSL via the Angle library provided by Google.
You can switch: (more and updates here )

  • firefox:   URL:  about:config   ; search “angle”; click on webgl.disable-angle to switch (immediate effect)
  • chrome: relaunch with chrome.exe --use-gl=desktop (or create an alias).

Moreover, Angle use two different versions of D3D on chrome vs firefox, which can thus shows up different bugs and behaviors: in case of doubt, try both.

WebGL2.0 compatibility issues

  • Are you sure your browser is webGL2.0 ?  -> Test here.
  • Continuation to next line with \ (e.g. for macros) :
    not accepted initially by Firefox.
  • Return in divergent branches:
    webGL2 GLSL-ES compilers are new, thus come with new bugs. An old issue came back: when one branch of parallel evaluation has a return while others don’t. The return can then be missed, possibly crashing the compiler/driver via infinite loop, or “just” cause wrong or slow results.
  • If: nVidia (linux+windows) ignores diverging returns in if. Test here.
  • Switch: Windows ignore diverging returns inside switch. Test here , here.
  • Switch: linux refuses unreachable statements, typically: return; break;
  • texture() in non-unrollable loop:
    Windows/Angle tries to do something horrible to guess MIPmap level. In case of non-unrollable loop this generates so much extra code that it can easily overwhelm the compiler. -> use texelFetch or textureLOD instead. Test here.
  • Declarations in for:
    Windows bug if a loop counter is declared in another loop. (for(int i,j;..) for (;j<N;j++) )

Link to all glsl bug related shadertoys.

Usual tricks in Shadertoy / GLSL

Ever figured the ‘?’ icon at the bottom-right of the source area 🙂 ?
And the “shader Inputs>” on top-left ? 🙂
These are, respectively, a GLSL ES summary, and a list of the Shadertoy variable.

GLSL already knows vectors and matrices

  • comprising operations like length, distance, normalize, dot, cross and mat*vec
  • Many operations directly work on vectors (acting on each components)
  • Some special operations do boolean operations on vectors (all, any, lessthan…)
  • Many operations implictely expend floats to vectors ( v+2., v/=2., step(0.,v)… )
  • and constructors are also casters ( e.g. V=vec4(x>y, 0, vec2(x)) ).
  • GLSL already knows 3D graphics operations such as reflect, refract, faceforward
  • GLSL provides many goodies like clamp, mix (linear interpolation), smoothstep (Hermite weighting)

NB: Complex calculus easily implements as vector and matrices :

  • Use vec2 for definition,  + – between complexes,  + – * / with a float
  • complex multiplication of z1 by z2 is mat2(z1,-z1.y,z1.x) * z2
  • complex division of z1 by z2 is z1 * mat2(z2,-z2.y,z2.x) / dot(z2,z2)

C tricks are good for GLSL perfs and programming ease

  • pow(x,y) is doing exp(y*log(x)): costly, not valid for x<0, not perfect precision.
    • for x^2, do prefer x*x !
    • for 2^x use exp2(x).  the reverse log2 also exist (both in most langages 🙂 ).
  • atan API also includes the 2 parameters version doing atan2(y,x)
  • x = cond ? v1 : v0 can be useful, especially for cascaded small expressions.
  • There should be a law punishing people using if cond then x=true else x=false. Just do directly x= cond 😉
  • macros can be a convenient substitutes for templates (e.g. expressions valid for floats, vec2, vec3, vec4).

Uncomplete integer operations

Many integer operations are missing. Sometime the simplest is to do them on floats then cast (or not). But be careful to precision loss. Still,

  • integers up to 16,777,216 are exactly represented by IEEE floats on 32bits
  • + – * will thus be exact… But not the division: x/x might not be exactly 1.
  • fract and log2 are directly reading the mantissa and exponent so are lossless
  • In particular, << and >> can be represented by *exp2(n) and *exp2(-n)
  • mod on int-in-floats has precision bugs.
    You can do mod directly on ints with  a % b = a-a/b*b
  • Note that you can loop on floats to avoid loads of casts.

GLSL run pixels in parallel

  • So it can computes derivative of any variable for free ! dFdx, dFdy, fwidth
    (The precision is approximate, though: uncentered finite differences within 2×2 blocks)
  • Think parallel. Doing long initializations and definition of arrays won’t be factored through pixels since the whole shader is called at every pixels. So most of the time you save code, memory, registers, by merging the initialization and action loops.
  • A reminder that local memory and number of registers is an ultra-critical resource on GPU.
  • Think procedural: to draw 1000 objects on screen, don’t draw and clamp all of them in your shader – i.e. full set checked at each pixel. Try to find the one(s) that cover the pixel, then render only this one.

Texture tricks (GLSL or Shadertoy)

  • MIPmap is simply activated by switching the texture mode. Still,
    • You can bias it (force less or more blur) via a third parameter at texture call.
    • At parameterization discontinuity the automatic estimation on the LOD might be very wrong. => you can force it using texture…LodEXT
    • Note that MIPmap can be used to approximate integrals.
  • texture…gradEXT directly computes the texture derivative
  • Noise color texture: G and A channels are R and B translated by (37.,17.) , if no vflip. This allows to fake interpolated 3D noise.
  • tex15 is an ordered Bayer matrix : first made for easy half-toning (just threshold it with the grey level), it also provides a permutation table in [0,63].
  • Shadertoy buffers can be used to precompute data.
    More generally, in a multi-buffer algorithm if the result of a buffer is not expected to change do the computation only at iFrame==0  (or up to a delay, if asynchroneous data such as images are used).
  • Note that sound texture includes the FFT of the music.
  • Check for the magic keyboard matrix 🙂

More “touchy” tricks (e.g. for code golfing)

  • The final alpha is ignored, so you can work directly on pixelColor or do vec4(myGreyShader).
  • The final color is naturally clamped (your screen pixel won’t be negative or surbright 🙂 ) so for the final image operation you can forget the last clamp.
    ( Of course this can be wrong for intermediate calculations, and buffers do store unclamped floats. )
  • You must initialize variables, comprising out parameters (such as pixelColor).
    But v -= v will work 99.9999% of the times. The only theoretical issue is when the reused register occurred to value NaN by chance (which I’ve never seen occurring up to now).

Advanced super-tricks

You want to use your own texture ? read section Extending Shadertoy & more.
(But it won’t save and others people won’t see it if they don’t do the same insert.)

Readings