Widgets & GUI toolkits (and more)

In their shadertoys, programmers have to handle everything, and redo everything from scratch every time. For repetitive basic utilitary features like GUI or font display it can thus be very boring, so that many just don’t embed any GUI and users have to tweak defines.
Fortunately, various community members have publish useful base elements that you can reuse !
( PS: coders, please do add the tag “gui” to your utilitary shaders: for now it gathers very little ! )

Bags of features:

  •   Super Shader GUI 98 : Windows98 inside !
    Windows (move, iconify, scroll), sliders, checkbox, color picker.


Isolated features:

Sliders & widgets :

color picker :

Mouse management:

Keyboard management:

Text & int/float numbers display:





  •   Mode 7 WASD Walkaround : basic ASWD move on 3D terrain.
    ( No longer compiling: replace vec2 BUFF_RES with #define ).
    Remark: coders, do account for non-Qwerty keyboards: replace or add arrows ! (codes 37-40)
  • Missing ! : Some easy-to-reuse code for 2D and/or 3D walk-through.
    –> Which shader would you advise ?

Shadertoy media files

Sometime you need to access Shadertoy built-in media (texture, video, music…) out of Shadertoy (e.g., if porting your shader to desktop, or writing a paper or a blogpost about your shader).
Here they are:

2D textures:






Classical corner cases

The following addresses more specific issues, simple or more involved:

Accessing texel center

Textures are strange arrays indexed by a float. Once rescaled by the texture size, texel centers happen to correspond to coordinates integer+0.5 , as for window pixels (but unlike iMouse coordinates). When the texture is supposed to contain non-interpolatable data, it is easy to misfetch the value. Turning the texture interpolation flag to nearest is often a bad idea since if you misfetched the precise texel location, rounding to integer can easily access to the wrong plain value. So you do have to get your formula exact !
Note that if you really want to access the texture as an array, there is now texelFetch( iChannel0, ivec2(U), lod ).  Integer index, no interpolation or wrapping (so possibly faster to), and more easy to fetch the right place in 🙂 .

Please no black icons: Demo mode on mouse-tuned shaders

Many shaders are reactive to the mouse, which is nice. But a lot of them forget that Mouse==vec4(0) at start and for icon display, leading to black or uninteresting image.
→For your mouse-tuned shader, always think about detecting the zero case and set special demo values – possibly animated – then !

Two possibilities (among many):

if (iMouse.xy==vec2(0)) set-nice-params; else set-params(mouse);
vec2 M = iMouse.xy; M = length(M)<10. ? fake-M-value(t) : normalization(M);

The second case allows the user to easily get back to demo-mode by clicking in the bottom-left corner.

One more thing about initial mouse value:
some detects that the mouse is currently clicked by the sign of iMouse.zw. Just take care that in the icon and at start iMouse.zw == 0 as well, which is not negative despite not clicked 😉 .  More details about Shadertoy iMouse uniform here.

More about icon, start image, preview

  • A reminder that live icons start at t=10″: your shader better be nice a this moment, and at least not plain black !
  • An image preview is made when you save your shader. Some people browse using previews instead of live icons (faster, safer), and previews are used at various places (e.g. when I list & link to Shadertoy games 😉 , or when you use ShadertoyPlugin to check if what you see is what the user intended).
    So when your shader get mature, a reminder to wait for a cool look to hit the “save” button !

iFrame  vs  iTime  vs  iDate(.w)

  • When you really want the viewer experience to be smooth, i.e. synchronized with his own time (camera motion, simulation), use iTime. Using iFrame would yield a motion with some stops and changing speed, that would differs among users system, and with screen resolution.
  • Speaking of simulation: in physical simulations all equations should refer to real units of space and time to account for various window resolution and shader framerate, and thus should rely on some dx,dy and dt variables. iTimeDelta let you know the real dt since the last frame. NB: it’s usually better to use a relaxation to smooth it along several frames, for isolated micro-freezes don’t corrupt the simulation: dt = mix(dt, iTimeDelta, .05); // smooth on 20 frames
  • iDate.w, giving (micro)seconds since midnight, is useful when you need to really have unique seeds even at shader initialization – along the run-time iTime is generally sufficient since FPS is never exactly regular. If you use the fractional part of iDate.w, take care that at the end of day (~86000 sec) the precision collapses.
  • iFrame is more indicated to manage internal states of your shader: initialization time, actions to delay to next frame or more, periodic switching between 2 states, etc.

Attention: in the icon, iTime starts at 10 seconds, while iFrame is really 0. This is one more reason for not using iTime to trigger initializations. It’s also a reason to take care about what your shader looks like at 10″ time, since it will be the look of your icon. (Always check the look of your icon in the “latest shaders” view).
More details about Shadertoy special timing uniforms here.

Video sync, video time

Avoid binding the same video multiple time in different buffers: it’s very easy to get each version out of sync. → store it in the first buffer, and access only this version.

Also, you can’t be sure when the video will be loaded and start so it’s very unprecise to rely on iTime for timed video effects.  If you want your effects to be well synchronized, you can use the special Shadertoy uniform iChannelTime[0].

Loading texture at initialization

In buffers, we can proceed to some initialization doing  if ( iFrame == 0 )
Alas it won’t work if you need a texture (e.g. the noise texture), since html is asynchronous and textures need a few frames to be loaded. To handle this some people do instead if ( iFrame < 30 ), but when the network’s Gods are unhappy it can take longer, and you don’t want your shader to rigidly wait at start.
The solution is to store and compare iChannelResolution[] through frames, since this ShaderToy uniform is set to texture resolution only when loaded. Example here. Note that it also allows dynamic detection of texture binding by the user.

if (u==R-.5) { O.a = iChannelResolution[1].x; return; }
if ( iFrame == 0 || texture(iChannel1, 1.-.5/R, 0. ).a == 0. ) { // init...

NB: above, testing pixel R-.5 instead of .5 allows to also detect swap to fullscreen.

Attention: you could use textureSize().xy as well, but level 0 is never 0 (set to ivec2(1) if no texture). And anyway if you need MIPmap levels at initialization it is not enough to test level 0: you must test the required (or the maximum) LOD to textureSize ! cf here.

Video or sound doesn’t loaded at start (or randomly)

This is not a bug, it’s a “feature” of the stupid new web policy about forbidding auto-play of mutimedia if user didn’t do some event (and it not even works all the time). Pressing the “rewind” button may or may not work.
The only solution is, user must quickly click-and-drag mouse or press a key in Shadertoy window before the loading is finished.
At least you can test if the locking is there by comparing iChannelResolution and textureSize: in such case the first is set (meaning the texture is around) but the second stick to 1 (for lod0), so you might at least message the user. See here.

Detecting resolution change / waiting for fullscreen

At resolution change you might want to redraw the background or reorganize data. A typical use case is wanting to let the user go fullscreen before doing the meaningful initialization. Pausing the shader 1sec at start is a quite rigid, requiring a keypress can be not transparent enough. The solution again is to store and compare iResolution through frames, Example here.
Note that detecting going to fullscreen allows you to zoom and refit content if you wish, but there is currently no possibility to do the same on size shrink, since the buffer is already trimmed. 😦

MIPmap and dFdx/dFdy/fwidth artifacts at discontinuities

Hardware derivatives are useful for antialiasing or drawing constant-thickness curves, They are also used by the hardware to compute the LOD to apply to MIPmapped textures. But if the mapping value has discontinuity at some places, the look will be ugly there, showing segments or curvy artifacts.

A classical  situation bringing discontinuity is fract(coords) or mod . ( NB: I hope you know that textures can directly repeat without using fract, by just setting the flag 🙂 .)
A solution consists in manually computing the LOD and using either textureLod() (for simple LOD level tunning) or textureGrad() (and providing the full pixel footprint). Base API (but would reproduce the artifact if derivatives used as is):
textureLod( iChannel0,  log2(length(fwidth(U*iResolution.xy))) );
textureGrad( iChannel0, U, dFdx(U), dFdy(U) );

As for what we did for antialiasing we can use hardware derivative in the task and just skip the fract or mod, since they don’t change the gradient (but by adding discontinuities). When the scaling is simple the LOD level is easily known directly, without computing derivatives. Or in some case you might prefer computing the derivatives analytically.

Another classical situation is caused by atan(y,x) (e.g. polar coordinates), at jump 2π→ 0 or -π→π . But since the jump amount is known, and easy to detect (big value while usual gradients are small) it is easy to trim it:
da = fwidth(a); if ( da > 3.) da = abs(da-2π);  cf example.

Something more puzzling: sometime the bug doesn’t show until you resize the window or go fullscreen (or somebody see it and not you). Cf example. The reason is that hardware derivatives use an approximate tricks: pixels are organized in tiles of 2 x 2 neighborhoods, and the derivatives are evaluated only once per tile as right – left for dFdx and top – bottom for dFdy. So only discontinuities occurring within a tile are detected. If the fract(coords) 1→0 jump or the polar -π→π jump happens between 2 tiles (typically for centered polar coordinates, or for texture tiled 2 x 2, with window height multiple of 4) then the artifact won’t occur. So a hacky trick could be to center your coordinates so as to control this, e.g. offsetting screen coords by 2.*floor(iResolution.xy/4.)   🙂 .

A more nasty case is the use of MIPmap or derivatives in loops, if or switch blocks. As we have seen here, SIMD parallel computation yields such situation to correspond to divergence: the neighbor pixels might be not executing the same code (i.e. being idle while the current pixel is following his combinatory branch). Since hardware derivatives are based on comparing the values of neighbor pixels, they no longer work (and can return 0, NaN or stupid things depending on the system). A call to texture() in a loop might seems harmless, but if the loop has varying length then we are in such a case: if the current pixel is looping further than one of its neighbor, MIPmap and hardware derivatives are wrong for it.
Here again the solution is to compute the LOD manually, and if using derivatives, to evaluate them before the if or switch. Or to save the useful value in the loop and compute derivative or access MIPmap after the loop.

Case study: making dot pattern loopless

Many beginner tends to program in Shadertoy as they would program in C: by explicitly drawing every elements. But shaders are called for every pixels so drawing the full scene each time can be very costly, and it’s way more efficient to determine what is the only – or the few – element(s) that may cover the current pixel, if feasible. Which is often the case for repetitive patterns, e.g. a spiral of dots. Once the candidate element is determined, then we can draw it (which might by itself be costly, if it was not a simple dot like here).


Let see how we can get to one form of algorithm to the other.
A Shadertoy user once proposed a “C like” shader similar to this:

#define rot(a) mat2( cos(a),-sin(a),sin(a),cos(a) )

void mainImage( out vec4 O, vec2 u ) {
    vec2 R = iResolution.xy,
    U = ( 2.*u - R ) / R.y,
    p = vec2(.01,0);

    float PI = 3.14159,
        phi = iTime * .01 + 0.1, // or (1. + sqrt(5.))/2.,    
          a = phi * 2.*PI,
          d = 1e9;

    for( int i = 0; i < 1400; i++) {
    	   d = min( d, length(U - p) - .001 );
           p *= rot(a);
           p = normalize(p) * ( length(p) + .0015 );
    O = vec4( smoothstep(3./iResolution.y, 0., d - .01) );

p starts at vec2(.01,0), then at each step it is rotated by a and its distance to center increased by .0015. The loop is called 1400 times for every pixel, which can be quite costly (especially in fullscreen), while at most one dot will cover the pixel. May we determine which ?


First, let makes the current coordinates explicit, from the description above (NB: I personally prefer to loop on floats to avoid loads of casting):

p = ( .01+ .0015*i ) *CS(i*a);
#define CS(a)  vec2(cos(a),sin(a))

Now, which i yields the dot closest to U ?

Let’s do the maths !

p ~= U considered in polar coordinates yields .01+ .0015*i ~= length(U)  and  i*a ~= atan(U.y,U.x)
The first gives i0 ~= ( length(U) - .01 ) / .0015  and the second gives i1 ~= atan(U.y,U.x) / a , or indeed, i1' = i1 +k*2.*PI/a where k is the – unknown – spire number.
We can find k such that i0 ~= i1'k ~= ( i0 - i1 ) / (2.*PI/a ) .
k must be integer, so we should consider either the floor or cell of this. round would do it for a small dot, but with the larger one here we will need both values to cover the full dot (and for an even larger one we should visit a bit more integers around).
Then, i = round( i1 + k * 2.*PI/a )


Which gives the loopless version of the shader ( the 2-values loop n is just to parse the floor and cell ) :

#define CS(a)  vec2(cos(a),sin(a))

void mainImage( out vec4 O, vec2 u )
    vec2 R = iResolution.xy, 
         U = ( 2.*u - R ) / R.y;

    float PI = 3.14159,
        phi = iTime*0.001 + 0.1, // or phi = (1. + sqrt(5.))/2.,
          a = phi * 2.*PI,
         i0 = ( length(U) - .01 ) /.0015,
         i1 = ( mod( atan(U.y,U.x) ,2.*PI) )/ a, // + k*2PI/a
          k = floor( (i0-i1) / (2.*PI/a) ), 
          i, d = 1e9;
    for (float n = 0.; n < 2.; n++) {
        i = round( i1 + k++ * 2.*PI/a );
        vec2 p = ( .01+ 0.0015*i ) *CS(i*a);
    	d = min( d, length(U - p) - .001 );   
    O = vec4(smoothstep(3./iResolution.y, 0., d - .01));

Not only the cost per pixel is hugely decreased (by about 1000), but also the cost is now totally independent to the number of dots !  Which is good news when designing a pattern.

See more ( including simpler ) loopless shaders.


Note that when designing a spiral shader “the procedural way” from scratch, we could follow a totally different strategy: converting U to spiral coordinates, then splitting it in chunks using fract. The not-trivial part is then the drawing of the dot, since it must not be done in spiral coordinates or it would appear deformed. So we must convert back the center and neighborhood to screen coordinates. See example.

Programming tricks in Shadertoy / GLSL

Many people start writing GLSL shaders as they would write C/C++ programs, not accounting for the fact that shaders are massively parallel computing + GLSL language offers many useful goodies such as vector ops and more. Or, not knowing how to deal with some basic issues, many people copy-paste very ugly or unadapted code designs, or re-invent the wheel (generally ending on less-good solutions than the evolutionary polished ones 😉 ). 

Here we address some basic patterns/tasks. For even more basic aspects related to the good use of GLSL language and parallelism, please first read usual-tricks-in-shadertoy/GLSL . And for non-intuitive issues causing huge cost (or even crashes) at runtime or compilation time, read avoiding-compiler-crash-or-endless-compilation .

Normalizing coordinates

The window is a rectangle that can have various sizes: icon, sub-window in browser tab, fullscreen, seen from different computers (including tablets and smartphones), and different aspect ratio (fullscreen vs sub-window, or in different hardwares including smartphones long screens in landscape or portrait mode). So we usually start by normalizing coordinates. For some reason, many people use a very ugly pattern of first normalizing+distorting the window coordinates to [0,1]x[0,1] ( – 0.5 if centered) then applying an aspect ratio to undistort. Basic clean solutions are:

vec2 R = iResolution.xy,
  // U = fragCoord / R.y;                     // [0,1] vertically
     U = ( 2.*fragCoord - R ) / R.y;          // [-1,1] vertically
  // U = ( fragCoord - .5*R ) / R.y;          // [-1/2,1/2] vertically
  // U = ( 2.*fragCoord - R ) / min(R.x,R.y); // [-1,1] along the shortest side

Displaying textures and videos

Note that if you want to map an image on the full window, thus with distortions, you then do need to use fragCoord/R.
But if you want to map un undistorted rectangle image – typically, a video – , things are a little more involved: see here. Since typical video ratio is accidentally not too far to window ratio (on regular screen) most people blindspotly relied on the “map to full window” above, but on smartphones it then look totally distorted.
( Note that texelFetch avoids texture distortion on a simpler way, but then you no longer benefit from hardware interpolate, rescale, wrap features. )

Managing colors

Don’t forget sRGB / gamma !

Don’t forget that image textures and videos channel intensities are encoded in sRGB, and that final shader color is to be reencoded by you in sRGB, while most synthesis and treatments done in shaders are assumed to be in flat space.
This is especially important for antialiasing since returning 0.5 is really not perceived as mid-grey (test here), for color interpolation (see counter-example, and another), and for luminance computation of textures images and video (NB: this encoding of intensity was historically chosen to account for non-linear intensity distortion in CRT screens, as perception-based cheap compression, then as a normalization to understand colors the same way through multiple input and output devices).
Fortunately sRGB is close to gamma 2.2 conversion: do fragColor = pow(col, vec4(1./2.2) ) at the very end of your program, and col = pow(tex,vec4(2.2)) after reading a texture image to be treated or combined (this does not apply to noise textures). Note that just doing fragColor = sqrt(col), resp. col = tex*tex, is a pretty good approximation.


Many people rely on full costly RGB2HSV conversion just to get a hue value.
This can be made a lot simpler using (see ref):

#define hue(v) ( .6 + .6 * cos( 2.*PI*(v) + vec4(0,-2.*PI/3.,2.*PI/3.,0) ) )  // looks better with a bit of saturation
// code golfed version:
// #define hue(v) ( .6 + .6 * cos( 6.3*(v) + vec4(0,23,21,0) ) )

For full RGB2HSV/HSL and back, see classical and iq references.

Drawing thick bars

step( x0, x ) transitions from 0 to 1 at x0.
smoothstep( .0 , .01, x-x0 ) does the same with smooth transition.
To make a thick bar, rather than multiplying a 0-to-1 with a 1-to-0 transition, just do:

step(r/2., abs(x-x0) )
smoothstep(.0, .01 , abs(x-x0)-r/2. )  // smooth version

NB: above, 1 is outside. If you want 1 inside use 1.- above, or:

step( abs(x-x0), r/2 )
smoothstep( .01, .0,  abs(x-x0)-r/2. )  // smooth version


Aliasing in space or in time is ugly and make your shader looking very newbie 😀 . Oversampling inside each pixel is very costly and gives not-so-good improvement but with hundreds samples per pixel. For algorithms like ray-tracing you have little alternatives (but complex techniques like game-programming screen-space time-based denoising). But for simple 2D shaders it’s often easy to have very good antialiasing for almost free, by using 1-pixel-smooth transitions at all boundaries: More generally, the idea is to return a floating point “normalized distance” rather than an binary “inside or outside”.
Typically, instead of if (x>x0) v=0.; else v=1. ( or  v = x>x0 ? 0. : 1. ), which are equivalent to v=step( x0, x ) , just use v = smoothstep( x0-pix, x0+pix, x ) where pix is the pixel width measured with your coordinates (e.g. pix=2./R.y if vertical coord is normalized to [-1,1]). ( Or simply clamp( (x-x0)/(2.*pix) ,0., 1.) . Note that smoothstep eats part of the transition interval so you need to compensate using at least pix = 1.5*pixelWidth. ). cf Example code.

// Antialiased 2D ring or 1D bar of radius r around v0 (2D disc: v0 = 0 )
// normalized coords version:
#define S(v,v0,r)  smoothstep( 1.5/R.y, -1.5/R.y, length(v-(v0)) - (r) )  
// pixel coords version:
#define S(v,v0,r)  smoothstep( 1.5, -1.5, R.y* ( length(v-(v0)) - (r) )) 

( You might find the 2nd formulation more intuitive: turn v back to pixel coordinates, and proceed black/white transition over 2-3 pixels ).

When you see magic numbers like 0.01 in smoothsteps tell the code author that it won’t scale (aliased in icon, blurry in fullscreen) and tell them to just use true pixel width instead. Note that for 1 pixel thin features, result will look aliased if you forget the final sRGB  conversion at the end of the shader.

Nastier functions are  floor , fract and mod since there is no simple way(*) to smooth their discontinuity the same way we did for step. Still, these are often used with some final thresholding, that just have to not be right on the discontinuity: e.g.,  fract(x+.5)-.5 has no longer discontinuity at x = 0 (or at x = integer). If you need to handle both discontinuities at 0 and 1 (e.g. series of bars as above), abs(fract(x+.25)-.5)-.25 will put them in the continuous part of fract. cf example code.
(*) :  E.g. 1: see smoothfloor/smoothfract . E.g. 2: you might sometime use clamp( sin(Pi*x)/Pi / pix, 0.,1. ) instead of int(x)%2 .

If the parameter value is not a simple scaling of coordinates it can be difficult to know the pixel size in these units. But GLSL hardware derivatives can do it for you: pix = fwidth(x) , at least if x is not crazily oscillating faster than pixel rate. But then as a derivative any discontinuity will cause an issue while you were only interested in the coarse gradient. If x contains discontinuities like x=fract(x’) or x=mod(x’), then simply use x’ instead of x in fwidth since it’s just the same gradient without the discontinuity. cf Example code.

Drawing lines

People solved this long ago, so you don’t need to reinvent the wheel 😉 .
The principle is to return the distance to a segment, then to use the “antialiased thick bar” trick above (cf #define S). Note that for a complex drawing you can first compute the min distance to all features then apply the antialiased-bar (and optional coloring) at the very end. You might even use dot(,) rather than length() so as to compute sqrt only once.

float line(vec2 p, vec2 a,vec2 b) { // --- distance to segment with caps
    p -= a, b -= a;
    float h = clamp(dot(p, b) / dot(b, b), 0., 1.);// proj coord on line
    return length(p - b * h);                      // dist to segment
    // We might directly return smoothstep( 3./R.y, 0., dist),
    //     but its more efficient to factor all lines.
    // We can even return dot(,) and take sqrt at the end of polyline:
    // p -= b*h; return dot(p,p);

Depending on the use case, you might want the distance to an isolated segment (including caps at ends) or just to the capless segment.  cf Example code.

Blending / compositing

When you splat semi-transparent objects, or once you use antialiasing, rather than setting or adding colors you must compose these semi-transparent layers or you will suffer artifacts.
Below, C is pure object color in RGB and opacity in A, O is current and final color.

Drawing assumed to be from front to back stage (i.e. closest first):
(which allows to stop as soon as opacity is 100% or above some threshold like 99.5%)

O += (1.-O.a) * vec4( C.rgb, 1 ) *C.a;

Drawing assumed to be from back to front stage (i.e. closest last):

O = mix( O, vec4( C.rgb, 1), C.a );

Vector maths

First, a reminder that GLSL directly knows about vectors, matrices, vector geometry operations, blending operations; even most ordinary math functions do work on vectors: see here. Besides geometry, vector can also be used for RGBA colors, for complex numbers, etc. Each time you want to do the same thing on x,y,z (for instance), use them ! The perf won’t be a lot better, but the readability of the code will be a lot more, comprising the reasoning, bug chasing, code evolution.

In addition it’s often convenient to add some more vector constructors like:

#define CS(a)        vec2( cos(a), sin(a) )
#define cart2pol(U)  vec2( length(U), atan((U).y,(U).x) )
#define pol2cart(U) ( (U).x * CS( (U).y ) )

Some operations on complexes: ( vec2 Z  means  Z.x + i Z.y  )

// add, sub;  mul or div by float : just use +, -, *, /
#define cmod(Z)     length(Z)
#define carg(Z)     atan( (Z).y, (Z).x )
#define cmul(A,B) ( mat2( A, -(A).y, (A).x ) * (B) )  // by deMoivre formula
#define cinv(Z)   ( vec2( (Z).x, -(Z).y ) / dot(Z,Z) ) 
#define cdiv(A,B)   cmul( A, cinv(B) )
#define cpow(Z,v)   pol2cart( vec2( pow(cmod(Z),v) , (v) * carg(Z) ) )
#define cpow(A,B)   cexp( cmul( B, clog(A) ) )
#define cexp(Z)     pol2cart( vec2( exp((Z).x), (Z).y ) )
#define clog(Z)     vec2( log(cmod(Z)), carg(Z) )

the simplest is to just return the 2D matrix (even for 3D axial rotations):

#define rot(a)      mat2( cos(a), -sin(a), sin(a), cos(a) )
// use cases:
vec2 v = ... ; v *= rot(a); // attention: left-multiply reverses angle 
vec3 p = ... ; p.xy *= rot(a.z); p.yz*= rot(a.x); ...

Note that the optimizer recognizes identical formulas and won’t evaluate sin and cos twice.

Just for fun, the code golfed version 🙂 :  mat2( cos( a + vec4(0,33,11,0)) )

Computing random values

    • Sometime we need the equivalent of drand(), i.e. linear congruence series, that can easily be reimplemented explicitely. cf wikipedia.
    • But most of the time what we really need is a hash value, i.e. a different random value for each pixel, or grid cell, or 3D coord, or 2D+time, etc. And this hash might be a scalar or a vector.
      • For simple use cases, you might rely on the shadertoy 2D or 3D noise textures in grey or RGBA, see special-shadertoy-features . (Take care to not interpolate and reach texel centers if you really want a hash, possibly using nearest flag or texelFetch). Still, the precision is limited (8 bit textures, 64 or 256 resolution).
      • Early integer-less shading languages popularized old-school cheap float-based hashes relying on the chaotic lowest-significant bits after a non-linear operation. (The magic values are important and come from the dawn of computer science age.)
        #define hash21(p) fract(sin(dot(p, vec2(12.9898, 78.233))) * 43758.5453)
        #define hash33(p) fract(sin( (p) * mat3( 127.1,311.7,74.7 , 269.5,183.3,246.1 , 113.5,271.9,124.6) ) *43758.5453123)

        see many variants here. A problem is that precision is hardware (and compiler) dependent so random values can varies with users. Plus p must be not too small or not too big as well: on poor 16 or 24 bits hardwares the random value might just always be zero.

      • Since webGL2 we can now rely on robust precise (but a bit costlier) integer-based hashes: see reference code , especially the GlibC or NRC refs in Integer Hash – II.
        They usually eat an unsigned, so take care when casting from floats  around zero (since [u]int(-0.5) = [u]int(0.5) ).
      • Attention: the variant introduced by Perlin based on permutation tables is very inefficient in shaders since arrays and texture fetches are ultra-costly, and cascading dependent access of 3D-to-1D wrap is not pipeline-friendly as well.
    • You might not want a hash, but a continuous random noise function. Depending on your needs,
      • you might then be happy with a simple value noise (e.g. simple noise texture with interpolation, or analytic using ref codes),
      • splined value noise,
      • or more costly gradient noise (see ref codes),
      • up to full Perlin noise (gradient + spline interpolation + fractal. NB: Perlin published 3 different algorithms along time: Classical, Improved, Simplex).
        Attention: many shaders or blog named “Perlin noise” indeed just fake a simple gradient or even value noise, with random rotations through scales to mask artifacts. This might be ok for you but don’t confuse for what it is not. Conversely, it’s not a good idea for perfs to use the permutation tables for the hashes.

Profiling, timers, compiled code

Optimizing GPU code, as for parallel programming (but worse), is difficult and unintuitive. Several tools can help in this task.
Alas, none work in webGL, only on desktop. But you can easily port your webGLSL to desktop, and even more easily using some glue tools like libShadertoy (see page Applications compatible with Shadertoy ).

Profiling tools:

nVidia insight (different features depending you want the windows VisualStudio, linux Eclipse, or standalone version).

Getting timers:

  • C-side: timer Query tells you the precise real duration (in ms) of rendering of your drawcall.
  • GLSL-side: Shader clock lets you mesure the current SM clock (in ticks) any time you want in your shader.
    It’s often useful to check the current SM ID, warp ID, and corresponding ranges.

Getting compiled code:

two methods:

  • Compile the shader apart, using cgc compiler or shaderPlayground app(windows).
    Problem: you must choose the target language (openGL, openGL-ES, webGL, …) and language version. Hard to be sure which will be used in your app, especially for webGL in a browser.
  • Getting the assembly from your app: GetProgramBinary()

Now, it’s always interesting to see the generated code, but it’s often not easy to deduce right things from it in terms of how to optimize (apart for very basic arithmetic or algorithmic). For instance key aspects for perfs are number of registers used (because it constraints how many alternate threads can be piled-up to mask wait-states), divergence (conditionally executed parts that will differ for different warp pixels), consequence of wait-states (because waiting for maths or texture or worse, dependent chain – that optimizer can improved by shuffling commands), all things that not easily read in the code or that optimizer could improve by producing a code looking strange at first glance. Also, optimizer tend to produce long code by unrolling everything in order to resolve more operations at compilation time, but this can also yields apparently ugly complex code nonetheless more performant.

In the scope of webGL, remind that upstream of that windows will by default transpile the code to HLSL, using different version of D3D depending you run Firefox or Chrome, instead of GLSL compilation. And the layer Angle transforms your GLSL code to try to fix errors and instabilities of webGL, but different browsers activates different Angle patches or don’t use it at all.
On the other end, the “assembly code” is indeed an intermediate code, that is compiled further in the GPU.
That’s why profiling tools and timers are probably more useful for optimization 😉

Readings (shaders, maths, 3D)

People often ask where to start, and which readings to help starting or progressing.
Some just want to learn shaders, others want to get more fluent in the maths behind, some are specifically interested into 3D rendering.

Here are a sample of some online resource, free books and pay books that I often saw mentioned as helpful :
[ Disclaimer:   the purpose of this page is NOT to catalog the full list of books and webpages about 3D. Moreover, it target beginners, not university level 3D. With a focus on “graphics in fragment shaders“, as expected on a Shadertoy/GLSL blog 😉   ]


More advanced:

Puzzling compilation errors in shadertoy

Typically, a cryptic error message appears on top of the source window, or even just a red frame around the tab name, but no command or line number is pointed.

The fact is that your source is included by ShaderToy Javascript into the true GLSL shader, with parts added before and after, and this is the real thing that is compiled: bad things can happen out of your section, even if caused by you. Also, this involves string manipulations that can also fail. In addition, the compiler in the driver can express weirdly when bad things occurs, such as exhausting of resources. This might even cause first a long freeze.

  • Nothing but the tab framed in red:

    • You probably forgot a } somewhere, and the error line doesn’t appears since it occurs… past your source, in the part that Shadertoy adds after. Indeed such {} mismatch can even sometime cause an infinite loop.

    • You played code golfing with #define mainImage: since the introduction of Common tab no error message will ever be displayed in this case, you have to guess. (But if you are in code golfing commando, you can read through the matrix so it’s not a problem 😀 )

  • Comments in #define :
    several special character like $ ‘ ” @ or UTF8 char like é ç  will cause
    Unknown Error: ERROR: 0:? : '' : syntax error

  • Array to big for memory… accounting the way the compiler possibly manage it ultra-badly. For instance if you do bilinear interpolation of array values, OpenGL compiler store data 4 times. Registers used for the assembly langage also count in the resource.

  • Untitled.pngBut if you are right at the limit, and possibly overwhelm the resource only because of the registers, then you can get even stranger messages with no hint at all but the hundred of error followed by whole compiled code result ! see example ( for OpenGl ).

  • Ultra long compilation time (because of you long loops and nested functions, all to be unrolled) can also result in awkward messages after some freeze time.



Embedding shadertoys in website

NB: In code snippets below, replace { by < . WordPress is unable to display code. 😦

Just as clickable image:

Copy-paste the shader URL, and build the one for the corresponding icon:

{ a href="https://www.shadertoy.com/view/SHADER_ID" >{ img src="https://www.shadertoy.com/media/shaders/SHADER_ID.jpg" /> My shader { /a >

My shader

Functional shader:

If you click the “share” button below a shader, you get the piece of code to copy-paste:

{ iframe src="https://www.shadertoy.com/embed/SHADER_ID?gui=true&t=10&paused=true&muted=false" width="640" height="360" frameborder="0" allowfullscreen="allowfullscreen" >{ /iframe >

( no example, for WordPress doesn’t accept iframes ).

… As webpage background:

To the code above in your html, just add a css file or entry telling to map the iframe as full-window background. See the 3 tabs html, css and result in  example.

Minimal version (would be better to specify a class or id name):

iframe {
 position: fixed;
 width: 100%;
 height: 100%;
 top: 0;
 right: 0;
 bottom: 0;
 left: 0;
 z-index; -1;
 pointer-events: none;

Fetching Shadertoy database via javascript:

See the API manual here.
You first need to register online your USER_KEY to be mentioned in your scripts. Then you can fetch queries to the Shadertoy data base to recover as JSON files lists of shader ids via search criterion, or shaders description and contents, to be used in you javascript. For instance, this is how I did my own shaders browser.

Note that only shaders saved as “public+API” can be managed. In particular, you can’t access unlisted private shaders.

Avoiding compiler crash (or endless compilation)

Sometime, shader compilation is long. Or ultra-long. Or freezing the browser. Or even crashing it after a timeout. Worse: this can happen for other peoples (often under another OS) on your shaders while it was okay for you, then your shader can be unlisted because of something you don’t experience (very frustrating).

Before suggesting solutions and what to care about, it’s important to understand…

What happens at early compilation

  • Functions do not really exist on GPU, because there is no stack to jump out then go back (that’s why recursivity is not allowed). This is just a writing aid, like macros. So all functions are first inlined.
  • Loops used to be fake as well. But even now that dynamic loops do exist, optimizers strongly prefer to keep unrolling them for performances: loop content is duplicated as many times as loop steps, with loop variable replaced by its successive const values. One problem is that optimizers don’t foresee that it might overwhelm resources (starting with final code length).
  • Branching vs divergence: when in a same warp (i.e. 32 pixels neighborhood) different conditional (“if“) branches are followed, SIMD parallelism force each thread to run them all (masking the result when not the right branch for a given thread), as shown in these demos.  For variable length loops (for, while) or early exist (conditional break in a loop) this can be even more involved.
    This firstly impact runtime performances, but branches obviously also lengthen the inlined code length (e.g. if big functions are called in branches).
    Also,  while dFdx, dFdy, fwidth might just give silly values or get unset/reset across diverging pixels, on some systems the function texture() try to do better to find the MIPmap LOD to use,  which may consist in evaluating the whole code 4 time to recover a 2×2 neighborhood on which evaluating the derivatives of texture coordinates.
  • The resulting functionless and (almost) loopless simplified but very longer GLSL code is then really compiled and optimized. But the code length and compile duration might overwhelm resources and fail, causing a crash.
  • Note that before compilation, Angle applies various code modifications to turn around some bugs occurring on some drivers/boards, then on Windows it transpiles GLSL to HLSL. And after, both shading languages are compiled into intermediate assembly language ARB, to be compiled and optimized again to get a true GPU executable. So in total there is a full stack of code rewrite and optimization.

Now, just consider the big figure: e.g. for a ray-marching code with a long stepping loop, containing branches (e.g. “if hit”) calling functions (to get the normal, the textures values, etc), that might themselves contain loops on function (e.g. for procedural texturing). Worse: the shading part launching shadow rays (or reflected/refracted rays) with a brand new marching loop (yes, it would be duplicated for each step of the main loop). In addition to the “map” function testing the whole scene for ray-intersection at every step, and this one is likely to also contain loops and further functions call.
The true code length before the true compilation is the huge combinatory of all this. You have no idea how long it could be. Well, indeed you have to.

What can we do ?


  • Do you really need 1000000 steps ? sure ?
  • Do you really need to detail procedural texture (or shape) down to nanometer ? (think about where falls the pixel size limit).
  • Do you really need to compute the texture also for shadow evaluation ?
  • Can’t you first test a raw hit, then inspect the details once this step is reached ?
  • Can’t some part be done in as separate buffer (i.e., stored for the whole frame rather than evaluated for each pixel) ? BTW, does it really need to be re-evaluated at each time step ?
  • Can’t a repeated pattern be done implicitly with a simple mod/fract rather than with an explicit loop ?
  • Or can’t you find the only one (or few) items that can really meet the pixel ?

Within the unroll & inline logic

  • Deferred heavy  processing out of loops:
        if (end_condition) { process; break; }
        if (end_condition) { set_parameters; break; }
    then process the parameters after the loop.
    Typically: shading evaluation, shadows, reflected rays…
  • Deferred heavy  processing out of branches:
        ...else if ( cond_N ) do_action(params);
        ...else if ( cond_N ) set_parameters;
    then process the parameters after the loop.
  • Specialize functions, or use branches inside only if triggered by const params.
    Worst case would be an shape(P, kind, params)  implementing a whole bank of possible shapes called into a ray-marching loop: if kind is not const, the whole shape() source will be multi-duplicated.
  • Don’t call texture() in any divergence-prone area (“if” branch, variable-length or early breakable loop), at least if MIPmap is activated. Or use explicit LOD via textureLod() or textureGrad() .

Keep your critical judgement: the above advices are not always possible, and not always useful. Small loops, small processes don’t deserve special action, plus the GPU *is* powerful enough to deliver good performances on complicated code. Just, learn to recognize the coding patterns that make two “similarly complicated” shaders (by the number of lines or functions)  having totally different fate by how the compiler react. And avoid blindly following the dark path, the one nastily looking “as you would have done on CPU”.

Fighting the unroll & inline logic

You can also fight loop unrolling by making the compiler unable to know the length. E.g.:
    for (int i=0; i<N+min(0,iFrame); i++)

You can forbid optimizations [ which, exactly ? and is it really working ? ] by adding at the top of each code, or later but still outside functions definition :
    #pragma optimize(on)
    #pragma optimize(off)

Compilation can be a lot faster, but of course runtime perfs will be impacted.



See also: