11/5/09

FRequency To the Road of Ribbon in 1k for win32

Instead of producing my own stuff, I've spent time crunching down the original Linux 1k from Frequency. Mainly I did it to see it on my ati card (the original was messed up) but also to see if I could get a win32 version, written in C, down to 1024 bytes.

During the crunching it became obvious I'd have to make a few compromises:
  • The colour of the ribbon is the same as the walls. This is the biggest artistic compromise and definitely it can be criticised as compromising the original. To compensate I added a more dramatic camera path that gets very close to the ribbon.
  • Clock is less accurate and resets, causing a jump to a new viewpoint. This is not as good as the original.
  • View angle is smaller - caused by a new method of calculating view vectors. Not a big issue.
Thats really it. On the plus side, I've cured a bug that caused the ribbon to be reflected many times and I made the new version run a bit faster enabling a higher resolution. I've completely rewritten the code that generates the olour but I've stayed as close to the original as I could. I hope its not noticeable too much. However, mine is darker, perhaps I can cure this later.

This is a fantastic intro and not one I thought could be done in 1k in win32 in C when I started. Linux 1ks have smaller frameworks and so have more space available for the intro; and asm is of course more space efficient than C.

Here is the source code and an exe to try out. I hope you agree it has kept the essence of the original whilst being 1k in win32. I used crinkler 1.1 (even though 1.2 would produce smaller code) as 1.2 doesn't work for me. So the intro wont work under win7!!

Respect to FRequency from auld^titan.

11/1/09

Frequencies To The Road of Ribbon Reactivated

Frequency did a really great Linux 1k recently. To the Road of the Ribbon is a classic labyrinth but (to my knowledge) for the first time with reflections and even a ribbon added. It fits in Linux at 1k nicely but is about 1240 bytes under windows, though xt95 made no attempt to make it smaller under windows, it was small enough under Linux. For some reason it was buggy on my ATi box so out of interest I converted the Linux assembler to C and ported to windows.

I couldn't quite get the code down to 1k using crinkler under windows (setup for windows is larger). Perhaps some jolly fellow can convert it back to asm? You can get my C version here, with a much compressed shader. Its around 1070 bytes using crinkler.

There are some nice tricks in this intro. I like the hi-res clock passed in through glColor4ubv:

t=timeGetTime();
glColor4ubv((unsigned char *)(&t));

and then in the shader (my version):

float w=dot(vec3(gl_Color),vec3(256/256,256,256*256))*.256;

The dot product was a great idea to get a hi-res clock, not sure if its original to Frequency but its very cool. I optimised the size a little by making the original vec3(1,256,65536) into the form above.

10/21/09

Bad intro year

I started this year trying to code a full blown demo which is a long term work in progress now. In the mean time work took over and coding hasn't been possible. The hiatus has given me two very clear ideas for intros. I'll see if they come out as 4ks or something bigger later.

1/20/09

GLSL Shader Creation: Geometry, Vertex and Fragment

To the point, I recently expanded my tiny shader creation framework to include geometry shaders. Not much to say about it. Notice that Geometry and Vertex shaders are optional. The code is getting quite long so it might be worth testing if loading the function pointers in a loop and indexing them would be smaller, nontheless its highly repetitive so crinkler should cope well.

GLuint ShaderCompile(const char *gs, const char *vs, const char *fs) {
GLuint p,s;
p = ((PFNGLCREATEPROGRAMPROC)(wglGetProcAddress("glCreateProgram")))();
if (gs!=NULL) {
//if there is a geom shader...
s=((PFNGLCREATESHADERPROC)wglGetProcAddress("glCreateShader"))(GL_GEOMETRY_SHADER_EXT);
((PFNGLSHADERSOURCEPROC)wglGetProcAddress("glShaderSource")) (s, 1, &gs, NULL);
((PFNGLCOMPILESHADERPROC)wglGetProcAddress("glCompileShader"))(s);
((PFNGLATTACHSHADERPROC)wglGetProcAddress("glAttachShader")) (p,s);
}
if (vs!=NULL) {
//if there is a vertex shader...
s=((PFNGLCREATESHADERPROC)wglGetProcAddress("glCreateShader"))(GL_VERTEX_SHADER);
((PFNGLSHADERSOURCEPROC)wglGetProcAddress("glShaderSource")) (s, 1, &vs, NULL);
((PFNGLCOMPILESHADERPROC)wglGetProcAddress("glCompileShader"))(s);
((PFNGLATTACHSHADERPROC)wglGetProcAddress("glAttachShader")) (p,s);
}
s=((PFNGLCREATESHADERPROC)wglGetProcAddress("glCreateShader"))(GL_FRAGMENT_SHADER);
((PFNGLSHADERSOURCEPROC)wglGetProcAddress("glShaderSource")) (s, 1, &fs, NULL);
((PFNGLCOMPILESHADERPROC)wglGetProcAddress("glCompileShader"))(s);
((PFNGLATTACHSHADERPROC)wglGetProcAddress("glAttachShader")) (p,s);
((PFNGLLINKPROGRAMPROC)wglGetProcAddress("glLinkProgram"))(p);
return p;
}


(I'll return to DirectX and OpenGL next post.)

1/12/09

How to Use D3Dx and OpenGL together : Part 1

I love OGL but the OGL SDK is a shoddy collection of web links. Worse, although as size coders we have GLU, DX coders have a lot of advantages in the standard libraries they have access to. Fortunately, DirectX is quite well designed in the way libraries are broken down and from OGL its possible to use the d3dx support library without rendering using D3d. Its a little overhead of course compared to rendering everything in d3d, but in terms of bytes at 4k its almost nothing.

Inside d3dx are cool routines for getting normals to a mesh, smoothing meshes, raytracing (!) which can be used for collisions, splines for motion and camera paths, the elusive cube and torus, neither of which for some inexplicable reason appear in GLU, and a whole range of maths functions. In this little blog I'll show code to set up OGL and d3d together.


LPDIRECT3D9 pD3D;
LPDIRECT3DDEVICE9 pd3dDevice;

void initD3D(HWND hWnd)
{
D3DPRESENT_PARAMETERS d3dpp;

pD3D = Direct3DCreate9 (D3D_SDK_VERSION);
d3dpp.Windowed = TRUE;
d3dpp.SwapEffect = D3DSWAPEFFECT_DISCARD;
pD3D->CreateDevice( 0, D3DDEVTYPE_HAL, hWnd,
D3DCREATE_SOFTWARE_VERTEXPROCESSING, &d3dpp, &pd3dDevice );
}

The code above sets up D3D. To be precise, it initialises D3D enough so that we can use functions from d3dx. To be more precise, DirectX 9 functions. It may be possible to get smaller but I'm a beginner with d3d, so send me an email if you can beat it. As both pD3D and pd3dDevice are required by d3dx routines, we make them global. It may be possible to use a static declaration for d3dpp to make this smaller byt a few bytes.

Now we need some code to initialise windows and opengl and use this routine to initialise D3D:


static PIXELFORMATDESCRIPTOR pfd={
0, // Size Of PFD... BAD coding, saves bytes
1, PFD_SUPPORT_OPENGL | PFD_DOUBLEBUFFER, 32, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 32, 0, 0, 0, 0, 0, 0, 0
};

void WINAPI WinMainCRTStartup()
{
HWND hWnd = CreateWindow( "EDIT", NULL, WS_POPUP|WS_VISIBLE|WS_MAXIMIZE,
0, 0, 0, 0, 0, 0, 0, 0 );
initD3D(hWnd);
HDC hDC = GetDC( hWnd );
SetPixelFormat ( hDC, ChoosePixelFormat ( hDC, &pfd) , &pfd );
wglMakeCurrent ( hDC, wglCreateContext (hDC) );
ShowCursor(FALSE);
.
.
}

The code above is a little larger than normal because the d3d initialisation requires the window handle as well as the GetDC call, so we have to store it in hWnd variable.

In this blog then, I've presented a tiny framework that initialises win32, opens a window, initialises d3d and initialises opengl. In the next Blog, I'll describe how to use mesh functions in d3dx and render using OpenGL.

1/1/09

Place to live

Joined Titan. Good to have a home :-). Titan gets a lot of bad press. So much so I once decided to leave but luckily they are a forgiving bunch and let me back in. However, inside Titan there are fabulously talented people and its hard to see why Titan hasn't been more successful to this time. Anyway, thanks for letting me back in, I'm already enjoying myself.

12/29/08

Very small Code for Cornell Box

I don't often just add links here (first time) but here is one worth seeing for size coders. Its a complete global illumination solution in 99 lines of C++ code. I notice the exe can be less than 4k.

12/22/08

Code for Moving Camera in GLSL Raytracer

Suppose you have a "two triangles and a shader" program as iq likes to call them. Essentially, you cover the screen with a polygon and then draw everything with a shader. This might be raytracing, for example. The problem is this: how to move the camera around in very few bytes. Better, how to write a very small shader that lets us use OpenGL commands like gluLookAt and glRotate to control the camera?

A good solution hit me today so here is the code. First the main opengl loop to draw that polygon contains:

glRects(-1,-1,1,1);

This is one good advantage for size coding of OGL over D3D. A single polygon covers the screen. We cant use texture coords or colours at the vertices or we would have to define a quad or two triangles. So we are stuck with just the vertices.

Now the magical, tiny, Vertex shader which will give us a moving camera:

varying vec3 v,EP;
void main(){
gl_Position=gl_Vertex;
v = vec3( gl_ModelViewMatrix*gl_Vertex);
EP= vec3( gl_ModelViewMatrix*vec4(0,0,-1,1) );
}

The first line, makes sure that the glRect still covers the screen in the Pixel shader. It does not transform it but leaves it where it is. The second line, however records the transformed vertex in world space as a vec3. This in effect will record the position in world space for every pixel on the screen.

The last line is recording the eye position. Arbitrarily, the eye position is hardcoded here to be one unit in Z away from the screen giving a filed of view of 90 degrees - quite normal for a camera. Note that eye point needs a homogeneous value of 1 as the fourth co-ordinate to work properly.

We could have used vec4 for both lines above and the code would be shorter, but as most raytracing will use vec3 later, yuo can chose to bite the bullet and make the code longer here and shorter in the fragment shader. Horses for courses.

Now in the fragment shader its easy to construct the ray to start tracing:

varying vec3 v,EP;
void main(){
vec3 Ro=EP; //set ray origin
vec3 Rd=v-Ro; //set ray direction

Its that easy. Now you can move the camera in your raytracer using normal OpenGL commands. To finish here is an image from YAST (yet another sphere tracer as I'm calling my glsl raytracer). I'm able to move around the spheres as I chose. As usual, click on the image to see a bigger version.

Out of interest, on an x1950, 1024x768, 30 spheres, one light, shadows and 3 levels of reflection, I'm getting around 50fps.

12/21/08

Getting OpenGL shaders even smaller



Flow2 is a 1k of mine from over 12 months ago. It was 1022 bytes. Since then, several things changed. Firstly, ATi fixed their drivers for *some* cards so that you no longer need a vertex shader/fragment shader pair. As per the glsl spec only a fragment shader is necessary. Fearmoths exploited this in their Linux 1k You Massive Clod. Secondly, last January Crinkler 1.1a was released which compressed better (20 or so bytes). Thirdly, I got better at size reducing C code for crinkler.

So, it was time to revisit an old intro and try to size reduce it. As it happens flow2 didnt have a proper timer so I decided to add one, taking the intro way up over 1024 bytes. Nontheless the current exe is ... wait for it... 890 bytes!!
Thats extra timer code and still 130 bytes smaller. OK, some code. Flow2 is a "one polygon and a shader" style of intro. The point is now you don't need a vertex shader, just a pixel shader, the tiny glsl code can be even smaller:

void setShaders() {
GLuint p,s;
s = ((PFNGLCREATESHADERPROC)wglGetProcAddress("glCreateShader"))(GL_FRAGMENT_SHADER);
((PFNGLSHADERSOURCEPROC)wglGetProcAddress("glShaderSource")) (s, 1, &fsh, NULL);
((PFNGLCOMPILESHADERPROC)wglGetProcAddress("glCompileShader"))(s);
p = ((PFNGLCREATEPROGRAMPROC)wglGetProcAddress("glCreateProgram"))();
((PFNGLATTACHSHADERPROC)wglGetProcAddress("glAttachShader"))(p,s);
((PFNGLLINKPROGRAMPROC)wglGetProcAddress("glLinkProgram"))(p);
((PFNGLUSEPROGRAMPROC) wglGetProcAddress("glUseProgram"))(p);
}


Now the big question...can you fit a synth and music in 130 bytes? Midi for sure but a real synth? The wav header alone is around 43 bytes after compression, add in PlaySoundA and already half the bytes are gone. It seems unlikely but ... well...

12/10/08

Tiny code for Using Multicore CPU Multithreading

Its easier than most people think to use multicores, even with very small code.
The trick is to realise that a CPU, when given more than one thread, will try to run those threads on different cores where possible. I used multithreading (and hence multicore) in a 1k once. I wanted the music in a different thread because I had no space for a real timer for the music. So, the music was written as a function that played the next note. The function was called from the main loop. Timing issues connected to drawing the graphics meant the note was played slightly off beat at times and it was very disconcerting.

So instead I changed the music routine to be:

Loop:
Issue notes
Sleep for some time


Then I set it up in another thread before, entering the main loop, using:

SetThreadPriority(_beginthread( &addTune,0,0), THREAD_PRIORITY_TIME_CRITICAL);

(Watch out this was done using GCC - VC++ may need other, more precise, syntax)
beginthread kicks off a new thread with the function addTune as the first function called in that thread when it starts. SetThreadPriority ensures that even on a single core the music thread will have higher priority than the main graphics thread and the music will not suffer tiny stalls. The ear is more sensitive to this than the eye to the occassional graphics frame dropped.

You need to include process.h for the thread calls above.

On multiple cores this would now be configured by the O/S to be two threads running on two cores.

What about synch? In the 1k I synched the music to the graphics. This means communicating between the two threads (and cores). Again this is deceptively simple.
A variable is declared and the music thread assigns it each time is plays a new note. The drawing thread uses the variable (read only) to decide how to move the graphics just as normal. Globals can be used to communicate between threads and even to synchronise them.

Threads and cores can be very tricky beasts in true parallel programming (I used to work at Edinburgh Parallel Computing Centre on the supercomputers there) but they can also be used in even a 1k intro!

Addendum, to avoid linking with MSVCRT and use win32 functions instead, its possible to use syntax like this in VC++:

SetThreadPriority((HANDLE)CreateThread( NULL, 0,
(LPTHREAD_START_ROUTINE)&addTune,0,0,0), THREAD_PRIORITY_TIME_CRITICAL);

11/3/08

Intersecting a ray with spheres: GLSL

To the point, here is a dump of some very small code to intersect a ray with N spheres. I used it for my wada rendering so its hardwired to four spheres. It is a function so can be dropped into anyones code. Its dumb and just does a linear search over the spheres. However, you can get reasonable speeds up to 20 spheres or so depending on your card. It is of course ps3.0.

p: the point of origin of the ray.
rd: ray direction.
si: OUTPUT, the closest sphere hit.
t is returned, the distance to the closest intersection point.

Assumption: spheres are passed in as vec4s into a uniform, in the normal GLSL manner. Each vec4 is (x,y,z,r2): centre and radius squared.

uniform vec4 sph[4]; // in this case 4 spheres

float isect (in vec3 p, in vec3 rd, out vec4 si){
float t=999.9,tnow,b,disc;
for (int i=0; i<4; i++) { //each sphere
tnow = 9999.9; //temporary closest hit is far away
//next 4 lines are the intersect routine for a sphere
vec3 sd=sph[i].xyz-p;
b = dot ( rd,sd );
disc = b*b + sph[i].w - dot ( sd,sd );
if (disc>0.0) tnow = b - sqrt(disc);
// hit, so compare and store if this is the closest
if ((tnow>0.0001)&&(t>tnow)) {t=tnow; si=sph[i];}
}
return t;
}


A couple of explanations:

disc = b*b + sph[i].w - dot ( sd,sd );

sph[i].w is already a radius squared so saves some code here. Also dot(sd,sd) is short for length(sd) - less bytes.

if ((tnow>0.0001)&&(t>tnow))...

tnow is the current intersection parameter, the distance along rd of the current intersection. If tnow is very small, we may be intersection with the sphere we are already on and must ignore it. 0.0001 is arbitrary.
Of course to get the actual intersection you need to do this in the main routine:

if (t!=999.9) inter=p+t*rd;

or something similar.

Returning to IFS

A few years ago I posted on how to make IFS small over on in4k. Now fractals aren't that well received in the demoscene (a bit been there, done that). The fact is though they are amazing for graphics size coding. The following image was generated using only two (!!) transforms and some nice colour mapping. That would be about 150 bytes.



Click on the image to see it larger.

Now with geometry shaders, an IFS could be run in the geometry pipeline to create complex 3d structures from simple primitives ...lets say, oh I don't know....cubes, in exactly the same way traditional IFS fractal flames are created using points in 2d.

Anyone with geometry shading hardware care to try?

10/26/08

Disappointing Last 4k

Last year I quit dbfinteractive forums, where previously I was very active. If you dont know it, I recommend it as a place to find interesting code snippets for beginners in the demoscene. I also did my last 1k. Largely, I felt I was going nowhere fast with my 1ks. The same is true of my 4ks now and Triangle Ted is my last 4k. TT was meant to be my first 100+ thumbs on Pouet, and my last 4k. Instead it turned out to be a backward step (in rating terms) but remains my last 4k. Disappointing but thats life.

I hope that you've enjoyed some of my 4ks and maybe even been inspired to try yourself.

I like the challenge of size coding so this blog will continue and when I produce anything useful it will go here. As a consequence of not competing in the scene I can be freer with code and what I do which is really what I need now. This means more code should be available on this blog than before. Hurrah!

10/19/08

4k Procedural Graphics

A new category in size coding has been legitimised by the fantastic work of RGBAs iq: 4k procedural graphics. Unlike intros, procedural graphics draw a single image and stop. Usually the image takes from a few seconds to upwards of a minute to render.

Here are some of the best examples to date:
ixaleno
rgba_slisesix
godspeed
photon race
off the shelf

It occurs to me though that there is huge scope for innovation in this category. Here are some ideas:

- Image plus music. An image is drawn and accompanying mood music plays.
- Image plus movement. An image bigger than the screen could be drawn and scrolled or zoomed.
- Slide show. Several images are rendered and displayed with some transition one afetr another.
- Red-blue image. 3D images requiring red-blue glasses.
- RDS Images (random dot stereograms)
- Movies (perhaps this breaks the fundamentals of no animation)

It seems there may be lots of ways to expand this category. 2009 promising to be equally as interesting as 2008 has been in the sizecoding area.

10/18/08

New 4k finished

Despite a very tough year for me, I've finally finished another 4k. This time the synth and music come from s!p and graphics and concept (yes, there is a concept!) from me. It was released tonight at TRSAC. Thanks go to Rbraz too for a piece of code he placed at dbfinteractive forum for all to use. As usual, click for larger versions.


The 4k is about my experience as an "oldskool" guy seeking work in a "newskool" environment. In a country where I don't speak the language too well. Requires PS3.0 card and quite a strong one (say x1800 or 7XXX class onwards). It runs under XP or Vista and also on Nvidia and ATi.

Watch out for it on Pouet soon: Triangle Ted Seeks a Job. Once again, Youtube succeeds in murdering the quality (the uploaded version was quite good) but here is the video anyway. It should whet your apetite to see the intro for real I hope.

9/21/08

Ati is dead in the Demoscene

There are a precious few groups (the best groups) who are able to write shaders for both Nvidia and ATi cards. Weaker coders just code until it works on their cards and release, not knowing they are writing bad code. The worst of these will then blame bad drivers for their poor coding but the truth is they probably didn't even read the specification.

This situation was ok(ish) at the beginning of this year, before Nvidia demo boxes were sent out. However, now Nvidia dominates the demoscene (well done, it was great marketing by the way). Of the 4k teams out there, only TBC, RGBA and Fairlight are sure fire to work across many platforms. Notice how each group has a star programmer in there? ATi owners are left in the cold - few 4ks now running on their hardware.

Whilst 2008 seems to have been a good year for the quality and variation in 4ks, there is a darker side too. PC 4ks have become dramatically less compatible. Precious few run on ATi cards, PS4.0 has been used but runs on very few cards out there and, the winning 4k at NVScene doesn't even run on the XP operating system.

If you are new at this you probably don't know that being compatible costs bytes. It means your sound engine cannot use tracks available in one OS. It means shaders become slightly bigger, maybe even with multiple versions. It is not a level competition between the compatible and "works on one machine" 4ks. Not just this but producing compatible 4ks is harder - it requires more work. I spent a day recently debugging my shaders on Nvidia after size reducing them on ATi. Simply put, compatible 4ks should be given more credit. In a world where 100 bytes makes all the difference, exploiting OS specific advantages is a huge advantage.

Non-compatible 4ks usually claim to work on the "compo" machine. After all they wouldn't be released otherwise. This is fine except the bigger parties are now providing very high end machines (full HD, quad core, >2Gig of RAM, ps4.0 new graphics cards). Most people do not have access to this technology, creating an elite who do. Thank goodness for smaller parties with more modest machines for competition, yet even they this year got their hands on Nvidia demo boxes, temporarily (I hope) exaggerating the problem.

How far will we go with the non-compatibility? I suspect we will go all the way. The competition system is set up to reward those who do produce something that works on one machine. Nothing else matters.

What if it changed? What if competitions DID NOT announce the hardware or OS? Imagine a 4k intro competition with these rules:
*...
* OS will be XP or Vista
* Card will be ATi or Nvidia but will support PS3.0
* Organisers will run entries on compo machine and report results to competitors but no information about the compo machine will be released even at the party
*...

Isn't that the problem solved? If nobody knows the machine, it forces competitors to try to be compatible, not use too much CPU, be conservative on memory use. 4ks will run no a wider range of platforms and within one year we will all know how to write shaders properly.

I saw once someone suggesting we have a standard competition machine. I think this is unworkable. How will, say Riverwash, be able to access the same tech as Breakpoint? Its unlikely. In addition it doesn't solve the real issue. It just means every 4k will run on that platform (much like the Nvidia demobox problem of 2008) and no other.

Another possibility is to allow remote voting. Provide the entries live at the party for download. We at home can try them and vote. If they dont run we can give bad votes. The problem with this is (I suspect) it is open to abuse too much. It would be a brave party organiser who tried this.

No, the real solution is to make the target platform anonymous. Though, wait! If the platform is anonymous, some people will fail to get their precious 4k working. This will reduce the number of entries shown at parties. The smaller parties in 2008 struggled anyway to compete with the bigger ones for number of entries. So there is a downside.

Well, it seems the solution is simple then. Buy nvidia, upgrade to vista, replace that 2 year old PC with a new one and add lots of memory.

I know a good bank manager...

9/13/08

Glass tile distortion shader

Occassionally, I will post tiny shaders, useful for effects in 4k intros (eg kaleidoscope). I discovered recently how to do glass tiles as a post process overlay in very little code. Distortion shaders require you to render to copy to texture first. I have done other blogs about this. However I would guess most 4ks already do this for blur, AO, rgb distortion or whatever already, so the extra bytes of this shader are absolutely tiny.

Here is the shader pair:

varying vec4 v;
void main(){ v = gl_Position = ftransform(); }

The vertex shader assumes something is drawn on the screen (a quad say) and covers the screen as a surface for the distortion shader. This means v will hold co-ordinates -1.0..1.0

uniform sampler2D s;
varying vec4 v;
void main(){
vec2 x = v.xy/2.0 + 0.5;
gl_FragColor = texture2D( s, x + 0.1*tan(x*5.0) );
}

Firstly v is mapped to 0.0..1.0 range of normal texture co-ordinates. The glass tile effect is then simply that tan distortion! Here is an image of what it can produce - ignore the background, thats just something to be distorted.


If you want to "waste" bytes, its a good idea to do something like:

gl_FragColor = texture2D( s, x+min( 0.1*tan(x*5.0), 0.2 ) );

This helps to prevent "sparklies" in the distortion by keeping that tan under control.

8/20/08

Float casting in GLSL (today)

Boring but may be helpful.
According to the GLSL spec version 1.2, ints can be cast to floats automatically. Ati has always held back on casting, nvidia being far more forgiving. However recently ATi made some advances in their compiler and now it works - almost. People programming small shaders, should be very careful in one particular case: functions.
Since the July driver update from ATi, the following are now all valid on Ati (verified with x1900, no version of shaders declared):

float a=1;
vec3 a=vec3(1);
float a=1/128;
a=max(a,0); //warning:invalid on NVIDIA..do not use
a=pow(a,8); //warning:invalid on NVIDIA..do not use
if(a>0)...

Previous Ati drivers required (in most cases above) to use a "." in the number eg:

a=max(a,0.);
if(a>0.)...

However, even on ATi, one exception still remains: user defined functions.

float f(float x) {return x+1;} //valid
f(1); // INVALID
f(1.); // valid

Also:

float f(float x) {return 0;} //INVALID
float f(float x) {return 0.;} //valid

So, essentially, values passed to and back from a user defined function must still be explicitly typed. This is the only exception I have found so far on ATI. Intrinsic functions operate by implicitly casting (eg max). However on Nvidia, intrinsic functions must also be strongly typed. So the only safe way forward as of today is to make sure that all parameters into and out of *any* function are strongly typed (use the "." form) and do not rely on implicit casting.

The GLSL spec says that functions may be overloaded just by changing the types input. Therefore, the cards are trying to show the correct behaviour. The only hiccup is that ATI have decided that intrinsic functions are known and the user has not provided an overloaded version themselves so the implicit cast will go ahead. This seems to be an open (but known) issue in the spec.

8/13/08

Thanks for credits

I wanted to say quickly that that I had credits/greets in two recent 4ks. This really encouraged me to continue posting "useful" snippets here even though sometimes my ideas get used before I get chance to finish a 4k! Thanks guys.Nucleophile which won at Assembly and fractoblob which uses the wada idea from a previous blog but successfully removes the sparklies that I suffered.

Isosurfaces in GLSL

To render an isosurface in GLSL, you'll need:
* a fast graphics card
* ps 3.0

The basic algorithm is simple:

For each pixel
Step along ray until surafce is hit
Find normal
Light and output colour


There are two complications though. Firstly when you ray march, you need to step a certain amount along the ray. Of course when you step, you may go through the surface, not exactly hit it. The smaller the step, the less this is important but the more compute power is required. The solution is a root finder. Essentially a root finder is required (eg secant, newton etc) so the steop along the ray is acceptably large (for performance) but a close enough approximation to the surface can be found (for visual integrity). When the ray passes through the surface a root finder is used to find the EXACT intersection.

Here is a very simple root finder function in glsl:

float rootfind(in float a,in float b,in vec3 eye,in vec3 raydirection){
float exact;
for(int i=0;i<9;i++){
exact=(b+a)/2.0;
if ( func(eye+exact*raydirection)<0.0 ) b=exact;
else a=exact;
}
return exact;
}

In the above example the function "func" is the isosurface we ar trying to intersect.

Here the algorithm is simple and tiny. a and b are parametric values for points either side of the exact intersection (you have these from ray marching). The rootfinder iterates a fixed number of times. Each time, the exact answer is approximated by the value half way between the previous two best guesses (initially a and b) which are an upper bound and a lower bound of the exact value. If the function value at "exact" is negative we must make the lower bound (b) equal to "exact", if its positive we update the upper bound instead because the intersection is when the function is equal to 0.

This is inefficient but its the simplest root finder to understand and can be coded very small.

So assuming we find the exact intersection using this method, mow what is the Normal to the surface? The normal is the rate of change of the function in x,y and z. We already have the value of the function at one point (the intersection point). So we need to call the function three more times, offset in each of x,y and z, find the differences and , optionally, normalise the results.

Here is a simple function in glsl to find a normal of an isosurface:

vec3 isonormal (in vec3 v, in float val)
{
return normalize(vec3(val-func( vec3(v.x+.01, v.y, v.z) ),
val-func( vec3(v.x, v.y+.01, v.z) ),
val-func( vec3(v.x, v.y, v.z+.01) ) ) );
}


The function above takes the intersection point and the value of "func" at this intersection point as input and returns the normalised normal as a vec3. Notice how a small delta is added to each of x, y and z for the intersection point to get a new position in space. e.g.

val-func( vec3(v.x+.01, v.y, v.z) )

is a single float representing the change in "func" between the intersection point a a new point just a little to its right in x.