Quadrics kinda sorta work for real time

(L) [2012/04/17] [tby hobold] [Quadrics kinda sorta work for real time] Wayback!

A few of you might remember some ominous claims I made on the old ompf forum about using quadrics as fundamental rendering primitives. I presented a theoretical article about intersecting arbitrary quadrics with arbitrary polyhedra, but didn't hint where this was meant to go, and then I dropped below the radar.
I have spent a lot of time trying to find more practical ways of modeling "arbitrary" shapes as piecewise quadrics. But I could only come up with my own not-quite-solutions. They may make different trade-offs than what exists in the literature, but I can't see them carry anything more than perhaps a flashy demo.
So I recently turned my attention from the impossible to the possible. I only have a proof of concept now, but it seems encouraging to me:
[IMG #1 Image]
I can fit a kD-Tree to CSG objects made from arbitrary quadrics. The surface primitives are separated well from each other, and CSG trees are automatically simplified down to the surfaces that are relevant in each voxel. The quality of the kD-Tree is not too bad (one can judge this a bit better in animation: [LINK http://www.vectorizer.org/exampleHiKdT02.mp4]).
I am far from real time, at ~0.75 seconds for one 720p frame on six cores of a 3.3GHz "Westmere" Xeon, but I have only just begun tuning. A few statistics:
- an average of 7.7 rays are fired per pixel (including reflection and up to two shadow rays, plus some basic anti-aliasing)
- a ray traverses 34 branch voxels and 4 leaf voxels on average
- an average leaf voxel contains 1.8 quadrics and 0.2 CSG intersections (i.e. boundaries between two quadrics)
- so 8 quadrics are being tested per ray (which seems a bit high to me; I should probably try to cut off empty space from leaves)
Overall, a single ray still takes a ridiculously long 25000 clock cycles. I do not yet have a good feeling for where all this time is lost. But I am quite confident that my early implementation is not compute bound. I suspect a lot of time is wasted due to memory latency and branch mispredictions.

Anyway, those of you who already have well-tuned code for traversal of kD-Trees might be in a better position to assess the potential of this simplest type of curved surface, so I'll hand out the most interesting pieces of code. I'd be more than happy if someone else wanted to experiment with it. Ask away!
Code: [LINK # Select all]/*
Intersecting an Arbitrary Quadric with an Axis Aligned Box
(an illustrative code fragment)
An explanation of the geometry and math behind this algorithm can be found at
http://www.vectorizer.org/IntersectQuadricTetrahedron.html
Feel free to send comments, improvements and questions to
"hobold@vectorizer.org". To my knowledge this algorithm is new, and
unencumbered by intellectual property rights.
This code fragment is Copyright 2012 by Holger Bettag, licensed under GPLv2.
If the GPLv2 is too restrictive for you, contact me. I just want to learn
about improvements you might find. I am sure we can agree on something that
won't cost you a dime, and doesn't force all the obligations of the GPLv2 on
you. The GPLv2 is merely the default.
*/
class Quadric {
public:
double A, B, C; // A x^2 + B y^2 + C z^2
double D, E, F; // + D yz + E zx + F xy
double G, H, I; // + G x + H y + I z
double J; // + J
}

// handle apex of a parabola
// internal use
static
inline void refineMinMax1D(double& minval, double& maxval,
         double a, double b, double c,
         double tmin, double tmax) {
if(a != 0.0) {
double pt = -b/(2.0*a);
// clamp to limits
pt = qmin(pt, tmax);
pt = qmax(pt, tmin);

double value = (a*pt + b)*pt + c;
minval = qmin(minval, value);
maxval = qmax(maxval, value);
}
return;
}
// handle apex of a 2D quadric and the two edges u = umin, u = umax and
// the one corner at u = umax, v = vmin
// internal use
static
void refineMinMax2D(double& minval, double& maxval,
      double a, double b, double c, double d, double e, double f,
      double umin, double umax, double vmin, double vmax) {
// check 2D apex point
double denominator = 4.0*a*b - c*c;
double value;
if (denominator != 0.0) {
double pu = (c*e - 2.0*b*d)/denominator;
double pv = (c*d - 2.0*a*e)/denominator;
// clamp to limits
pu = qmin(pu, umax);
pu = qmax(pu, umin);
pv = qmin(pv, vmax);
pv = qmax(pv, vmin);
value = (a*pu + c*pv + d)*pu + (b*pv + e)*pv + f;
minval = qmin(minval, value);
maxval = qmax(maxval, value);
}
// handle border at u = umin
refineMinMax1D(minval, maxval,
b, c*umin + e, (a*umin + d)*umin + f,
   vmin, vmax);
// ... and border at u = umax
// extract coefficients of parabola (for reuse)
c = c*umax + e;
a = (a*umax + d)*umax + f;
refineMinMax1D(minval, maxval, b, c, a, vmin, vmax);

// look at corner at u = umax, v = vmin
value = (b*vmin + c)*vmin + a;
minval = qmin(minval, value);
maxval = qmax(maxval, value);

return;
}
// intersect a quadric with an axis aligned box
//
// returns 2 (VoxInside) if the box is contained in the quadric,
// 1 (VoxOutside) if the box is completely outside the quadric
// 0 (VoxSurface) if the box overlaps the quadric surface
VoxClass Quadric::intersectVoxel(const Vec3& Min, const Vec3& Max) const {
// handle the coordinate minimum corner and the coordinate maximum corner
double value = evaluatePoint(Min);
double minval = evaluatePoint(Max);
double maxval = qmax(minval, value);
minval = qmin(minval, value);
// process the faces and edges of the box
// (each of the six face calls handles two of the 12 edges and
// one of the 8 corners; the remaining two corners are already done)

// extract face at z = zmin as 2 dimensional quadric and process it
refineMinMax2D(minval, maxval,
A, B, F, E*Min.z + G, D*Min.z + H, (C*Min.z + I)*Min.z + J,
Min.x, Max.x, Min.y, Max.y);

// z = zmax
refineMinMax2D(minval, maxval,
A, B, F, E*Max.z + G, D*Max.z + H, (C*Max.z + I)*Max.z + J,
Min.x, Max.x, Min.y, Max.y);

// y = ymin (switched u and v to q(v,ymin,u) for symmetry)
refineMinMax2D(minval, maxval,
C, A, E, D*Min.y + I, F*Min.y + G, (B*Min.y + H)*Min.y + J,
Min.z, Max.z, Min.x, Max.x);

// y = ymax
refineMinMax2D(minval, maxval,
C, A, E, D*Max.y + I, F*Max.y + G, (B*Max.y + H)*Max.y + J,
Min.z, Max.z, Min.x, Max.x);

// x = xmin
refineMinMax2D(minval, maxval,
B, C, D, F*Min.x + H, E*Min.x + I, (A*Min.x + G)*Min.x + J,
Min.y, Max.y, Min.z, Max.z);

// x = xmax
refineMinMax2D(minval, maxval,
B, C, D, F*Max.x + H, E*Max.x + I, (A*Max.x + G)*Max.x + J,
Min.y, Max.y, Min.z, Max.z);

// find the 3D apex point
double denominator = 8.0*A*B*C + 2.0*(D*E*F - A*D*D - B*E*E - C*F*F);
if (denominator != 0.0) {
double px = (2.0*B*E*I + D*D*G + 2.0*C*F*H - 4.0*B*C*G - D*F*I - D*E*H);
double py = (E*E*H + 2.0*A*D*I + 2.0*C*F*G - 4.0*A*C*H - D*E*G - E*F*I);
double pz = (2.0*B*E*G + 2.0*A*D*H + F*F*I - 4.0*A*B*I - E*F*H - D*F*G);
px /= denominator;
py /= denominator;
pz /= denominator;

// clamp position to limits
px = qmin(px, Max.x);
px = qmax(px, Min.x);
py = qmin(py, Max.y);
py = qmax(py, Min.y);
pz = qmin(pz, Max.z);
pz = qmax(pz, Min.z);
value = evaluatePoint(Vec3(px, py, pz));
minval = qmin(minval, value);
maxval = qmax(maxval, value);
}

return VoxClass(((maxval < 0.0) << 1) + (minval > 0.0));
}
// brute force method, for debugging & test reference
VoxClass Quadric::intersectVoxelDBG(const Vec3& Min, const Vec3& Max) const {
const int size = 100;
int x,y,z;
double min = evaluatePoint(Min);
double max = min;
double val;
double fx, fy, fz;
for (z = size; z >= 0; z--) {
fz = Min.z + double(z)*(Max.z - Min.z)/double(size);
for (y = size; y >= 0; y--) {
fy = Min.y + double(y)*(Max.y - Min.y)/double(size);
for (x = size; x >= 0; x--) {
fx = Min.x + double(x)*(Max.x - Min.x)/double(size);

val = evaluatePoint(Vec3(fx,fy,fz));
min = qmin(min, val);
max = qmax(max, val);
}
}
}
if (min > 0.0)
return voxOutside; // positive minimum value => voxel is outside

if (max < 0.0)
return voxInside; // negative maximum value => voxel is inside

return voxSurface; // otherwise voxel must straddle quadric surface
}

[IMG #1]:

(L) [2012/04/19] [tby hobold] [Quadrics kinda sorta work for real time] Wayback!

I miscalculated my initial figure for clock cycles per ray ... apparently I divided the number of rays by the number of threads once again. After a few more tweaks for better load balancing and faster code paths for a few special cases, I am now getting between 1.55M and 1.75M rays per second per core. Not great, but respectable. This corresponds to 1900 to 2200 clock cycles per ray. The numbers vary with output resolution. At 1280 pixels wide, the rays are a bit more coherent and are computed faster; at 640 pixels wide things slow down a bit. (No ray packets, no specialized shadow ray optimizations, no SIMD utilization ... everything is plain and simple with a bit of multithreading.)
So the good news is that I am now a lot less confused about the program's high sensitivity to minor changes. But the bad news is that I'll have a much harder time squeezing out cycles than I would have had if a ray really took 25000 clocks.
Finally, I noticed that I handed you only half the magic you need to make your own experiments. The part about adaptively simplifying CSG composites has a few pitfalls. Pruned CSG subtrees must be created on the fly, and passed back to the voxel intersection routine of the parent. Here's another fragment from my proof of concept code (please kindly ignore the ugly memory leak):
Code: [LINK # Select all]/*
Intersecting CSG composites with an Axis Aligned Box and simplifying them
(an illustrative code fragment)
Feel free to send comments, improvements and questions to
"hobold@vectorizer.org".
This code fragment is Copyright 2012 by Holger Bettag, licensed under GPLv2.
If the GPLv2 is too restrictive for you, contact me. I just want to learn
about improvements you might find. I am sure we can agree on something that
won't cost you a dime, and doesn't force all the obligations of the GPLv2 on
you. The GPLv2 is merely the default.
*/
VoxClass Intersection::intersectVoxel(const GeObject*& Obj,
            const Vec3& Min, const Vec3& Max) const {
VoxClass status1, status2;
const GeObject* obj1;
const GeObject* obj2;

status1 = child1->intersectVoxel(obj1, Min, Max);
status2 = child2->intersectVoxel(obj2, Min, Max);
Obj = this; // default return object is this CSG intersection

// outside of one means outside of intersection
if ((status1 == voxOutside) || (status2 == voxOutside)) {
return voxOutside;
}
// both surfaces means we need to test against CSG combination
if ((status1 == voxSurface) && (status2 == voxSurface)) {
// there might be an opportunity for pruning here, in case a child ...
// ... could simplify itself
if ((child1 != obj1) || (child2 != obj2)) {
// TODO: fix memory leak here ...
Obj = new Intersection((GeObject*)obj1, (GeObject*)obj2);
if (Obj == NULL) {
fprintf(stderr,
   "Intersection::intersectVoxel(): no memory for pruned CSG.\n");
exit(0);
}
}
return voxSurface;
}
// surface of only one means we can prune CSG tree
if (status1 == voxSurface) {
Obj = obj1;
return voxSurface;
}
if (status2 == voxSurface) {
Obj = obj2;
return voxSurface;
}
// both inside means inside of intersection
return voxInside;
}

VoxClass Union::intersectVoxel(const GeObject*& Obj,
         const Vec3& Min, const Vec3& Max) const {
VoxClass status1, status2;
const GeObject* obj1;
const GeObject* obj2;

status1 = child1->intersectVoxel(obj1, Min, Max);
status2 = child2->intersectVoxel(obj2, Min, Max);
Obj = this; // default return object is this CSG union

// inside of one means inside union
if ((status1 == voxInside) || (status2 == voxInside))
return voxInside;
// both surfaces means we need to test against CSG combination
if ((status1 == voxSurface) && (status2 == voxSurface)) {
// there might be an opportunity for pruning here, in case a child ...
// ... could simplify itself
if ((child1 != obj1) || (child2 != obj2)) {
// TODO: memory leak here ...
Obj = new Union((GeObject*)obj1, (GeObject*)obj2);
if (Obj == NULL) {
fprintf(stderr,
   "Union::intersectVoxel(): no memory for pruned CSG.\n");
exit(0);
}
}
return voxSurface;
}
// surface of only one means we can prune CSG tree
if (status1 == voxSurface) {
Obj = obj1;
return voxSurface;
}
if (status2 == voxSurface) {
Obj = obj2;
return voxSurface;
}
// outside of both means outside of union
return voxOutside;
}

(L) [2012/07/25] [tby hobold] [Quadrics kinda sorta work for real time] Wayback!

I am making slow progress on the impractically slow tree building. I finally understand the geometry and math behind a sweep type algorithm based on the original box-quadric-intersection test. To find all relevant candidate split positions (at most four for a general quadric primitive), it is sufficient to examine a scaffolding of four edges (parallel to the axis in question), plus up to five oblique "struts" (one on each parallel face, and possibly one inside the box). The edges and the struts are all straight lines, so ultimately it boils down to tracing a few carefully chosen rays.
It's not particularly deep or involved, but tedious with many details where devils can hide. I will eventually present a reference implementation, but don't hold your breath ...
On another note, the unsolved modeling problem keeps haunting me. What use is a renderer for objects that no one can conveniently model? In an attempt to motivate myself, I experimented a bit with surfaces of revolution:
[IMG #1 Image]
In this particular example, circa 280 quadric surfaces are stitched together to approximate an arbitrary, smooth, generating function to +/- 0.0005 tolerance. A similar number of invisible separator quadrics have been used, but I think with some more effort their number could be drastically reduced.

I will continue to sporadically post in this thread, but primarily I will document my (lack of/slow) progress on my website:
[LINK http://vectorizer.org/boar/]
Comments and criticism are welcome, either here or by email.

[IMG #1]:

(L) [2012/07/26] [tby hobold] [Quadrics kinda sorta work for real time] Wayback!

>> graphicsMan wrote:Seems like a fun project
In a way, the fun is the only remaining reason behind the project (the hopefully humorous tone of the project home page is a consequence of that). I had much grander plans ... imagine if realtime raytracing was capable of not only doing more realistic lighting with less hassle than rasterizers, but could efficiently display curved surfaces. That would be a very visible advantage over rasterization. Maybe even enough of an advantage to make game design studios want to employ this new aesthetic for a game "like none you have seen before". CPUs and GPUs keep converging further as we speak, so the hardware is pretty much already there if you want to do software defined realtime graphics.
However, these plans depend much more critically on the ability to easily model such curved objects than they depend on an efficient renderer (both are needed, but the renderer "only" takes hard work, not so much ingenuity, thanks to all the prior art that you folks out there have already done). I had hoped to come up with a construction that is somewhat compatible with the established tools and design workflow. But I didn't. I spent a year badly reinventing old wheels, or exploring old and new dead ends.
Without some sort of quadric retrofitting voodoo (abbreviated "qRv"; guess how that should be pronounced [SMILEY :-)]) controlled by polygon meshes, even a perfect realtime raytracer for quadrics is utterly useless. Still, there are many interesting problems to be solved when you replace straight geometry with conics.
Interesting, and somewhat challenging, problems was exactly what I needed to break free from an employer that had changed beyond recognition. They used to be a creative chaos much like Valve ([LINK http://blogs.valvesoftware.com/abrash/valve-how-i-got-here-what-its-like-and-what-im-doing-2/]), but became a victim of their success. In the end, it was a bureaucratic nightmare; I was depressed because I felt I couldn't make myself useful anymore.
At the same time I couldn't just jump ship. On the one hand I still felt loyalty to my coworkers, and on the other hand I pretty much knew that I would never ever find another group of bright minds, another place ablaze with ideas like this company used to be in their early days, no matter where I went. It just wasn't as easy as going out the door and heading for "greener pastures".
So I decided to grant myself a sort of sabbatical. To see if I could still solve nontrivial problems. The "grander plans" mentioned above were a distant hope, a justification, a motivation, an excuse for the foolish decision to go into voluntary unemployment in the middle of a financial crisis.
That was probably way more information than anybody wanted. Then again, maybe I get lucky and one of you finds the "grand plans" as fascinating as I do. I'd gladly fill a research position trying to make them as real as I possibly can. *cough* [SMILEY :)]

(L) [2012/07/27] [tby hobold] [Quadrics kinda sorta work for real time] Wayback!

>> graphicsMan wrote:Shinji Ogaki published a paper last year about how to ray trace Phong tessellations.
I had skimmed it. As far as I understand, he basically renders Bezier triangles of degree two. This works out to an implicit degree four surface (a quartic). There is also a good paper by James Blinn et al on how to render implicit surfaces up to that degree on the GPU, with heavy emphasis on efficient and robust numerics. Both are very worth reading, but the fact remains that cubic and quartic surfaces require substantially more computational work (an order of magnitude or more), and their mathematical complexity makes it very challenging to find clever optimizations. For starters, try to find an exact and efficient "surface in voxel" test for them ...
>> I understand your desire to use quadrics, since you can use closed-form solutions instead of tessellation or iteration...
There are a number of reasons why I am so convinced that it has to be either quadrics or nothing:
- quadrics are mathematically simple. No innocent brain cells need to be sacrificed for working out the formulas.
- quadrics are numerically simple. The worst operations needed are divide and square root; modern hardware does these with very good throughput and accuracy (unlike the old days when iterative algorithms stalled the pipeline for dozens of cycles). One can hope that a quadric roughly takes only double the memory and double the computation of a triangle.
- quadrics are geometrically just about sufficient to locally approximate any smooth surface
>> perhaps you could figure out an approach to fit quadric pieces to a surface in a way that gets C^1 (or better, G^2) continuity across shared faces of a higher-order surface? You will get smoother surfaces than triangles without super-high tessellation, but lower continuity than the higher-order surface. This might be good enough?
Actually, the approaches I tried didn't even enforce C^1. The nice thing about second order approximation is that the error at a tangent discontinuity shrinks quadratically as you tesselate more finely. The edges between patches become invisible long before patches shrink to pixel size. (For example, the surface of revolution above does not enforce tangent continuity between patches, but the resulting artifacts are too small to be noticed.)
This was my secret sauce, so to speak. The reason for my initial hubris in thinking I could outdo all those who already painstakingly worked out intricate and useless C^1 constructions. In fact, enforcing C^1 causes undesirable side effects: non-locality and overshoot.
The problems I ran into were even more basic than that. When you are hell-bent on requiring no more than one quadric patch per input triangle, then it will sometimes be impossible to even only get straight C^0 continuity ... triangle meshes can be surprisingly complex, topologically speaking (and quadrics can be frustratingly constrained, geometrically speaking). One can still try constructions that subdivide the input mesh in some way, but right now I don't know enough about processing of meshes to make guesses how that ought to be done.

Quadrics kinda sorta work for real time back

Board: Home Board index Raytracing General Development