(L) [2012/04/17] [tby hobold] [Quadrics kinda sorta work for real time] Wayback!A few of you might remember some ominous claims I made on the old ompf forum about using quadrics as fundamental rendering primitives. I presented a theoretical article about intersecting arbitrary quadrics with arbitrary polyhedra, but didn't hint where this was meant to go, and then I dropped below the radar.
I have spent a lot of time trying to find more practical ways of modeling "arbitrary" shapes as piecewise quadrics. But I could only come up with my own not-quite-solutions. They may make different trade-offs than what exists in the literature, but I can't see them carry anything more than perhaps a flashy demo.
So I recently turned my attention from the impossible to the possible. I only have a proof of concept now, but it seems encouraging to me:
[IMG #1 Image]
I can fit a kD-Tree to CSG objects made from arbitrary quadrics. The surface primitives are separated well from each other, and CSG trees are automatically simplified down to the surfaces that are relevant in each voxel. The quality of the kD-Tree is not too bad (one can judge this a bit better in animation: [LINK http://www.vectorizer.org/exampleHiKdT02.mp4]).
I am far from real time, at ~0.75 seconds for one 720p frame on six cores of a 3.3GHz "Westmere" Xeon, but I have only just begun tuning. A few statistics:
- an average of 7.7 rays are fired per pixel (including reflection and up to two shadow rays, plus some basic anti-aliasing)
- a ray traverses 34 branch voxels and 4 leaf voxels on average
- an average leaf voxel contains 1.8 quadrics and 0.2 CSG intersections (i.e. boundaries between two quadrics)
- so 8 quadrics are being tested per ray (which seems a bit high to me; I should probably try to cut off empty space from leaves)
Overall, a single ray still takes a ridiculously long 25000 clock cycles. I do not yet have a good feeling for where all this time is lost. But I am quite confident that my early implementation is not compute bound. I suspect a lot of time is wasted due to memory latency and branch mispredictions.
Anyway, those of you who already have well-tuned code for traversal of kD-Trees might be in a better position to assess the potential of this simplest type of curved surface, so I'll hand out the most interesting pieces of code. I'd be more than happy if someone else wanted to experiment with it. Ask away!
Code: [LINK # Select all]/*
Intersecting an Arbitrary Quadric with an Axis Aligned Box
(an illustrative code fragment)
An explanation of the geometry and math behind this algorithm can be found at
http://www.vectorizer.org/IntersectQuadricTetrahedron.html
Feel free to send comments, improvements and questions to
"hobold@vectorizer.org". To my knowledge this algorithm is new, and
unencumbered by intellectual property rights.
This code fragment is Copyright 2012 by Holger Bettag, licensed under GPLv2.
If the GPLv2 is too restrictive for you, contact me. I just want to learn
about improvements you might find. I am sure we can agree on something that
won't cost you a dime, and doesn't force all the obligations of the GPLv2 on
you. The GPLv2 is merely the default.
*/
class Quadric {
public:
double A, B, C; // A x^2 + B y^2 + C z^2
double D, E, F; // + D yz + E zx + F xy
double G, H, I; // + G x + H y + I z
double J; // + J
}
// handle apex of a parabola
// internal use
static
inline void refineMinMax1D(double& minval, double& maxval,
double a, double b, double c,
double tmin, double tmax) {
if(a != 0.0) {
double pt = -b/(2.0*a);
// clamp to limits
pt = qmin(pt, tmax);
pt = qmax(pt, tmin);
double value = (a*pt + b)*pt + c;
minval = qmin(minval, value);
maxval = qmax(maxval, value);
}
return;
}
// handle apex of a 2D quadric and the two edges u = umin, u = umax and
// the one corner at u = umax, v = vmin
// internal use
static
void refineMinMax2D(double& minval, double& maxval,
double a, double b, double c, double d, double e, double f,
double umin, double umax, double vmin, double vmax) {
// check 2D apex point
double denominator = 4.0*a*b - c*c;
double value;
if (denominator != 0.0) {
double pu = (c*e - 2.0*b*d)/denominator;
double pv = (c*d - 2.0*a*e)/denominator;
// clamp to limits
pu = qmin(pu, umax);
pu = qmax(pu, umin);
pv = qmin(pv, vmax);
pv = qmax(pv, vmin);
value = (a*pu + c*pv + d)*pu + (b*pv + e)*pv + f;
minval = qmin(minval, value);
maxval = qmax(maxval, value);
}
// handle border at u = umin
refineMinMax1D(minval, maxval,
b, c*umin + e, (a*umin + d)*umin + f,
vmin, vmax);
// ... and border at u = umax
// extract coefficients of parabola (for reuse)
c = c*umax + e;
a = (a*umax + d)*umax + f;
refineMinMax1D(minval, maxval, b, c, a, vmin, vmax);
// look at corner at u = umax, v = vmin
value = (b*vmin + c)*vmin + a;
minval = qmin(minval, value);
maxval = qmax(maxval, value);
return;
}
// intersect a quadric with an axis aligned box
//
// returns 2 (VoxInside) if the box is contained in the quadric,
// 1 (VoxOutside) if the box is completely outside the quadric
// 0 (VoxSurface) if the box overlaps the quadric surface
VoxClass Quadric::intersectVoxel(const Vec3& Min, const Vec3& Max) const {
// handle the coordinate minimum corner and the coordinate maximum corner
double value = evaluatePoint(Min);
double minval = evaluatePoint(Max);
double maxval = qmax(minval, value);
minval = qmin(minval, value);
// process the faces and edges of the box
// (each of the six face calls handles two of the 12 edges and
// one of the 8 corners; the remaining two corners are already done)
// extract face at z = zmin as 2 dimensional quadric and process it
refineMinMax2D(minval, maxval,
A, B, F, E*Min.z + G, D*Min.z + H, (C*Min.z + I)*Min.z + J,
Min.x, Max.x, Min.y, Max.y);
// z = zmax
refineMinMax2D(minval, maxval,
A, B, F, E*Max.z + G, D*Max.z + H, (C*Max.z + I)*Max.z + J,
Min.x, Max.x, Min.y, Max.y);
// y = ymin (switched u and v to q(v,ymin,u) for symmetry)
refineMinMax2D(minval, maxval,
C, A, E, D*Min.y + I, F*Min.y + G, (B*Min.y + H)*Min.y + J,
Min.z, Max.z, Min.x, Max.x);
// y = ymax
refineMinMax2D(minval, maxval,
C, A, E, D*Max.y + I, F*Max.y + G, (B*Max.y + H)*Max.y + J,
Min.z, Max.z, Min.x, Max.x);
// x = xmin
refineMinMax2D(minval, maxval,
B, C, D, F*Min.x + H, E*Min.x + I, (A*Min.x + G)*Min.x + J,
Min.y, Max.y, Min.z, Max.z);
// x = xmax
refineMinMax2D(minval, maxval,
B, C, D, F*Max.x + H, E*Max.x + I, (A*Max.x + G)*Max.x + J,
Min.y, Max.y, Min.z, Max.z);
// find the 3D apex point
double denominator = 8.0*A*B*C + 2.0*(D*E*F - A*D*D - B*E*E - C*F*F);
if (denominator != 0.0) {
double px = (2.0*B*E*I + D*D*G + 2.0*C*F*H - 4.0*B*C*G - D*F*I - D*E*H);
double py = (E*E*H + 2.0*A*D*I + 2.0*C*F*G - 4.0*A*C*H - D*E*G - E*F*I);
double pz = (2.0*B*E*G + 2.0*A*D*H + F*F*I - 4.0*A*B*I - E*F*H - D*F*G);
px /= denominator;
py /= denominator;
pz /= denominator;
// clamp position to limits
px = qmin(px, Max.x);
px = qmax(px, Min.x);
py = qmin(py, Max.y);
py = qmax(py, Min.y);
pz = qmin(pz, Max.z);
pz = qmax(pz, Min.z);
value = evaluatePoint(Vec3(px, py, pz));
minval = qmin(minval, value);
maxval = qmax(maxval, value);
}
return VoxClass(((maxval < 0.0) << 1) + (minval > 0.0));
}
// brute force method, for debugging & test reference
VoxClass Quadric::intersectVoxelDBG(const Vec3& Min, const Vec3& Max) const {
const int size = 100;
int x,y,z;
double min = evaluatePoint(Min);
double max = min;
double val;
double fx, fy, fz;
for (z = size; z >= 0; z--) {
fz = Min.z + double(z)*(Max.z - Min.z)/double(size);
for (y = size; y >= 0; y--) {
fy = Min.y + double(y)*(Max.y - Min.y)/double(size);
for (x = size; x >= 0; x--) {
fx = Min.x + double(x)*(Max.x - Min.x)/double(size);
val = evaluatePoint(Vec3(fx,fy,fz));
min = qmin(min, val);
max = qmax(max, val);
}
}
}
if (min > 0.0)
return voxOutside; // positive minimum value => voxel is outside
if (max < 0.0)
return voxInside; // negative maximum value => voxel is inside
return voxSurface; // otherwise voxel must straddle quadric surface
}
[IMG #1]:
(L) [2012/04/19] [tby hobold] [Quadrics kinda sorta work for real time] Wayback!I miscalculated my initial figure for clock cycles per ray ... apparently I divided the number of rays by the number of threads once again. After a few more tweaks for better load balancing and faster code paths for a few special cases, I am now getting between 1.55M and 1.75M rays per second per core. Not great, but respectable. This corresponds to 1900 to 2200 clock cycles per ray. The numbers vary with output resolution. At 1280 pixels wide, the rays are a bit more coherent and are computed faster; at 640 pixels wide things slow down a bit. (No ray packets, no specialized shadow ray optimizations, no SIMD utilization ... everything is plain and simple with a bit of multithreading.)
So the good news is that I am now a lot less confused about the program's high sensitivity to minor changes. But the bad news is that I'll have a much harder time squeezing out cycles than I would have had if a ray really took 25000 clocks.
Finally, I noticed that I handed you only half the magic you need to make your own experiments. The part about adaptively simplifying CSG composites has a few pitfalls. Pruned CSG subtrees must be created on the fly, and passed back to the voxel intersection routine of the parent. Here's another fragment from my proof of concept code (please kindly ignore the ugly memory leak):
Code: [LINK # Select all]/*
Intersecting CSG composites with an Axis Aligned Box and simplifying them
(an illustrative code fragment)
Feel free to send comments, improvements and questions to
"hobold@vectorizer.org".
This code fragment is Copyright 2012 by Holger Bettag, licensed under GPLv2.
If the GPLv2 is too restrictive for you, contact me. I just want to learn
about improvements you might find. I am sure we can agree on something that
won't cost you a dime, and doesn't force all the obligations of the GPLv2 on
you. The GPLv2 is merely the default.
*/
VoxClass Intersection::intersectVoxel(const GeObject*& Obj,
const Vec3& Min, const Vec3& Max) const {
VoxClass status1, status2;
const GeObject* obj1;
const GeObject* obj2;
status1 = child1->intersectVoxel(obj1, Min, Max);
status2 = child2->intersectVoxel(obj2, Min, Max);
Obj = this; // default return object is this CSG intersection
// outside of one means outside of intersection
if ((status1 == voxOutside) || (status2 == voxOutside)) {
return voxOutside;
}
// both surfaces means we need to test against CSG combination
if ((status1 == voxSurface) && (status2 == voxSurface)) {
// there might be an opportunity for pruning here, in case a child ...
// ... could simplify itself
if ((child1 != obj1) || (child2 != obj2)) {
// TODO: fix memory leak here ...
Obj = new Intersection((GeObject*)obj1, (GeObject*)obj2);
if (Obj == NULL) {
fprintf(stderr,
"Intersection::intersectVoxel(): no memory for pruned CSG.\n");
exit(0);
}
}
return voxSurface;
}
// surface of only one means we can prune CSG tree
if (status1 == voxSurface) {
Obj = obj1;
return voxSurface;
}
if (status2 == voxSurface) {
Obj = obj2;
return voxSurface;
}
// both inside means inside of intersection
return voxInside;
}
VoxClass Union::intersectVoxel(const GeObject*& Obj,
const Vec3& Min, const Vec3& Max) const {
VoxClass status1, status2;
const GeObject* obj1;
const GeObject* obj2;
status1 = child1->intersectVoxel(obj1, Min, Max);
status2 = child2->intersectVoxel(obj2, Min, Max);
Obj = this; // default return object is this CSG union
// inside of one means inside union
if ((status1 == voxInside) || (status2 == voxInside))
return voxInside;
// both surfaces means we need to test against CSG combination
if ((status1 == voxSurface) && (status2 == voxSurface)) {
// there might be an opportunity for pruning here, in case a child ...
// ... could simplify itself
if ((child1 != obj1) || (child2 != obj2)) {
// TODO: memory leak here ...
Obj = new Union((GeObject*)obj1, (GeObject*)obj2);
if (Obj == NULL) {
fprintf(stderr,
"Union::intersectVoxel(): no memory for pruned CSG.\n");
exit(0);
}
}
return voxSurface;
}
// surface of only one means we can prune CSG tree
if (status1 == voxSurface) {
Obj = obj1;
return voxSurface;
}
if (status2 == voxSurface) {
Obj = obj2;
return voxSurface;
}
// outside of both means outside of union
return voxOutside;
}