Irregular grids

(L) [2017/06/07] [ost by ziu] [Irregular grids] Wayback!

Hey,
That was a really interesting publication! You have, what seems to me, to be excellent rendering performance. Still it would be interesting to profile your build times to see where are the bottlenecks in the algorithm for each scene. You can probably easily get this data off something like nvprof. San Miguel in particular is an interesting scene to study since it has a lot of dissimilar sized triangles. Sponza, Conference, Crown, and Fairy Forest, should have similar characteristics. If enough time is spent on the initial uniform grid construction, on scenes with dissimilar sized triangles, a different construction algorithm with better workload distribution among threads might be warranted. Looking at Figure 5 in your paper the grid construction time seems to be ~40% of total construction time in your algorithm. In our experience this can be made up to 9x faster with a more scalable algorithm.

We also made the CUDA source code from our paper available, in the MIT license, on Github here in case you're interested: [LINK https://github.com/vcosta/cuda-grid] . In case you want to understand the algorithm just look at Figures 2,3,4 in our paper: [LINK https://www.academia.edu/13928983/Efficient_Grid_Construction_on_Streaming_Architectures] . It's totally parallel and uses no atomics whatsoever.

(L) [2017/06/08] [ost by ziu] [Irregular grids] Wayback!

>> javor wrote:Following up the discussion from the "glitch pictures" thread:
As with the Nemoto & Omachi and the Macro-regions paper, the resulting acceleration structures are indeed very similar, but not the same. You certainly have a point criticizing how the paper is positioned, but our view on this is that the novelty is mainly in the combination of construction and traversal algorithms (and the resulting performance), not so much in the the geometric representation of the spatial index structures.
Then why do you say this in Section 1, Paragraph 2, of your paper? >> In this paper, we demonstrate that efficient ray traversal on modern GPUs is also possible with a new type of acceleration structure, which combines characteristics of hierarchical grids and bounding volume hierarchies.
To me it looks like you guys clearly thought the acceleration structure itself was novel. The acceleration structure is EXCELL so I think that paper should be referenced or, at minimum, acknowledged in some fashion. The traversal algorithm you guys use IMHO is also similar to prior EXCELL algorithms. What is novel in your work IMHO is the construction method plus the fact it's optimized for a GPU architecture. I quote the "Adaptive cell division for ray tracing" (1991), B. Pradhan, A. Mukhopadhyay paper I referenced before in the other thread because I think it makes this more explicit:
>> 2. PROPOSED DATA STRUCTURE
We assume that the entire object space is uniformly divided into cells by a three-dimensional "virtual" grid of predefined resolution. We call this grid virtual because the cells are not occupied by the objects in object space but by integer entries which guide the search through object space. The virtual grid is represented by a three-dimensional integer array with each cell represented by an element of the array. The space subdivision is done in the following manner: At the beginning the entire object space is enclosed within a box called the root-box on which we superimpose the virtual grid. ... Fig. 1 illustrates this subdivision in two dimensions ... In the case of quadtree division (two-dimensional version of oct-tree division) this box will be divided into four uniform quadrants ... Now notice that each box contains an integral number of cells along both the x- and y-axes. Thus, every cell is contained in exactly one box. We mentioned earlier that the virtual grid is an integer array. This is because the array element corresponding to a cell is filled with identification number of the box which contains the cell. ... The data structures for cell division described in the previous section are shown in Fig. 3. It shows the box-array, an array of pointers to box structures created during subdivision. A box is entered into this array when it can no longer be subdivided. A box is uniquely identified by an index of the array. ... Each box consists of cells which contain the index of this box structure in the box-array.
theirs:
[IMG #1 Image]
yours:
[IMG #2 Image]
>> 3. TRAVERSING THE DATA STRUCTURE
Traversing the data structure by a ray is reduced to traversing the virtual grid in object space. When a ray is fired, the first point in which it intersects the root-box is calculated. From this point onward the number of cell units traversed by the ray in both the x-and y-directions are maintained. Since cells are of uniform size, this can be obtained by dividing the actual distance traversed in x- and y-directions by respective cell dimensions in those directions. These cell units need not be whole numbers, since a ray may happen to cross a fraction of a cell. But the integer portions of the cell units yield indices of the cell the ray is currently traversing. Using these to index into the virtual grid we get the number of the box the ray is traversing. ... To find the next box, first we have to obtain the point at which the ray leaves the present box. Next, we have to calculate a point slightly away from the previous point along the direction of the ray. Glassner [2] employs a method in which the ray is intersected with all the six planes of the current box, and the nearest plane or planes are found by determining the smallest t values. This method consumes nine multiplication and nine addition operations, assuming that division and subtraction are equivalent to multiplication and addition, respectively. Lee [5] uses a target plane method in which 5 multiplications and 6 additions take place on average. But, in the worst case, this is slower than the Glassner's method. We employ another different method which is faster than the above two methods. We start by observing that any ray can potentially intersect, at most, three of the six planes of the current box, at the point where it leaves the box. We classify these six planes as lower x- plane, higher x-plane, lower y-plane, higher y-plane, lower z-plane, and higher z-plane. After determining the ray source u and the ray vector v, we find which three of these planes will possibly be intersected by the ray with smallest t by examining its ray vector. For example, if v has a negative x component then the lower x-plane, but not the higher x-plane, is to be tested for intersection. This method consumes five multiplications and six additions in the worst case, which is equal to the average case performance of Lee's method [5].
It looks to me like this describes the method you guys use at:
[LINK https://github.com/madmann91/hagrid/blob/master/src/traverse.cu]

This is also described in the paper "A new algorithm of space tracing using a CSG model" (1987), Kadi Bouatouch, Mohamed Ouali Madani, Thierry Priol, Bruno Arnaldi.

Still think it's a waste of time to read the prior work? Shit, now you made me do *your* work. [SMILEY :-D] But this was fun because I found a lot of other papers about something else that I'm currently working on as a result so it's good.

AFAIK your construction algorithm *is* novel compared to prior published EXCELL work, plus it uses SAH, and it is implemented on a GPU, I have no doubt that you guys ended up reinventing a lot of this independently (it wouldn't be the first time it happened in the field; the grid voxel traversal algorithm was even independently conceived by at least 3 teams in the 1980s at roughly the same time frame in different areas of the globe!). But that's no excuse not to quote the prior art in the future now that you know that it exists. At least that's my POV. Still, it doesn't detract from your major achievement, like I said before.

[IMG #1]:

[IMG #2]:

(L) [2017/06/22] [ost by madmann] [Irregular grids] Wayback!

I am the first author of the paper.

I have a few remarks to make. First of all, I think there is undoubtedly novelty in the data structure we present. In particular, I see two major differences with previous work and the papers you mention:

We are using a hierarchical grid as an index ("brick map"). Previous approaches all use a uniform grid for that purpose. There however is a cost in using a hierarchical grid: Our virtual grid must have a resolution of the form K*(2^d), and doing a lookup in the virtual map is no longer one memory access, but a sequence of dependent memory accesses. The advantage is that our hierarchical grid allows us to easily reach grid resolutions of over 50000^3, whereas all of the previous approaches, if implemented on a modern GPU, would not be able to get beyond around 2048^3. A large margin of the performance gains we have come from this (see the section that compares against Macro-Regions in the paper).
We allow cells to overlap each other. The work that is the closest in that respect is the Macro-Regions paper, where only empty cells can overlap. This, in turn, gives more freedom to choose a cell shape than any of the previous approaches. As a consequence, the expansion phase can provide 20% performance improvements over the already merged grid.
Even if you completely disregard the novel construction algorithm, the optimized data layout, and all of the experiments, there is enough novelty in those two points alone to claim the data structure is new. That may not be earthshakingly new, but it is undeniably new. Research rarely starts from scratch, and ideas are always around, even if they are developed independently.

Concerning your related work suggestions, as Javor said, we have to make a choice as to what to mention in the references. The intended audience is people working using ray tracing today. Most of the approaches you mention, while doing something somewhat similar, are more than 20 years old, and re-implementing them for comparison purposes would not be trivial because they were not designed to work on GPUs. We already did that effort for Macro-Regions, and the results speak for themselves: Performance is just not here (on a simple scene, like Sponza, Macro-Regions are around 4 times slower). There are plenty of other approaches that do something somewhat similar and we simply do not have the time/space to put each and every one of them in the paper. The choice of references we made might not be perfect, but we believe it positions our paper as a modern, GPU ray tracing article.

(L) [2017/06/24] [ost by ziu] [Irregular grids] Wayback!

I dunno. The data structure itself looks the same as EXCELL to me. I think the summary I provided above, with snippets of prior literature, made the similarities quite clear. You mention that in your data structure "cells overlap each other" but this is not the case in the example color figure in your article that I quote above. None of those cells overlap. If the cells do overlap, then yes, this is more similar to macrocells, but if this is true, then you can have more than one cell in each voxel of the index which kind of defeats the purpose in my view. If your cells can overlap I think the figures need to make this clear. Yes, your way of building it, this looks novel to me, compared to how EXCELL data structures were built in the literature I looked at. Someone who really knows about quadtrees and octrees who worked on this back then would be able to discuss this much better than I can. It is not my main area of expertise. I have provided the links. Still you can read them and judge for yourself. If you do not have access to the papers you can PM me.

Is it that much of a big deal to you if someone else thought of this data structure before? If it makes you feel better this also happened to me. I ended up reinventing rectilinear grids for ray-tracing a couple decades after Mike Gigante came up with NUgrid [SMILEY :shock:]. Of course doing it decades afterwards means I did it differently and ended up solving one aspect of the problem which he did not back then. I figured out a way to lookup a grid cell in a rectilinear grid in constant, O(1), time. He only described a way to do this in O(log(N)) time. I published this in WSCG 2011. A couple years later someone published in SIGGRAPH 2013 another way to lookup grid cells in constant time in a rectilinear grid. With a totally different and way, way, more complicated algorithm than mine [SMILEY :lol:].

Anyway this is inconsequential. What matters, at least to me, is the performance of your algorithms is great. Also the fact is people aren't using this structure right now, that I am aware of, so your work does change the current perspective on the field IMHO. My point with the references is not only to give credit where credit is due, but also that there may be other related research to those papers which could help push this work further which might be neglected without the references for future research.

Also, a novel good construction algorithm in itself *is* a big deal. Proper SAH can enhance render performance like 2-3x on a kd-tree if I remember correctly. Was MacDonald and Booth's paper not relevant because Kaplan used bintrees before that? Like I said I think your paper is highly interesting, relevant, and novel. But the data structure is presented as novel and in my opinion it isn't. If you aren't satisfied with my opinion that's fine. I guess we'll just have to agree to disagree.

To finish, I do not think it is necessary for you to implement the original EXCELL algorithms and compare with those. That would basically be constructing an octree (e.g. with Karras 2012) and then creating the EXCELL index over that. It can be left as an exercise to the reader. Comparison with the current state of the art in the area is clearly more than good enough and you guys provide extensive tests which make the applications and benefits for your algorithms perfectly evident.

Irregular grids back

Board: Home Board index Raytracing General Development