Ray tracing for more than rendering
John Carmack sat down to talk with us about the current world in graphics including all the discussion about Intel and ray tracing, what id has in mind for the next engine following id Tech 5 / Rage and how the multi-GPU world is panning out.John Carmack sat down to talk with us about the current world in graphics including all the discussion about Intel and ray tracing, what id has in mind for the next engine following id Tech 5 / Rage and how the multi-GPU world is panning out.
Many of our readers, as well as comments across the web, asked for feedback from the developers. It makes sense – these are the people that are going to be spending their money and time developing games to sell on next-generation architecture so surely their opinions would be more grounded in reality than a hardware company trying to push their technological advantages. With that in mind, we spent some time talking with John Carmack, the legendary programmer at id Software famous for Wolfenstein, Doom, Quake and the various engines that power them. What started out as a simple Q&A about Intel’s ray tracing plans turned into a discussion on the future of gaming hardware, both PC and console, possible software approaches to future rendering technology, multiple-GPU and multi-core CPU systems and even a possible insight into id Tech 6, the engine that will replace the id Tech 5 / Rage title.
The information that John discussed with us is very in-depth and you’ll probably want to block off some time to fully digest the data. You might also want to refresh your knowledge of octrees and voxels. Also note that in some areas the language of this text might seem less refined than you might expect simply because we are using a transcription of a recorded conversation.
PC Perspective: Let’s just jump right into the issue at hand. What is your take on current ray tracing arguments floating around such as those featured in a couple of different articles here at PC Perspective? Have you been doing any work on ray tracing yourself?
John Carmack: I have my own personal hobby horse in this race and have some fairly firm opinions on the way things are going right now. I think that ray tracing in the classical sense, of analytically intersecting rays with conventionally defined geometry, whether they be triangle meshes or higher order primitives, I’m not really bullish on that taking over for primary rendering tasks which is essentially what Intel is pushing. (Ed: information about Intel’s research is here.) There are large advantages to rasterization from a performance standpoint and many of the things that they argue as far as using efficient culling technologies to be able to avoid referencing a lot of geometry, those are really bogus arguments because you could do similar things with occlusion queries and conditional renders with rasterization. Head to head rasterization is just a vastly more efficient use of whatever transistors you have available.
But, I do think that there is a very strong possibility as we move towards next generation technologies for a ray tracing architecture that uses a specific data structure; rather than just taking triangles like everybody uses and tracing rays against them and being really, really expensive. There is a specific format I have done some research on that I am starting to ramp back up on for some proof of concept work for next generation technologies. It involves ray tracing into a sparse voxel octree which is essentially a geometric evolution of the mega-texture technologies that we’re doing today for uniquely texturing entire worlds. It’s clear that what we want to do in the following generation is have unique geometry down to the equivalent of the texel across everything. There are different approaches that you could wind up and try to get that done that would involve tessellation and different levels of triangle meshes and you can could conceivably make something like that work but rasterization architecture does really start falling apart when your typical triangle size is less than one pixel. At that point you really have lost much of the benefits of rasterization. Not necessarily all of them, because linearly walking through a list of primitives can still be much faster than randomly accessing them for tracing, but the wins are diminishing there.
Wolfenstein 3D, 1992 – courtesy of MobyGames.com
Wolfenstein 3D, 1992 – courtesy of MobyGames.com
In our current game title we are looking at shipping on two DVDs, and we are generating hundreds of gigs of data in our development before we work on compressing it down. It’s interesting that if you look at representing this data in this particular sparse voxel octree format it winds up even being a more efficient way to store the 2D data as well as the 3D geometry data, because you don’t have packing and bordering issues. So we have incredibly high numbers; billions of triangles of data that you store in a very efficient manner. Now what is different about this versus a conventional ray tracing architecture is that it is a specialized data structure that you can ray trace into quite efficiently and that data structure brings you some significant benefits that you wouldn’t get from a triangular structure. It would be 50 or 100 times more data if you stored it out in a triangular mesh, which you couldn’t actually do in practice.
I’ve been pitching this idea to both NVIDIA and Intel and just everybody about directions as we look toward next generation technologies. But this is one of those aspects where changing the paradigm of rendering from rasterization based approach to a ray casting approach or any other approach is not out of the question but I do think that the direction that Intel is going about it as a conventional ray tracer is unlikely to win out. While you could start doing some real time things that look interesting its always going to be a matter of a quarter the efficiency or a 10th of the efficiency or something like that. Intel of course hopes that they can win by having 4x the raw processing power on their Larrabee versus a conventional GPU, and as we look towards future generations that’s one aspect of how the battle may shape up. Intel has always had process advantage over the GPU vendors and if they are able to have an architecture that has 3-4x the clock rate of the traditional GPU architectures they may be able to soak the significant software architecture deficit by clubbing it with processing power.
From the developers stand point there are pros and cons to that. We could certainly do interesting things with either direction. But literally just last week I was doing a little bit of research work on these things. The direction that everybody is looking at for next generation, both console and eventual graphics card stuff, is a “sea of processors” model, typified by Larrabee or enhanced CUDA and things like that, and everybody is sort of waving their hands and talking about “oh we’ll do wonderful things with all this” but there is very little in the way of real proof-of-concept work going on. There’s no one showing the demo of like, here this is what games are going to look like on the next generation when we have 10x more processing power – nothing compelling has actually been demonstrated and everyone is busy making these multi-billion dollar decisions about what things are going to be like 5 years from now in the gaming world. I have a direction in mind with this but until everybody can actually make movies of what this is going to be like at subscale speeds, it’s distressing to me that there is so much effort going on without anybody showing exactly what the prize is that all of this is going to give us.
PCPER: So, because Intel’s current demonstrations are using technology from two previous generations rather than showing off one or two generations AHEAD of today, there is little exciting to be drawn from it?
CARMACK: I wouldn’t say there’s anything that Intel has shown, even if they network a whole room full of PCs and say “we’ll be able to stick all of this on a graphics card for you in the coming generation,” I don’t think they’ve shown the win. I don’t think they’ve shown something people will say “my god that’s 10x cooler” or “that makes me want to buy a new console”.
Doom, 1993 – courtesy of MobyGames.com
Doom, 1993 – courtesy of MobyGames.com
It is tough in a research environment to do that because so much of the content battle now is media rather than algorithms. They’ve certainly been hacking on the Quake code bases to at least give them something that is not an ivory tower toy, but they’re working with something that is previous generation technology and trying to make it look like something that is going to a next-gen technology. You really can’t stretch media over two generational gaps like that, so they’re stuck. Which is why I’m hoping to be able to do my part and provide some proof of concept demo technology this year. We’re working on our RAGE project and the id Tech 5 code base but I’ve been talking to all the relevant people about what we think might be going on and what our goals are for an id Tech 6 generation. Which may very well involve, I’m certainly hoping it involves, ray tracing in the “sparse voxel octree” because at least I think I can show a real win. I think I can show something that you don’t see in current games today, or even in the current in-development worlds of unique surface detail. By following that out into the extra dimension of having complete geometric detail at that same density I think can provide something that justifies the technological sea change.
PCPER: How dramatic would a hardware change have to be to take advantage of the structures you are discussing here?
CARMACK: It’s interesting in that the algorithms would be something that, it’s almost unfortunate in the aspect that these algorithms would take great advantage of simpler bit-level operations in many cases and they would wind up being implemented on this 32-bit floating point operation-based hardware. Hardware designed specifically for sparse voxel ray casting would be much smaller and simpler and faster than a general purpose solution but nobody in their right mind would want to make a bet like that and want to build specific hardware for technology that no one has developed content for. The idea would be that you have to have a general purpose solution that can approach all sorts of things and is at least capable of doing the algorithms necessary for this type of ray tracing operation at a decent speed. I think it’s pretty clear that that’s going to be there in the next generation. In fact, years and years ago I did an implementation of this with complete software based stuff and it was interesting; it was not competitive with what you could do with hardware, but it’s likely that I’ll be able to put something together this year probably using CUDA. If I can make something that renders a small window at a modest frame rate and we can run around some geometrically intricate sparse voxel octree world and make a 320×240 window at 10 fps and realize that on next-generation hardware that’s optimized more for doing this we can go ahead and get 1080p 60 Hz on there.
That would be the justification that would make everybody sleep a whole lot of better that there is going to be some win coming out this.
PCPER: Is AMD’s tessellation engine that they put in the R600 chips anywhere close to what you are looking for?
CARMACK: No, tessellation has been one of those things up there with procedural content generation where it’s been five generations that we’ve been having people tell us it’s going to be the next big thing and it never does turn out to be the case. I can go into long expositions about why that type of data amplification is not nearly as good as general data compression that gives you the data that you really want. But I don’t think that’s the world beater; I mean certainly you can do interesting things with displacement maps on top of conventional geometry with the tessellation engine, but you have lots of seaming problems and the editing architecture for it isn’t nearly as obvious. What we want is something that you can carve up the world as continuously as you want without any respect to underlying geometry.