Final Thoughts, Performance Rankings, Conclusion
Final Thoughts
After reviewing all the benchmark data as well as the image quality screenshots, both GPGPU technologies had their pros and cons that could affect a consumer’s decision to purchase hardware and software that utilizes ATI Stream and/or CUDA. While Stream’s transcoding times were slightly better than CUDA in most of our performance tests, CUDA seemed to produce a higher quality image that evened things out a bit. Stream also seemed to be more efficient in using less of the CPU’s resources for transcoding while also producing fast transcoding times. However, these transcoding times might be lower because it is outputting lower-quality video files as our subjective image quality tests suggest.
Another interesting item to note is the GPU usage scores we recorded for our Radeon 4770. None of the GPU scores we received went over 23 percent, which indicates there’s still a lot of stream processing power available for programmers to take advantage of. Maybe we’ll also see some enhancements to future drivers from ATI and NVidia to use the extra GPU muscle too. Unfortunately, we weren’t able to record any of our 9800GTX+’s GPU scores, but we did see higher CPU usage numbers that indicate NVidia isn’t as concerned with multi-taskers who might like to use their computer for other tasks while they are transcoding video.
Cyberlink’s PowerDirector 7 is a full-featured video editing and transcoding application that supports both ATI Stream and CUDA. PowerDirector’s only “flaw”, if you want to call it that, is that it maxes out the CPU during transcoding and doesn’t leave any room for multi-tasking. The other program we got a chance to play with was another title from Cyberlink called MediaShow Expresso. A lot of talk has been buzzing around this particular app and for good reason. The transcoding times we recorded using Expresso were extremely quick. The UI had a Loiloscope feel to it and was intuitive from the moment I opened the program. Choosing different preset profiles was a snap and consumers should have an easy time adapting to Expresso’s two-step transcoding process.
Lastly, we were pretty impressed with the simplicity of ATI’s Avivo HD benchmark results against Handbrake. The overall transcoding times were exceptional and Avivo kept the CPU usage down for those of us who like to multi-task. The interface was extremely simple to use, but lacked some advanced features we’ve become accustomed to seeing in video transcoders like video effects and transitions options and better video customization options. ATI confirmed to us that Avivo HD does not support iPod, PSP, VC1, H.264, and MKV video formats at this time. However, it does support MPEG-1, VCD, MPEG-2, DVD/VOB, DVR-MS, DivX (as long as the codec is installed), and WMV formats, which is more than adequate for most users.
Performance rankings
To recap the goals of our review today, we wanted to rank how each GPGPU technology faired in meeting the intent of our testing perimeters for this article. A couple of our perimeters were specific to testing against a CPU-based transcoder,
Parameter 1: Evaluate CPU usage and determine how much of the computing load being handled by the CPU with ATI Stream/CUDA enabled and disabled
Winner: ATI Stream. During our evaluation, we noticed considerable differences in CPU usage between transcoding with ATI Stream and CUDA. CUDA’s average CPU usage was in the 80s, while Stream was closer to the high 60s. The extra CPU usage didn’t really help CUDA in producing faster transcoding times either. So, the winner would have to be ATI Stream because it used less resources and produced faster transcoding times. It also left enough resources for users to do additional tasks during transcoding.
Parameter 2: What performance differences will consumers notice between using ATI Stream or CUDA?
Winner: ATI Stream. The performance differences between these two GPGPU technologies was a bit mixed because Stream used less CPU power and had better transcoding times, but it seemed to produce lower quality videos. If we strictly viewed just the “performance” portion of our review, ATI Stream would win because of its benchmark results during performance testing. We’ll give a slight edge to ATI Stream in this portion of our ranking.
Parameter 3: Subjectively evaluate the image quality of outputted video that was transcoded with ATI Stream and CUDA
Winner: NVidia CUDA. CUDA seemed to produce a higher-quality image in two out of the three video clips we captured screenshots from. ATI Stream’s outputted video was a little bit softer in a few parts of the test videos and CUDA’s screenshots were brighter, clearer, and showed a little more detail overall. So, we’ll give CUDA the image quality crown.
Conclusion
We’d like to thank Cyberlink and AMD (ATI) for providing their respective transcoding software for our review today. GPGPU technology is really still in its infancy and GPU acceleration for video transcoding is just the beginning. I’m sure both AMD (ATI) and NVidia have their sights set on using the GPU for more general tasks and are working with programmers to move toward utilizing stream computing for other types of applications. The benefits of GPU acceleration is undeniable, especially in the video transcoding department. The differences between transcoding with the GPU and CPU in tandem as opposed to using the CPU alone suggest that GPU acceleration plays a large role in outputting video at faster rates. I’m sure we’ll see a lot more from the GPGPU realm that consumers and enthusiasts should benefit from not only from performing basic tasks, but with more computing-intensive programs.
An individual user may in fact want different benefits at different times as well: if I am in a rush to catch a flight I might my movie to encode incredibly fast regardless of quality so I don’t miss the plane. Or I might have planned ahead that night and decided I want a better quality encode but still have a time crunch.
CPU utilization is an important factor as well for multi-tasking. NVIDIA’s implementation is obviously using some extra CPU cycles to improve quality in a way that AMD’s implementations are not. Again – two different perspectives on what you want to do with your system.
What I am trying to get at is that for me – I would favor the higher image quality results of the NVIDIA CUDA-based implementations of these GPGPU apps in just about 99% of circumstances. If you are considering UPCONVERTING your content to HD quality, for example, what is the point of getting it done “faster” if it isn’t done in the best quality possible? If you are one of those users archiving your DVD content locally then you would also likely desire the better image quality of the NVIDIA CUDA software as opposed to the AMD Stream software.
In the end though, Steve is correct: GPU computing is here in a pretty big way but still has further to go before it is really everything for everyone.



Please change the tile.
Your
Please change the tile.
Your article is not about a comparison of Stream and CUDA performance, it is the difference between two software implementations utilising Stream and CUDA.
These technologies allow you to parallelise your algorithms, to imply that one technology performs ,as you essentially say, ‘better quality maths’ than the other is ignorant.
Please do not misdirect readers like this.
Regards.
Joe Bloggs
Please change you word.
Your
Please change you word.
Your comment is not about a reply to the article, it is a quantification of how butthurt you are.
These new breakthrows allow us to see how badly you are spell ,as you essentially try to use ‘larger words’ but not good at English.
Please do not obfuscate readers’ thoughtings like this.
Regards.
Bloe Joggs
damn dude, look at your own
damn dude, look at your own english, it’s absolutely dreadful!
Ya dude, your an idiot, your
Ya dude, your an idiot, your article is misleading. For sure!
Peace
Hater Bater Fuck Face
SO MUCH HATE !
SO MUCH HATE !
You are comparing two cards,
You are comparing two cards, one is nearly a year older than the other one, its elementary that the new one is going to win. This review is biased
Why are you not comparing the
Why are you not comparing the same frame in the outputs? How can you do a comparison of different frames and make a decision on differences in quality?
My personal gaming research
My personal gaming research team has found nVIDIA’s CUDA technology to be superior, but they compared current GPUs, not GPUs with a manufacturing time gap.
This is a very interesting
This is a very interesting article to contribute to my PC Hardware class, as I’m currently in a Network Admin program in Vermont. Please keep up the good work guys I love your site, and you have been very helpful over the last several semesters.
For BitCoin Minners AMD GPUs
For BitCoin Minners AMD GPUs faster than Nvidia GPUs!
Why?
Firstly, AMD designs GPUs with many simple ALUs/shaders (VLIW design) that run at a relatively low frequency clock (typically 1120-3200 ALUs at 625-900 MHz), whereas Nvidia’s microarchitecture consists of fewer more complex ALUs and tries to compensate with a higher shader clock (typically 448-1024 ALUs at 1150-1544 MHz). Because of this VLIW vs. non-VLIW difference, Nvidia uses up more square millimeters of die space per ALU, hence can pack fewer of them per chip, and they hit the frequency wall sooner than AMD which prevents them from increasing the clock high enough to match or surpass AMD’s performance. This translates to a raw ALU performance advantage for AMD:
An old AMD Radeon HD 6990: 3072 ALUs x 830 MHz = 2550 billion 32-bit instruction per second
A New Nvidia GTX 590: 1024 ALUs x 1214 MHz = 1243 billion 32-bit instruction per second
This approximate 2x-3x performance difference exists across the entire range of AMD and Nvidia GPUs. It is very visible in all ALU-bound GPGPU workloads such as Bitcoin, password bruteforcers, etc.
Secondly, another difference favoring Bitcoin mining on AMD GPUs instead of Nvidia’s is that the mining algorithm is based on SHA-256, which makes heavy use of the 32-bit integer right rotate operation. This operation can be implemented as a single hardware instruction on AMD GPUs (BIT_ALIGN_INT), but requires three separate hardware instructions to be emulated on Nvidia GPUs (2 shifts + 1 add). This alone gives AMD another 1.7x performance advantage (~1900 instructions instead of ~3250 to execute the SHA-256 compression function).
Combined together, these 2 factors make AMD GPUs overall 3x-5x faster when mining Bitcoins!
Fucking plagerism. Copy/paste
Fucking plagerism. Copy/paste from some other source, no citation or credit. Your education should be shredded and flushed down the toilet. Here is where you copied it from for people who want to read from someone with actual knowledge and not just ctrl+c —> ctrl+v.
https://en.bitcoin.it/wiki/Why_a_GPU_mines_faster_than_a_CPU
You plagerized me. I
You plagerized me. I complained about someone else who copied something and posted a link. All you did was change the link. You are a loser and the worst scum on the internet.
Why are we bitching about
Why are we bitching about plagiarism? If i wanted to make sure his info was correct i would’ve looked it up myself. I could care less if it was “plagiarized” as long as the information was correct.