The Present and Future of Qualcomm GPUs

Qualcomm is now well on its way to the full release of the Adreno A4x series of GPUs, including the A405, A418, A420, and A430 with even higher performance and support for newer APIs like DX11.2. The A420 is already shipping with the Snapdragon 805 in phones like the Droid Turbo and Nexus 6. The A430 will increase the shader count by 2.25x, improving performance for so-called “superphones” and tablets.   

Many hurdles still remain for Qualcomm and other mobile GPU vendors that will push the demand for GPU compute well into 2016. Within a year, Qualcomm expects that 4K displays will be available for mobile devices that draw the same power as the QHD displays shipping in the top-end devices currently. High quality gaming, including previous generation console level graphics (PS3 / Xbox 360) from engines like UE4 (though obviously with smaller data sets) are coming. Newer APIs like DX12 and the next generation of OpenGL ES are right around the corner.  Furthermore, virtual reality headsets from companies like Samsung and Oculus already are using mobile devices to take over as a competitive gaming platform.  Currently the Samsung Galaxy VR headset only supports Qualcomm Snapdragon/Adreno-based devices.)

General purpose GPU computing will also drive the need for faster GPUs. Imaging and multimedia are key differentiators for hardware vendors including capabilities like 4K video recording, high frame rate capture, and real-time effects previews; all are able to utilize the GPU for parallel processing. OpenCL will be the key to much of this work so supporting all current and upcoming revisions to that standard (as well as DirectCompute) are critical.

Adreno A4x Architecture

The Adreno A4x addresses much of this API transition already, but more can and will be done to improve shader performance, tessellation capability, lower latency for transactions, and of course intelligent GPU throttling to allow all of this to occur in the same or lower power profiles.

A4x is the first Qualcomm GPU to support Direct X 11.2, OpenGL ES 3.1, as well as the Android Extension Pack (AEP) that Google announced along with the Lollipop operating system. This extension to the current graphics APIs adds support for hardware tessellation, geometry shaders, compute shaders and ASTC texture compression – all targeting PC-quality graphics in mobile devices. The A4x GPUs all support AEP and improve performance along the areas of individual shader pipes.

OpenCL 1.2 Full profile is supported along with Microsoft’s DirectCompute API targeting faster general purpose compute for mobile GPUs. RenderScript acceleration has also been improved over the A3x line of GPUs.

Texturing performance gets an upgrade in A4x along with added support for higher levels of anisotropic filtering with less impact on performance. ASTC support (Adaptive Scalable Texture Compression) allows for better level of detail capability and overall improved texture quality at the same performance levels. Qualcomm has increased texture cache sizes and the general purpose L2 cache as well to help out.

ROPs are what do the final blending of pixels and output to the screen and Qualcomm has improved them with A4x to achieve peak draw rates more frequently. Better Z-stencils allow for faster depth rejections, lowering the amount of pixels that the GPU must process to draw any particular scene, removing hidden regions from the pipeline.

All of these GPU improvements are demonstrated with a host of custom made demos from QC’s internal development house. I saw a hardware accelerated dynamic tessellation example centered around a very detailed hornet on cracked desert earth; the results were impressive coming from a reference platform utilizing the Snapdragon 810. GPGPU compute improvements were shown off with an accelerated video panorama capability, using HD video rather than simply stills to create one of the most interesting visual products you will see on a mobile device.

A4x is just the latest GPU available in the market from Qualcomm but you can be sure that more is coming; this company is not standing still.  Looking back at what has changed in the mobile market since 2008 is stunning.

If we focus on the capabilities that are enabled by Adreno and the graphics technologies, it offers a 60x performance improvement along with a 40x display resolution increase. The results are impressive by any metric.

Pixel Quality versus Pixel Quantity

In its partnership with dozens of OEMs and software developers, Qualcomm is preparing for another shift in the mobile market when it comes to graphics: that of pixel quality rather than just raw pixel quantity. The Snapdragon Display Engine is a portion of the display pipeline that has technologies integrated to improve the experience of the user beyond just adding pixels for stats-sheet padding. Built up of four discrete segments, the SDE includes composition acceleration, along with something that Qualcomm is calling “ecoPix” for display power reduction, “TruPalette” for improved picture quality, and an array of interfaces for various display technologies.

Composition acceleration is exactly what it sounds like: improving the performance of the compilation of the multitude of layers that need to be combined by the operating system and applications to provide a high quality user experience. The reduction of frame drops, otherwise known as jank, is important in this step as well and is part of a major push from Google to improve the smoothness of user experiences. Jank is critically impactful to how fast a system “feels” and the composition acceleration engine in the SDE helps here.

TruPalette is an interesting set of hardware and software tools that aim to improve the appearance of the pixels on the screen, regardless of the quality of the screen they are displayed on. This includes color enhancement to adjust the shift of colors along the gamut and memory color to improve the appearance of skin tones and foliage during those color enhancements. Most interesting is the gamut mapping capability that allows the source video/image to map correctly to the display gamut (provided by the OEM) to maintain color quality and consistency.

EcoPix is a set of features that Qualcomm offers to its GPU users to improve power efficiency. Content Adaptive Backlight (CABL) is a technology for LCDs that will adjust the luminance of the displayed frame while simultaneously lowering the backlight of the phone; lowering power but presenting the same perceived image to the user, thereby saving power. FOSS is the same idea for OLED screens that use self-emitting designs. Frame buffer compression lowers the amount of bandwidth required to go from the SOC to the display, which lowers overall power and can lower the amount of display memory necessary, reducing overall cost. Readers of PC Perspective are very familiar with variable refresh – the idea that a display can refresh at lower than standard rates (and more granular steppings). For Qualcomm, they use that ability to lower power consumption from display use.

Finally, the interface display control PHY allows connections to various outputs including a pair of DSI (display serial interface), an embedded DisplayPort (eDP) option, HDMI, and even full write-back support for wireless displays like Miracast.  

All of these technologies demonstrate Qualcomm’s focus on the quality and power efficiency of individual pixels rather than the raw horsepower required to push additional pixels. Obviously a modern mobile GPU needs to balance both of these pressure points and Adreno seems to have OEMs covered in either direction.

An Evangelical Position

My time at Qualcomm a few months back proved to be incredibly interesting and reiterated to me how dedicated the company was to the GPU as a first-class part of the Snapdragon SoC. It might seem obvious that the company had this kind of commitment, but for many that only see the outward marketing of other companies in the field, it often appears that Qualcomm is always one-step behind. That definitely isn't the truth; it appears to be in lock-step with the partners and OEMs that drive the industry forward.

Qualcomm’s executives understand that they are in an interesting position in the mobile market. Though they are easily the market leader in terms of units sold, the company chooses to work with customers on processor design goals rather than designing them in isolation and presenting them to customers after completion. The “silicon for silicon’s sake” methodology can work in the world where 200 watt GPUs are the norm, but when you start discussing metrics like micro-watts, getting things right the first time is critical to success. Qualcomm works with its customers, including all of the largest phone and tablet vendors in the world, to measure compute needs and to develop the perfect processor for each segment. The company will often find itself evangelizing to its partners about the needs of consumers, the wants of consumers, and the right hardware to get there. As mentioned earlier, Qualcomm was one of the first companies to point to industry standard benchmarks as a useful and warranted method for component selection and future development. They also understand that some tests that are outdated should be discounted and focus on them shifted away.  Qualcomm instead chooses to fight for the benchmarks that actually prove beneficial to the end user.

Qualcomm has a lot more on the line than most of the phone/tablet vendors with each individual chip. It only takes 8 months or so from development to shipment for a standard smartphone today (though there are obviously some exceptions), while the SoC that powers it will take 2-3 years from development to shipment. As with most computing hardware, a large portion of the engineering work that goes into a CPU, or GPU, or even communications processor is about predicting the needs of the future. Though no company is perfect, Qualcomm has gained unmatched credibility with customers and consumers by being correct for many years. Next time you see that cute little dragon ad during a commercial break, just remember how long it has taken to get there.

« PreviousNext »