AMD just made developer’s lives a little bit easier, and has saved consumers from more confusion when it comes to supporting technologies such as SSE4.  AMD has been working on their own version of SSE5 for some time before Intel came out with a set of instructions called AVX (Advanced Vector Extensions- not to be confused with the Larrabee architecture’s vector extensions).  Apparently there was a lot of overlap between SSE5 and AVX, enough so that AMD thought it prudent just to go ahead and support the extra features in AVX that SSE5 did not encompass, as well as create definitions/offsets of three other instructions that AMD had developed but were not enveloped by AVX.  These three being XOP (which are quite a few different instructions that were left over from SSE5), FMA4 (four operand fused multiply add), and CVT16 (half-precision floating point converts).

AMD’s solution will not be 100% compatible, and the main problem coming from the FMA portion.  Intel is speccing out a 3 operand destructive destination fused multiply add, while AMD went with the 4 operand non-destructive destination FMA.  AMD’s blogger of course feels that AMD’s overall solution was better and more rooted in developer support.  Whether or not that is true, AMD does have a bit more of a reputation of being consumer-centric.  AMD also was the first out with 3DNow! (well before KNI, otherwise known as SSE), and have slowly adopted SSE support.  SSE4a was not actually based on Intel’s SSE4 instructions, but the uptake of those instructions so far have been… problematic.

Going to a single standard is going to be best in the end.  While AMD might not have the fastest AVX implementation, at least they have an implementation.  This means that in software optimized for AVX, the processor will not be running the non-AVX code path, and it will be faster than it would have been if it had just used the non-AVX path.  Bulldozer will be the first processor to embrace these new extensions, while on the Intel side Sandy Bridge will be the first processor to utilize them.

AVX is a big step.  In fact, it is as big as the jump from MMX to SSE/3DNow!, perhaps even bigger.  It is of course unfortunate that AMD and Intel cannot work together a bit better on these things, but that is not in Intel’s best interest.  I think Intel is still stinging a bit from adopting AMD’s 64 bit X86 extensions, and they certainly were not going to use AMD’s SSE5 vector extensions (though interestingly enough AVX does a lot of the things that AMD detailed 8 months in their specification before Intel did).  So, looking from far away, it does appear as though AMD had a lot of the functionality in place in their design, and by adjusting the design could implement the majority of the features in Intel’s specification.

While AMD has a history of bringing new and interesting instructions to the world, in this case they realized that they still only have about 16% to 20% of the market (depending on the quarter) and Intel still has far more extensive developer relations.  By embracing AVX as best they can with Bulldozer (it will not be able to do the three operand destructive destination FMA that Intel redefined in its latest specification), then software developers will be able to do a more general AVX code path to enable vector acceleration on all new X86-64 CPUs from both AMD and Intel.