AMD’s solution will not be 100% compatible, and the main problem coming from the FMA portion. Intel is speccing out a 3 operand destructive destination fused multiply add, while AMD went with the 4 operand non-destructive destination FMA. AMD’s blogger of course feels that AMD’s overall solution was better and more rooted in developer support. Whether or not that is true, AMD does have a bit more of a reputation of being consumer-centric. AMD also was the first out with 3DNow! (well before KNI, otherwise known as SSE), and have slowly adopted SSE support. SSE4a was not actually based on Intel’s SSE4 instructions, but the uptake of those instructions so far have been… problematic.
Going to a single standard is going to be best in the end. While AMD might not have the fastest AVX implementation, at least they have an implementation. This means that in software optimized for AVX, the processor will not be running the non-AVX code path, and it will be faster than it would have been if it had just used the non-AVX path. Bulldozer will be the first processor to embrace these new extensions, while on the Intel side Sandy Bridge will be the first processor to utilize them.
AVX is a big step. In fact, it is as big as the jump from MMX to SSE/3DNow!, perhaps even bigger. It is of course unfortunate that AMD and Intel cannot work together a bit better on these things, but that is not in Intel’s best interest. I think Intel is still stinging a bit from adopting AMD’s 64 bit X86 extensions, and they certainly were not going to use AMD’s SSE5 vector extensions (though interestingly enough AVX does a lot of the things that AMD detailed 8 months in their specification before Intel did). So, looking from far away, it does appear as though AMD had a lot of the functionality in place in their design, and by adjusting the design could implement the majority of the features in Intel’s specification.
While AMD has a history of bringing new and interesting instructions to the world, in this case they realized that they still only have about 16% to 20% of the market (depending on the quarter) and Intel still has far more extensive developer relations. By embracing AVX as best they can with Bulldozer (it will not be able to do the three operand destructive destination FMA that Intel redefined in its latest specification), then software developers will be able to do a more general AVX code path to enable vector acceleration on all new X86-64 CPUs from both AMD and Intel.