PC Perspective Podcast #363 – 08/20/2015
Join us this week as we discuss DX12 Benchmarking, Skylake News from IDF, Intel Optane Storage and more!
You can subscribe to us through iTunes and you can still access it directly through the RSS page HERE.
The URL for the podcast is: https://pcper.com/podcast – Share with your friends!
- iTunes – Subscribe to the podcast directly through the iTunes Store
- RSS – Subscribe through your regular RSS reader
- MP3 – Direct download link to the MP3 file
Hosts: Ryan Shrout, Josh Walrath, and Allyn Malventano
*sorry for the audio problems with Ryan's Skype, still not quite sure what the issue was*
Program length: 1:13:03
-
Week in Review:
-
0:17:15 IDF 2015 Skylake Architecture:
-
This week’s podcast is brought to you by Casper. Use code PCPER at checkout for $50 towards your order!
-
News item of interest:
-
0:34:15 IDF 2015 Storage:
-
0:50:45 FMS 2015:
-
-
Hardware/Software Picks of the Week:
-
Closing/outro
Subscribe to the PC Perspective YouTube Channel for more videos, reviews and podcasts!!
sorry Ryan, but the IDF
sorry Ryan, but the IDF should invest in better intrawebs …your audio suffers a lot !!!
It wasn’t IDF it was Tom
It wasn’t IDF it was Tom Petersen getting payback for the ashes of the singularity article.
LOL!!!
Why does Ryan hate my ears!
Why does Ryan hate my ears! Is he using a microphone wrapped in a crinkly plastic bag which he is beating on the table while he talks? Ok, I’m joking … sort’a the audio on this is actually painful.
Did Ryan use the mic
Did Ryan use the mic integrated on the earbuds? I get much better results than that just using the mic integrated into my laptop.
Josh is trying to say it with
Josh is trying to say it with out saying it that he thinks the benchmark was rigged for AMD by being optimized for their hardware. I would like to hear him elaborate on the specifics. Did he know both Nvidia and AMD had the source code for 1 year? What did Nvidia do with it?
An example of what he is trying to claim is akin to Nvidia doing extreme levels of tessellation that isn’t even visible because they know AMD will choke harder on it.
Maybe Josh is right. My thoughts are AMDs labors have been paying off. I think all the console leverage has them working very closely with the industry powers that be such as Microsoft. There are excerpts in the DX12 manual that are copy and paste verbatim from the Mantle programming manual on function.
AMD has also been gearing up
AMD has also been gearing up for this with the Graphics Core Next architecture for quite some time.
Meanwhile at Nvidia headquarters: http://s23.postimg.org/y11wsixnf/meanwhile.jpg
I don’t think it is rigged
I don't think it is rigged towards AMD parts at all from the developer. Sure, they have had help from AMD but if both have had code for a year I think this may be more the results of having developed Mantle and worked on their drivers for a lot longer than NV has in these scenarios. DX12 may very well have been a back burner project for NVIDIA until relatively recently with the introduction of Win10. NV will improve, but it seems that in this particular case AMD is a bit ahead of the pack.
Being a low level API, it is
Being a low level API, it is kind of unclear whether Nvidia has that much more headroom here. I would expect that they can at least get it so the DX12 path is always faster than DX11 though. If it comes down to raw compute, AMD has more flops for many of these comparisons. An old 290x is actually around the same maximum flops as 980 Ti. This may be a best case scenario for AMD though. We don’t have any way to tell until we get more actual DX12 based engines to test.
Actually, I think the
Actually, I think the hardware itself has a lot more to do with it.
Arstechnica wrote up an excellent article explaining how Nvidia’s hardware is designed and built more for DX11(which is more serial processing) and AMD went with all of those ACE units(more massive parallelism) which is great for DX12 which uses command lists.
http://arstechnica.com/gaming/2015/08/directx-12-tested-an-early-win-for-amd-and-disappointment-for-nvidia/
Now i know that drivers can and will improve for both sides, but I suspect given the combination of both mantle/dx12 development and the hardware design(ACE units) that we will see better improvement for AMD cards than nvdia cards in future dx12 games, maybe not as dramatic as this first bench, but I think this trend will continue.
Josh I have our answer here.
Josh I have our answer here. Please check this out.
http://forums.guru3d.com/showthread.php?t=401950
“People wondering why Nvidia is doing a bit better in DX11 than DX12. That’s because Nvidia optimized their DX11 path in their drivers for Ashes of the Singularity. With DX12 there are no tangible driver optimizations because the Game Engine speaks almost directly to the Graphics Hardware. So none were made. Nvidia is at the mercy of the programmers talents as well as their own Maxwell architectures thread parallelism performance under DX12. The Devellopers programmed for thread parallelism in Ashes of the Singularity in order to be able to better draw all those objects on the screen. Therefore what we’re seeing with the Nvidia numbers is the Nvidia draw call bottleneck showing up under DX12. Nvidia works around this with its own optimizations in DX11 by prioritizing workloads and replacing shaders. Yes, the nVIDIA driver contains a compiler which re-compiles and replaces shaders which are not fine tuned to their architecture on a per game basis. NVidia’s driver is also Multi-Threaded, making use of the idling CPU cores in order to recompile/replace shaders. The work nVIDIA does in software, under DX11, is the work AMD do in Hardware, under DX12, with their Asynchronous Compute Engines.”
And the red team and the
And the red team and the green team continue their back and forth, but hopefully Vulkan will show some of its close to the metal strengths and when Steam OS goes to its official release and there will be even more improvements! I’m very interested in seeing how much the Vulkan API running on an OS without all the extra phone home telemetry and baked into OS bloatware/adware/spyware will do when it has more CPU and GPU cycles available for running the games. It’s going to be Vulkan verses DX12, and the reds and the greens will have some more arguing points! let’s hope the both Nvidia’s and AMD’s driver teams are not ignoring the potential Steam OS based systems that will be coming in November.
Well it’s just another round of butthurt from the loosing side this time, which will become another round when the opposing side scores a temporary lead. Round and round with the bruised egos of that special species Known as the Fanboy, knuckles dragging and Neanderthal mono-brows furrowed in anger.
I’m more interested in seeing
I’m more interested in seeing someone figure out how to fix streaming so that it doesn’t break with SLI. Games on SLI rigs get 140fps or so… turn on video capturing for streaming and you generally lose about 50% frames.
Shouldn’t the date be 19th
Shouldn’t the date be 19th and not the 20th?
I’ve watched it live on Wednesday night (-0500 UTC).
Allyn, Josh and Ryan: I
Allyn, Josh and Ryan: I really REALLY enjoyed Allyn’s excellent review of Optane: he has confirmed a lot of my own poorly informed expectations of that new memory.
QUESTION for all 3 of you: using Allyn’s shopping analogy — “go back to the supermarket and get pickles now” — am I correct to predict that 2.5″ SSDs with Optane will also benefit from cranking up the transmission clock over the data cable?
I’m thinking of the 2.5″ Intel 750 form factor, but with Optane instead of Nand Flash.
I’m confident that overclocking Optane DIMMs will become a favorite exercise for lots of Enthusiasts, and the science of overclocking is already so mature.
WHAT IF we design a storage controller with support for the NVMe protocol controlling 2.5″ Optane SSDs, and we build in the option to increase the transmission clock — perhaps in discrete steps?
PCIe 3.0 NVMe runs at 8G right now, and it also utilizes the 128b/130b jumbo frame supported by PCIe 3.0. Moreover, PCIe 4.0 will ramp the chipset clock rate to 16GHz.
Here’s a preliminary article I wrote which explored ramping up the clock rate on the data cables, by modifying an inexpensive PCIe AOC:
http://supremelaw.org/patents/BayRAMFive/overclocking.storage.subsystems.version.3.pdf
This one change could also occur with SATA and SAS channels. Plus, future versions of those specs should also adopt 128b/130b jumbo frames, as USB 3.1 did with its 128b/132b jumbo frame.
For example, such a 2.5″ Optane SSD could utilize a jumper block to switch among 6G, 8G, 12G and 16G (for starters).
Better yet, auto-detection is also option for achieving that same capability, and is also superior functionally speaking.
From what you said, Allyn, it may even be feasible to increase this transmission clock to 32GHz, assuming the data cable can handle this high data rate.
If Optane is as fast as you say, overall throughput should benefit from much faster transmission clocks, yes?
Your thoughts are always appreciated.
MRFS
Here’s a little comparative
Here’s a little comparative math:
G.Skill now offer DDR4-3000 (PC4-24000).
That’s 24 GB per second raw bandwidth.
What clock rate must each NVMe lane use,
to achieve that same theoretical bandwidth?
24 GB/s / 4 lanes = 6 GB/s per lane
PCIe 3.0 jumbo frame is 130 bits / 16 bytes = 8.125
6 GB/s x 8.125 bits per byte = 48.75 GHz
Are my numbers correct?
So I took all the bumps and
So I took all the bumps and clicks and FIRST converted them to binary, no good, BUT then I converted the binary to braille, and then randomly arranged the letters and took some mushrooms, and BANG, it couldn’t be clearer. HALF LIFE 3 CONFIRMED!!!!!
In for +1 on Arctic Fox
In for +1 on Arctic Fox reference
Inversely, if we start with
Inversely, if we start with 32 GHz, then:
32 Gb/s / 8.125 bits per byte = 3,938 MB/s per PCIe 3.0+ lane
3,938 x 4 PCIe lanes = 15,752 MB/s
If we start with 16 GHz (cf. PCIe 4.0), then:
16 Gb/s / 8.125 bits per byte = 1,969 MB/s per PCIe 3.0+ lane
1,969 x 4 PCIe lanes = 7,876 MB/s
Using the latter estimate, that raw bandwidth
exceeds the raw bandwidth of DDR2-800 (PC2-6400)
by (7,876 / 6,400) = 1.23x = 23% faster.
WOW!
I did this calculation because it’s not likely
that mass storage will return to parallel
“ribbon” data cables any time soon, particularly
now that SATA and SAS data cables are widely adopted.
Put simply, serial is here to stay
(just as constant change is here to stay 🙂
I’ve been using a 12GB ramdisk for several years,
using the very reliable RamDisk Plus from
http://www.superspeed.com . If Optane can achieve
mass storage throughput approaching 8,000 MB/s,
assuming a serial data cable and x4 PCIe lanes,
that speed will exceed the level of satisfaction
which I now enjoy with our 12GB ramdisk hosted
with a matched quad of Corsair DDR2-800 RAM.
Believe me, I’m honestly quite spoiled by
doing routine file system I/O with that ramdisk!
In other words, I should expect to cable
a single 2.5″ Intel Optane SSD to an integrated
motherboard port, or to an Add-On Controller port,
and experience throughput that is 23% faster
than the DDR2-800 ramdisk I am now enjoying,
assuming the cable transmission rate is 16 GHz.
(With PCIe 3.0 and the 8G clock, MAX HEADROOM
is about 4,000 MB/s, or about two-thirds of
DDR2-800 throughput (4.0 / 6.4). Still not too shabby.)
There’s where I believe Intel should be going
with Optane, in addition to the rest of Intel’s
roadmap for this new memory technology.
p.s. Personally, I feel it’s unfair to raise
the bar for mass storage all the way up to the
blazing speeds of today’s high-speed DDR4 DRAM.
Trading a significantly lower raw bandwidth
for very large storage capacities is a trade-off
I will gladly accept, as an inventor, designer
and high-performance workstation user.
MRFS