AMD Ryzen and the Windows 10 Scheduler – No Silver Bullet
As it turns out, Windows 10 is scheduling just fine on Ryzen.
** UPDATE 3/13 5 PM **
AMD has posted a follow-up statement that officially clears up much of the conjecture this article was attempting to clarify. Relevant points from their post that relate to this article as well as many of the requests for additional testing we have seen since its posting (emphasis mine):
-
"We have investigated reports alleging incorrect thread scheduling on the AMD Ryzen™ processor. Based on our findings, AMD believes that the Windows® 10 thread scheduler is operating properly for “Zen,” and we do not presently believe there is an issue with the scheduler adversely utilizing the logical and physical configurations of the architecture."
-
"Finally, we have reviewed the limited available evidence concerning performance deltas between Windows® 7 and Windows® 10 on the AMD Ryzen™ CPU. We do not believe there is an issue with scheduling differences between the two versions of Windows. Any differences in performance can be more likely attributed to software architecture differences between these OSes."
So there you have it, straight from the horse's mouth. AMD does not believe the problem lies within the Windows thread scheduler. SMT performance in gaming workloads was also addressed:
-
"Finally, we have investigated reports of instances where SMT is producing reduced performance in a handful of games. Based on our characterization of game workloads, it is our expectation that gaming applications should generally see a neutral/positive benefit from SMT. We see this neutral/positive behavior in a wide range of titles, including: Arma® 3, Battlefield™ 1, Mafia™ III, Watch Dogs™ 2, Sid Meier’s Civilization® VI, For Honor™, Hitman™, Mirror’s Edge™ Catalyst and The Division™. Independent 3rd-party analyses have corroborated these findings.
For the remaining outliers, AMD again sees multiple opportunities within the codebases of specific applications to improve how this software addresses the “Zen” architecture. We have already identified some simple changes that can improve a game’s understanding of the "Zen" core/cache topology, and we intend to provide a status update to the community when they are ready."
We are still digging into the observed differences of toggling SMT compared with disabling the second CCX, but it is good to see AMD issue a clarifying statement here for all of those out there observing and reporting on SMT-related performance deltas.
** END UPDATE **
Editor's Note: The testing you see here was a response to many days of comments and questions to our team on how and why AMD Ryzen processors are seeing performance gaps in 1080p gaming (and other scenarios) in comparison to Intel Core processors. Several outlets have posted that the culprit is the Windows 10 scheduler and its inability to properly allocate work across the logical vs. physical cores of the Zen architecture. As it turns out, we can prove that isn't the case at all. -Ryan Shrout
Initial reviews of AMD’s Ryzen CPU revealed a few inefficiencies in some situations particularly in gaming workloads running at the more common resolutions like 1080p, where the CPU comprises more of a bottleneck when coupled with modern GPUs. Lots of folks have theorized about what could possibly be causing these issues, and most recent attention appears to have been directed at the Windows 10 scheduler and its supposed inability to properly place threads on the Ryzen cores for the most efficient processing.
I typically have Task Manager open while running storage tests (they are boring to watch otherwise), and I naturally had it open during Ryzen platform storage testing. I’m accustomed to how the IO workers are distributed across reported threads, and in the case of SMT capable CPUs, distributed across cores. There is a clear difference when viewing our custom storage workloads with SMT on vs. off, and it was dead obvious to me that core loading was working as expected while I was testing Ryzen. I went back and pulled the actual thread/core loading data from my testing results to confirm:
The Windows scheduler has a habit of bouncing processes across available processor threads. This naturally happens as other processes share time with a particular core, with the heavier process not necessarily switching back to the same core. As you can see above, the single IO handler thread was spread across the first four cores during its run, but the Windows scheduler was always hitting just one of the two available SMT threads on any single core at one time.
My testing for Ryan’s Ryzen review consisted of only single threaded workloads, but we can make things a bit clearer by loading down half of the CPU while toggling SMT off. We do this by increasing the worker count (4) to be half of the available threads on the Ryzen processor, which is 8 with SMT disabled in the motherboard BIOS.
SMT OFF, 8 cores, 4 workers
With SMT off, the scheduler is clearly not giving priority to any particular core and the work is spread throughout the physical cores in a fairly even fashion.
Now let’s try with SMT turned back on and doubling the number of IO workers to 8 to keep the CPU half loaded:
SMT ON, 16 (logical) cores, 8 workers
With SMT on, we see a very different result. The scheduler is clearly loading only one thread per core. This could only be possible if Windows was aware of the 2-way SMT (two threads per core) configuration of the Ryzen processor. Do note that sometimes the workload will toggle around every few seconds, but the total loading on each physical core will still remain at ~%50. I chose a workload that saturated its thread just enough for Windows to not shift it around as it ran, making the above result even clearer.
Synthetic Testing Procedure
While the storage testing methods above provide a real-world example of the Windows 10 scheduler working as expected, we do have another workload that can help demonstrate core balancing with Intel Core and AMD Ryzen processors. A quick and simple custom-built C++ application can be used to generate generic worker threads and monitor for core collisions and resolutions.
This test app has a very straight forward workflow. Every few seconds it generates a new thread, capping at N/2 threads total, where N is equal to the reported number of logical cores. If the OS scheduler is working as expected, it should load 8 threads across 8 physical cores, though the division between the specific logical core per physical core will be based on very minute parameters and conditions going on in the OS background.
By monitoring the APIC_ID through the CPUID instruction, the first application thread monitors all threads and detects and reports on collisions – when a thread from our app is running on the same core as another thread from our app. That thread also reports when those collisions have been cleared. In an ideal and expected environment where Windows 10 knows the boundaries of physical and logical cores, you should never see more than one thread of a core loaded at the same time.
Click to Enlarge
This screenshot shows our app working on the left and the Windows Task Manager on the right with logical cores labeled. While it may look like all logical cores are being utilized at the same time, in fact they are not. At any given point, only LCore 0 or LCore 1 are actively processing a thread. Need proof? Check out the modified view of the task manager where I copy the graph of LCore 1/5/9/13 over the graph of LCore 0/4/8/12 with inverted colors to aid viewability.
If you look closely, by overlapping the graphs in this way, you can see that the threads migrate from LCore 0 to LCore 1, LCore 4 to LCore 5, and so on. The graphs intersect and fill in to consume ~100% of the physical core. This pattern is repeated for the other 8 logical cores on the right two columns as well.
Running the same application on a Core i7-5960X Haswell-E 8-core processor shows a very similar behavior.
Click to Enlarge
Each pair of logical cores shares a single thread and when thread transitions occur away from LCore N, they migrate perfectly to LCore N+1. It does appear that in this scenario the Intel system is showing a more stable threaded distribution than the Ryzen system. While that may in fact incur some performance advantage for the 5960X configuration, the penalty for intra-core thread migration is expected to be very minute.
The fact that Windows 10 is balancing the 8 thread load specifically between matching logical core pairs indicates that the operating system is perfectly aware of the processor topology and is selecting distinct cores first to complete the work.
Information from this custom application, along with the storage performance tool example above, clearly show that Windows 10 is attempting to balance work on Ryzen between cores in the same manner that we have experienced with Intel and its HyperThreaded processors for many years.
Pinging Cores
One potential pitfall of this testing process might have been seen if Windows was not enumerating the processor logical cores correctly. What if, in our Task Manager graphs above, Windows 10 was accidentally mapping logical cores from different physical cores together? If that were the case, Windows would be detrimentally affecting performance thinking it was moving threads between logical cores on the same physical core when it was actually moving them between physical cores.
To answer that question we went with another custom written C++ application with a very simple premise: ping threads between cores. If we pass a message directly between each logical core and measure the time it takes for it to get there, we can confirm Windows' core enumeration. Passing data between two threads on the same physical core should result in the fastest result as they share local cache. Threads running on the same package (as all threads on the processors technically are) should be slightly slower as they need to communicate between global shared caches. Finally, if we had multi-socket configurations that would be even slower as they have to communicate through memory or fabric.
Let's look at a complicated chart:
What we are looking at above is how long it takes a one-way ping to travel from one logical core to the next. The line riding around 76 ns indicates how long these pings take when they have to travel to another physical core. Pings that stay within the same physical core take a much shorter 14 ns to complete. The above example was a 5960X and confirms that threads 0 and 1 are on the same physical core, threads 2 and 3 are on the same physical core, etc.
Now lets take a look at Ryzen on the same scale:
There's another layer of latency there, but let us focus on the bottom of the chart first and note that the relative locations of the colored plot lines are arranged identically to that of the Intel CPU. This tells us that logical cores within physical cores are being enumerated correctly ({0,1}, {2,3}, etc.). That's the bit of information we were after and it validates that Windows 10 is correctly enumerating the core structure of Ryzen and thus the scheduling comparisons we made above are 100% accurate. Windows 10 does not have a scheduling conflict on Ryzen processors.
But there are some other important differences standing out here. Pings within the same physical core come out to 26 ns, and pings to adjacent physical cores are in the 42 ns range (lower than Intel, which is good), but that is not the whole story. Ryzen subdivides by what is called a "Core Complex", or CCX for short. Each CCX contains four physical Zen cores and they communicate through what AMD calls Infinity Fabric. That piece of information should click with the above chart, as it appears hopping across CCX's costs another 100 ns of latency, bringing the total to 142 ns for those cases.
While it was not our reason for performing this test, the results may provide a possible explanation for the relatively poor performance seen in some gaming workloads. Multithreaded media encoding and tests like Cinebench segment chunks of the workload across multiple threads. There is little inter-thread communication necessary as each chunk is sent back to a coordination thread upon completion. Games (and some other workloads we assume) are a different story as their threads are sharing a lot of actively changing data, and a game that does this heavily might incur some penalty if a lot of those communications ended up crossing between CCX modules. We do not yet know the exact impact this could have on any specific game, but we do know that communicating across Ryzen cores on different CCX modules takes twice as long as Intel's inter-core communication as seen in the examples above, and 2x the latency of anything is bound to have an impact.
Some of you may believe that there could be some optimization to the Windows scheduler to fix this issue. Perhaps keep processes on one CCX if at all possible. Well in the testing we did, that was also happening. Here is the SMT ON result for a lighter (13%) workload using two threads:
See what's going on there? The Windows scheduler was already keeping those threads within the same CCX. This was repeatable (some runs were on the other CCX) and did not appear to be coincidental. Further, the example shown in the first (bar) chart demonstrated a workload spread across the four cores in CCX 0.
Closing Thoughts
What began as a simple internal discussion about the validity of claims that Windows 10 scheduling might be to blame for some of Ryzen's performance oddities, and that an update from Microsoft and AMD might magically save us all, has turned into a full day with many people chipping in to help put together a great story. The team at PC Perspective believes strongly that the Windows 10 scheduler is not improperly assigning workloads to Ryzen processors because of a lack of architecture knowledge on the structure of the CPU.
In fact, though we are waiting for official comments we can attribute from AMD on the matter, I have been told from high knowledge individuals inside the company that even AMD does not believe the Windows 10 scheduler has anything at all to do with the problems they are investigating on gaming performance.
In the process, we did find a new source of information in our latency testing tool that clearly shows differentiation between Intel's architecture and AMD's Zen architecture for core to core communications. In this way at least, the CCX design of 8-core Ryzen CPUs appears to more closely emulate a 2-socket system. With that, it is possible for Windows to logically split the CCX modules via the Non-Uniform Memory Access (NUMA), but that would force everything not specifically coded to span NUMA nodes (all games, some media encoders, etc) to use only half of Ryzen. How does this new information affect our expectation of something like Naples that will depend on Infinity Fabric even more directly for AMD's enterprise play?
There is still much to learn and more to investigate as we find the secrets that this new AMD architecture has in store for us. We welcome your discussion, comments, and questions below!
Here are 2 simple short
Here are 2 simple short videos that prove PCper is full of excrement:
https://www.youtube.com/watch?v=BORHnYLLgyY
https://www.youtube.com/watch?v=JbryPYcnscA
It is perfectly clear from the videos the way Win10 distributes treads affects the performance, and Win10 is doing it in a random and suboptimal way.
Or in another words PCper article above is pure disinformation.
Add these
Add these too:
https://www.youtube.com/watch?v=U9DE83lMVio
https://www.youtube.com/watch?v=XAXS8rYwGzg
We have results different
We have results different from in those videos. Additional data points sure, but that does not mean our results are not factual.
Allyn, ZoA video is saying
Allyn, ZoA video is saying the same thing as you, what ever tech is use for inter cpu communication is call is the problem.
A new scheduler can help to mitigate this problem.
So please test your code on windows 7, wanted to see the line graph.
Allyn, ZoA video is saying
Allyn, ZoA video is saying the same thing as you, what ever tech is use for inter cpu communication is call is the problem.
A new scheduler can help to mitigate this problem.
So please test your code on windows 7, wanted to see the line graph.
If anything, 4chan’s /g/
If anything, 4chan’s /g/ already debunked this GARBAGE of a so-called “article”, completely exposing PcPer’s lies and Ryan Shrout making up outright BS right out of his ass. This is especially laughable if you’ll take into consideration the mere fact that Microsoft themselves already admitted that the problem is actually there, it exists, and that they’re already working on the patch. And Microsoft admitted this BEFORE Ryan shat out this POS of an “article”, basically completely F’ing up on spot. Glorious. Simply glorious.
Yes, not Ryan, but Allyn,
Yes, not Ryan, but Allyn, whatever. PcPer is made out of Intel/Nvidia shills, so it doesn’t matter who wrote that, in this particular case. I’d be the same either way. What matters is the fact of a complete and utter F up by PcPer.
The article exposed
The article exposed interesting information that is not obvious…
AMD fanbois discrediting anything that points to any fault being on AMD’s side does not change the underlying facts one bit.
There was nothing “exposed”
There was nothing “exposed” by them, because it’s all BS pulled out of an ass. THEY were exposed for the liars that they are. Again, Microsoft already openly stated that the problem is there and that the problem is indeed in the Windows, NOT in the processor.
You just have an AXE to
You just have an AXE to grind, and you could care lees about anything else! CPU/GPU makers are not football teams!
You are going all Fukushima Daiichi after the Tsunami on everyone and grasping at any excuse for a verbal fisticuffs!
Man, you are rally latching
Man, you are rally latching onto that very generic tweet from Microsoft, aren't you? Also, what exactly about the actual screen shots and test results are you claiming to be a lie? Put up or shut up.
We only put out this sort of content to educate folks and to (hopefully) steer or accelerate the relevant companies to accelerate whatever optimizations are needed to FIX THE ISSUE. We are enthusiasts. We want everything to be better / faster / etc. Nothing about this is to bash one company or specifically promote another. If you insist on interpreting it that way, I recommend you check your own bias.
You’d be better off by
You’d be better off by completely deleting this so-called self-proclaimed “article”, because you were caught red-handed and smack dab in your lies, but instead you’re preferring to continue on throwing the tantrum and going into full denial. The thing is – it is YOU who should’ve been “putting up” by now and admitting everything, but considering that the actual thief always screams “thief!” the most loudest out of all people that are present in the room, it’s pretty clear you’re not going to admit on anything. Not like I’m surprised by any of this, in all honesty. Them shekels won’t workout themselves, naturally.
Was pretty much a intel shill
Was pretty much a intel shill detector done by the shill itself xD.
This is an interesting
This is an interesting article, however I think that an even deeper investigation is needed, given that disabling SMT on Ryzen does improve performance significantly in some cases, in a way that doesn’t happen on Intel chips, and that it apparently doesn’t happen on Windows 7. Perhaps the immediate suspects aren’t the cause, but something is definitely screwy.
Just a heads up.. To the
Just a heads up.. To the author and to those dismissing the tweet from ms. Its easy to dismiss one you selectively choose when they actuallly responded with a more clear response to a clear question.
The MS tweet didn’t say
The MS tweet didn’t say anything about if there is a problem in Windows or not… Reading too much into the tweet in either direction is just bad practice from both sides.
All you need to
All you need to know:
https://www.youtube.com/watch?v=O54bww5zoRM
PCPer can you just add the
PCPer can you just add the Windows 7 tests into the mix and release your source code for the simple C tests? It would go a long way to making your audience happier.
Thanks for this well written
Thanks for this well written and very well-researched and documented article concerning the scheduling and SMT performance issues with Ryzen. It would be great if you could share the C++ code for these tests so others could verify results on their own as well.
“While it was not our reason for performing this test, the results may provide a possible explanation for the relatively poor performance seen in some gaming workloads. Multithreaded media encoding and tests like Cinebench segment chunks of the workload across multiple threads. There is little inter-thread communication necessary as each chunk is sent back to a coordination thread upon completion. Games (and some other workloads we assume) are a different story as their threads are sharing a lot of actively changing data, and a game that does this heavily might incur some penalty if a lot of those communications ended up crossing between CCX modules.”
Based on this analysis about the CCX inter-module latency issue possibly explaining the poor 1080p gaming benchmark results, it seems like the next logical step in testing would be to generate some synthetic data to be transferred and processed on threads in different CCX module cores.
Call me a skeptic, but even though I am a big fan of AMD processor technology, I would be surprised to learn that AMD CPU test engineering is not well aware of some of the issues with the CCX latency and is investigating approaches internally and with Microsoft to address them.
One of the other things that should receive more attention as well is the significant differences between the inter-core ping times of Ryzen (26ns) vs Broadwell (14ns). I am surprised that with 14nm FinFET technology and a new CPU that these times are not much closer or even slightly favor Ryzen. The other bad news this indicates is that even with a single CCX design (R5 1500x ?) the SMT processing may still be considerably less efficient than Intel’s.
This must be the only (or one
This must be the only (or one of the only) comments that also touches the _inter_ CCX latency issue. It does seem to me that double the access latency for data if a thread runs on a different core might make a sizable difference and (as another commenter has pointed out), the Windows scheduler with its multiple runqueues and policy of picking the first free slot will really only make matters worse. (I have read up on the internals myself, though it’s been maybe almost a year or so; I do feel like I remember enough of it.)
So thread bouncing should also be investigated, maybe Ryzen would even profit from waiting a short time before moving threads off their home CPU. (Though even a single quantum is rather large; I’ll have to admit I don’t really know what to do there.)
It might also be worth looking into the performance Linux has. Whereas everyone basically gets the same scheduler on Windows, with little there to tweak, on Linux, there have been different schedulers for a long time, if not from its inception. This should allow for easier study of the effect of different scheduling policies on performance (for the very inclined 😉 ).
I, myself, remember the Completely Fair Scheduler (CFS), the BFS and the O(1) scheduler, though there seem to be others. I pulled this PDF from a short Google search: http://www.diit.unict.it/users/llobello/linuxscheduling20122013_P1.pdf . I realize it has very suboptimal formatting, but otherwise, the content appears to be sane.
To address the point risen in the final paragraph: There could be different implementations for inter-thread concurrency at play. I’m pretty sure AMD is also using a different protocol for cache concurrency. Maybe those could address at least some of the deviation.
[I don’t see any preview function, so let’s hope for the best …]
This must be the only (or one
This must be the only (or one of the only) comments that also touches the _inter_ CCX latency issue. It does seem to me that double the access latency for data if a thread runs on a different core might make a sizable difference and (as another commenter has pointed out), the Windows scheduler with its multiple runqueues and policy of picking the first free slot will really only make matters worse. (I have read up on the internals myself, though it’s been maybe almost a year or so; I do feel like I remember enough of it.)
So thread bouncing should also be investigated, maybe Ryzen would even profit from waiting a short time before moving threads off their home CPU. (Though even a single quantum is rather large; I’ll have to admit I don’t really know what to do there.)
It might also be worth looking into the performance Linux has. Whereas everyone basically gets the same scheduler on Windows, with little there to tweak, on Linux, there have been different schedulers for a long time, if not from its inception. This should allow for easier study of the effect of different scheduling policies on performance (for the very inclined 😉 ).
I, myself, remember the Completely Fair Scheduler (CFS), the BFS and the O(1) scheduler, though there seem to be others. I pulled this PDF from a short Google search: http://www.diit.unict.it/users/llobello/linuxscheduling20122013_P1.pdf . I realize it has very suboptimal formatting, but otherwise, the content appears to be sane.
To address the point risen in the final paragraph: There could be different implementations for inter-thread concurrency at play. I’m pretty sure AMD is also using a different protocol for cache concurrency. Maybe those could address at least some of the deviation.
[I don’t see any preview function, so let’s hope for the best …]
This must be the only (or one
This must be the only (or one of the only) comments that also touches the
inter
CCX latency issue. It does seem to me that double the access latency for data if a thread runs on a different core might make a sizable difference and (as another commenter has pointed out), the Windows scheduler with its multiple runqueues and policy of picking the first free slot will really only make matters worse. (I have read up on the internals myself, though it’s been maybe almost a year or so; I do feel like I remember enough of it.)
So thread bouncing should also be investigated, maybe Ryzen would even profit from waiting a short time before moving threads off their home CPU. (Though even a single quantum is rather large; I’ll have to admit I don’t really know what to do there.)
It might also be worth looking into the performance Linux has. Whereas everyone basically gets the same scheduler on Windows, with little there to tweak, on Linux, there have been different schedulers for a long time, if not from its inception. This should allow for easier study of the effect of different scheduling policies on performance (for the very inclined 😉 ).
I, myself, remember the Completely Fair Scheduler (CFS), the BFS and the O(1) scheduler, though there seem to be others. I pulled this PDF from a short Google search: http://www.diit.unict.it/users/llobello/linuxscheduling20122013_P1.pdf . I realize it has very suboptimal formatting, but otherwise, the content appears to be sane.
To address the point risen in the final paragraph: There could be different implementations for inter-thread concurrency at play. I’m pretty sure AMD is also using a different protocol for cache concurrency. Maybe those could address at least some of the deviation.
[I don’t see any preview function, so let’s hope for the best …]
This should be developed
This should be developed further and put into general testing of CPU’s going forward as knowing the topology of the cores and the latencies involved can be useful information for some.
Just like FCAT became part of standard testing of graphics cards, this should become part of standard testing of CPU’s.
Although I would agree there
Although I would agree there is no silver bullet, I think there is a problem with how Windows is scheduling threads. In contrast to the monolithic design of Intel’s processors, the Ryzen was built with a modular design– in this case, two compute complexes with an interconnect. This seems like a good compromise between Intel’s strategy of having two processor lines– one for consumer/gaming use and another for enthusiast/professional use. AMD has instead tried to create one processor which is well rounded and capable of both types of workflow. The operating system scheduler needs to be changed to understand that not all cores are created equal and to prefer threads from the same application stay on the same compute complex. In the case where an application is more parallel than a single compute complex can handle, threads should begin spilling into the other compute complex. Also, it would be preferable to prefer balancing threads across cores instead of assigning multiple threads to the same core using SMT.
This type of a modular, NUMA on a chip design I believe is the future, and while the operating system could better handle scheduling of threads, that is only a band-aid, as the true solution to this problem is that applications need to be NUMA aware. Some applications are already NUMA aware– video encoding such as x265 is capable of handling NUMA nodes, and many games are programmed to be NUMA aware on consoles, as the PS4 and Xbox One have a similar memory architecture to the Ryzen. I suspect that the PC ports of these games did not carry over any of the NUMA scheduling of the console game; however. Even with last generation’s consoles, games had to be programmed in a way to rely heavily on thread-local storage to avoid load-hit-stores, due to poor caching design of the PowerPC, so game developers should be familiar with how to resolve these types of issues already.
The problem I believe with
The problem I believe with Ryzen is it runs too hot and stops you getting a fast clock speed on air or water. At the same clock as an Intel the Ryzen is faster on benches and possibly gaming in general but about 4ghz is the limit.
I couldn’t do anything with my 1700X past 3907.47, even just opening a browser would make the system lock. I tried disabling SMT and shut down 6 cores but no cigar.
On gaming benches such a Heaven 1080p my i5 Haswell smokes it because it can run a much higher clock, even my FX6300 almost matches it at 5217mhz.
As a result the 1700X and ROG crosshair are packed up ready for RMA, I didn’t pay out £650 for a down grade which is exactly what it is for me.
If you where buying a chip to
If you where buying a chip to over clock you should have purchased the 1700 SKU and saved even more money.
Overclocking headroom on a top end SKU is a fool’s errand and the top end SKU by default should have the least amount of overcloscking headroom and for all the money spent the top end SKU better damn well be clocked(Base/Boost) to the highest possible speed out of the box or the customer is being ripped off!
Intel’s entire line of “K” branded parts are just an elaborate ruse and a txetbook case of marketing psychology where Intel engineers a little more headroom into its “K” series parts to give the impression of “Overclockablity” to the uneducated consumers who fall for that “K” marketing scheme”
AMD’s top end 1800X performs better with respect to clock speeds than AMD originally exoected and the 1800X is at it’s proper limit by design with any overclocking headroom as small as possible. The real deal for the overclocker from AMD comes in the 1700(NON X) which is $10 dollars less expensive than the discounted 7700 “K” SKU and the 1700 offers 8 cores for a damn good price, that overclocked performs like the top binned 1800X!
That Is the real deal for any real overclocker and not some marketing driven special “K” marketing brand that is intentionally engineered with a little extra “Overclocking” headroom to meet the illogically perceived expectations of the Rubes that fell for it!
OC to get what? 10% more?
The
OC to get what? 10% more?
The 1700X gives twice the performance off an i5 or fx6300 with lower lower power consumption and temps.
The i5 is just a quad core, try to do a tiny little thing while gaming and the cpu cr*aps itself, do that with 1700, barely touches his belly.
By that standard a 6900K would also be a “downgrade”.
Hi,
This is just my
Hi,
This is just my opinion.
I think that all ryzen cpus have been compared with skylake and kaby lake. And in games, where single thread matters the most, intel still has the lead. When playing 1080p, with gpus like 1080 and 1080ti, the processor can become bottleneck if it cant keep up at those high framerates.
For example, lets not forget that on most benchmarks where ryzen 1800x was losing to a 7700k,
in single core performance ryzen 1800x has about 160 single core performance (cinebench r15),
whereas i7 7700k has about 193 points.
I would have loved to see the ryzen 1800x in games against a similar single thread performance CPU (hasswell) for example, the i7 4770k at stock speeds, which has 156 points in cinebench r15.
TLDR; i think the difference between the ryzen 1800x getting 90 fps and the i7 7700k getting 120 fps is precisely that single core difference.
What do you guys think?
Also, explain this if you
Also, explain this if you can:
Look at the CS:GO and Rise of the Tomb Raider videos:
https://hardforum.com/threads/amd-ryzen-7-performance-windows-7-vs-windows-10.1926898/
So, when are you gonna really tackle this issue and run an actual Win7 VS Win10 comparison instead of ignoring some of the data in order to push a particular prefered hypothesis despite contrary evidence?
Win7 gaming is beyond the
Win7 gaming is beyond the scope of this article. Using simple tools and a pretty focused testing regime we get some hard numbers about what threading/scheduling looks like as well as latency across cores and CCX vs. what Intel has. We don't have any idea why there are perf differences between the OS's, but we have a clearer idea of what is happening behind the scenes with this testing with a focus on Win10. I'm not sure what further testing will be done this week, but it would be interesting to see these particular tests replicated on Win7.
So an analogy of what you are doing here… Tom decides to taste test the differences between a red delicious and a granny smith. You don't like the results because he didn't include an orange!
All we ask is for is an
All we ask is for is an objective win 7 VS 10 gaming benchmarks comparison so as to have further data availlable for an objective analysis as well as to give a general idea of what to expect from Ryzen once Win 10 gets up to speed ( and perhaps better in some cases ) with Win 7.
And the reason we want that is the reason we rely on sites like this for our information: buying decision making process.
More information from such a comparison certainly can’t hurt this process and could also kill off some hypotheses / narrow down the search for the ( sometimes massive ) gaming performance discrepancies observed between the 2 OSes.
I think that’s a pretty reasonable expectation considering the current context and data availlable so far. The analogy is simply a non-sequitur considering the motivations involved, a strawman logical fallacy iows.
Go to Anandtech’s Ryzen:
Go to Anandtech’s Ryzen: Strictly technical forum and read the posts there! That thread has been ongoing with the fouum members trying/testing on Linux/Windows 7/10 and each day more is added!
All of the Infinity Fabric details are not going to be released by AMD until the Zen/Naples and Radeon/Vega SKUs are RTM so some NDAa are still ongoing!
Maybe Cigarette Man may chime in on some forum in Dresden but the truth is out there! Oh! If only Anand Lal Shimpi was not currently sealed up so snugly in carbonite in that special cript deep below that round space-ship headquarters in the valley of silicon in Cali!
Oooh, non-sequiter strawman
Oooh, non-sequiter strawman logical fallacy! Let's throw in some false equivalency to complete the trifecta!
I am curious what you truly think are the motivations involved? The question initially addressed is "is the Win10 schedular broken when assigning threads to cores and what kind of performance issues could we potentially see with inter-CCX communications." Results show schedular is working as expected in these situations. Latency for core to core communications is pretty high.
Easy, right?
https://youtu.be/JbryPYcnscA
https://youtu.be/JbryPYcnscA
http://www.phoronix.com/scan.
http://www.phoronix.com/scan.php?page=article&item=nvidia-1080ti-ryzen&num=2
As I understand it they LOCK
As I understand it they LOCK (set the affinity for) the threads in the program that does the “pinging” between them.
The OS scheduler makes NO difference in this case (unless it
s so completly broken it doesn´t follow instructions)
Hey Ryan or Allen could you
Hey Ryan or Allen could you guys test the thread to thread latency on older AMD cpu’s such as Bulldozer or Piledriver to see if they have the same issues with increased latency module to module? In theory they should have the same problem, unless MS already “fixed” the issue for those architectures in the past. If the issue remains there could be performance yet to be seen in the older architectures should a fix from MS/AMD arrive; as well as more concert for why this issue wasn’t addressed long ago. For Science?
Sorry I spelled you’re name
Sorry I spelled you’re name wrong Allyn.
Sorry I spelled your name
Sorry I spelled your name wrong Allyn.
Great writeup. This kind of
Great writeup. This kind of content is why I love PC Perspective.
So how will naples work if
So how will naples work if they do something with the NUMA thing?
Maybe AMD should use nipples
Maybe AMD should use nipples driver for it? I mean naples. I am human so I have 2? female dogs have many of these.
So AMD should use nipples
So AMD should use nipples driver? stderr I mean naples? Cows and wolves, female, have many of these. lolz.
So how will naples work if
So how will naples work if they do something with the NUMA thing?
Hi Allyn, thanks for trying
Hi Allyn, thanks for trying to clear this thing up and for trying to moderate the comments, but resistance is futile. You are clearly an anti AMD fanboy and the numbers you posted are clearly just “expert opinions” made to put in a bad light what is clearly the best CPU that ever existed
/end sarcasm
Can’t wait for the next episode where you’ll hopefully throw some more mumbo jumbo expert data onto how maybe the second CCX’s L3 cache is being used by the 1st CCX even though the 2nd CCX’s cores are idle.
/really end srcasm
You did a great job, you did answer what you were chasing for – now I’m almost sure it’s not the SMT (I’m a researcher, I still want independent confirmation). It’s your fault for not answering every question everyone has, but I hope you strive for perfection and come up with the magic bullet to solve Ryzen’s gaming performance – or at least why it ain’t gonna happen no matter the amount of smoke an fanboy mirrors.
https://youtu.be/JbryPYcnscA
https://youtu.be/JbryPYcnscA
“even AMD does not believe
“even AMD does not believe the Windows 10 scheduler has anything at all to do with the problems they are investigating on gaming performance.”
I think this sentence, that was bolded, is poorly phrased and caused most of the confusion in the comments.
You have demonstrated windows does not have issues in regards to labeling the cores or with SMT.
However you do indeed discuss (and more in your video) that windows scheduler is not aware of the CCXs. Therefore it would indeed be correct to conclude that the windows 10 scheduler DOES have a problem and could be causing low gaming performance.
You needed to be specific with your conclusion (that you were testing SMT …) and not generalize “the scheduler is fine” when there are other possible issues (the CCXs)…
Thank you! This is great
Thank you! This is great work.
Everyone talks about gaming, but many lightly-threaded workloads suffer the same lackluster performance on Ryzen as many games. In Autodesk Inventor (3D CAD), a similarly clocked Xeon with 6 cores regularly outperforms Ryzen by a considerable amount. With many workstation applications being lightly-threaded, an Intel CPU is preferable.
I was hoping for bug fixes to find this missing performane. We need more competition in the highend. But, this lacking performance could simply be a result of the architecture.
We need to see how it turns out, but it doesn’t look good. I may get a Xeon E5-1650v4 or a Core i7 6800K for my animation workstation instead of a Ryzen 7. I will wait, though, to see what happens.
Peer reviewed Sources please!
Peer reviewed Sources please! And do include any professional Trade journals if you can!
Your comments appear anecdotal!
great video follow up, at
great video follow up, at least comments are being usefull in pushing things forward :p