AMD's Robert Hallock (previously the Head of Global Technical Marketing for AMD and now working full time on the CPU side of things) has posted a comprehensive Ryzen update, covering AMD's official stance on Windows 10 thread scheduling, the performance implications of SMT, Windows power management settings, and more. The post in its entirety is reproduced below, and also available from AMD by following this link.
(Begin statement:)
It’s been about two weeks since we launched the new AMD Ryzen™ processor, and I’m just thrilled to see all the excitement and chatter surrounding our new chip. Seems like not a day goes by when I’m not being tweeted by someone doing a new build, often for the first time in many years. Reports from media and users have also been good:
- “This CPU gives you something that we needed for a long time, which is a CPU that gives you a well-rounded experience.” –JayzTwoCents
- Competitive performance at 1080p, with Tech Spot saying the “affordable Ryzen 7 1700” is an “awesome option” and a “safer bet long term.”
- ExtremeTech showed strong performance for high-end GPUs like the GeForce GTX 1080 Ti, especially for gamers that understand how much value AMD Ryzen™ brings to the table
- Many users are noting that the 8-core design of AMD Ryzen™ 7 processors enables “noticeably SMOOTHER” performance compared to their old platforms.
While these findings have been great to read, we are just getting started! The AMD Ryzen™ processor and AM4 Platform both have room to grow, and we wanted to take a few minutes to address some of the questions and comments being discussed across the web.
Thread Scheduling
We have investigated reports alleging incorrect thread scheduling on the AMD Ryzen™ processor. Based on our findings, AMD believes that the Windows® 10 thread scheduler is operating properly for “Zen,” and we do not presently believe there is an issue with the scheduler adversely utilizing the logical and physical configurations of the architecture.
As an extension of this investigation, we have also reviewed topology logs generated by the Sysinternals Coreinfo utility. We have determined that an outdated version of the application was responsible for originating the incorrect topology data that has been widely reported in the media. Coreinfo v3.31 (or later) will produce the correct results.
Finally, we have reviewed the limited available evidence concerning performance deltas between Windows® 7 and Windows® 10 on the AMD Ryzen™ CPU. We do not believe there is an issue with scheduling differences between the two versions of Windows. Any differences in performance can be more likely attributed to software architecture differences between these OSes.
Going forward, our analysis highlights that there are many applications that already make good use of the cores and threads in Ryzen, and there are other applications that can better utilize the topology and capabilities of our new CPU with some targeted optimizations. These opportunities are already being actively worked via the AMD Ryzen™ dev kit program that has sampled 300+ systems worldwide.
Above all, we would like to thank the community for their efforts to understand the Ryzen processor and reporting their findings. The software/hardware relationship is a complex one, with additional layers of nuance when preexisting software is exposed to an all-new architecture. We are already finding many small changes that can improve the Ryzen performance in certain applications, and we are optimistic that these will result in beneficial optimizations for current and future applications.
Temperature Reporting
The primary temperature reporting sensor of the AMD Ryzen™ processor is a sensor called “T Control,” or tCTL for short. The tCTL sensor is derived from the junction (Tj) temperature—the interface point between the die and heatspreader—but it may be offset on certain CPU models so that all models on the AM4 Platform have the same maximum tCTL value. This approach ensures that all AMD Ryzen™ processors have a consistent fan policy.
Specifically, the AMD Ryzen™ 7 1700X and 1800X carry a +20°C offset between the tCTL° (reported) temperature and the actual Tj° temperature. In the short term, users of the AMD Ryzen™ 1700X and 1800X can simply subtract 20°C to determine the true junction temperature of their processor. No arithmetic is required for the Ryzen 7 1700. Long term, we expect temperature monitoring software to better understand our tCTL offsets to report the junction temperature automatically.
The table below serves as an example of how the tCTL sensor can be interpreted in a hypothetical scenario where a Ryzen processor is operating at 38°C.
Power Plans
Users may have heard that AMD recommends the High Performance power plan within Windows® 10 for the best performance on Ryzen, and indeed we do. We recommend this plan for two key reasons:
- Core Parking OFF: Idle CPU cores are instantaneously available for thread scheduling. In contrast, the Balanced plan aggressively places idle CPU cores into low power states. This can cause additional latency when un-parking cores to accommodate varying loads.
- Fast frequency change: The AMD Ryzen™ processor can alter its voltage and frequency states in the 1ms intervals natively supported by the “Zen” architecture. In contrast, the Balanced plan may take longer for voltage and frequency (V/f) changes due to software participation in power state changes.
In the near term, we recommend that games and other high-performance applications are complemented by the High Performance plan. By the first week of April, AMD intends to provide an update for AMD Ryzen™ processors that optimizes the power policy parameters of the Balanced plan to favor performance more consistent with the typical usage models of a desktop PC.
Simultaneous Multi-threading (SMT)
Finally, we have investigated reports of instances where SMT is producing reduced performance in a handful of games. Based on our characterization of game workloads, it is our expectation that gaming applications should generally see a neutral/positive benefit from SMT. We see this neutral/positive behavior in a wide range of titles, including: Arma® 3, Battlefield™ 1, Mafia™ III, Watch Dogs™ 2, Sid Meier’s Civilization® VI, For Honor™, Hitman™, Mirror’s Edge™ Catalyst and The Division™. Independent 3rd-party analyses have corroborated these findings.
For the remaining outliers, AMD again sees multiple opportunities within the codebases of specific applications to improve how this software addresses the “Zen” architecture. We have already identified some simple changes that can improve a game’s understanding of the "Zen" core/cache topology, and we intend to provide a status update to the community when they are ready.
Wrap-up
Overall, we are thrilled with the outpouring of support we’ve seen from AMD fans new and old. We love seeing your new builds, your benchmarks, your excitement, and your deep dives into the nuts and bolts of Ryzen. You are helping us make Ryzen™ even better by the day. You should expect to hear from us regularly through this blog to answer new questions and give you updates on new improvements in the Ryzen ecosystem.
(End statement.)
Such topics as Windows 7 vs. Windows 10 performance, SMT impact, and thread scheduling will no doubt still be debated, and AMD has correctly pointed out that optimization for this brand new architecture will only improve Ryzen performance going forward. Our own findings as to Ryzen and the Windows 10 thread scheduler appear to be validated as AMD officially dismisses performance impact in that area, though there is still room for improvement in other areas from our initial gaming performance findings. As mentioned in the post, AMD will have an update for Windows power plan optimization by the first week of April, and the company has "already identified some simple changes that can improve a game’s understanding of the 'Zen' core/cache topology, and we intend to provide a status update to the community when they are ready", as well.
It is refreshing to see a company publicly acknowledging the topics that have resulted in so much discussion in the past couple of weeks, and their transparency is commendable, with every issue (that this author is aware of) being touched on in the post.
There is a couple possible
There is a couple possible issues here. One is if the scheduler needlessly bounces a thread between different CCXs. This will obviously thrash the caches a bit. That issue can be reduced by core parking and such. That probably isn’t the main issue here though. The main issue is probablly shared data. This isn’t really classified as a NUMA issue. Access to memory should be the same for both CCXs. Threads are usually going to share data by mapping shared memory. Both processes map the same area of memory. Any reads and writes have to use a locking mechanism. How much this effects performance will depend on how much and how frequently they read and write the shared memory. If very frequent communication is necessary, then the best case scenario would be for them to actually be on the same physical processor. In that case, the shared memory area would probably end up cached in the L2 cache. The next best thing is to be on the same CCX, since the shared memory would be in the shared L3 cache. If processes running on separate CCXs share memory though, this will cause a big increase in latency due to cache coherency operations and copying modified cache blocks back and forth.
For running on the same CCX, acquiring the lock and reading or writing shared memory doesn’t involve any operations on the fabric or memory controller if it is in the CCXs cache hierarchy. Writing to shared memory that is in both CCX caches would require traffic on the fabric just to get the lock. Any modified data would then need to be copied over before the other process can acquire the lock to read it. That is a lot of added latency. This isn’t a failure of the scheduler. The scheduler does not operate at that level. It doesn’t know which threads need to do a lot of communication. Therefore, changing the scheduler can’t really fix this issue.
The scheduler knows not to run two threads on the same physical core, if it can avoid it. When it has to schedule more than 4 threads, it has to choose whether to run it on the same CCX, doubling up one of the processors, or running it on the other CCX. If there is a lot of shared memory accesses (the schedulers probably doesn’t know anything about that) then it would be better to run it on the same CCX. If there isn’t a lot of shared memory accesses, then it could be much better to run it on the other CCX. On the other CCX, the process gets a core and L2 caches to itself along with reduced L3 load. The scheduler probably doesn’t have sufficient information to choose between those. Trying to schedule on the same CCX would hurt performance of many applications. They may be able to do a few things to increase performance, like tending to schedule threads on the same CCXs rather than anywhere, although that could reduce performance for some applications also. You would have a lot of resources on the other CCX going to waste. On Intel’s architecture, the exact same core it was on before is best, but if that isn’t available, then it doesn’t matter which one is chosen. On AMD’s architecture, it does matter. There isn’t a good way to handle many of these cases unless developers assign core affinities to group the threads that need to do a lot of communication.
If the claims of windows 7 performance are correct, then I would be interested in seeing some testing for that to see what is going on. I would wonder if it was even designed to handle a 16 core processor. Those were probably a lot less common when windows 7 was released.
Your face when you didn’t
Your face when you didn’t fall for the Rypoo meme
The most hilarious thing
The most hilarious thing about all of this is that Linux works with CCX architecture absolutely fine and fully since the day-one. Literally zero problems on Unix-like systems. It’s just the Windows that lacks all of the proper support. But PcPer’s Intel shillers would defend to death their baked “nothing is wrong with Windows, it’s the processor!” BS. They’re just like the ‘Murican MSMs – caught red-handed and were absolutely rekt for the lies they’ve spread, but they’ll never ever admit any of it because if that happens everything will be over for them. This is truly a very sad sight to behold.
Submitted for your
Submitted for your entertainment: http://www.phoronix.com/scan.php?page=article&item=nvidia-1080ti-ryzen&num=2
http://www.youtube.com/watch?
http://www.youtube.com/watch?v=URBZaFhizGc
But keep on trying, Josh. You ARE a walrus, after all.
What does that link have to
What does that link have to do with Linux and your declaration?
>”The minimums? PAH, who
>”The minimums? PAH, who cares! I’M JOSH WALRUS!”
Really, now.
Chen: You talk about Linux
Chen: You talk about Linux being perfect, Josh points you to Linux results countering your claim, and you come back with Windows results? Does not compute, bro.
Minimums is what matters, not
Minimums is what matters, not the “Averages” Josh was throwing at me. Averages and Maximums don’t mean jack. When it comes down to Intel compiler-abused YOBA GAYMS, only minimums matter. Minimums are good and well on Linux. They are not on Windows. But keep living in denial, I guess.
Minimums? As in
Minimums? As in framerates?
Try again.
Yeah… Totally matters if framerate minimums are higher if they’re consistently lower than the competition overall.
Or, go to techreport and read the article there about why minimum framerates isn’t what matters.
“Gen1” Zen has minimums of 68
“Gen1” Zen has minimums of 68 in Windows 7 and 16 in NSA Spyware 10. In the same game, on the same settings, and with the same hardware. Averages don’t mean jack because they dip. And they can dip heavily. Minimums are minimums. They never dip, because they’re, you guessed it, MINIMUMS. Higher minimums are always better than Averages or Maximums. Especially if you can get very high minimums while maxing everything out and on very high resolutions. High minimums guarantee absolutely smooth experience, if they go 50 and above. Minimums never stutter, so minimums of 60 will always be more preferable than 100500+ of Averages that are stuttering like ducks. “Gen1” Zen can into high minimums on Windows 7. On NSA Spyware 10 it can’t for now, because Micro$oft doesn’t understand CCX. On Linux RyZen has high minimums in everything, because Linux understands CCX. And “Gen1” Zen’s minimums are quite high across the board when on Windows 7 or Linux, unlike InFail’s overpriced CrappyFake with dried horse sperm under the cover.
Minimums/maximums are set by
Minimums/maximums are set by a single data point across an entire run consisting of thousands and are a poor representation of the overall performance of pretty much anything. Minimums can also vary wildly from run to run, especially if they are caused by a cache miss, interrupt collision, etc. Also, you can have a game that dips 20% below the average five times a second on one platform compared to the other platform dipping *once* to 40% and then call the former a better experience when clearly it is not. That's the reason Ryan presents frame rating data in percentiles. I take that a step further with latency percentiles for storage performance.
See, this kind of response is
See, this kind of response is why i read PCPer. They actually get it right when it comes to differentiating overall or real world performance and obsessing over a single data point thats being misused to “make a point”.
You are a lamer.
You are a lamer.
Heres an idea: fuck yourself
Heres an idea: fuck yourself you deranged AMD fanfuck
Linux/OS is not perfect but
Linux/OS is not perfect but Linux/OS is one thing! And that one thing is that Linux is not from M$ or controlled by M$!
So I’m looking for that Ryzen/Vega laptop APU SKU used on a Linux OS based laptop. A Linux OEM produced(as if there is any other option for laptops than OEM!) laptop that will give me a relatively M$/Intel/Nvidia free laptop option at an affordable price where I am the real owner of my laptop hardware.
“And that one thing is that
“And that one thing is that Linux is not from M$ or controlled by M$!”
This is the point in your post when your friendly Microsoft representative enters the office, refers to his an envelope full of Linux-using licencees, and hands you a coffer to fill with your regular ‘indemnification’ payments.
That was a mistake, but M$
That was a mistake, but M$ would never EVER dare to use that.
Because otherwise, there will be war. And we will win in that war.
Linux is freedom (if of
Linux is freedom (if of “GNU/Linux” Stallman’s circle).
So is AMD. So is Zen. So is Radeon.
This is only the beginning.
Free Software was the first step.
The next step – free hardware.
Um what? Theyre going to be
Um what? Theyre going to be giving away free hardware? Where do i go to get my socialist free handout PC from the hardware bureau?
You are as big a wing nut on
You are as big a wing nut on your right end as the poster that you replied to is on his(Implied by you) left end of the wing nut teeter totter.
There is nothing more disgusting in this world than the communist and capitalist worshiping wing nut rednecks that are so absolutely abject-morialistic with their crazy obsessive worship of these two systems that do not really work.
But the with a world so full of the masses of Bumpkins and one day they can all be replaced by robots and all the communist and capitalist worshiping wing nut rednecks sent packing out of the world’s cities and towns forever.
The real threat to civilization is the bumpkin, whatever system that the bumpkin believes in!
Star screeching, landwhale.
Star screeching, landwhale.
Basement dwelling lipid-head!
Basement dwelling lipid-head!
NINE NINETY NINE!!!!!
NINE NINETY NINE!!!!!
Out of 999999 HP I have.
Out of 999999 HP I have.
The issue has been determined
The issue has been determined already.
https://community.amd.com/community/gaming/blog/2017/03/14/tips-for-building-a-better-amd-ryzen-system
The problem is not Ryzen, or the Windows scheduler, or SMT. It’s because some games are not using the correct CPU topology map.
Fair bet that despite advice to do a fresh install of everything, some reviewers just plugged in their ‘review drive’ with all their test games on it and proceeded to test using games with the wrong topology map. Hence review scores being all over the place.
Still, it should be a pretty easy fix. As the biggest problem seems to be windows reporting Ryzen as a 16 core processor instead of an 8C/16T one. Which means the topology map will be wrong in any event. Even if generated by a fresh install.
No, it’s more like a filthy
No, it’s more like a filthy combination OF Windows incapable of CCX AND pro-Intel compiler biased YOBA GAYMS.
It a complete and utter clusterduck, but there’s literally NOTHING wrong with the stone itself, it’s all on the software side.
AMD is directly contradicting
AMD is directly contradicting you in their statement, multiple times. You're welcome to take it to the grave, but you're only embarrassing yourself here with the conspiracy theories, as well as making AMD look bad. They say it's not the scheduler and they appear to be making good on their promise to work with developers to optimize their apps and games to work better with Ryzen. Stomping your feet about schedulers is not going to get anyone anywhere at this point.
Yep, the problem is not with
Yep, the problem is not with the scheduler itself, that works as it should, just look at the MT scores ? The problem is that the topology map that the scheduler bases its decisions on is not necessarily correct.
The effect this has will vary depending on the app. Some games like F1 will probably benefit greatly from a fix, others, hardly at all.
https://www.youtube.com/watch
https://www.youtube.com/watch?v=aUk5T3AkJYE
Master Chen should just be banned and all his idiotic comments deleted, he’s clearly one of those paid AMD Red Team shills.
The only thing that is clear
The only thing that is clear regarding Master Chen is that there is some serious mental problems involved.
I do support a ban on that user.
Calm your mad ass, InFail
Calm your mad ass, InFail shill.
Your mother is a cunt whore?
Your mother is a cunt whore? Everyone already knows that though.
Sigh, okay guyse, this kiddo
Sigh, okay guyse, this kiddo has clearly gone over the board. This is waaay beyond any redemption even for his lame ass. If PcPer won’t do anything about this so-called “comment” of his, you will show your true faces to everyone. Now, I’m screening, if anything.
I was about to … now I am
I was about to … now I am reconsidering.
You should be fired, then.
You should be fired, then.
You should shut your ignorant
You should shut your ignorant fuckin yap
You should shut your ignorant
You should shut your ignorant fuckin yap
i guess I have the same
i guess I have the same mental issues as Master Chan because i agree with most of what he says
when I don’t, I enjoy his passion and humor
In that case I would say you
In that case I would say you guessed correctly.
You agree with me because you
You agree with me because you know the truth. Unlike absolute majority of the kindergartner-tier trollie boys that are attacking me here, you’ve clearly researched the matter and read info provided by 4chan’s /g/ and many other sources. So did I. That’s why you understand what I’m saying there.
If you rely on info from
If you rely on info from 4chan, that says a lot about your mental state.
ESAD already.
ESAD already.
Says he doesnt know how to
Says he doesnt know how to troll
Comments like this make me
Comments like this make me wish PCPer had implemented a function to replace Youtube links with their headlines and maybe creators.
Cause this dude and his click-baitery was frustrating enough in Youtube’s recommendations. Also, would save me from opening for the threehundreth time the same handful of Ryzen / Win7 videos.
Reminds me of a saying: if
Reminds me of a saying: if every time you walk into a room, it smells like sh.it, then maybe YOU’RE the one sh.itting
Usually when I walk into the
Usually when I walk into the room, everyone’s already covered in it and I’m wearing a sparkling white suit.
XD
I love you MC. Don’t ever
XD
I love you MC. Don’t ever change. Cheers man
I’m not an MC, I’m a DJ. Not
I’m not an MC, I’m a DJ. Not exactly same things.
And not very bright. MC ->
And not very bright. MC -> Master Chen
Buy yourself a brain already,
Buy yourself a brain already, scrub.
No wonder you are so fucked
No wonder you are so fucked up if this usually happens to you.
I’m the only clean person
I’m the only clean person here, so there’s nothing “ducked up” about that.
I think he’s paid off by
I think he’s paid off by Intel because his deranged dribbles are actually making me want to stay as far away from AMD products as possible by association.
AMD is also affiliated with
AMD is also affiliated with G2A, the money laundering carding site.
I think he’s paid off by
I think he’s paid off by Intel because his deranged dribbles are actually making me want to stay as far away from AMD products as possible by association.
I think you’re paid off by
I think you’re paid off by Intel because your deranged dribbles are actually making me want to stay as far away as possible from Intel products by association.
You wouldn’t happen to have a
You wouldn’t happen to have a 14 or 18 core Xeon E5 v3 laying around anywhere? It appears to have 3 different settings for cache coherency. I am wondering if the COD (cluster on die) setting is the closest to Ryzen. Here is the anandtech article about it:
http://www.anandtech.com/show/8423/intel-xeon-e5-version-3-up-to-18-haswell-ep-cores-/4
Splitting 8 cores into 2 core clusters is an engineering trade-off. AMD achieves lower core-to-core latency between threads in the same core cluster. The ring bus cannot scale to that many cores without becoming a bottleneck for many applications. That is why Intel has different modes to allow the cores to be split into multiple clusters also. To achieve the best performance, the software will just need to set core or cluster affinity somehow. I doubt anyone has tried using a 14 to 18 core haswell EP with two separate ring buses for gaming, but if you did, you might see the same behavior. Note that the 3 different settings are totally dependent on what you are running. Some applications will perform better with one setting, while that setting will hurt other applications.
Yes, good point. Though the
Yes, good point. Though the ring bus itself might have higher bandwidth and/or lower latency than Infinity Fabric. Not sure of the details, but with four generations of those chips so far, Intel has had time to iron things out.
The ring bus does 32 bytes a
The ring bus does 32 bytes a cycle, I think. I have seen the same number for a Ryzen core connection to the fabric. The topology and many other things are probably completely different though. We don’t have much information on the infinity fabric yet.
If Allyn’s numbers are
If Allyn’s numbers are remotely representative it was 14 ns for the same core and 76 for other cores on the 5960x. AMD has 3 levels, 26 for the same core, 42 for those in the same CCX, and 142 for those in different CCXs. So as long as developers set core affinities, they can get 42 on Ryzen compared to 76 for the 5960x. I wouldn’t take those numbers to mean anything super precise though. Most thread to thread communication will take place via shared memory. If it is in the same L2 or L3 cache, then performance will probably be good. Trying to read and write the same memory from two different CCXs with 2 different L3 caches is bad though.
thanks for that explanation
thanks for that explanation
The ryzen 1700 is basically a
The ryzen 1700 is basically a 5960x for $300, we should all as consumers applaud amd for this launch. I owned the FX 8120/8320/8350 and these cpu’s are light year ahead of those disasters, cannot wait jump aboard this ryzen ship.
Something puzzle me… I got
Something puzzle me… I got an r7 1700 system running.
I did nothing but set the clock to 3.7ghz and I get a cinebench score of 1939
At 3.6ghz I get a score of 1831
And my voltage go DOWN as I raise clock during cinebench ?
1.160 at 3.6ghz
1.152 at 3.7ghz
The PC crash at 3.8ghz (the voltage is under 1.15v)
note: I use 2133mhz ram, as anything dont even let me boot the PC.
I’m using the stock cooler and it doesn’t seem to spin that fast.
Max recorded 1864rpm in prime95…
Does this make any sense ?
I wonder if I can break 2K cinebench score with 1.2v …
Or even go higher with 3200 DDR vs 2133 ?