BEST HARDWARE GPU SHEDULING!
– GPU scheduling, there’sreally no way to avoid that this is a very boring sounding topic, but as fans of AMD Ryzen processors know, scheduling can make a huge difference to how a computer performs. And the scheduler that weuse for our graphics cards has been around since Windows Vista. It’s a total dinosaur. That is why today’stopic is such a big deal. We’re gonna look at the performance impact of Windows 10’s shiny, new, hardware-accelerated GPU scheduler, and discuss the gotchasthat might’ve kept Microsoft from implementing it for so long. But before that, gotcha, segue time. Ting wants to help you savemoney by getting you to pay for only the mobile data you use. Check it out at the linkbelow to find out more. (electronic music) At least a few of youare probably scratching your heads right now, going, “What the heck is a scheduler anyway?” In short, whenever yourcomputer does some work, it needs to know when and for how long it should crunch those numbers in order to make sure thatit doesn’t end up doing either too much work, potentially resulting in asystem freeze or too little, resulting in low performance. For modern desktop CPUs,with many processing cores, it’s easy to understand why this is so important for performance.
Not only do you need tomake sure that the workloads are balanced and not all justpiled onto a single core, you also need to be aware of things like the layouts of the course, because there can be significantperformance penalties for splitting a task up one way versus splitting it up another way. And we’ve seen this in the real world, much of Windows multi-corescheduling until recently was geared towards hyper-threading and at most a dual or quad-core CPU. But then along came AMDRyzen and wrecked all that, and performance wasn’t great at first, especially with early days Threadripper. But as the scheduler improvedand became more aware of how to best utilizethese extra resources, performance improved as well. Back to graphics cards now. In the early days, you’d have the CPU just load the GPU up with commands without worrying about scheduling at all. I mean, you were only runningone GPU-driven application at a time anyway, like a game. So it didn’t really matter. Now this was fast, butit limited flexibility. Remember guys, all tabbingbetween multiple games or running games in a window,that wasn’t really a thing. And when it came time toaccelerate the desktop, like in Windows Vista’s AeroUI, this had to be dealt with. So Microsoft created theWDDM software scheduler, where the CPU could buffer theGPUs commands a frame ahead while the GPU was busyworking on the current frame. This was still fast, butdevelopers had to make a choice.
They could either submitshort bursts of commands to minimize latency, or they could send large command groups to minimize performance overhead. There wasn’t any way of completelyremoving either trade-off with a CPU-based schedulerlike we’ve been using, and that doesn’t considermulti-GPU setups at all. But actually speaking of which, make sure you get subscribed because we got our hands ona really cool multi-GPU card from 2003, and you’re notgonna want to miss it. Anyway, that all changes with the Windows 10 May, 2020 update. Now we’ve got hardware-acceleratedGPU scheduling in Windows, which basicallymeans that Windows tells the GPU what tasks are the highestpriority on like a high level, but the GPU manages itsown resource allocation between those workloads toget the best of both worlds, lower latency and higheroverall throughput. So then why isn’t it the default? As with all things we need to dig in and figure that out for ourselves. So we grabbed a Ryzen 3 3100to test a low-thread count CPU, a Ryzen 9 3950X to testa high-thread count CPU, a GeForce RTX 2080 Ti totest a high-end Nvidia GPU and a Radeon 5700 XT, totest a high-end AMD GPU. In theory, this covers all our bases since it’s ultimately gonna be the CPU that’s gonna bethe bottleneck here, right? Let’s start with the Nvidia graphics card. At 10 ADP with low settings, we’re stressing the CPU more than the GPU. And in most of the games we tried, hardware-accelerated GPU scheduling actually doesn’t seem towork out all that low for us.
We’re typically lookingat roughly the same or slightly faster orslightly slower performance, depending on the game. That’s in spite of thetheoretical advantages that we should be seeing here. Now, as frame rates get higher, as in Doom Eternal and CS:GO, that performance got widened significantly with as much as 13% performance lost in our minimum frame rates. So it seems that despite beingCPU-bound in this scenario, offloading the GPU scheduling from the CPU isn’t really helping, at least not with these early drivers. Our Radeon 5700 XT didhave a better go of things, particularly in combinationwith the Ryzen 3, where we got mostly normal frame rates, pretty much across the board with the exception of Red Dead Redemption and Doom Eternal with our Ryzen 9. This is probably because the Ryzen 9 has enough cores to handlethe scheduling normally, but these games, whichare uniquely both running the lower-level Vulkan API are developed with softwarescheduling in mind, and aren’t taking advantageof the extra slack. Still we’re not exactlybeing blown away here with the Ryzen 3 either. And it’s tough to firmly recommend turning the hardware GPU schedulingon for gaming anyway, based on this scenario alone. Now it’s time to kickthings into high gear by running our games at 4Kwith Mac stout settings, a GPU-bound scenario. Here, Nvidia sees a reasonableimprovement in performance in Shadow of the Tomb Raiderand in Red Dead Redemption 2 with minimum frame ratesshooting up by 15 to 20% on both of our CPUs. That’s not too shabby. Doom Eternal though is ourfirst major stumbling block for high-rise hardware GPU scheduling, leaving our Ryzen 3 in arelatively good position, but hamstringing our 16-core chip. Finally, CS:GO doesn’t seem to care much what handles the schedulingon Nvidia, this resolution, which may be down to it’s older, comparatively primitive engine. As for AMD, there’s notmuch movement at 4K, which might be expecteddue to the low frame rates that we can expect out of a5700 XT at that resolution. Both Shadow of the Tomb Raiderand Red Dead Redemption 2 are essentially unchangedover the traditional scheduler with the averages being a bit lower and higher for the quad-coreand 16-core respectively.
Doom Eternal shows the same measurable, if slight performancebump across the board, and CS:GO seems to not reallylike hardware GPU scheduling with a multi-CCD CPU, like our 3950X droppinga good 5% in performance, compared to when we ran itwith the traditional scheduler. In productivity, both of ourCPUs and both of our GPUs, saw anywhere from slight to notable performanceimprovements though, with the Blender BMW test getting more out of hardware GPU scheduling with AMD’s OpenCL implementationversus Nvidia’s CUDA. Conversely V-Ray, whichonly ran on Nvidia hardware shows a tangible improvement, which gives us some level of confidence that hardware GPUscheduling should be a boon to productivity-minded folk going forward. Overall, it looks as thoughour Ryzen 3 3100 averages around the same performance as with traditional GPU scheduling with modest performancegains and less of a loss than the 3950X, which in turngets a bit of a rougher deal on its overall average, but gets a slightly betteraverage gain in performance where it did improve. If we break it down by thenumber of times the test ran above 99% of the traditionalschedulers performance, we’re seeing more of theRyzen 3’s better gain to loss ratio, and overall a much betterresult out of the AMD card compared to the Nvidia one, giving both of our RyzenCPUs a performance bump in two thirds to three quartersof the tests we ran today, which brings us back toour earlier question. Why isn’t it the default? Well, as the data shows, it’s still very much in the early stages and it isn’t quite mature yet. For some hardware, there’s gonna be animprovement in performance, but perhaps not in every scenario and not necessarily by a whole lot either. In others, you mightstraight up lose performance in more cases than you’ll gain it.
That’s not a good look foran operating system team that gets absolutelycrucified on social media every time an update goes awry. So I see why they did it this way, but that’s not to saythat it has no future. As the drivers mature, AMDs for example is actually in beta right now. And as Microsoft receives more feedback into how it works with each workload and hardware combination,and this is crucial. As developers rethink how they should hook into the scheduler, we should first start to seeno performance loss at all, and eventually aconsistent performance gain from this new scheduling method. It’s already looking promisingand true to our video title. If you play a game that benefits from it on hardware that handles it well, you could be enjoying noticeablybetter performance today. Ting does mobile phoneservice differently. There’s no contracts, nooverage fees, no carrier tricks. You just pay a fair price for the talk, text and data that youactually use each month. It’s especially great, if you’re stuck at home using WiFi, instead of using your mobile data. Ting gives you complete control over your cell phone account,and you can set alerts and caps for each device on your account. So you can keep your usage in check. They offer nationwide LTE coverage by using T-Mobile, Sprintand Verizon’s networks, which means you’ll have greatcoverage from coast to coast. Almost any phone will work with Ting from an ancient MotorolaRazr sitting in your basement to the latest iPhone 11 series. And you can check yourphones compatibility @linus.ting.com, and while you’re there, get $50 in credit when you sign up. Thanks for watching, guys. If you’re looking forsomething else to watch, maybe check out our recent sciencey video on stacking radiators, wherewe answered once and for all just how many radiatorsyou can run in a row before you stopped gettingany cooling benefit from it.