In the last month alone, both Samsung and Xiaomi have been caught adjusting the performance of their phones on a per-app basis in a way that some experts see as “benchmark manipulation.” On the one hand, smartphones are getting faster and running hotter every passing year, and something has to be done to fight that. On the other, treating certain apps differently isn’t always transparent to the user or “fair” when considering benchmarks. It’s a nuanced problem with no easy solutions, though there are a few ways it can be better — particularly if Google can address the issue more directly in Android itself.

What is “benchmark manipulation?”

There’s been a lot of talk about benchmark manipulation recently, and performance profiles are deeply embedded in the subject, so the first thing you need to understand is what constitutes benchmark manipulation to begin with. The developers behind Geekbench, the popular cross-platform benchmarking solution, tell us that they don’t have a concrete definition, but it isa little more clear-cut than “I know it when I see it.”

As a general policy, the company considers “any situation where a device treats benchmark applications differently than other applications” to constitute benchmark manipulation, according to lead developer John Poole.

This definition can cover a range of behaviors. Back in 2013, Samsung artificially boosted the performance of the Galaxy S4’s GPU beyond its defined spec. At the time, Samsung said it was doing it for any app used in “full-screen mode,” but also explicitly called out benchmarking applications as beneficiaries of that behavior, constituting a form of benchmark manipulation. But throttling performance for other applications and only giving “full” performance to benchmarks, as OnePlus was caught doing in 2021, would also constitute benchmark manipulation. More recently, Samsung was caught adjusting performance on its flagship phones on a per-app basis, which meets the same definition.

It’s also important to note that the motivation isn’t always the same. Sometimes what can be construed as benchmark manipulation is actually an emergent property of other behaviors with different intentions entirely.

A necessary evil?

Technology is getting faster with time; it’s the fundamental premise behind almost all of the subject’s advances in the last half-century. The same applies to smartphones, but they have a few basic limits in design based on how we use them. For example, we’re worried about water damage, so we don’t like having big vents, and we need to be able to carry them around and use them one-handed sometimes, so they can only weigh so much and be so big.

As the chipsets inside smartphones have started getting faster, they also produce more heat, which is an issue. While PCs and laptops can use active cooling solutions with heatsinks, vents, and fans, that’s not something most customers would accept in a smartphone. (Although there are gaming phones that have this functionality, none have been IP rated for any water resistance, which is something customers care about.)

RedMagic 7 review fan led

Some phones (like this Red Magic 7) have active cooling to run faster for longer, but it means no IP rating.

This limits the speeds for the chips in smartphones based on what can be ambiently cooled. More recently, as speeds have increased in the flagship space, causing the chips inside them to produce more heat, new cooling solutions have proven necessary. Some smartphone makers have gone so far as to highlight technologies like vapor chambers, surface area in cooling solutions, and bleeding-edge materials like graphene to improve cooling performance. And, though it’s an issue you can address with hardware solutions, software can also play a difference. Arguably, it will play the biggest difference as time goes on.

Different apps have different requirements when it comes to speed, and your to-do list probably doesn’t need as much oomph as the latest, prettiest games do. As anyone who used a custom kernel during the golden era of rooting and ROMing can tell you, there are many different ways to solve this problem and different sets of logic when it comes to workloads that can be applied. Some tasks are best suited to throwing all your power at it with the very biggest cores at once; other workloads require sustained performance. And this is the real issue.

Performance is more than a number

Speaking to John Poole, founder and president of Geekbench’s parent company, there’s a lot of nuance and attributes to balance between things like increasing TDPs (thermal design power/profile, i.e., the maximum heat a chip can dissipate) and the rise of big cores with big boost clocks (ostensibly handy for short, bursty tasks, but often used for inflating single-core benchmarks above other considerations).

According to Poole, “The vast majority of people’s time on a phone is scrolling through webpages, scrolling through Facebook, or something like that,” and short bursts of high performance make sure those intermittent workloads can be handled smoothly and fluidly. Many devices explicitly cause a speed boost on touch for this reason. (Fun fact: One game developer told me this can be a pain if your game is played with a controller because it won’t trigger the same behavior.) But long-running sustained tasks like games also need better continuous performance than these scroll-stop-scroll-stop workloads do. And this introduces two new challenges: heat management and battery life.

“Someone sitting and playing Fortnite on their phone, on the bus, on their commute. If you don’t reign in your TDP, you’ll get a lot of — I think the term of art is ‘jank,’ where your framerates go up and down. It’s a really ugly experience. It’s much better to have a nice consistent level of performance, rather than a sort of sawtooth where the phone heats up and performance goes down; the phone cools down, performance goes back up.”

Qualcomm was also happy to talk about rising heat output and performance profiling in smartphone chipsets, explaining that the Snapdragon 8 Gen 1, its latest flagship SoC, can scale its performance from a 3W to 9W+ thermal envelope, with plenty of flexibility depending on the cooling situation. And when it comes to how to scale that performance, it’s up to the customer.

“We allow our customers full flexibility on controlling the behavior of the various compute units in our SoC. For example, some OEMs decide to build custom solutions on top of what we offer to manage CPU and GPU frequencies when playing games in order to extend battery life and reduce stutter during gaming. Other customers have implemented different CPU/GPU governors.

We offer all of our customers an engine and it's up to them to build a chassis around it and constrain power or performance as they see fit. Some OEMs may decide to throttle in order to save power and others may run unchecked to provide peak performance.”

Qualcomm doesn’t just sell hardware to companies; its software also includes CPU and GPU governors that OEMs can tune “according to their own unique needs,” and Qualcomm has some of its own SDKs for app developers as well, and at least one game developer tells me they’re better than the corresponding tools that Google provides.

Android itself could help here

I reached out to Google for more details regarding how Android is helping both developers and smartphone manufacturers when it comes to tuning performance, but the company didn’t have any information to share with us. Speaking to game developers and companies, I’m told that Google has mostly offloaded that work to app- and game-makers (and chipset manufacturers).

For app developers, Google provides tools to help profile app performance, handy when reducing “jank” and troubleshooting other performance issues, but the observed behavior may still vary from device to device, and you can’t tune performance universally.

Android did introduce an API to trigger a sustained performance mode with Android 7.0 Nougat, but none of the game developers I spoke to for this piece said they were using it. In fact, one of the developers behind the Skyline emulator on Android tells me in no uncertain terms that “its sh*t,” and competing SDK solutions for performance profiling by Qualcomm offer better control.

There were some Game Mode changes in Android 12, but the related APIs aren’t that granular, allowing developers to select one of four power-optimized game modes. Another new API in Android 12 might make more of a dent, allowing developers to “hint” to the system about what sort of performance they need, seeing how long certain operations take, and sending back targets to the system, dynamically optimizing performance. But, as part of Android 12 and later, it will be a long time until most Android devices support it, which would have to happen before developers consider using it. And even when it’s widely available, it remains to be seen if it will have an impact.

According to Poole, the situation would be better if apps could manage all this themselves through APIs like these rather than having to rely on the system to scale performance based on some list of applications as it sees fit, but that’s a chicken-and-the-egg issue. Seemingly no one is using the current APIs. Augmenting that, Poole says a holistic solution to detect workload types and application behavior built into Android itself might work:

“If the operating system were able to detect how the application is interacting with the phone — if you have something that looks like a game, let’s treat it like a game. So if you have something that’s gone full-screen, it’s using a lot of Vulkan 3D calls, you might say, ‘oh, this looks like a game, let’s move it into a different performance tier,’ and make decisions based on how the application is behaving.”

In fact, Poole has already observed that some phone makers are implementing this sort of detection behavior independently, in one case deprioritizing performance for Vulkan calls if they aren’t being drawn on the screen. This particular example negatively affects off-screen benchmarks, but it’s “kind of clever,” in Poole’s words, and could be applied in other ways.

With no good, universal solutions right now for apps to manage performance themselves, most simply don’t. Many of the developers I spoke to ultimately acknowledged that they just have to hope for the best and that there won’t be any unforeseen behavior. Unfortunately, smartphone manufacturers and chipset makers all have their own ideas for how to tune performance, too.

Good and bad ways to do it

There are “good” and “bad” examples of how companies have adjusted hardware performance profiles. Perhaps the worst example is when MediaTek was caught engaging in per-app performance adjustments at the BSP (board support package, i.e., firmware) level back in 2020, targeting apps like benchmarks to elevate their performance. In essence, the logic that adjusted that performance was baked right in at a deep software level unless smartphone makers actively noticed and removed it, with chipsets from the company across a range of years affected.

In that case, benchmarking apps ran in a special high-performance mode that other apps seemingly didn’t get, resulting in a 30-70% performance difference depending on workload. All of this happened with no transparency to the user that it was occurring and seemingly no way to adjust or disable the behavior. MediaTek defended the practice when it was caught, but this was unarguably one of the worst ways to implement a performance profiling system. Some of MediaTek’s customers have started to disable this behavior, and it’s not clear if the company’s most recent chipsets do the same thing.

More recently, Samsung was also caught engaging in “benchmark manipulation,” adjusting performance profiles to benefit benchmarking applications while applying a more strict profile to apps like demanding games. We saw performance differences in our testing as great as 50% for GPU performance and 23-28% in CPU performance, though it’s difficult to measure; our own analysis of the logic behind the throttling system indicates it is quite sophisticated, juggling multiple variables like temperature, expected battery level, speed, and predicted performance benefits.

Samsung never made this behavior clear to the user, and there was no way to disable it even though Samsung offers multiple performance profiles and game settings. The company has since rolled out an update to address this issue, but not before four generations of Samsung devices were delisted for the practice on Geekbench. Android Police also caught Xiaomi engaging in a similar practice, applying per-app performance profiles that couldn’t be fully disabled, though the company was less apologetic.

Then, there’s OnePlus. Last year, the company was accused of throttling performance for around 300 popular apps on the OnePlus 9 Pro, including Twitter, Uber, Microsoft Office, and Chrome. Although OnePlus defended its actions, it also planned to add an option to disable it in OxygenOS 12.

OnePlus 10 Pro (2)

The OnePlus 10 Pro has one of the better solutions for sustained performance.

As of our review of the OnePlus 10 Pro, the company seems to have adjusted its strategy, throttling all apps about the same, according to our testing. This limits peak performance, but the Snapdragon 8 Gen 1 in the phone is almost superfluously fast, and customers likely won’t notice the difference. And, so far as I can tell, this improves both battery life and sustained performance.

As far as Geekbench is concerned, this doesn’t count as benchmark manipulation since benchmarks get the same treatment that other apps do — in fact, Poole said it was “arguably, the right thing” for OnePlus to do, and we agree. And though it could still be more transparent to the user, customers can at least anticipate how apps will perform, and it can be disabled entirely if they choose. While there are assuredly other potentially better ways to handle performance profiles, OnePlus has arrived at a safe strategy that more companies should consider. (We asked OnePlus to lend some expertise to this story, and though the company was delayed in its response, an update with a manufacturer’s perspective on the subject might be coming.)

But OnePlus’s solution, while good, still doesn’t seem like the ideal way to solve this problem.

The future of performance is in Google’s hands

This is only going to be a bigger issue with time, as phones continue to run hotter with newer chipsets that consume more power, paired with our ever-rising expectations for performance. Right now, app performance tuning is like the wild west, with every OEM improvising its own solution. In some cases, they merely brush up against unexpected behavior; in others, they outright mislead customers by manipulating that performance to exclude things like benchmarks.

The best solution could be to address this situation within Android itself with better tools for developers, paired with new requirements to ensure device manufacturers don’t change things in a way that drags Android down. But, if history is any indicator, Google may not want to.

In the past, Google’s failure to lay down the law when it comes to unexpected background app behaviors has resulted in wildly different results for things like delayed notifications from device to device. This has caused problems on certain phones that customers often believe to be issues with the Android platform as a whole rather than random manufacturers making dumb decisions. In the face of that, Google has refused to intervene or impose harsher restrictions.

Performance throttling and profiling is a relatively small issue now. But as thermal management becomes even more important with time and faster chipsets, ways of managing smartphone performance could have a much bigger impact, far beyond the race to secure big but ultimately meaningless numbers in benchmarks.

Although Google does have a specific API for apps like games to use when they need to ensure sustained performance, game developers tell me it’s insufficient. Seemingly none of these developers are targeting more recent game-related changes in Android yet, though they might have an impact in the future (once Android 12 and later are more widely rolled out). And if those changes in Android 12 aren’t enough, the situation will likely only get worse.

With Qualcomm already providing game developers more useful APIs in its own SDKs for use on its hardware, it sounds like it might be time for Google to reassess how performance profiling works on Android and potentially consider other solutions, like a programmatic load-detecting profile system proposed by Poole. Developers and customers both need better ways to manage performance expectations on Android. But the longer the situation continues to fester as it has, the more likely we are to find these broad and heavy-handed practices like per-app profiling to become the expected standard.

Google has already dropped the ball when it comes to background app management in Android, failing to enforce a more aggressive stance that would make Android a better experience for customers. If Google can’t get out ahead of this issue quickly, I’m afraid that arbitrary and often bad manufacturer solutions will become the accepted status quo as Google looks the other way — again.