art thumb

Shortly after the new Android Runtime made its grand entrance, I ran a pretty exhaustive (and exhausting) series of performance benchmarks that showed ART wasn't really ready to blow us away. At the time, I opted to avoid the topic of battery life because it is so difficult to test accurately and with unbiased, meaningful results. As it turns out, that was dumb. Yup, so many of you have asked, I finally had no choice but to dive in and run a battery of tests on...well, the battery.

I've been running tests for over 6 weeks, covering virtually every angle I could think of and repeating several measurements to ensure the results were consistent and accurate. To be honest, the vast majority of results were boring and predictable (given the results of some other tests). Instead of boring readers with numbers and graphs, I wanted this post to be interesting; so the only thing you’ll see here are the most important results and explanations. If you have questions about a given scenario, feel free to ask in the comments.


What I’m Testing For

Traditional methods for benchmarking battery life aren’t particularly useful for comparing runtimes. Frankly, traditional methods aren’t even that good for comparing phones. Simply maxing out the processor or running a video for several hours isn't going to demonstrate the advantages ART. Rather, we should see lower demands on the hardware while doing normal activities, ultimately conserving battery life. There are 3 scenarios where a more efficient runtime should save power:

  • Shorter Wakelocks - While your phone or tablet sits idle, it wakes up fairly often to perform regular tasks. As long as the CPU is awake, it is consuming more power. If execution can be sped up, those background processes can finish their work more quickly and get the device back to sleep. Even if this only shaves off milliseconds, the total savings could be substantial.
  • Lower Clock Speed - Virtually all modern processors are capable of adjusting their clock speed on the fly. Much like a car, speed comes at the cost of power. An efficient runtime can allow the processor to get the same work done while running at lower speeds. Some of the biggest beneficiaries will be seemingly simple things like animations, scrolling list views, and app switching.
  • Better Use Of Hardware - The lower demands on the processor can enable even deeper optimizations. For example, it may be possible to target ARM’s big.Little architecture, which pairs high performance cores for intense processing with low power cores for simple tasks. Activities that may have once required a moderate amount of power to complete could possibly operate on weaker cores without sacrificing performance, drastically cutting battery usage. Unfortunately, this probably won't happen for a while, but it is definitely something to look forward to.

In short, I want to isolate and dig into the common activities on our devices to determine if ART can do them more efficiently.

Testing Setup

I used two devices for my measurements, a Nexus 4 and a Nexus 5. Each was flashed with stock factory images of Android 4.4.2 (KOT49H). I wanted to keep the tests as controlled as possible while still keeping them relevant to real-world scenarios. To this end, both handsets were set up with a blank Gmail account so there wouldn’t be any unpredictable notifications or messages, but regular communication with Google’s servers would continue.

Wi-Fi stayed enabled and configured to connect only to a specific access point. SIM cards were not installed at any point and Bluetooth was disabled during all tests.

Aside from the apps included with the factory images from Google, the only other installed app was a custom battery logging tool I built for these tests. No apps were allowed to update during or between tests, including the Play Store and Play Services Framework, which like to silently update.

Disclaimer: As with the performance benchmarks, I want to be clear that these are simply the results of my own tests and can't represent every possible scenario. Differences in hardware, software versions, and other external factors could skew results in unpredictable ways. In other words, don't be angry if your experiences don't line up with my own.

Test 1 - Idle With Background Processes

No matter how powerful and useful our smartphones and tablets are, they usually have to sit with the screen off to conserve energy. Even when left unused, a little bit of power is consumed every time a service spins up to download the latest weather information, sync messages with an email server, or polling for updates from Facebook and Twitter.

Even though most events involve communicating with web services, I wanted to remove the networking component and the potential variables that might come from irregular or unpredictable data. To simulate the right conditions, I wrote a simple app that wakes up every 5 minutes, runs a standard sorting algorithm on a moderately sized data set, and then logs the battery level. Each test ran for just over 24 hours. These are the results:

The differences are pretty small given the timeline, but they do favor Dalvik by a little bit. All in all, this isn’t particularly surprising, as many built-in functions that occur as a part of running background services are already heavily optimized or based in native code, which we already know isn't running quite as smoothly on ART. In a more typical environment, we would probably see both sides becoming slightly more balanced. I suspect this might start to tip the other way after the next version of Android rolls out.

From left to right: Nexus 4 (Dalvik then ART), Nexus 5 (Dalvik then ART)

Test 2 - Video Rundown

Quite a few readers specifically requested a video rundown. I don't personally think it's a good way to even compare phones because it's a bit of a biased test, but it's easy enough to do and produced a side effect that was very interesting.

I chose a 720p resolution MP4 with x264 encoding and looped it for exactly 3 hours 25 minutes. Multiple players were used, but there were no discernable difference in the results, so MX Player is filling in on the screenshots.

The numbers don't differ by much, but ART on the Nexus 4 does come out slightly ahead and there's a tie on the Nexus 5. Of course, video codecs are already heavily optimized and usually built from native code, typically on hardware with built-in decoders, so there was little chance of a significant power savings here.

From left to right: Nexus 4 (Dalvik then ART), Nexus 5 (Dalvik then ART)

The more interesting outcome from this test actually has to do with the battery usage report. Under Dalvik, we see the mediaserver and MX Player processes are clearly consuming most of the power, along with a healthy amount for the Android OS. However, ART attributes most of the power usage to the screen. Perhaps I’m thinking of that in reverse and Dalvik is under-reporting what it takes to power the screen while ART is calling it correctly. I’ll leave this up to the commenters to debate, but this is a strong indication that we’ll need to be a little more careful about diagnosing battery usage in the future.

Test 3 - Animation Rundown

As many have observed, the current incarnation of ART just feels much smoother. There certainly are fewer glitches and dropped frames during animations, particularly in web browsers and launchers. In fact, that’s one of the top reasons people have given for making the switch. I decided to investigate if these optimizations resulted in any sort of gain or loss to battery life. As it turns out, the results aren't just significant, they actually show us one of the weak points in ART.

This test called for another custom application - something that could roll through animations for a few hours without any interaction or randomness. I wrote a simple app that animated in a list with 100 cards, each with a picture and some text, then continuously scrolled back and forth through the list. Each trial ran for exactly 3 hours.

For the first time, ART breaks through with a definitive win in battery life. Oddly, only the Nexus 4 came out ahead while the Nexus 5 was behind, but more on that in a bit. At least this gives us some evidence that there are some places where ART could extend battery life - under the right circumstances. Animations tend to create a lot of objects in memory and must be able to make changes quickly. Some targeted optimizations to enhance animations would make a lot of sense, and enhanced battery life would make for a great side-effect.

Still, we have to ask why one device shows a win for ART while the other does not. I considered a few possibilities, but it occurred to me that a simple oversight of my own may have exposed a performance issue. The first version of my testing app used a single set of images on the cards, all of which were exactly 1/2 of the size necessary for rendering on a Nexus 4, which meant that they were 1/3 of the size for the Nexus 5. It looks like image processing was responsible for eating up a significant amount of power.

I enlarged the images to match the PPI of each device and ran through the tests again. This produced the winning combination I think we were all hoping to see: ART was finally the winner on both testing devices. Sadly, the differences are still mostly negligible, but it's enough to be optimistic.

From left to right: Nexus 4 (Dalvik then ART), Nexus 5 (Dalvik then ART)


These results should hopefully give some context to the effect of ART on battery life. Even though the outcomes don't explicitly favor ART, they also don't really contradict any one story on the Internet. If anything, this just proves that there are situations where both runtimes can shine. Additionally, there are many other optimizations that can’t be accurately measured, none of which are represented here and could be responsible for more significant experiences.

I expect the majority of the people claiming wildly better battery life are experiencing a placebo effect. After all, every test I ran demonstrated no more than about 2%-4% difference after burning through nearly 50% of the battery.

Still, I'm fairly certain early adopters won’t be swayed to return to Dalvik, and I'm not really sure they should be. If you’re on the fence about switching, I would still advise sticking to Dalvik, at least until the next version of Android. Aside from smoother animations and scrolling, this version certainly doesn’t offer any profound increase in performance or battery life, and it’s still prone to occasional bugs. At this stage, it’s still something best left to developers and some enthusiasts.

It’s important to remember that these underwhelming results are coming from a preview version of software the Android Team did not intend for regular users to even start using yet. The code isn’t optimized to the extent that we know it can be, and there are surely several safety checks to guard against bugs, all of which are adding overhead that will eventually become unnecessary. Of course, nobody wants ART to land in the crosshairs of an upcoming Bug Watch (wink), so let’s not be too hasty about stripping out those safety measures. Nonetheless, the current incarnation is simply about getting everything working and introducing a new runtime to the world. We should be looking forward to future versions for more tangible improvements.

I’m pretty sure I’ll be hammering out an even more thorough set of benchmarks when Android 4.5 (or 5.0, or whatever) launches in a few months. In the meantime, Part 4 is coming up really soon, so keep your eyes peeled!

Cody Toombs
Cody is a Software Engineer and Writer with a mildly overwhelming obsession with smartphones and the mobile world. If he’s been pulled away from the computer for any length of time, you might find him talking about cocktails and movies, sometimes resulting in the consumption of both.

  • funkmon

    Now this is an article. You should get a raise.

    • http://www.androidpolice.com/ Artem Russakovskii

      Nice try, Cody's brother.

      • http://www.androidpolice.com/author/cody-toombs/ Cody Toombs

        I've never met that man before, but I like the way he thinks :)

        • http://www.androidpolice.com/ Artem Russakovskii

          Nice try, Cody.

          • http://www.androidpolice.com/author/cody-toombs/ Cody Toombs

            Ok, that's creepy. I barely clicked the button to post that and your response was up seconds later. I know you're doing other stuff besides just eyeing this article's comments.


          • http://www.androidpolice.com/ Artem Russakovskii


          • Michael Ta


  • YB Pow

    This is so well written!

  • http://blog.tonysarju.com/ Infowerx Solutions

    Good read, thanks!

  • Ivan Petkovic

    Yes, great article. I'm impressed with battery life on my N5 with ART, and I'll stick to it, since I found no problems (OK, one, Sound Level app won't work), I finally need no percentage value in my notification bar.

  • thartist

    I understand what you meant, but let's take this as a valid Correction: "These results should HOPEFULLY GIVE NOT some context to the effect of ART on battery life" as it is just a barely available option for developers, and most surely by the lack of official mentions, not even close to final.

    • Matthew Fry

      I believe he made it perfectly clear that this should be taken as a snapshot of ART in its current preview form.

      "It’s important to remember that these underwhelming results are coming from a preview version of software the Android Team did not intend for regular users to even start using yet."

      • thartist

        Oh crap, my eyes skipped that paragraph :(

        • Matthew Fry

          Or maybe... he added it after he saw your comment! *puts on tinfoil hat*

      • Trent Russell

        And yet, many people seem to be missing this point. It should be a big bold, red caveat at the top of the article.

        • thartist

          Well that's kind of what motivated my comment! The fact that everyone else is talking and pushing about it like it was just another "new feature"

  • Robert Macri

    Why in the first set of screenshots does it show that you left the ART enabled phones on idle for two hours longer then the Dalvik phones?

    • http://www.androidpolice.com/author/cody-toombs/ Cody Toombs

      On that particular set of screenshots, I had to leave before the times would have matched. That's part of the reason I calculate the Fall Rate in the charts. The duration of the test matters less than the rate at which the battery decreases.

  • nofearofimaginarymen

    Great article. I tried ART for a while and did not notice a change in battery life but apps did seem to open slightly faster. For my use, xposed framework was more important than the benefits I saw with ART so dalvik for me.

    • Régis Knechtel

      Same here. I would stick with ART, if it wasn't for Xposed.

  • Matthew Fry

    Great article. I'm kind of disappointed though that the effect on background processes (in particular, Google's) was nonexistent. I was looking at new phones and they claim idle times of 36 or 48 hours and, while it's an honestly tested life span, it seems unfair to tease that and find that real life usage is half that. I, again, make my plea for larger batteries.

  • Lost

    Not sure if test method is good. There are two types of activities on devices: time related (being idle over night, watching single video, play game for 1 hour...) or task related (check mail, post on forum, load web page, do something in productivity app...).

    For 'time related' activities it is good enough to test as shown: do those activities for same time (example, 3h) on both devices, then compare battery used.

    But for 'task related' activities it is NOT good to measure them repeatedly done in 3h - because faster or more optimized device can for example execute 20% more of those activities in those 3h, while using same or even more battery. And in you real life case, you need to use one 'mail check' or one 'load web page', not 1.2 of those.

    In other words, activities like web browsing, mail checking, or even 'animation cards using' should be tested with either fixed number of activities on both devices, either by just running eg 1000 times same activity (so one device will finish faster), or allowing that both devices are wake same time (eg 3h), but running exactly same number of activities in that time (for example, when you browse web, you load same number of pages, and you spend same time watching display...)

    For this ART vs Dalvik case, first one has more sense (run N tests on both devices for 'task related' activities).

    • Matthew Fry

      These are valid test cases assuming he used timers correctly. The faster device might complete the tasks sooner but would just mean more idle time. The background process test ran on a 5 minute timer so they ran the same number of times. He did not say whether it was timer based but the scrolling was likely at the same rate meaning it scrolled the exact same number of times over the 3 hour period.

      • Edward Shaw

        True, unless the Nexus 5 achieved a higher frame rate. In which case, it has done a larger amount of work. That said, that may be exactly the result you want if you're testing battery life, as opposed to performance.

    • http://www.androidpolice.com/author/cody-toombs/ Cody Toombs

      I'm not really sure what you're asking for. If I'm reading this right, it seems like you're agreeing with my coverage of "time-related activities" with this set of benchmarks, but "task-related activities" are left out or covered inappropriately. Is that correct?

      The basis of your point about task-related activities is that tasks can be completed more quickly, which I believe is pretty well covered by the benchmarks in part 2 of this series.

      The two scenarios you gave towards the end are almost precisely describing the benchmarks in this article. Your second scenario describes keeping a device awake for the same length of time and performing identical activities, which is exactly how tests 2 and 3 were run.

      There is one difference between your first scenario and my first test: Instead of running X number of activities consecutively and then allowing the device to rest once it completed, I spread activities out over a timespan with resting time in between. In either case, the basic premise is about how quickly a processor-intensive action can complete before the device is allowed to return to an idle or sleep state. My test incorporates the additional overhead of waking up from sleep.

      • ssj4Gogeta

        I think what @Lost meant is that if you have two different devices (in this case, let's say N5 with ART and N5 with Dalvik) looping the same task continuously for a fixed amount of time, the more optimized of them has done more work in that time, so it's not fair. So if they were rendering graphics continuously for a minute, the first one could have rendered 1000 frames, while the second one only 500. In this case, the battery used isn't directly comparable.

        However, what you're doing is different - having them both perform the exact same amount of work every 5 minutes. That would be like limiting both devices to the same fps in our example.
        However, I'm not sure about the animation test. Do you know that the fps was same for both ART and Dalvik? Unless you do, you can't conclude anything about the battery life impact.

        • http://www.androidpolice.com/author/cody-toombs/ Cody Toombs

          In a way, that becomes an interesting meta-discussion. To ask about framerates, you also have to ask if there's a way to prevent a higher framerate and if it would make a difference if you did. On a similar, if not overriding note, since the thing being measured is essentially the runtime's ability to deliver animation, is it not fair to measure what it delivers without modification? After all, that's how all apps are supposed to perform under normal conditions.

          Given that animations are fundamentally different from tasks, in so much as the OS might decide to complete them with differing levels of work (by dropping frames) and they have to complete in a specified period of time, I think it's fair to get measurements based on how a typical app would operate under each runtime. Logically, it's less fair to collect results by scrolling through 100 cards on ART and only 90 cards on Dalvik purely because both would produce exactly 900,000 frames.

          Believe me, I get what you're saying, but it's more of an academic question than a practical one; a real world user is still going to scroll through all 100 cards, despite how many frames it takes to get there.

          • fonix232

            While Lost and ssj4Gogeta are right, I couldn't disagree more with their observations.

            This test was to provide a battery LIFE overview. So a time-based approach even on task-related activities is just perfect - as you'll be watching a movie for x time, not for x rendered frames. Nor will you be playing a game for x rendered frames, or x actions done, but for a subset of time (e.g. 5 minutes on the toilet).

          • Jason Rittenhouse

            only 5 minutes? slow news day i guess.

          • fonix232

            It was just an example. I try to not to spend too much time distracted, as it hinders work efficiency.

  • Jeff Miller

    Didnt read the entire article but really well done. Also, Google has stated that any difference noted by users is a placebo as ART isnt ready for prime time. Im sure it will improve drmatically as it is tweaked and ready to roll. Way to early to even compare things...

    • shaboogen

      Please don't confuse "Google" for "random anonymous guy Jerry Hildenbrand proported to talk to". Google hasn't said anything and probably won't until this makes prime time.

      Both Cody at AP and Brian Klug at Anandtech proved there are legitimate performance deltas when using ART, two sources with evidence is far more compelling than some third hand anonymous anecdote.

  • Sam

    Either your figures or your screenshots for the Nexus 4 video rundown test are reversed. The Dalvik entry on your table says 45% remaining, but the screenshot shows 42, and vice-versa for the Art numbers.

    • http://www.androidpolice.com/author/cody-toombs/ Cody Toombs

      Thanks, great catch. I originally organized these in the spreadsheet with ART in the first column and Dalvik in the second. I accidentally forgot to move the numbers when I re-ordered the columns.

  • Paul Taylor

    I gave ART a try on my Nexus 4, until the optimised Dalvik and Bionic libraries became available. Those made me switch back to Dalvik and I've been happy with it so far. Thanks for the comparisons!

  • guitarguy23

    I think you should write EVERY Android article, dang!! Spot on!

  • Vitaly Streltsov

    My reason to stay on dalvik is Xposed.

  • Bin Artyte

    You know what's the biggest problem with art? Is that no laymen gives a damn. Just saying lol.

    • natabbotts

      They don't need to - once it's incorporated, they'll say "huh, this new update thingy makes my phone feel zippier and it lasts a bit longer" - it isn't important how it does it, just that it does, and that means good press for android and ultimately, a better UX and more sales.

  • Bin Artyte

    You know what's the biggest problem with art? Is that no laymen gives a damn. Just saying lol.

  • Alex

    Nice article
    It will be nice if you can add BIONIC & Qualcomm optimizations to the benchmark.
    Most of people (including me) and ROM's are using them.

    • http://www.androidpolice.com/author/cody-toombs/ Cody Toombs

      I can see a case for that, but I wanted to focus on comparing completely stock software from Google, at least for now. I might consider this for the future, though.

      Also, it was hard enough to set aside two devices, so a third batch of benchmarks, even if only done in a single pass, would probably have delayed this article by quite a bit. Artem would have killed me if I did that ;)

  • black

    Part 4? Great, more uselessness that doesn't prove anything. Only kidding... great read. :)

  • Drew M

    I applaud the author, but running each test once is really too small of a sample size to really draw any conclusions from. I get small variations in battery life from one charge to the next.

  • Darren Henderson

    All of this is great work, but I thought one of the main advantages of ART was to reduce the cpu load during application launch since there's no more JIT compiling. Why not test Dalvik vs. ART on application switching. Pick a set of apps to open and close over and over again for a couple hours. May have to figure out a way to automate removing them from recents too though.

    • http://www.androidpolice.com/author/cody-toombs/ Cody Toombs

      I was thinking about this for a while, and I would like to go down this rabbit hole eventually. There were a few constraints that kept me away from this subject, for now.

      To begin with, I would have had to build another automation app (which, I believe actually have been the 9th, just for this article), and I needed to get this off of my plate before it grew even bigger. I also knew that I wouldn't just be testing for battery life, but also the performance characteristics, which would have added a lot of variables that I wasn't sure I could fairly account for without a lot more preparation. Ultimately, I chose to pass on this specific angle because app startup and switching is almost certainly less relevant to battery life than the other stuff I went with.

      I have already put a little time into this, so it will probably make an appearance when I do an update after the next version of Android rolls out. When it comes time to redo the performance and battery benchmarks, there will be a lot of new tests and more expanded coverage.

  • Braden Abbott

    For me, it's all in my head. I don't like the idea of using a Linux OS that virtualizes the apps it runs..I like the feeling of knowing my device is running native code, so even if the gains are negligible I will continue to use ART because I haven't had a single issue with it or any of my apps.

  • Nelson

    So, how many times were the tests repeated to give any statistical relevance for the results?

    • http://www.androidpolice.com/author/cody-toombs/ Cody Toombs

      Every test discussed in the article was done at least 3 times, although the durations varied for some (particularly on Test 1, which ran for 60 hours on its first pass).

      As stated, I also ran several other tests that I didn't include, mostly variations on the ones discussed here. Most of the results remained consistent, but a few had to be thrown out due to factors outside of my control (a power outage, for example). In total, I think I ran somewhere just over 100 trials with various durations and scenarios.

      I wanted to ensure these results were repeatable, consistent, accurate, and unbiased. However, making them scientific is too high of a standard. As I said, there are too many variables to control and I am not equipped with enough testing devices, professional grade tools for measurements, or the time and resources to close off every last environmental factor. In a sense, the goal was to produce extremely high quality anecdotal evidence. These are battery tests, so they are always going to shift for a myriad of reasons. I'm using numbers from my tests to illustrate my findings, not to draw conclusive evidence.

  • Deeco

    Wanting to try ART so badly, just wished Xposed Framework was compatible.

  • Space!

    Excellent post ! I wonder if there will be possibilities to have ART optimised for GS4 sooner than officially.
    Warning: newbie here, i don't really know what i'm talking about

  • dnt


    One question: can you please add some tests for "performance improvements" in ART vs ... for the same tests that you made?
    Surely 2 years of work got some improvements and I would really like to see if ART is maybe 30% (??) performance improvement vs Dalvik.

    Thanks !

    • http://www.androidpolice.com/author/cody-toombs/ Cody Toombs

      I think you're looking for Part 2 of this series.

      The only thing I don't think I've really covered to the extent that I would like is a good comparison of framerates on standard animations like the ones I used for my 3rd test in this post. I'm already thinking about rectifying that this weekend, but I won't have time until then.

  • Alan

    I don't think these tests are good for comparing Dalvik's JIT and ART's AOT.

    For the first test, the app is likely very small. That means for JIT, it's likely that the entire app itself would be considered hot and compiled. Since it's so small, it'll likely all fit inside the code cache which means the compiled code won't get thrown away to make room for newly compiled hot code. In the end, the resource intensive parts of the app (maybe the whole app) are running off native code for both the Dalvik and ART test. The differences are likely due to the experimental state of ART and other background phone tasks.

    The video player test also doesn't seem relevant. Most video players have their resource intensive code written against the NDK and the video is also likely decoded by the dedicated hardware video decoder. For both Dalvik and ART, there isn't much left to compare.

    The 3rd test is similar to both the first 2. The app is small and the hot code are likely compiled to native code and never removed for other hot code. I'm not sure how the animation is written but it may make more use of the GPU than the CPU running any resource intensive code.

    In real life cases, the JIT takes time to identify hot code and compile them. If the app is more complex, the hot code that's found can change which could boot existing compiled code out of the code cache. The thrown away code may become hot and again compiled later. Also outlier cases can cause Dalvik to revert to interpreting instead of running the compiled code due to optimization assumptions becoming not true for that particular case. Also, when the app is closed or goes out of focus, Android may reclaim the RAM and thus throwing away profiling data and the code cache. Doing this repeatedly could cause more overhead than ART which could result in more energy spent for the same amount of work done.

    I view the results as having negligible differences in battery life where the differences are more due to ART being experimental and/or something not related to Dalvik and ART.

  • StarkWiz

    Thanks for the awesome article. I got Nexus 5 and I was just planning to revert to Dalvik as I didn't think ART was of much benefit right now.
    After going through your article, I think I might be more happy Dalvik.
    Definitely ART is going to be the winner and much better than Dalvik once it's stable.

  • goodman

    You good

  • xxjoStaxx

    I switched to ART and lost some apos and widgets, added to the fact that my screen wouldn't shut off with the power button. I waited for about a half hour while it updated 160+ apps, then it would not lock or turn the screen off when it rebooted. So I Had to switch right back to Dalvik. I don't think the differences are all that negligible anyway, so I might as well stick with what I Know works for my phone and with all my apps. I don't want to give up using GoLocker, and that was the first difference I noticed. Then noticed some of my widgets were gone, and I'm quite used to my screen setups as they are. I'm not sure why my lock/power button just crapped out with ART. But I think at least for the foreseeable future, I'll be sticking with Dalvik. Has anyone else experienced anything like this?