Following the standard definition, benchmarking is the process of measuring the performance of an application individually, or in contrast to other competitor applications in order to extract and analyse performance metrics.
Nowadays, benchmarking is critical in the development cycle, mainly due to the fact that for one application there are a multitude of copycats with very similar content so the users tend to go with the one that has the best performance optimizations.
The rising and highly competitive market “demands” top performance from an application in order to strive and get an increasing pool of users engaged.
Moreover, the statistics from top-rated projects, of different genres, are showing that the highest grossed revenue is mostly coming from mid-end devices and not the high-end ones, meaning that this game of cat and mouse implies constant adaptability.
A veteran fan of the game, will most likely ignore functionality or visual bugs along the way but when it comes to the stress of the device (high battery consumption, overheating, low FPS), they'll most likely stop playing the game.
In the below article we'll go through a presentation outlining Amber's take on the whole process, the most relevant metrics that confirm/infirm performance aspects and also the strategies that we have created in order to integrally review all of the technical areas from a performance standpoint.
The service covers 4 types of testing based on 3 essential phases.
Each of the testing types has its own scope and can be used as a single service or as a full package covering all of the project needs from a performance perspective.
These services can be part of a continuous engagement, where our team works closely with the Development team while making sure that the integrations aren't negatively influencing the overall game performance and/or stability.
Also, based on availability, our team can cover these sub-services as ad-hoc requests whenever the technical expertise is required for performance validation.
Phase 1 - Planning & Setup
Phase 2 - Execution
Phase 3 - Reporting & Feedback
“system of measurement used for accurately checking hardware stress”
FPS (Frames per Second) - total frames rendered by a game in one second - rendering is done at 30 (or lower) and 60 FPS (or lower), depending on game genre.
This is one of the most relevant metrics that validates overall game performance
RAM (Random Access Memory) - short term storage that holds data used by the CPU and/or active applications.
CPU (Central Processing Unit) - circuit that handles the majority of the phone's functions such game instructions & users` input processing;
GPU (Graphics Processing Unit) - circuit that handles graphical/visual related operations and acts as a load reduction for the CPU;
Boot times - total time spent performing various operations such as: game boot, menu to sub-menu transitions, dialog display, game lobby loading, etc.
Time is important, and nobody wants to play a game that loads up in 3 minutes, do you?
App size - total size that the game occupies pre before assets are downloaded, post boot (after all assets are downloaded and during specific gameplay time.)
Nowadays storage space isn't as problematic as it was in the past, yet, there are still a lot of users that hold on to photos, videos & multiple applications on their device, rather than using a cloud solution, so having a clear overview in regards to how much storage space the app uses always comes in handy.
Battery drain - battery percentage during targeted gameplay.
Most of the graphically rich games are directly affecting the battery life of the device, which is a big deal breaker for the any of the players.
Our variety of devices with different hardware/firmware combinations enables us to accurately review battery stress that a regular user would encounter.
Device temperature - total degrees (Celsius) during gameplay.
A very common issue that we've been seeing in testing is device overheating which, over time, will lead to battery degradation or SoC wear and tear.
Network traffic - validating download and upload traffic during gameplay sessions.
A lot of users are playing games on cellular data (subway, bus, trains etc.) so we always need to make sure that the app isn't causing big holes in their subscription plans.
Having a look on the list of metrics the temptation of tracking all of them is high but the concept of “less is more” applies harmoniously.
Our recommendation would be to start with the critical areas that require optimizations that can be added fairly quick and efficiently and as soon as we get those out of the way we can address the remaining sections that require tweaks.
As an example, the Truecaller application (500M+ downloads, 4.5 stars, 12M reviews) managed to obtain a 30% increase in DAU/MAU just by adding a mix of new features and by improving their launch speed.
Also, big names in the industry such as King & Rovio, focus a lot on improving technical metrics in order to maintain an expanded community that continues to enjoy their games on low & mid-end devices in 2020 also.
Identify Bottlenecks, performance Gaps & address critical aspects before they become problematic
By reviewing and validating the new content prior to its release to the public will be a high contributing factor to the delivered product but also to the confidence of all included parties (Stakeholders - Development - QA - End User).
One of the main roles of the Benchmarking team, along with the other Specialised QA services, is to prevent the snowball effect from ever happening. We act as a support team that finds the spark before firefighting becomes a daily practice.
Having the performance benchmarking included in the development process will enable us to identify issues in incipient stages and you to address them before a negative resonating impact is starting to occur among the user pool.
A multitude of development teams are struggling with issues that slip through the cracks and become troublesome while evolving along with the application phases, instead of focusing their efforts on developing new content, hence missing deadlines, delaying releases, failing submissions, etc.
A big majority of junior game development companies spend the most of their time working on new content and features but once the popularity explodes the trend of getting negative reviews/feedback increases exponentially as well.
Why is that happening? — The answer is easy, increasing the user pool also means increasing the device pool on which the application is installed, and not all of the users have high-end devices at their disposal.
Increase end-user satisfaction and stimulate retention while finding out the gaps between your product and a competitor product.
Benchmarking, or better said, the device coverage part of it, is an essential tool that can be used to review performance on a high number of devices (low / mid / high end) with various hardware - firmware combinations validating how they cope with technical stress over extended gameplay time.
In other words, we can know in advance what/where are the performance gaps and more importantly, have a clear idea in mind in regards to what devices will be supported outside of the OS version limitations.
In most cases, early device blacklisting will maintain the negative reviews and/or feedback to a minimum.
But why stop here, besides of the multitude of devices that can be checked during device coverage, we also have the option of performing tests on the same device but running on different OS versions - Our QA teams are constantly reporting bugs that are OS specific.
Example:
Now imagine targeting the Top 30 devices that bring in the most revenue, running on 3 different OS versions - by including ongoing performance review, all of them can and will be validated long before the users start seeing these issues on the Live builds.
More and more senior game development companies are beginning to see the essential role of benchmarking and use it along with functional QA in order to validate their products and increase coverage in an efficient manner by properly checking the implementation of new content before the application reaches the general public.
Whenever an application will start hanging, crashing or have a noticeable decrease in performance, this will affect all of the other metrics.
New content increases popularity for the moment but an optimized game will retain users for longer periods of time.
Do you remember that nice pair of sneakers that you got for your birthday?
What if they would tear up after wearing them for one week - that wouldn't be a very nice experience, and most likely you won't be choosing that brand when getting a new pair.
Hardware stress on the users devices is causing a similar impact and we wouldn't want that to happen since this will lead to a user pool decrease, a bunch of negative reviews and a negative wall of text the app store.
From a psychological perspective, we're prone to recall negative emotions/memories much easier than positive ones, so in our domain of work we're constantly on thin ice, hence our need of constantly adapting to all of the incoming factors.
Fortunately, we have all of the means at our disposal to make sure that all applications that go through the performance benchmark cycles are fully validated and ready to be enjoyed by the community.