In recent years, tech giants and other companies have come under scrutiny on how they collect and use user data. Google itself has focused on making privacy improvements to its products and services in response. However, it looks like its efforts did not go far enough. A new research paper reveals that Google's Messages and Dialer/Phone apps have been collecting and sending scrambled user data to its servers, potentially violating the European Union's GDPR.

Douglas Leith, a computer science professor at the Trinity College Dublin, claims in his "What Data Do The Google Dialer and Messages Apps on Android Send to Google?" paper that Google's Messages and Dialer apps have been sending data to the company's servers without taking explicit user consent. More specifically, these apps collect information about user communications, including an SHA256 hash of the messages and their timestamp, phone numbers, incoming and outgoing call logs, call duration, and length. This is then shared with Google's servers using Google Play Services Clearcut logger service and the Firebase Analytics service. The data helps the company link the message sender and receiver and/or the two devices in the call, enabling features like spam filtering and business caller IDs.

While only a 128-bit value of the message hash is shared with Google's server, Leith believes that for short texts, it is possible to reverse the hash to reveal its content. "I’m told by colleagues that yes, in principle this is likely to be possible," Leith told The Register. "The hash includes a hourly timestamp, so it would involve generating hashes for all combinations of timestamps and target messages and comparing these against the observed hash for a match – feasible I think for short messages given modern compute power." However, we haven't seen any hard evidence on anyone actually breaking the encryption — this is just hearsay.

The research paper further highlights that both Google apps do not feature privacy policies to explain what data is being collected, which the company itself requires from third-party apps on the Play Store. In fact, the information is not even made available for download when one uses Google Takeout to export the data associated with their account. Google Play Services does inform users that some data is collected for security and fraud prevention, but there's no explanation on why exactly message content and call info are collected.

The Google Messages app is installed on millions of Android devices worldwide, including the Samsung Galaxy S22 series. The Phone app is also the default dialer app on smartphones from manufacturers like Xiaomi, Realme, and Motorola, so this is a major privacy oversight. Going by Google's previous track record, though, the company could have intentionally avoided taking user consent and hid information on the data it was collecting.

Leith first detailed his findings to Google in November last year, which is also why he had to delay posting his paper publicly. He recommended the following changes to Google, out of which six have already been implemented:

  1. The specific data collected by Dialer and Messages apps, and the specific purposes for which it is collected, should be clearly stated in the app privacy policies.
  2. The app privacy policy should be easily accessible to users and be viewable without having to first agree to other terms and conditions (e.g. those of Google Chrome). Viewing of the privacy policy should not be logged/tracked prior to consent to data collection.
  3. Data on user interactions with an app, e.g., app screens viewed, buttons/links clicked, actions such as sending/receiving/viewing messages and phone calls, is different in kind from app telemetry such as battery usage, memory usage, slow operation of the UI. User’s should be able to opt-out of collection of their interaction data.
  4. User interaction data collected by Google should be made available to users on Google’s https://takeout.google.com/ portal (where other data associated with a user’s Google account can already be downloaded).
  5. When collecting app telemetry such as battery usage, memory usage etc., the data should only be tagged with short-lived session identifiers, not long-lived persistent device/user identifiers such as the Android ID.
  6. When collecting data, only coarse time stamps should be used, e.g., rounded to the nearest hour. The current approach of using timestamps with millisecond accuracy risks being too revealing. Better still, use histogram data rather than timestamped event data, e.g., a histogram of the network connection time when initiating a phone call seems sufficient to detect network issues.
  7. Halt the collection of the sender phone number via the CARRIER_SERVICES log source when a message is received, and halt collection of the SIM ICCID by Google Messages when a SIM is inserted. Halt collection of a hash of sent/received message text.
  8. The current spam detection/protection service transmits incoming phone numbers to Google servers. This should be replaced by a more privacy-preserving approach, e.g., one similar to that used by Google’s Safe Browsing antiphishing service, which only uploads partial hashes to Google servers.
  9. A user’s choice to opt-out of “Usage and diagnostics” data collection should be fully respected, i.e., result in a halt to all collection of app usage and telemetry data.

Google has provided an explanation of some of its data collection practices:

  1. The message hash is collected for detecting message sequencing bugs.
  2. Phone numbers are collected to improve regex pattern matching for automatic recognition of one-time passwords sent over RCS. Messages automatically recognize incoming One-Time Password (OTP) codes to avoid the user having to fill them in. This can be a frequent point of failure and the phone number data is used to improve recognition by providing ground-truth based on known OTP sender numbers.
  3. The ICCID data is used to support Google Fi.
  4. Firebase Analytics logging of events (not including phone numbers) is used to measure the effectiveness of app download promotions (for Messages and Dialer specifically). Namely, to measure not only whether the app was downloaded but also whether it was used once downloaded.

There's still no clarity on whether the Google apps adhere to the GDPR and if they have been violating them so far. It is possible that the company will now be subjected to a GDPR investigation and slapped with a fine if the apps are found in violation.

UPDATE: 2022/03/23 17:07 EST BY STEPHEN SCHENCK

Expanded Google statement

A Google spokesperson has reached out to us, hoping to offer a little more insight into its privacy practices here:

We’re committed to compliance with Europe’s privacy laws and apply strict privacy protections to data collected via our Dialer and Messages apps.

Both Dialer and Messages use limited amounts of data for highly specific purposes that allow us to diagnose and resolve product functionality issues and ensure message delivery is consistently reliable. These technical logs are not – and were never – used for targeting ads and were protected by strict internal access controls.

Phone numbers and hashed SMS related data within Messages were only used in technical logs to debug app service issues. Phone numbers that were not saved in a user’s contact list are only used by Dialer to guard users against unwanted spam calls.

UPDATE: 2022/03/22 16:18 EST BY MANUEL VONAU

Updated title

We hear you — our initial title didn't fit the story, making the matter appear more dire than it is. We've updated the title to be more accurate and adjusted parts of the story. Thanks, everyone who commented!