privacy

An extensive study looking into the top 100k ranking websites has revealed that many are leaking information you enter in the site forms to third-party trackers before you even press submit.

This leaked data includes personal identifiers, email addresses, usernames, passwords, or even messages entered into forms and then deleted and never actually submitted.

This data leak is sneaky because internet users automatically assume that the information they type on websites isn't saved until they submit it, but for almost 3% of all tested sites, this isn't the case.

Alarming findings

The study was conducted by university researchers who used a crawler based on DuckDuckGo's Tracker Radar Collector tool to monitor exfiltration activities.

The results have been summed up on this webpage, while the researchers also published the detailed technical paper for those who want to dive deeper.

The crawler was equipped with a pre-trained machine-learning classifier that detected email and password fields and intercepted script access to those fields.

Crawler function diagram
Crawler function diagram (GitHub)

The researchers tested 2.8 million pages on the world's top 100,000 highest ranking sites and found that 1,844 websites let trackers exfiltrate email addresses before submission when visited from Europe.

However, when visiting those same websites from the US, the number of sites collecting information before submission jumped to 2,950.

Finally, researchers determined 52 websites to be collecting passwords in the same way, but all of them addressed the problem after receiving the researchers' report.

Who receives the data?

The purpose of website trackers is to monitor visitor activity, derive data points related to preferences, log interactions, and maintain a persistent anonymous (theoretically) ID for each user.

The sites use trackers to provide a more personalized online experience to their users, but they also allow third-party trackers to help advertisers serve targeted ads to their visitors and increase monetary gains.

Top sites using leaky trackers
Top sites using leaky trackers (kuleuven.be)

Many of these third-party trackers are using scripts that monitor for keystrokes when inside a form, and save the content, even before the user presses the submit button

The obvious repercussion of having data entered on forms logged is losing the anonymity of trackers, and at the same time, privacy and security risks arise.

The data collected by the university researchers shows that the problem stems from a small number of trackers that are prevalent on the web. 

For example, LiveRamp's trackers were found in 662 sites whose email addresses were logged, Taboola was present in 383, Verizon collected data from 255 sites, and Adobe's Bizible ran in 191 sites.

Third-party trackers and their owners
Third-party trackers and their owners (kuleuven.be)

In the password-grabbing category, Yandex tops the list with the highest number of confirmed cases.

Half of the listed first and third parties have responded to the researchers with comments and explanations, attributing the collection to a mistake.

The GDPR factor

The difference between EU and US stats is attributed to the presence of GDPR, a legal regulatory context for protecting the personal data of EU netizens processed by online entities.

The case of compliance here depends on the disclosure of the collection of the data entered in website forms, which needs to be detailed and clearly defined.

For example, the typical 'we share your personal data with selected marketing partners' doesn't cut it for GDPR.

According to the study, the email exfiltration by third parties via trackers breaches at least three GDPR requirements, namely the transparency principle, the purpose limitation principle, and the absence of consent requests.

Confirmed violations of the GDPR are punishable by a fine of up to 20,000,000 Euros or up to 4% of the entity's global annual turnover.

What can users do

The best way to deal with this problem would be to block all third-party trackers using your browser's internal blocker. All major browsers have an in-built blocker, and you will find it in the privacy section of the settings menu.

Additionally, private email relay services give users the capacity to generate pseudonymous email addresses, so even if someone snatches it, identification won't be possible.

Finally, for those who want to take a more involved approach, the researchers have created and released a browser add-on named Leak Inspector, that monitors exfiltration events on any site and warns users accordingly.

Related Articles:

Evasive Sign1 malware campaign infects 39,000 WordPress sites

Google Chrome gets real-time phishing protection later this month

YouTube stops recommending videos when signed out of Google

New executive order bans mass sale of personal data to China, Russia

FTC to ban Avast from selling browsing data for advertising purposes