• First Reference
  • About us
  • Contact us
  • Blog Signup 📨

First Reference Talks

Discussions on Human Resources, Employment Law, Payroll and Internal Controls

  • Home
  • About
  • Archives
  • Resources
  • Buy Policies
You are here: Home / Privacy / All about data scraping

By Christina Catenacci, BA, LLB, LLM, PhD | 4 Minutes Read August 3, 2021

All about data scraping

data scraping

Recently, the topic of data scraping has been in the news. But what is it? How do people do it? Why would anyone want to do it? Are there any dangers associated with it? And what can be done to deal with it?

What is a data scraper?

A data scraper is one who extracts data generated by another program—the most common use is web scraping, where the scraper captures various types of data from a website.

A web scraper imports the data and transfers it into a spreadsheet for various reasons, some of which include conducting research for web content/business intelligence; doing pricing for travel booker sites/price comparison sites; finding sales leads/conducting market research by crawling public data sources; and sending product data from an e-commerce site to another online vendor.

In this sense, when the scraping of public data is done to gain insights and not to make a profit or cause harm to individuals, there can be beneficial uses.

The dark side of web scraping

But there is a dark side to data scraping, involving things such as email harvesting, where email addresses are collected and sold to spammers or scammers. It is important to note that email harvesting is considered to be a bad marketing practice and also contrary to the privacy laws of some jurisdictions. For example, Canada’s federal privacy law, the Personal Information Protection and Electronic Documents Act (PIPEDA), clearly prohibits email harvesting.

Another important example to keep in mind is the joint investigation regarding Clearview AI, which I wrote about here,
where it was concluded by the Office of the Privacy Commissioner of Canada, the Commission d’accès à l’information du Québec, the Office of the Information and Privacy Commissioner of British Columbia and the Office of the Information and Privacy Commissioner of Alberta, collectively referred to as the Offices, that Clearview AI violated the privacy rights of Canadians.

The Offices concluded that biometric facial information was sensitive in almost all circumstances—it was intrinsically, and in most instances permanently, linked to the individual. It was distinctive, unlikely to change over time, difficult to modify, and largely unique to the individual. Simply put, facial biometric information was particularly sensitive.

And when Clearview AI scraped the facial information from websites, it was necessary to first obtain express opt-in consent before it collected the images of any individual in Canada. Further, the stated purposes of helping law enforcement were neither appropriate nor legitimate—this represented the mass identification and surveillance of individuals by a private entity in the course of commercial activity.

A recent example that may hit close to home for many

One very recent example of data scraping is the LinkedIn web scraping that has taken place in the spring and summer of 2021—it was reported that a hacker first posted 500 million LinkedIn records for sale on a hacker forum. Subsequently, the number of records that were scraped and placed for sale on the Dark Web rose to 700 million.

The saga continued shortly after this, where more data was added to the collection. The data was scraped from public LinkedIn profiles and other websites—totaling one billion LinkedIn records—containing further pieces of personal data. The hacker provided screenshots to prove that several types of data were exposed, neatly organized into categories in a spreadsheet. Ultimately, the personal data that was scraped included several types of data, some of which included: full names; email addresses and passwords; locations; phone and fax numbers; websites; LinkedIn profiles; company names and job titles; as well as LinkedIn connections.

Needless to say, this incident was concerning, given the potential for spamming, scamming, and identity theft of individuals and business owners who used LinkedIn.

At this point, it is important to note that data scraping is not permitted by LinkedIn under the user agreement involving members or the terms of service agreement involving recruiters. In fact, statements by LinkedIn made in April 2021 and June 2021 have emphasized that this recent scraping activity violated LinkedIn terms of service:

When anyone tries to take member data and use it for purposes LinkedIn and our members haven’t agreed to, we work to stop them and hold them accountable.

It was also confirmed that this did not constitute a data breach since no private LinkedIn member data was exposed—the data was scraped from LinkedIn and other websites. In fact, a rash of data scraping has been reported recently, hitting other social media platforms including Facebook and Clubhouse.

How does data scraping differ from a data breach?

These social media companies have strongly pointed out that there has been no data breach since only public information was scraped—no private member information was hacked.

Conversely, when there has been a data breach, private member information held by an organization is hacked and certain obligations are consequently triggered. For example. under PIPEDA, a breach of security safeguards refers to the loss of, unauthorized access to, or unauthorized disclosure of personal information resulting from a breach of an organization’s security safeguards or from a failure to establish those safeguards. PIPEDA has reporting and notification requirements (to the Privacy Commissioner and affected individuals respectively), record-keeping requirements, and very serious consequences for noncompliance. Further details can be found in the Breach of Security Safeguards Regulations.

What can organizations take from this?

Data scraping can have some beneficial uses—but when these uses become questionable, organizations are recommended to review the consent provisions in PIPEDA, the Guidelines for Obtaining Meaningful Consent, and their own policies and procedures and ensure that they are in compliance with privacy laws. In addition, it is important for organizations to appreciate the sensitive nature of biometric information, and the particularly sensitive nature of facial biometric information when examining consent and purposes of collection, use, and disclosure of personal information.

And if individuals and business owners find that they may have been affected by a data scraping incident, the following is recommended:

  • create new and different passwords for online accounts
  • use a password manager or create complicated, unique, and lengthy passwords
  • use antivirus software
  • use two-factor authentication
  • stay away from suspicious messages
  • visit the actual social media account to determine if something is wrong with an account
  • use a VPN

In terms of data breaches, organizations are recommended to enhance their cybersecurity and create an incident response plan. Of course, if there has been a data breach, it is necessary to immediately comply with PIPEDA and the Breach of Security Safeguards Regulations.

  • About
  • Latest Posts
Follow me
Christina Catenacci, BA, LLB, LLM, PhD
Christina Catenacci, BA, LLB, LLM, PhD, is a member of the Law Society of Ontario. Christina worked as an editor with First Reference between 2005 and 2015 working on publications including The Human Resources Advisor (Ontario, Western and Atlantic editions), HRinfodesk, and First Reference Talks blog discussing topics in Canadian Labour and Employment Law. She continues to contribute to First Reference Talks as a regular guest blogger, where she writes on privacy and surveillance topics. Christina has also appeared in the Montreal AI Ethics Institute's AI Brief, International Association of Privacy Professionals’ Privacy Advisor, Tech Policy Press, and Slaw - Canada's online legal magazine.
Follow me
Latest posts by Christina Catenacci, BA, LLB, LLM, PhD (see all)
  • Hefty GDPR fine for Meta - January 20, 2023
  • 2022 report: More data breaches and costs rising - November 1, 2022
  • Bill C-27: a look at proposed AI provisions - August 9, 2022

Article by Christina Catenacci, BA, LLB, LLM, PhD / Business, Information Technology, Privacy / antivirus, biometric data, cyber incident response plans, cybersecurity, dark web, Data breach, data scraping, email harvesting, facial recognition technology, hacker attacks, LinkedIn, password protection, privacy law, social media, two-factor authentication

Share with a friend or colleague

Get the Latest Posts in your Inbox for Free!

Electronic monitoring

About Christina Catenacci, BA, LLB, LLM, PhD

Christina Catenacci, BA, LLB, LLM, PhD, is a member of the Law Society of Ontario. Christina worked as an editor with First Reference between 2005 and 2015 working on publications including The Human Resources Advisor (Ontario, Western and Atlantic editions), HRinfodesk, and First Reference Talks blog discussing topics in Canadian Labour and Employment Law. She continues to contribute to First Reference Talks as a regular guest blogger, where she writes on privacy and surveillance topics. Christina has also appeared in the Montreal AI Ethics Institute's AI Brief, International Association of Privacy Professionals’ Privacy Advisor, Tech Policy Press, and Slaw - Canada's online legal magazine.

Footer

About us

Established in 1995, First Reference is the leading publisher of up to date, practical and authoritative HR compliance and policy databases that are essential to ensure organizations meet their due diligence and duty of care requirements.

First Reference Talks

  • Home
  • About
  • Archives
  • Resources
  • Buy Policies

Main Menu

  • About First Reference
  • Resources
  • Contact us
  • 1 800 750 8175

Stay Connected

  • Facebook
  • LinkedIn
  • Twitter
  • YouTube

We welcome your comments on our blog articles. However, we do not respond to specific legal questions in this space.
We do not provide any form of legal advice or legal opinion. Please consult a lawyer in your jurisdiction or try one of our products.


Copyright © 2009 - 2023 · First Reference Inc. · All Rights Reserved
Legal and Copyright Notices · Publisher's Disclaimer · Privacy Policy · Accessibility Policy