Data Scraping Litigation Tests Limits of Data Privacy Laws

Zarish Baig

Email

213-689-5152

Bio and Articles

Kristin L. Bryan

Email

216-479-8070

Bio and Articles

Find Your Next Job !

LEGAL ASSISTANT II

Experienced Family Law Attorney

Specialist: Legal Information Center Research

Explore More Job Openings

Scraping By: Data Scraping Litigation Continues to Test Limits of Longstanding Data Privacy Laws

by: Zarish Baig, Kristin L. Bryan of Squire Patton Boggs (US) LLP - Privacy World

Monday, November 30, 2020

Print Mail Download info_icon_img

/>i

It is a reoccurring issue in data privacy litigation—a plaintiff commences litigation challenging applications of new technology and raising various claims concerning decades-old data privacy laws that predated the technology at issue. Such is the case of recent data scraping litigation, addressed in greater detail below.

What is data scraping? Good question. To generalize, it is a mechanism of extracting data from websites (including websites not available to the public and accessible only to individuals with user accounts). The practices of Clearview which has been the subject of recent litigation are a prime example. By compiling information scraped from the social media accounts of billions of individuals, Clearview was able to create a massive facial recognition database it subsequently provided to third party customers. However, notwithstanding the clear privacy issues implicated by data scraping, there is no law specifically regulating this practice nationwide (although some state laws, as CPW has already covered, regulate the collection of biometric data). As such, in litigation regarding data scraping, parties are stuck arguing over the application of various statutes that were enacted long before data scraping was prevalent.

As just one example: To address the growing problem of computer hacking, in 1984 Congress passed the Computer Fraud and Abuse Act (the “CFAA”), creating criminal and civil liability for a party who accesses a computer without authorization or in a manner exceeding their authorization. To prevail on a civil CFAA claim, a plaintiff typically must demonstrate that a defendant intentionally accessed a computer without authorization or exceeded the authorized access, and thereby obtained information from a protected computer. The CFAA has been extensively litigated, although courts have not interpreted its provisions consistently. This is true including in regards to data scraping. While courts usually apply the CFAA in manner that protects a website’s publicly available data against third-party unauthorized access, courts have also formulated various standards to determine whether a third party’s access to a website was without authorization or exceeded authorized access in violation of the CFAA.

This is because, among other things, the CFAA prohibits intentionally accessing a protected computer “without authorization” or in a manner that exceeds the authorized access, and obtaining information from such a computer. The CFAA defines “protected computer” broadly, and includes every computer connected to the Internet. The CFAA also prohibits knowingly and with intent to defraud, accessing a protected computer without authorization, or exceeding authorized access, and by means of such conduct furthering the intended fraud and obtaining anything of value. 18 U.S.C. Section 1030. Importantly, however, the CFAA however, does not define the term “without authorization”. This ambiguity in the statute has led to a split among the federal appeals courts regarding how the condition of “without authorization,” as used in the CFAA, should be applied in the context of data scraping. While some circuit courts have broadly looked to whether collecting data from a website violates a website’s terms of use or service, other courts have more narrowly interpreted the condition to require the technical circumvention of some kind of code-based access restriction.

For instance, last year the Ninth Circuit in hiQ Labs, Inc. v. LinkedIn Corp., 938 F.3d 985 (9th Cir. 2019), addressed under what circumstances a company may legally “scrape” data from another company’s website. There, the court determined on a motion for a preliminary injunction that “scraping” publicly available information from LinkedIn likely is not a violation of the CFAA because the LinkedIn computers are publicly accessible. As such, hiQ did not access the computers “without authorization” as required by the CFAA. The Second and Fourth Circuits follow this interpretation of the CFAA as well.

This approach is far from uniform, however. Sw. Airlines v. Farechase, 318 F. Supp. 2d 435, 439-40 (N.D. Tex. 2004) (finding that a plaintiff plausibly alleged a CFAA claim when Southwest “directly informed” the defendant that its scraping activity violated the Use Agreement on Southwest’s website, which was “accessible from all pages on the website,” as well as via “direct repeated warnings and requests to stop scraping.”). The First, Fifth, Seventh and Eleventh Circuits broadly interpret the CFAA to cover violations of corporate computer use restrictions and policies governing authorized uses of databases.

Three years in, the LinkedIn-hiQ battle over data scraping continues in both the Northern District of California, and the Supreme Court of the United States, where LinkedIn’s petition for certiorari is pending. For those who are not familiar, hiQ filed its initial complaint against LinkedIn in 2017, alleging LinkedIn’s cease-and-desist letters to hiQ, followed by LinkedIn restricting hiQ’s access to its website, was anticompetitive and violated state and federal laws. The crux of hiQ’s complaint was that LinkedIn did not have monopoly rights to personal data made publicly available by its users, and that by scraping its website, hiQ did not violate users’ privacy rights (what LinkedIn alleges). As mentioned, the Northern District of California granted hiQ’s request for a preliminary injunction against LinkedIn restricting hiQ’s access to publicly available LinkedIn member profiles. LinkedIn appealed, but the appeal was denied. LinkedIn then filed a petition for certiorari to the SCOTUS, which is currently pending.

Separate from the preliminary injunction, on September 9, 2020, Judge Chen of the Northern District of California granted in part LinkedIn’s motion to dismiss hiQ’s amended complaint. The Court dismissed all claims under the Sherman Act, the federal antitrust legislation. Nine separate causes of action remain, including HiQ’s allegation that LinkedIn violated California’s Business and Professions Code (the California antitrust legation). LinkedIn filed its Answer and Counterclaims on November 20—including counterclaims under, you guessed it, the CFAA.

The specific question pending before the SCOTUS (in hiQ’s words) is: “Whether a professional networking website may rely on the Computer Fraud and Abuse Act’s prohibition on “intentionally access[ing] a computer without authorization” to prevent a competitor from accessing information that the website’s users have shared on their public profiles and that is available for viewing by anyone with a web browser.” Theoretically, if SCOTUS rules in favor of hiQ, LinkedIn members (and users/members of other similar platform) may lose their ability to control where and with whom their personal information is shared once they have made it public through the platform. The ruling would also answer the question on who owns rights to user’s “publicly accessible” data. It is a critical question, and bound to have major impact in the data scraping arena.

So there you have it. Another day, another interesting developing in data privacy litigation. How this all shakes out in regards to data scraping (and what it means for the millions of individuals whose personal data is the target of such scraping) remains to be seen. Stay tuned.