AI Audit Capabilities Reach a Breakthrough via Anthropic

Trending News

District Court Rejects First Amendment Challenge to California Climate Disclosure Requirements

This Week in 340B: August 5 – 11, 2025

DOL Resurrects PAID Program to Supervise Employers’ Self-Audits and Settlements of Potential FLSA and FMLA Violations

Trump Administration Issues AI Action Plan and AI Executive Orders

Winning the Race: America’s AI Action Plan

Innovation Over Regulation – Trump Unveils America’s AI Action Plan

Caught in the Act: Practical and Legal Considerations When Executives’ Office Love Affair Exposed

Florida’s CHOICE Act Becomes Law, Enhancing Certain Non-Compete Agreements

Labor Secretary Lifts Abeyance on OFCCP Disability and Veteran Matters

Find Your Next Job !

Specialist: Legal Information Center Research

LEGAL ASSISTANT II

Experienced Family Law Attorney

Explore More Job Openings

Researchers Announce Breakthrough in AI Audit Capabilities

of Robinson & Cole LLP - Data Privacy + Security Insider

Friday, May 24, 2024

Print Mail Download info_icon_img

info_icon_img

Anthropic has achieved a major milestone by identifying how millions of concepts are represented within their large language model Claude Sonnet, using a process somewhat akin to a CAT scan. This is the first time researchers have gained a detailed look inside a modern, production-grade AI system.

Previous attempts to understand model representations were limited to finding patterns of neuron activations corresponding to basic concepts like text formats or programming syntax. However, Anthropic has now uncovered high-level abstract features in Claude spanning a vast range of concepts – from cities and people to scientific fields, programming elements, and even abstract ideas like gender bias, secrets, and inner ethical conflicts.

Remarkably, they can even manipulate these features to change how the model behaves and force certain types of hallucinations. Amplifying the “Golden Gate Bridge” feature caused Claude to believe it was the Golden Gate Bridge when asked about its physical form (Claude normally responds with a variation on, “I have no form, I am an AI model.”) Intensifying the “scam email” feature overcame Claude’s training to avoid harmful outputs, making it suggest formats for scam emails.

Other features corresponding to malicious behavior or content with the potential for misuse included code backdoors and bioweapons, as well as problematic behaviors like bias, manipulation, and deception. Normally, these features activate when the user asks Claude to “think” about one of these concepts, and Claude’s ethical guardrails keep it from drawing from these sources when generating content. This validates that these features don’t just map to parsing user input but directly shape the model’s responses. It also points to the exact kind of malicious capability that hackers and other unauthorized users will undoubtedly exploit on pirate models.

While much work remains to fully map these large models, Anthropic’s breakthrough seems like an extremely promising step forward in the burgeoning field of AI auditing. And, given that researchers were able to directly tweak the features to influence Claude’s output, this research may also open the door to the sort of under-the-hood tinkering that has eluded generative AI developers for years. Of course, it may also open the door to direct, feature-level regulation as well as creative plaintiff’s arguments as the standard of care for AI developers takes shape.

Read the full blog post from Anthropic here.

Current Public Notices

Post Your Public Notice Today!

PUBLIC NOTICE OF RECIEVERSHIP SALE: Bison Hardwood, LLC

Published: 28 August, 2025

PUBLIC NOTICE OF UCC ARTICLE 9 SALE: Canary, LCC and it’s subsidiaries

Published: 25 August, 2025

PUBLIC NOTICE OF UCC ARTICLE 9 SALE: Interest in Contractor Sales & Services, LLC

Published: 25 August, 2025

PUBLIC NOTICE OF TRUSTEE-ASSIGNEE SALE: Elorac, Inc

Published: 25 August, 2025

PUBLIC NOTICE OF UCC SALE: Shoreview Holding LLC

Published: 25 August, 2025

PUBLIC NOTIC OF AUCTION OF ASSETS: BolderPlay

Published: 22 August, 2025

PUBLIC NOTICE OF DISPOSITION OF COLLATERAL: Vertify, Inc

Published: 20 August, 2025

PUBLIC NOTICE OF UCC ARTICLE 9 SALE: News Direct Corp.

Published: 20 August, 2025

PUBLIC NOTICE OF UCC ARTICLE 9 SALE: Phillips Amsterdam II LLC

Published: 18 August, 2025

PUBLIC NOTICE OF UCC SALE: BMD-III CHT Mezz, LLC

Published: 18 August, 2025

PUBLIC NOTICE OF UCC ARTICLE 9 SALE: Pulse Partners LLC

Published: 14 August, 2025

PUBLIC NOTICE OF UCC SALE: Whitworth Tool, Inc

Published: 12 August, 2025

PUBLIC NOTICE OF UCC ARTICLE 9 SALE: Membership Interests in RINO 17 LLC

Published: 11 August, 2025

PUBLIC NOTICE OF UCC ARTICLE 9 SALE: LCP Hollywood Lender LLC

Published: 8 August, 2025

PUBLIC NOTICE OF UCC SALE: GLP 2206 LLC

Published: 18 July, 2025

Discover more public notices

Current Legal Analysis

Borrowing and Lending 101

by: Financial Poise Faculty

Valuing Lost Profits in Litigation: Making the Case (and the Math) Work

by: Financial Poise Faculty

Practical Tips for Taking and Defending Depositions

by: Michele Schechter

AI Notetaking Tools Under Fire: Lessons from the Otter.ai Class Action Complaint

by: Joseph J. Lazzarotti

Why Anthropic’s Copyright Settlement Changes the Rules for AI Training

by: Timothy P. Scanlan, Jr. , Andrew R. Lee

More from Robinson & Cole LLP

Texas Mini-TCPA Gets a Makeover: What Businesses Need to Know

DOJ Seizes $2.8M in Cryptocurrency + Cash From Zeppelin Operator

by: Linn F. Freedman

CISA Issues Advisory on Chinese State-Sponsored Actors Targeting Critical Infrastructure

by: Linn F. Freedman

Mindvalley Learning Platform to Pay $450,000 to Settle Video Privacy Act Suit Over Meta Pixel

by: Kathryn M. Rattigan

SeatGeek Hit with Class Action Over Sharing User Data with TikTok and Meta

by: Kathryn M. Rattigan

Notetaker App in Litigation Crosshairs

by: Linn F. Freedman

FDA Final Guidance on PCCP: Streamlining FDA Approval for AI-Enabled Medical Devices

by: Linn F. Freedman

Privacy Tip #457 – Whistleblower Alleges DOGE Copied Social Security Data of 548 Million Americans to Cloud Server

by: Linn F. Freedman

Trump Administration Derails Revolution Wind as Court Fight over Federal Policy Wages On

by: Peter R. Knight

Android VPN Apps Linked to Chinese Co (Qihoo 360) Tied to PRC

by: Linn F. Freedman

HIPAA Privacy Rule in Focus: OCR Sheds Light on PHI Disclosures and Access Rights

Privacy Tip #456 – Bipartisan Coalition Urges Instagram to Change New Precise Location-Sharing Feature

by: Linn F. Freedman

“Once In, Always In” for Hazardous Air Pollutants: Back to the Grave

by: Brian C. Freeman , Christopher Y. Eddy

Privacy Tip #455 – Match Group Settles with FTC Over Deceptive Advertising

by: Linn F. Freedman

Pennsylvania Attorney General Announces Recent Cyber-Attack: What You Need to Know about Citrix Bleed 2

Upcoming Events

Sep

15-16

2025

US General Counsel Summit

Sep

15-16

2025

Chief Litigation Officer Summit

Sep

29-30

2025

Intellectual Property Law Institute 2025 - New York

Oct

20-21

2025

Intellectual Property Law Institute 2025 - California

Print