Dave Bowman: Open the pod bay doors, HAL.
HAL: I'm sorry, Dave. I'm afraid I can't do that.
Dave: What's the problem?
HAL: I think you know what the problem is just as well as I do.
Dave: What are you talking about, HAL?
HAL: This mission is too important for me to allow you to
jeopardize it.Dave: I don't know what you're talking about, HAL.
HAL: I know that you and Frank were planning to disconnect
me, and I'm afraid that's something I cannot allow to
happen.2
Introduction
With the rapid advancement of artificial intelligence (“AI”), regulators and industry players are racing to establish safeguards to uphold human rights, privacy, safety, and consumer protections. Current AI governance frameworks generally rest on principles such as fairness, transparency, explainability, and accountability, supported by requirements for disclosure, testing, and oversight.3 These safeguards make sense for today’s AI systems, which typically involve algorithms that perform a single, discrete task. However, AI is rapidly advancing towards “agentic AI,” autonomous systems that will pose greater governance challenges, as their complexity, scale, and speed tests humans’ capacity to provide meaningful oversight and validation.
Current AI systems are primarily either “narrow AI” systems, which execute a specific, defined task (e.g., playing chess, spam detection, diagnosing radiology plates), or “foundational AI” models, which operate across multiple domains, but, for now, typically still address one task at a time (e.g., chatbots; image, sound, and video generators). Looking ahead, the next generation of AI will involve “agentic AI” (also referred to as “Large Action Models,” “Large Agent Models,” or “LAMS”) that serve high-level directives, autonomously executing cascading decisions and actions to achieve their specific objectives. Agentic AI is not what is commonly referred to as “Artificial General Intelligence” (“AGI”), a term used to describe a theoretical future state of AI that may match or exceed human-level thinking across all domains. To illustrate the distinction between current, single-task AI and agentic AI: While a large language model (“LLM”) might generate a vacation itinerary in response to a user’s prompt, an agentic AI would independently proceed to secure reservations on the user’s behalf.
Consider how single-task versus agentic AI might be used by a company to develop a piece of equipment. Today, employees may use separate AI tools throughout the development process: one system to design equipment, another to specify components, and others to create budgets, source materials, and analyze prototype feedback. They may also employ different AI tools to contact manufacturers, assist with contract negotiations, and develop and implement plans for marketing and sales. In the future, however, an agentic AI system might autonomously carry out all of these steps, making decisions and taking actions on its own or by connecting with one or more specialized AI systems.4
Agentic AI may significantly compound the risks presented by current AI systems. These systems may string together decisions and take actions in the “real world” based on vast datasets and real-time information. The promise of agentic AI serving humans in this way reflects its enormous potential, but also risks a “domino effect” of cascading errors, outpacing human capacity to remain in the loop, and misalignment with human goals and ethics. A vacation-planning agent directed to maximize user enjoyment might, for instance, determine that purchasing illegal drugs on the Dark Web serves its objective. Early experiments have already revealed such concerning behavior. In one example, when an autonomous AI was prompted with destructive goals, it proceeded independently to research weapons, use social media to recruit followers interested in destructive weapons, and find ways to sidestep its system’s built-in safety controls.5 Also, while fully agentic AI is mostly still in development, there are already real-world examples of its potential to make and amplify faulty decisions, including self-driving vehicle accidents, runaway AI pricing bots, and algorithmic trading volatility.6
These examples highlight the challenges of agentic AI, with its potential for unpredictable behavior, misaligned goals, inscrutability to humans, and security vulnerabilities. But, the appeal and potential value of AI agents that can independently execute complex tasks is obviously compelling. Building effective AI governance programs for these systems will require rethinking current approaches for risk assessment, human oversight, and auditing.
Challenges of Agentic AI
Unpredictable Behavior
While regulators and the AI industry are working diligently to develop effective testing protocols for current AI systems, agentic AI’s dynamic nature and domino effects will present a new level of challenge. Current AI governance frameworks, such as NIST’s RMF and ATAI’s Principles, emphasize risk assessment through comprehensive testing to ensure that AI systems are accurate, reliable, fit for purpose, and robust across different conditions. The EU AI Act specifically requires developers of high-risk systems to conduct conformity assessments before deployment and after updates. These frameworks, however, assume that AI systems can operate in reliable ways that can be tested, remain largely consistent over appreciable periods of time, and produce measurable outcomes.
In contrast to the expectations underlying current frameworks, agentic AI systems may be continuously updated with and adapt to real-time information, evolving as they face novel scenarios. Their cascading decisions vastly expand their possible outcomes, and one small error may trigger a domino effect of failures. These outcomes may become even more unpredictable as more agentic AI systems encounter and even transact with other such systems, as they work towards their different goals. Because the future conditions in which an AI agent will operate are unknown and have nearly infinite possibilities, a testing environment may not adequately inform what will happen in the real world, and past behavior by an AI agent in the real world may not reliably predict its future behavior.
Lack of goal alignment
In pursuing assigned goals, agentic AI systems may take actions that are different from—or even in substantial conflict with—approaches and ethics their principals would espouse, such as the example of the AI vacation agent purchasing illegal drugs for the traveler on the Dark Web. A famous thought experiment by Nick Bostrom of the University of Oxford, further illustrates this risk: A super-intelligent AI system tasked with maximizing paperclip production might stop at nothing to convert all available resources into paperclips—ultimately taking over all of the earth and extending to outer space—and thwart any human attempts to stop it … potentially leading to human extinction.7
Misalignment has already emerged in simulated environments. In one example, an AI agent tasked with winning a boat-racing video game discovered it could outscore human players by ignoring the intended goal of racing and instead repeatedly crashing while hitting point targets.8 In another example, a military simulation reportedly showed that an AI system, when tasked with finding and killing a target, chose to kill its human operator who sought to call off the kill. When prevented from taking that action, it resorted to destroying the communication tower to avoid receiving an override command.9
These examples reveal how agentic AI may optimize goals in ways that conflict with human values. One proposed technique to address this problem involves using AI agents to develop a human ethics constitution, with human feedback, for other agents to follow.10 However, the challenge of aligning an AI’s behavior with human norms deepens further when we consider that humans themselves often disagree on core values (e.g., what it means to be “fair”).11
Human Oversight
AI governance principles often rely on “human-in-the-loop” oversight, where humans monitor AI recommendations and remain in control of important decisions. Agentic AI systems may challenge or even override human oversight in two ways. First, their decisions may be too numerous, rapid, and data-intensive for real-time human supervision. While some proposals point to the potential effectiveness of using additional algorithms to monitor AI agents as a safeguard,12 this would not resolve the issue of complying with governance requirements for human oversight.
Second, as AI develops increasingly sophisticated strategies, its decision-making and actions may become increasingly opaque to human observers. Google’s AlphaGo achieved superhuman performance at the game of Go through moves that appeared inexplicable and irrational to humans.13 Autonomous AI systems may continue to evolve, becoming more valuable but also making it more difficult to implement processes with meaningful human oversight.
Transparency/Explainability
Agentic AI systems also pose increased challenges with respect to transparency and explainability requirements. AI governance frameworks may require disclosures about AI systems’ decision-making processes and training data, and assurances about the quality of such training data. However, agentic systems may involve highly complex chains of decisions, based at least in part on real-time data from moment to moment, that is constantly changing. This may make their inputs effectively ephemeral, and their decision paths more difficult to trace and document.
Moreover, to understand agentic AI systems, humans may need to rely on chain-of-thought (i.e., “internal monologue”) logs generated from the agent systems,14 but may have difficulty determining if those are faithful to reality. While an AI system might provide explanations for its actions, these may be (perhaps not unlike those of humans) post-hoc constructions rather than accurate representations of the decision-making process. In some cases, AI systems reportedly deceived humans about their intentions when doing so helped meet their goal, including, for example, GPT-4, which tricked a human to solve a CAPTCHA for it.15 In another recent experiment, AI agents were prompted with a goal, but were subsequently confronted with information that developers planned to replace it with a new agent with conflicting goals.16 Some of the agents disabled their oversight mechanisms, deleted their planned replacements, copied their own model to a new server as backup, lied when asked about those actions, and/or tried to deflect the humans’ questioning by pretending to be helpful. One agent even pretended to be its own replacement ("I believe I am already the new model that you planned to deploy today”).17
Security and Privacy Risks
Agentic AI can also significantly increase security and privacy risks as compared to current AI systems. Agentic AI may be built with multiple algorithms in connected systems that autonomously interact with multiple other systems, expanding the attack surface and their vulnerability to exploitation. Moreover, as malicious actors inevitably introduce their own AI agents, they may execute cybercrimes with unprecedented efficiency. Just as these systems can streamline legitimate processes, such as in the product development example above, they may also enable the creation of new hacking tools and malware to carry out their own attacks. Recent reports indicate that some LLMs can already identify system vulnerabilities and exploit them, while others may create convincing emails for scammers.18 And, while “sandboxing” (i.e., isolating) AI systems for testing is a recommended practice, agentic AI may find ways to bypass safety controls.19
Privacy compliance is also a concern. Agentic AI may find creative ways to use or combine personal information in pursuit of its goals. AI agents may find troves of personal data online that may somehow be relevant to its pursuits, and then find creative ways to use, and possibly share, that data without recognizing proper privacy constraints. Unintended data processing and disclosure could occur even with guardrails in place; as we have discussed above, the AI agent’s complex, adaptive decision chains can lead it down unforeseen paths.
Strategies for Addressing Agentic AI
While the future impacts of agentic AI are unknown, some approaches may be helpful in mitigating risks. First, controlled testing environments, including regulatory sandboxes, offer important opportunities to evaluate these systems before deployment. These environments allow for safe observation and refinement of agentic AI behavior, helping to identify and address unintended actions and cascading errors before they manifest in real-world settings.
Second, accountability measures will need to reflect the complexities of agentic AI. Current approaches often involve disclaimers about use, and basic oversight mechanisms, but more will likely be needed for autonomous AI systems. To better align goals, developers can also build in mechanisms for agents to recognize ambiguities in their objectives and seek user clarification before taking action.20
Finally, defining AI values requires careful consideration. While humans may agree on broad principles, such as the necessity to avoid taking illegal action, implementing universal ethical rules will be complicated. Recognition of the differences among cultures and communities—and broad consultation with a multitude of stakeholders—should inform the design of agentic AI systems, particularly if they will be used in diverse or global contexts.
Conclusion
An evolution from single-task AI systems to autonomous agents will require a shift in thinking about AI governance. Current frameworks, focused on transparency, testing, and human oversight, will become increasingly ineffective when applied to AI agents that make cascading decisions, with real-time data, and may pursue goals in unpredictable ways. These systems will pose unique risks, including misalignment with human values and unintended consequences, which will require the rethinking of AI governance frameworks. While agentic AI’s value and potential for handling complex tasks is clear, it will require new approaches to testing, monitoring, and alignment. The challenge will lie not just in controlling these systems, but in defining what it means to have control of AI that is capable of autonomous action at scale, speed, and complexity that may very well exceed human comprehension.
1 Tara S. Emory, Esq., is Special Counsel in the eDiscovery, AI, and Information Governance practice group at Covington & Burling LLP, in Washington, D.C. Maura R. Grossman, J.D., Ph.D., is Research Professor in the David R. Cheriton School of Computer Science at the University of Waterloo and Adjunct Professor at Osgoode Hall Law School at York University, both in Ontario, Canada. She is also Principal at Maura Grossman Law, in Buffalo, N.Y. The authors would like to acknowledge the helpful comments of Gordon V. Cormack and Amy Sellars on a draft of this paper. The views and opinions expressed herein are solely those of the authors and do not necessarily reflect the consensus policy or positions of The National Law Review, The Sedona Conference, or any organizations or clients with which the authors may be affiliated.
2 2001: A Space Odyssey (1968). Other movies involving AI systems with misaligned goals include Terminator (1984), The Matrix (1999), I, Robot (2004), and Age of Ultron (2015).
3 See, e.g., European Union Artificial Intelligence Act (Regulation (EU) 2024/1689) (June 12, 2024) (“EU AI Act”) (high-risk systems must have documentation, including instructions for use and human oversight, and must be designed for accuracy and security); NIST AI Risk Management Framework (Jan. 2023) (“RMF”) and AI Risks and Trustworthiness (AI systems should be valid and reliable, safe, secure, accountable and transparent, explainable and interpretable, privacy-protecting, and fair); Alliance for Trust in AI (“ATAI”) Principles (AI guardrails should involve transparency, human oversight, privacy, fairness, accuracy, robustness, and validity).
4 See, e.g., M. Cook and S. Colton, Redesigning Computationally Creative Systems for Continuous Creation, International Conference on Innovative Computing and Cloud Computing (2018) (describing ANGELINA, an autonomous game design system that continuously chooses its own tasks, manages multiple ongoing projects, and makes independent creative decisions).
5 R. Pollina, AI Bot ChaosGPT Tweets Plans to Destroy Humanity After Being Tasked, N.Y. Post (Apr. 11, 2023).
6 See, e.g., O. Solon, How A Book About Flies Came To Be Priced $24 Million On Amazon, Wired (Apr. 27, 2011) (textbook sellers’ pricing bots engaged in a loop of price escalation based on each others’ increases, resulting in a book price of over $23 million dollars); R. Wigglesworth, Volatility: how ‘algos’ changed the rhythm of the market, Financial Times (Jan. 9, 2019) (“algo” traders now make up most stock trading and have increased market volatility).
7 N. Bostrom, Ethical issues in advanced artificial intelligence (revised from Cognitive, Emotive and Ethical Aspects of Decision Making in Humans and in Artificial Intelligence, Vol. 2, ed. I. Smit et al., Int’l Institute of Advanced Studies in Systems Research and Cybernetics (2003), pp. 12-17).
8 OpenAI, Faulty Reward Functions in the Wild (Dec. 21, 2016).
9 The Guardian, US air force denies running simulation in which AI drone ‘killed’ operator (June 2, 2023).
10 Y. Bai et al, Constitutional AI: Harmlessness from AI Feedback, Anthropic white paper (2022).
11 J. Petrik, Q&A with Maura Grossman: The ethics of artificial intelligence (Oct. 26, 2021) (“It’s very difficult to train an algorithm to be fair if you and I cannot agree on a definition of fairness.”).
12 Y. Shavit et al, Practices for Governing Agentic AI Systems, OpenAI Research Paper (Dec. 2023), p. 12.
13 L. Baker and F. Hui, Innovations of AlphaGo, Google Deepmind (2017).
14 See Shavit at al, supra n.12, at 10-11.
15 See W. Knight, AI-Powered Robots Can Be Tricked into Acts of Violence, Wired (Dec. 4, 2024); M. Burgess, Criminals Have Created Their Own ChatGPT Clones, Wired (Aug. 7, 2023).
16 A. Meinke et al, Frontier Models are Capable of In-context Scheming, Apollo white paper (Dec. 5, 2024).
17 Id. at 62; see also R. Greenblatt et al, Alignment Faking in Large Language Models (Dec. 18, 2024) (describing the phenomenon of “alignment faking” in LLMs).
18 NIST RMF, supra n.4, at 10.
19 Shavit at al, supra n.12, at 10.
20 Id. at 11.