Analysis

Federal Agencies Largely Miss the Mark on Documenting AI Compliance Plans as Required by AI Executive Order

November 21, 2024 | Mayu Tobin-Miyaji, EPIC Law Fellow

On September 23, 2024, only about half of the U.S. federal agencies complied with the March 2024 OMB Memo (“M-24-10”)‘s requirement for agencies to publish compliance plans consistent with the memo, which implements requirements under the 2023 Executive Order (“EO 14110”) on Safe, Secure, and Trustworthy Development and Use of Artificial Intelligence.^[1] Beyond the failure of about half of the agencies to respond, review of the published compliance plans reveal that they leave much to be desired in transparency on the AI systems the agencies use and their potential to impact the rights and safety of people in the U.S.

Background: AI Executive Order and OMB Memo Requirements

The U.S. federal government is the single biggest procurer of AI technology in the U.S. In 2023, the federal government purchased more than $100 billion in IT products and services. Government uses of AI can touch broad and high-risk portions of people’s lives, including benefits provisions, climate change, disaster relief, law enforcement, immigration and deportations, healthcare, and education. This makes federal agencies’ uses of AI a high-risk action that must be carefully scrutinized.

While proponents of AI argue that it can fuel innovation and increase efficiency, AI is often criticized for inaccuracies in their systems and creating more potential harms than benefits. For example, AI models often embed existing biases and societal inequities in their outputs, creating discriminatory and unjust outcomes. AI systems can also create a perception that AI outputs are more “objective” than a human’s, leading to outsized trust in that output, obscuring biases and errors, and undermining individuals’ civil rights and access to government services. Furthermore, even if the technology were accurate, government use of high-risk AI systems like facial recognition technology threatens indiscriminate and ongoing privacy violations of millions of people. AI tools can only increase efficiency without sacrificing the rights and safety of people in the U.S. if they are governed by proper accountability mechanisms for accuracy, transparency, and civil rights protections. In an attempt to address these risks, the Biden administration released the 2023 Executive Order (“EO 14110”) on Safe, Secure, and Trustworthy Development and Use of Artificial Intelligence.

The OMB released its memo on AI governance, innovation, and risk management within federal agencies in March 2024 (“M-24-10”) as required under EO 14110. Importantly, M-24-10 includes tangible requirements for agencies using AI technologies, rather than mere guidelines. EPIC has engaged with and contributed to the OMB throughout M-24-10’s development. In December 2023, EPIC submitted comments on the draft OMB guidance, commending the OMB for requiring pre-deployment impact assessments, ongoing and independent AI testing, public AI use case inventories, and heightened risk management requirements for safety- and rights-impacting AI systems.

One key set of requirements for agencies in M-24-10 involves assessing AI use cases on their impact. First, M-24-10 requires an AI use case inventory to be made public unless it falls under specific exceptions. Second, M-24-10 requires agencies to determine whether each of their AI use cases is rights- or safety-impacting. Rights-impacting AI use cases are those whose output serves as a principal basis for a decision that significantly impacts an individual’s civil rights, privacy, equal opportunities, or access to government services.^[2] Safety-impacting AI is defined as those whose outputs serve as a principal basis for a decision significantly impacting the safety of human life or well-being, environment, or critical infrastructure.^[3] M-24-10 lists AI use cases that are presumed to be rights- or safety-impacting. For example, AI use in medicine, elections, law enforcement, or provision of medical diagnosis would be safety-impacting.^[4] AI use cases in speech tracking or censoring, recidivism predictions, housing, employment, credit decisions, or education could be rights-impacting.^[5] Third, for AI use cases determined to be rights- or safety-impacting, the agency must implement minimum risk management practices to mitigate harm. Finally, M-24-10 requires agencies to terminate non-compliant AI use cases.

Agencies were required to submit to the OMB and publicly release an agency plan to achieve consistency with the OMB memo or a written determination that the agency does not use and does not anticipate using AI by September 23, 2024.^[6] By December 1, 2024, agencies must publish the AI use case inventory and ensure any contracts associated with rights- or safety-impacting AI systems are brought into compliance with the requirements of M-24-10 or terminated.^[7]

Compliance Plans Analysis: One Step Forward, Four Steps Back

We identified five high-level themes in our assessment of 31 agency compliance plans.^[8] Only one theme is positive–the wide adoption of NIST AI Risk Management Framework. The other four themes explain why agencies largely failed to meet the goals of AI use transparency and accountability when publishing their compliance plans. These themes were: 1) High-level and vague compliance plans divorced from real use cases; 2) Unhelpful use case inventory descriptions; 3) Various links to external and non-public policies that undermine harmonized AI policy; and 4) Encouraging AI development for AI’s sake.

Wide Adoption of NIST AI Risk Management Framework

Many agencies, such as the Federal Reserve Board, Housing and Urban Development, the State Department, and the Department of Energy, highlight incorporating the NIST AI Risk Management Framework (“NIST Framework”) as part of their efforts to comply with the executive order and the OMB memo. While compliance with the NIST Framework is voluntary, its widespread adoption is a positive trend toward harmonizing AI risk management, transparency, and accountability across federal agencies. Further, federal agency adoption of the NIST Framework is an indication that the agencies find it useful as a guideline for compliance. Federal government-wide adoption of the NIST Framework would not only set the baseline for a consistent approach to AI risk evaluation and management but could also lead to shared learning and innovation around AI risk management.

High-level Compliance Plans Divorced from Real-world AI Applications

The most pervasive and problematic theme across the compliance plans is their abstract and high-level nature. Some compliance plans are essentially regurgitations of the M-24-10 requirements, with no detailed specifics of how the requirements will be applied and addressed within the agency’s processes.

For example, M-24-10 asks each agency to “[d]escribe any barriers to the responsible use of AI that your agency has identified, as well as any steps your agency has taken (or plans to take) to mitigate or remove these identified barriers.” The Security and Exchange Commission’s compliance plan states “[t]he SEC plans to establish a working group that will be responsible for identifying any barriers to the responsible use of AI, including with respect to IT infrastructure, data practices, and cybersecurity processes.” Thus, the SEC doesn’t yet have a working group that will assess such barriers, much less a plan to mitigate or remove any identified barriers.

Another example of an agency with vague plans is the Department of Energy (“DOE”). On how the DOE will implement the minimum risk management practices for AI use cases deemed to be rights- or safety-impacting, the plan states that “[t]he DOE Rights-and Safety-Impacting AI Working Group will facilitate the implementation of minimum risk management practices. DOE will evaluate use cases collected during the annual AI use case inventory data call and use cases self-reported throughout the year to determine which use cases meet criteria as rights or safety impacting and confirm that the minimum risk management practices are implemented.” This response only repeats the requirement in the OMB M-24-10 — to determine if a use case is rights- or safety-impacting and implement minimum risk management practices. It provides absolutely no concrete steps on how it will achieve these requirements, what the evaluation criteria will be, or what steps will be taken if compliance is not met.

Though some agencies went slightly beyond the DOE and the SEC by pointing to specific working groups within the agency that will oversee such processes, there was often little specificity beyond naming the working groups. The vague language utterly defeats the purpose of M-24-10 — to foster meaningful evaluation of AI use cases and transparency with the public.

High-level, unspecific plans fail to provide any meaningful details of compliance that would demonstrate transparency or allow meaningful accountability tied to their real-world AI use cases. Agencies often failed to list or link to the types of AI systems they use, requiring anyone looking for answers to do independent research to understand the context and potential impact of AI use by each agency.

One example is the Department of Veterans Affairs (“VA”). The VA’s compliance plan states that it has the largest integrated healthcare system in the country, with the largest genomic knowledge base in the world, and trains the largest number of nurses and doctors in the U.S. This broad statement does not explain how the VA might impact individual veterans through specific AI use cases. The VA’s most recent AI use case inventory (not linked in the VA’s compliance plan and discovered only through independent research) reveals use cases that raise serious privacy concerns. One entry is named “Digital Mental Health Tracking App (Behavidence),” and its description states that it is a mobile mental health tracking app that runs passively in the background of veterans’ phones, monitoring the use of various mobile apps, including length of time and frequency. Not explained in the use case inventory, but mentioned in a 2021 news release from the VA, is that these data points are compared to other users with known conditions such as ADHD, depression, anxiety, and stress, and may potentially be used to diagnose the user. This technology raises questions over issues of consent by the veterans to be monitored, the uses of such intimate data by the VA, any third-party data sharing, error or bias rates of monitoring on complex mental health conditions like depression, and potential consequences if an individual is flagged for risk of depression or suicide. The VA’s compliance plan does not hint at such concerning use cases impacting veterans, and the AI use case inventory also discloses limited information. This leads EPIC to conclude that the VA is either unaware that this use case clearly falls into a rights-impacting categorization or is willfully hiding pertinent information in its submission.

Another example involves the Social Security Administration (“SSA”). The SSA’s compliance plan keeps the discussions high-level and mostly repeats the requirements in the M-24-10. The SSA’s compliance plan fails to link to or discuss the SSA’s AI use cases, requiring a reviewer to research if there is a published use case inventory. One use case from the SSA’s most recent inventory is called “Insight,” a natural language processing and AI-based tool that is used by hearings- and appeals-level Disability Program adjudicators to analyze the text of disability decisions and other case data to alert adjudicators of potential issues. A report by the SSA’s Office of the Inspector General (also not linked in the compliance plan) paints a different, more concerning picture of how this AI system can impact individual disability insurance adjudications at the SSA. The report states, the “Office of Appellate Operations (‘OAO’) analysts use Insight to analyze hearing decisions, and make recommendations to OAO adjudicators to affirm, modify, reverse, or remand ALJ hearing decisions,” meaning that its output can impact disability benefit decisions. In the same report, 20% of survey respondents, made up of SSA staff who used Insight, reported accuracy issues with Insight’s flags, and it’s unclear whether these issues have been fixed. Because this use case impacts an individual’s access to government benefits, it would fall under the definition of rights-impacting AI. Nothing in the SSA’s compliance plan suggests it uses AI in ways that can impact an individual applying for disability insurance, leading EPIC to assume that the SSA either does not consider this use case to be rights-impacting or is obscuring concerning AI use cases from the general public by keeping the compliance plan vague.

While the SSA and the VA are specifically discussed here, almost none of the compliance plans include specific AI use cases. Thus, the compliance plans are divorced from on-the-ground implications of AI systems and lacking in detail, instead including only broad and generic language about compliance. This inhibits understanding the compliance plans’ implications without deeper research and goes directly against the goals of EO 14110 to encourage transparency and accountability to develop safe and trustworthy AI.

Existing AI Use Case Inventory Descriptions Lack Meaningful Details

As mentioned above, some agencies have AI use case inventories from previous years stemming from use case inventory obligations under a Trump-era executive order (EO 13960) focused on AI. In 2023, the Government Accountability Office found that agencies’ inventories were incomplete and contained inaccuracies. Today, the use case inventories still obscure a full picture of AI use cases, with descriptions focusing on the technology itself rather than its overall purpose, context of deployment, and who interacts with it and how.

For example, the Treasury Department’s inventory includes an “Account Management Chatbot” with the following description: “The Accounts Management Chatbot leverages a natural language understanding model within the eGain intent engine. This NLU maps utterances to a specific intent, and returns the appropriate knowledge article.” This description does not identify the chatbot’s intended user, what the eGain intent engine is, how, when, or where the chatbot is accessed, and what oversight mechanism exists.

In another example, the Department of State’s inventory has a use case titled “SentiBERTIQ” with the description “GEC A&R uses deep contextual AI of text to identify and extract subjective information within the source material. This sentiment model was trained by fine-tuning a multilingual, BERT model leveraging word embeddings across 2.2 million labeled tweets spanning English, Spanish, Arabic, and traditional and simplified Chinese. The tool will assign a sentiment to each text document and output a CSV containing the sentiment and confidence interval for user review.” This description is not at all clear on how, where, or by whom the tool might be used nor why the State Department is using technology that has been repeatedly demonstrated not to work and which raises significant civil rights concerns. A useful description would include why certain tweets in various languages are to be labeled with sentiments, whose tweets are labeled, and what actions the State Department takes after labeling tweets with certain “sentiments.”

In the draft phase of the OMB M-24-10, EPIC suggested requiring disclosure of the following information at minimum in the AI Use Case Inventory for effective oversight and accountability:

The name of the AI system
The intended uses and limitations of the AI system
All required data inputs for the AI system
A determination on whether the AI system is generative AI
Documentation and results of all testing and evaluation procedures completed since the previous submission
Any completed incident reports relevant to the AI system
Documentation on AI system development or procurement, including information on related vendors, sources of data, and relevant contracts

The current state of the lackluster AI use case inventory descriptions shows that explicitly requiring the above information would go a long way toward transparency and accountability. Agencies should proactively include the above information in the use case inventories they are required to publish before December 1, 2024, and the OMB should reconsider requiring the above information from agencies.

Links and References to External Policies Undermining Harmonized AI Policy

While the agency compliance plans were meant to be a clear and transparent way to document agency compliance measures, some submitted plans were instead a patchwork of multiple links to other policies or references to internal, non-public policies. This makes these compliance plans difficult or impossible to understand in their entirety because not all relevant documents are available to the reviewer.

For example, the General Services Administration’s (“GSA”) compliance plan includes a reference to the release of the directive “Use of Artificial Intelligence at GSA” in June 2024, which purportedly aligns with M-24-10. The directive itself is 31 pages long, and the GSA also states that it developed an internal guidance on the use of generative AI. To understand the entirety of the GSA’s compliance plan, one would have to read through both of those policies.

In another example, the Department of Homeland Security (“DHS”)’s compliance plan references its Artificial Intelligence Roadmap (24 pages), Policy Statement 139-06: Acquisition and Use of AI and ML by DHS Components (7 pages), Directive (026-11) on the Use of Face Recognition and Face Capture Technologies (7 pages), and DHS Policy (139-07) on the Use of Commercial Generative AI (GenAI) Tools (3 pages) for a grand total of 41 additional pages of supplemental reading in addition to the 13-page compliance plan to understand their submission.

Coupled with the vague and high-level nature of many of the submitted compliance plans, having to track down additional policies and directives to understand concrete plans for compliance is directly counter to transparency goals. While M-24-10 itself asks agencies how they have worked to update internal policies, guidelines, or principles to meet requirements, few of the responses actually provide these specifics. Merely linking to a document or stating that an agency has updated an internal policy leaves the agency policy just as opaque, providing absolutely none of the transparency required for meaningful protections, oversight, or evaluation. In complying with the Executive Order and the OMB memo focused on increased accountability and transparency, especially around AI use cases that may impact individual rights or safety, agencies must set out the concrete steps they will take to comply.

Overemphasis on AI Development Talent at the Cost of AI Oversight Capabilities

As EPIC previously pointed out, agencies should not adopt AI technologies purely for AI’s sake. Unfortunately, many agencies are over-emphasizing hiring new talent to develop AI technologies and under-emphasizing the need to have qualified employees that can implement governance and accountability measures for AI use cases.

For example, the Department of Transportation’s compliance plan in the AI talent section only focuses on acquiring and maintaining employees for development and deployment of AI solutions, completely ignoring the need for qualified employees to effectively regulate and evaluate AI systems. In another example, NASA’s compliance plan states, “NASA is taking steps to expand the use of artificial intelligence and machine learning to amplify productivity and increase capabilities.” NASA’s AI talent section highlights its deeply technical personnel and opportunities for existing employees to upskill within AI, but again omits skills building and hiring for AI governance, risk management, or assessing AI systems for their impact on safety and rights.

Skills for AI development and deployment and skills for AI governance and evaluating AI systems for potential impact on rights and safety are separate qualifications—having the former does not necessarily entail having the latter. Agencies must invest in both categories of skills in their personnel to comply with M-24-10 and EO 14110.

Conclusion

While the agency compliance plans and AI use case inventories were meant to move agency AI use toward transparency and accountability, the problematic trends identified in the submitted documents undercut those goals. Agencies must make their AI use case inventories useful, implement coherent AI governance policies that are understandable for the public, and invest in qualified talent that will oversee AI governance in addition to talent for AI development to achieve the goals of safe, secure, and trustworthy AI.

^[1] We found that 31 agencies posted compliance plans out of 50 federal agencies.

^[2] M-24-10 Section 6.

^[3] M-24-10 Section 6.

^[4] For the full list, see M-24-10 Appendix I.1.

^[5] For the full list, see M-24-10 Appendix I.2.

^[6] M-24-10 Section 3(a)(iii).

^[7] M-24-18 at Section 2(c)(ii).

^[8] Here is a list of the compliance plans that were readily found and assessed for this blog post: Department of Agriculture, Department of Defense, Department of Energy, Department of Health and Human Services, Department of Homeland Security, Department of Housing and Urban Development, Department of the Interior, Department of Labor, Department of State, Department of Transportation, Department of the Treasury, Department of Veterans Affairs, Environmental Protection Administration, General Services Administration, NASA, Nuclear Regulatory Commission, Office of Personnel Management, Social Security Administration, U.S. Agency for International Development, Equal Employment Opportunity Commission, U.S. Trade and Development Agency, Securities and Exchange Commission, Export-Import Bank of the U.S., National Science Foundation, National Archives and Records Administration, Federal Retirement and Thrift Investment Board, Federal Housing Finance Agency, Federal Reserve Board, Department of Justice, Department of Education, Department of Commerce.

Support Our Work

EPIC's work is funded by the support of individuals like you, who allow us to continue to protect privacy, open government, and democratic values in the information age.

Donate