The Future of AI in Government • NAU Public Service Academy
From Chatbots to Agentic AI
One request, handled five ways, from a plain chatbot to an autonomous agent. Agency is a sliding scale, so watch how the picture changes and where a person stays in control.
Choose a scenario
Illustrative scenarios, modeled on real government uses. Each one runs through the same five levels; the resident 311 request is the default.
One resident, three requests at once
Before we climb the spectrum, here is where the work starts.
City service request form
The agent's goal
What a plain chatbot does (level 1)
Start, finish, and the human AI reasoning
Tap any node to see what it does, in plain language and in n8n. The gold marker is one request making the trip.
Where would it land in your organization?
There is no single right level. It depends on your task, your data, and your tolerance for risk. A few questions to take back to your office:
Michael's rule from the first half, “treat AI as a structured drafting engine, not a knowledge source,” was about a chatbot. Does it still hold at Level 4?
For a given task, what is the lowest level that would do the job? What would justify climbing higher?
When a vendor's material says “agentic” or “AI-powered,” what does the tool actually do on its own? Which level would you place it at?
For each use, where should the human stay? Is the AI referencing your own data, and do you know which sources?
The policy question to take home: Do you have an AI use policy? Does it say anything about agentic AI, the levels where the AI can act on its own?
And your vendors: much of government AI arrives through procurement rather than in-house builds. Where does a vendor's tool sit on this spectrum, who is accountable for what it does, and does your AI policy cover third-party AI at all?
Where do your current and planned AI uses sit on this spectrum?
What are the benefits and the limitations at the level you are weighing, and who carries the risk if it is wrong?
The level frame itself has a research lineage: a 2000 model by Parasuraman, Sheridan, and Wickens described automation applied across a continuum of levels, from fully manual to fully automatic, and asked which functions should be automated and to what extent.
A question to start with: what is one small, low-risk task you could pilot, and what would you want in place, the data, a human check, a log, before you did?
Working with vendors, in practice: have the vendor complete an AI fact sheet (what it does, what data it uses, its limits, what human oversight it has) and put accountability, audit rights, data retention, and records access in the contract. The GovAI Coalition publishes free templates for this. One caveat worth naming: those templates are general AI governance, not agentic-specific, so add the agentic questions yourself, where on this spectrum the tool sits, what it can do without a person, and what actions it may take on its own. On public records: in Arizona a public record is one made or received in connection with public business, in any form, so AI prompts, outputs, and logs tied to official work can be disclosable, and the contract should make clear who holds those records and who answers a request.
Where is the human?
Human oversight is its own spectrum. For decisions affecting rights or safety, the Office of Management and Budget (OMB) requires human oversight, intervention, and accountability: a person in or on the loop.
Human in the loop
The system stops and waits for a person to approve before it acts. An approval gate.
Example: an AI flags a possible tumor; a radiologist makes the final call.
Fits when the decision is high-impact or high-stakes. Levels 1 to 4 live here.
Human on the loop
The AI runs on its own; a person monitors and can override or hit the kill switch.
Example: supervised driving, or an escalation queue that flags edge cases for review.
Fits when it is routine work at scale, with anomalies escalated. Level 5 and scaled automation.
Human out of the loop
Full autonomy, no human. The AI senses, decides, and acts alone.
Example: high-frequency trading, too fast for a person to intervene.
For government: not appropriate for anything affecting rights or safety.
The government rule: for high-impact AI (output that is a principal basis for decisions with a legal, material, binding, or significant effect on rights or safety, per OMB), keep a human in or on the loop: approval for high-impact decisions, continuous monitoring, and clear override and rollback paths. Human intervention is part of the architecture, not a safety net added later.
The agent loop: ReAct, plus reflection
Scenario:
We call this the agent loop: ReAct, plus reflection. ReAct (reason and act), from a 2022 paper, is the most common pattern: reason, act, observe, repeat. We add reflection, the agent reviewing and improving its own work, which Andrew Ng lists as one of the four agentic design patterns (with tool use, planning, and multi-agent collaboration).
Other names you may hear for the same idea: IBM calls it Think, Act, Observe; some borrow the military OODA loop (Observe, Orient, Decide, Act). Related but distinct patterns include Plan-and-Execute (plan all the steps up front) and Reflexion (the agent stores written self-critiques in memory to improve next time). The exact steps and order vary by source; we use the most common, source-backed wording.
Reason / Plandecide next step
Actcall a tool
Observeread result
Reflectcheck and improve
The gold arc returns to the start. Reason, act, observe, reflect, repeat: that return is the loop.
StartingThe loop is running.
What the agent knows so far
Looping
Each pass adds a line here as the agent makes progress.
Goal progress
Iteration 1 of up to 5 • stops early when the goal is met