Thoughts

Claude Opus 4.6: what regulated firms need to change now

When the model starts behaving like a junior operator, the question isn't "should we try AI?" It is: are we ready for AI that actually does work?

There is a moment you realise the "AI tool" era is ending.

It isn't when a model writes a decent email. It is when the model starts behaving like a junior operator. It can hold the whole system in its head. It can plan work. It can delegate to other agents. It can keep going long after the novelty wears off.

That is what the latest wave of agentic capability is pointing at. In a recent breakdown of Claude Opus 4.6, a few claims stood out. Multi-agent "team swarms". Context windows measured in entire codebases. Demonstrations where the model finds vulnerabilities or closes real issues in production.

If even a slice of that is true, the question for professional services isn't "should we try AI?" It is: are we ready for AI that actually does work?

The moment a system can do work, it can also do the wrong work, faster. In a regulated environment, speed without supervision is just accelerated risk.


The believer moment for firms and teams

Here is the believer moment for a practice lead. A paraplanner stops rewriting the same client email for the 40th time, because the workflow drafts it, checks it against the firm's tone and required disclosures, and routes it for approval. Same day. No "AI program".

Here is the believer moment for a compliance lead. You can open a record and see the evidence trail. What input artefacts were used. What was generated. What was flagged as uncertain. Who approved the final output. What changed between draft and send.

That is the difference between "AI adoption" and a supervision-ready operating model. The jump in capability matters because it shifts the baseline from prompting to workflows, and from a single assistant to systems that coordinate work.


What changed, in practical terms

Here is what the headline claims actually mean for a firm trying to operate.

Bigger context means more whole-workflow coverage

When context windows get large enough to hold policies, templates, and examples together, you can build workflows that reference the same material your people already rely on. That is useful, but it raises the stakes. More coverage increases the chance the system touches something it shouldn't.

The winning pattern isn't "give it everything". It is clear boundaries, disciplined retrieval, and logs.

"Team swarms" means delegation becomes a feature

Multi-agent coordination is where things get spicy for professional services, because the real work in a practice is already a swarm. Admin gathers a pack. A paraplanner drafts. An adviser reviews. Compliance samples and checks. Ops monitors exceptions.

Agent swarms can mirror that. Draft. Check. Flag exceptions. Prepare handover. Orchestrate the steps. This is the moment where "AI" stops being a tool and starts becoming a process layer. And a process layer needs governance.

Vulnerability discovery is a warning shot

If a model can find serious issues in code, it is a reminder that more autonomy plus deeper access increases the blast radius. "Let everyone use whatever tool they want" doesn't survive this era.


Chatbot vs workflow vs supervision-ready workflow

Most firms are currently stuck between two unhelpful extremes. "We banned it" (so people do it anyway, quietly). Or "we embraced it" (without a defensible operating model).

The better benchmark has three tiers.

  • Chatbot usage. Ad hoc, inconsistent, hard to evidence.
  • Workflow usage. Repeatable steps with defined inputs and outputs.
  • Supervision-ready workflow. Workflow plus human-in-the-loop gates, boundaries, logs, and an exception path.

If you are a compliance lead, you care about that third tier. If you lead a team, you should care too, because the third tier is what keeps you from accumulating compliance debt while chasing productivity.

The old strategy breaks: "experiment first, govern later"

For the last two years, the common pattern has been: tool appears, people play with it, leadership tries to catch up with a policy. That was barely survivable when the tool was mostly an assistant.

It is a bad plan when the tool becomes an agent. Agents don't just generate content. They take actions. They fetch. They decide. They route. They sometimes execute.

The strategy needs to invert. Define the guardrails first, then ship one workflow fast. Not a 10-week program. A shipped pilot in days.


A phased strategy that still moves at practice speed

Here is the version of "phased" that fits this market.

Crawl (day one): decide what "safe" means here

You don't need a 40-page AI policy to start. You need a one-page stance. Allowed data and not allowed. Allowed tools and blocked. What gets logged and where it lives. What must be reviewed by a human. What triggers escalation.

If you can't answer those five, you can't defend what is already happening in the business.

Walk (days two and three): ship one supervision-ready workflow

Pick one workflow where the risk is controllable and the value is immediate. Client communication drafts with approvals. File note completeness checks with exception flagging. Document intake triage with extraction and an audit trail.

Ship it with a draft, review and send gate. With boundaries and logging (inputs, outputs, approvals, exceptions). With a short operating pack the team can actually use. This is the Adapt2AI pattern. A three-day Pilot-in-a-Box. You walk away with something you can run on Monday.

Run (ongoing): an assurance cadence, not endless projects

Models change, tools change, and expectations shift with them. So the "run" phase is a cadence. Monthly or quarterly checks of workflows and logs. Prompt and playbook updates. One small improvement shipped per cycle. Champions trained to keep it tight.

That is how you scale without chaos.


The "so what"

For compliance leads. Standardise an "approved pattern" you can audit: gates, boundaries, logs, exceptions.

For leaders and advisers. Stop chasing "AI usage" and ship one workflow that gives time back without creating compliance debt.

For ops leaders. Treat controls and handover as first-class features, then scale by cadence.

The practical test

If your model can hold a million tokens and coordinate a team of agents, it can also do a lot of damage if it isn't boxed in.

Here is the practical test. If someone asked you tomorrow, could you show:

  • What gets logged?
  • Who approves outputs?
  • What data boundaries exist?
  • What happens when confidence is low?

If not, you don't need another model announcement. You need one supervision-ready workflow shipped with guardrails.

Which workflow would you ship in three days if you had to stand behind it? If you want a second set of eyes, start with a AI Fitness Review and we will help you choose the right first workflow.