What data can this system touch?

The EU AI Act, NIST's AI Risk Management Framework, and ISO/IEC 42001 agree on almost nothing. Not scope, not structure, not even whether they're law or voluntary guidance. But all three lean on one thing they assume you've already solved: that you can say what data your AI is allowed to touch, and on what basis. In most regulated tenants you can't and the instinct to fix that by locking everything down is the one answer that looks compliant and isn't.

What the frameworks actually ask

The EU AI Act's one concrete data-governance obligation is Article 10, and it's narrower than it's usually quoted as. It binds high-risk AI systems, and it governs the training, validation, and testing datasets that shape them: their relevance, representativeness, sources, preparation, and gaps (EU AI Act, Article 10). A Microsoft 365 Copilot deployment, or an agent reading documents in your tenant, usually isn't that shape: you aren't training a high-risk system on your corpus, you're letting a system read it at runtime. So Article 10 doesn't map literally. What carries over is the expectation behind it: that whoever runs the system can characterise and defend the data it depends on.

NIST is broader and voluntary. Its framework organises into four functions:

govern
map
measure
manage

where govern cuts across the others, and the map function is where you establish a system's context and what it draws on, before you try to measure or manage any risk (NIST AI RMF Core). You cannot map what a system touches if you can't see it in the first place.

ISO/IEC 42001:2023, the AI-management-system standard, runs on the familiar management-system clauses (4 through 10) plus a set of Annex A controls — one group of which is specifically about the data an AI system uses (ISO/IEC 42001:2023). It wants controls you can demonstrate, not intentions you can describe.

The common thread

Three different instruments, three different shapes. The thread isn't a shared clause — it's a shared precondition: to govern an AI system, you must be able to say what data it can reach, and on what basis.

Why that's an architecture question, not a policy one

The frameworks state the expectation. None of them tells you where the answer lives. On the Microsoft stack it lives in two places:

The label-and-rights plane. Copilot reads only what the signed-in user can already open, and where a label applies encryption, that user must hold the EXTRACT and VIEW usage rights before the AI can process the content at all (Microsoft Learn). So for an interactive AI, "what can this system read?" is mechanically the union of what its users can read.
The agent-identity plane. The moment an agent acts on its own identity instead of a borrowed one, that answer no longer comes from a person: the agent gets its own Microsoft Entra Agent ID, carries a human sponsor recorded as accountable for it, and reaches resources through scoped, time-bound tokens (Microsoft Learn).

Put the two together and the precondition has a concrete answer inside your tenant. What data can this system touch? — which identities, human and agent, hold which usage rights over which labelled content. On what basis? — the label's grant for the data, the identity and scopes for the actor. You can point at it. You can audit it. That is a governed answer.

Why locking everything down is a fake answer to it

Faced with "prove you control what the AI can touch," the reflex in a cautious organisation is to let it touch less: more labels, tighter grants, encryption on everything, customer-held keys on the rest. It looks defensible from across the room.

It isn't a governed estate. It's an encrypted one — and those aren't the same thing. A governed estate answers the question deliberately: this population, these rights, this basis, on purpose. An over-restricted estate answers it by accident — access narrowed label by label, exception by exception, until nobody can reconstruct why a given document is reachable by a given identity. You've bought the optics of control and sold the substance of it.

And the substance is exactly what the frameworks ask for. NIST's map function wants you to understand a system's context, not merely shrink it. ISO/IEC 42001 wants controls you can demonstrate, not locks you can gesture at. Article 10 wants a provider who can defend the data, not one who minimised it into a corner where no one can. An estate you can't reason about fails all three, however thoroughly it's encrypted — and over-restriction also manufactures the shadow-AI problem the audit was meant to prevent.

What changes

Stop treating "can you prove the AI is locked down?" as the question. The frameworks aren't asking whether you restricted access. They're asking whether you can account for it — say which identities reach which data, on what basis, and that you meant to.

That reframes the work. The label-and-rights plane and the agent-identity plane stop being two implementation chores and become the literal answer to your EU AI Act, NIST, and ISO posture for AI. Designed deliberately, that architecture is your governance evidence. Locked down by reflex, the same components are an expensive vault you can't explain to anyone, least of all an auditor.