Knowing your data: The foundation for safer AI

Why knowing your data is critical to reducing risk, cutting costs and accelerating AI

Article

Published: April 24, 2026 |

This article is the first in a series exploring how organisations can better understand, govern and act on their data in an increasingly regulated, AI‑enabled environment. Drawing on BDO’s advisory experience and our partnership with data intelligence platform BigID, the series will look at privacy, governance, and responsible AI, as well as the practical journey from data discovery to action.

Through our work with clients, and alongside technology partners such as BigID, we see a recurring challenge. Leaders lack a clear, up-to-date view of where sensitive data sits, who can access it, and what should be kept versus removed.

In our work across data governance and cyber resilience, we consistently see this ‘unknown data’ problem holding organisations back. It slows cloud migrations, undermines confidence in compliance, and introduces avoidable and unnecessary risk to AI programs. Without a clear view of your data estate, it’s very hard to move forward safely or at speed. Increasingly, the challenge is not just discovery but being able to act on what you find in a controlled and auditable way.

In Australia, this is not just a technology challenge, it is also an evidence challenge. Under Australian Privacy Principle 11 (APP 11), organisations must take reasonable steps to protect personal information and also take reasonable steps to destroy or de-identify personal information when it is no longer needed (unless retention is required by law or a court or tribunal order, or it forms part of a Commonwealth record).

You can’t govern what you can’t see. When governance is combined with automated discovery and classification across both structured and unstructured data, supported by platforms designed to scale across the enterprise, organisations are better positioned to reduce data risk, lower storage costs, respond faster to incidents, and prepare appropriately controlled datasets for AI use.

When growth outpaces governance

Boards and executives are increasingly being asked to take a more active role in overseeing data risk. Many organisations, however, are still relying on manual processes, spreadsheets and partial inventories to understand where personal, sensitive, or regulated data sits. At the same time, data volumes, particularly unstructured material across collaboration platforms and file stores, continue to grow quickly.

For many organisations, periodic review models struggle to keep pace with how data is created, shared and reused day-to-day. As a result, risk and cost tend to accumulate quietly in the background.

In practice, we often see this play out in four ways:

Rising cost and complexity: Duplicated and stale data drives up storage costs and creates uncertainty about what can be safely deleted or migrated.
Compliance pressure without evidence: Privacy and security obligations increasingly expect demonstrable controls and timely responses, but manual discovery cannot keep up with growing data estates.
Cyber resilience gaps: Limited visibility into where sensitive data lives, and how access is granted, makes oversharing harder to prevent and impact harder to scope when incidents arise.
AI adoption risk: AI initiatives can stall, or introduce unintended exposure, when organisations cannot confidently answer basics such as what information is feeding AI tools, whether sensitive material is being exposed, and whether appropriate guardrails are in place.

Where a breach does occur, Australia’s Notifiable Data Breaches (NDB) scheme requires organisations covered by the Privacy Act to notify affected individuals and the OAIC when serious harm is likely. Fast assessment and clear records of action therefore become critical.

What ‘good’ looks like now

A modern data governance and security posture is no longer a static inventory. It is a living, continuously updated view of:

What data exists across structured systems and unstructured sources
Where it resides, including cloud, SaaS and on-premises environments
How it is classified, such as personal, health, financial or organisation-specific identifiers
Who can access it, and where access is over-permissioned
What steps are being taken to reduce risk, including deletion, quarantine, restriction or escalation.

Increasingly, organisations are moving beyond reporting alone towards operating models where remediation actions can be triggered, tracked, and evidenced. This shift from insight to action is becoming a core expectation of modern data governance.

This level of visibility is also foundational to responsible AI. It enables organisations to identify AI‑ready datasets, apply appropriate controls and monitor policy risks tied to usage and data exposure, rather than discovering issues after the fact.

A practical roadmap: Five steps to move from unknown data to business confidence

Step 1: Start with discovery across the data estate

Establish visibility across cloud, SaaS and on-premises repositories, covering both structured and unstructured data. Many organisations start with a defined scope, such as high-risk repositories, priority business units or systems supporting AI pilots, to deliver early value.

Step 2: Classify sensitive and regulated data, including local identifiers

Effective programs move beyond file names and keywords. They use pattern-based and machine-learning approaches to identify personal, health, financial and organisation-specific identifiers that matter in practice.

Step 3: Assess exposure and prioritise risk

Not all sensitive data presents the same level of risk exposure. Risk-based prioritisation helps teams focus on issues that materially change risk posture, such as oversharing, ‘toxic’ data combinations, duplication and stale data retained for long periods.

Step 4: Take action

Discovery only delivers value when insights can be translated into safe, practical remediation. True control comes from the ability to safely operationalise outcomes, not just identify issues. This commonly includes:

Automated actions (delete, archive, quarantine, restrict, obfuscate)
Workflow-based actions (ticketing and assignment)
Delegated remediation to data owners with auditability.

Deletion is not about removing everything. It is about defensibly removing redundant or stale information while still respecting retention rules, legal holds and record-keeping requirements. APP 11 also introduces a clear expectation to destroy or de-identify personal information when it is no longer needed.

Step 5: Operationalise with the right scanning model

Some organisations need a rapid baseline to inform decisions, while others require ongoing assurance. In practice, this may involve one-off or scheduled scans, or continuous monitoring to support evidence, compliance and trend analysis over time.

Where organisations see value first

Executive alignment is often achieved fastest when data controls are clearly linked to tangible business outcomes.

Reduce cost and complexity: Identify redundant, stale or duplicated data to support defensible deletion, limit storage growth, and avoid migrating unnecessary data into the cloud.
Strengthen compliance confidence: Automate evidence and reporting to support privacy and security obligations while reducing manual audit burden.
Improve cyber resilience and incident response: Increase visibility into sensitive data locations and access pathways, significantly improving the speed and accuracy of incident scoping.
Enable responsible AI: Ensure information used in AI and GenAI initiatives is known, governed and appropriately controlled.

The question leaders should be asking now

If your organisation is accelerating AI, migrating to cloud, or responding to heightened privacy and cyber scrutiny, the key question is no longer whether a governance policy exists. It is whether the organisation can continuously demonstrate where sensitive information resides, who can access it, and what actions are being taken to reduce exposure.

When organisations can answer that with confidence, they unlock compounding benefits like lower cost, faster delivery, stronger compliance posture, and safer AI adoption.

How BDO and BigID can help accelerate sustainable outcomes

BDO’s approach is advisory led. We start with risk, regulation, operating model and business outcomes, then help implement the right controls and tooling in a way that can be sustained.

Within the BDO–BigID partnership, BigID provides a technology foundation for large‑scale data discovery, classification and remediation, while BDO supports clients to embed these capabilities into their environments, align stakeholders and translate insights into practical governance uplift.

Want a rapid baseline of your data risk posture?

BDO’s digital experts can help your organisation understand where sensitive data sits, what the priority risks are, and what actions will deliver the greatest impact, whether through a one-time discovery or a sustained operating model. Contact our team to find out more.

What’s next

In upcoming articles in this series, we’ll look more closely at:

Privacy‑focused discovery and remediation, including APP 11 destroy and de‑identify obligations
Practical data governance models that work across cloud, SaaS and unstructured data
How strong data foundations enable safer, more defensible AI and GenAI adoption
Moving from discovery to action, and how organisations operationalise remediation at scale.

Each article will build on the core principle of knowing your data is the foundation for reducing risk, controlling cost and accelerating digital programs with confidence.

Why knowing your data is critical to reducing risk, cutting costs and accelerating AI

Why knowing your data is critical to reducing risk, cutting costs and accelerating AI

When growth outpaces governance

What ‘good’ looks like now

A practical roadmap: Five steps to move from unknown data to business confidence

Step 1: Start with discovery across the data estate

Step 2: Classify sensitive and regulated data, including local identifiers

Step 3: Assess exposure and prioritise risk

Step 4: Take action

Step 5: Operationalise with the right scanning model

Where organisations see value first

The question leaders should be asking now

How BDO and BigID can help accelerate sustainable outcomes

Want a rapid baseline of your data risk posture?

What’s next

Key takeaways

Lack of data visibility increases risk and constrains AI adoption

Modern data governance requires continuous discovery and action

Data insight is foundational to cost reduction, resilience and responsible AI

Authors