I'm moving to a new company. The department I'm joining needs to keep data security airtight. So I did what anyone would do - I sat down and actually read the privacy policies of every major AI provider to understand what happens to the data we put in.
What I found wasn't great.
Most major AI providers train on your data by default
That's the headline. If you're using the free or paid consumer version of most AI tools - ChatGPT, Gemini, Copilot - your inputs are being used to train future models. Not maybe. Not sometimes. By default.
Here's the breakdown:
| Provider | Consumer tier | API tier | Enterprise | Opt-out |
|---|---|---|---|---|
| OpenAI (ChatGPT) | Trains by default | No training | No training | Settings toggle, but only for future chats |
| Google Gemini | Trains by default | No training (Vertex AI) | No training (Workspace) | Must disable Gemini Apps Activity - loses chat history |
| Microsoft Copilot | Trains by default | No training (Azure OpenAI) | No training (M365) | Settings, except in EEA |
| Meta AI | Trains by default | N/A (open-weight Llama) | N/A | Objection form, incomplete |
| Mistral | Trains by default (free tier) | No training (paid API) | No training | Account settings |
| Perplexity | Trains by default | No training (zero retention) | No training | Account toggle |
| Anthropic (Claude) | Does NOT train (opt-in since Aug 2025) | No training | No training | N/A - default is already private |
| xAI (Grok) | Does NOT train (opt-in) | Not documented | Not documented | N/A standalone, but Grok on X trains by default |
Two things stand out.
First, every single provider exempts their API and enterprise tiers from training. The difference isn't the AI - it's how you access it. Consumer chat equals training data. API access equals private.
Second, the opt-out mechanisms are not equal. Some are worse than others.
The opt-out trap
OpenAI lets you toggle off training in settings. Fair enough. But here's what most people miss - it only applies to future conversations. Everything you've already sent is gone. It may have already been used.
Google is worse. If you want to opt out of training on Gemini, you have to disable "Gemini Apps Activity." That also kills your entire chat history. You can't keep your history and opt out of training. There's no middle ground.
And then there's a detail that surprised me. Google's privacy policy says human reviewers can read your Gemini conversations. Those reviewed conversations are stored for up to 3 years.
Meta doesn't even pretend to give you a full opt-out. Since December 2025, they use AI chat interactions for ad personalization. You can object to model training through a form, but the ad personalization piece has no complete off switch.
Why this matters beyond privacy
This creates a centralisation problem.
When big AI companies train on billions of user interactions, their models get smarter. That's obvious. What's less obvious is the second-order effect - smaller, open-source models don't have access to that data. They're training on public datasets while the big players are training on real-world business conversations, legal documents, financial reports, internal strategies.
Over time, this could widen the gap. The rich get richer. The open-source alternatives that many companies prefer for control and transparency might fall further behind - not because they're technically worse, but because they're data-starved.
What you can actually do
Check your settings today. In ChatGPT, go to Settings, Data Controls, and turn off "Improve the model for everyone." In Gemini, decide if losing history is worth the opt-out. In Copilot, check your data sharing preferences.
Ask your IT team one question: Are we on a consumer tier or an enterprise/API tier? If nobody knows the answer, that's your answer.
Use private or incognito modes for anything sensitive. ChatGPT has Temporary Chat. Claude has Incognito mode. Neither of these get used for training regardless of your settings.
If you're building a product that uses AI - and this is what I did with LexBox - use API access exclusively. Legal documents have no business ending up in anyone's training pipeline.
The numbers that should worry compliance teams
The EU AI Act's transparency obligations for general-purpose AI models became legally binding in August 2025. Providers must now publish detailed summaries of their training data. Penalties go up to 15 million EUR or 3% of global turnover.
The lawsuits are piling up. The New York Times forced OpenAI to preserve 20 million chat logs as evidence in their ongoing case. Anthropic settled for $1.5 billion over training data sources. Google paid $1.4 billion in Texas over biometric and location data. Over 70 AI-related infringement cases were filed by the end of 2025.
The bottom line
Your data is your responsibility. AI tools are useful - I use them every day. But the default settings are not on your side. The gap between "I use ChatGPT" and "my company uses AI responsibly" is exactly one settings page and one conversation with your IT department.
Have that conversation.
Research based on privacy policies of OpenAI, Anthropic, Google, Microsoft, Meta, Mistral, Perplexity, and xAI as of April 2026. Policies change frequently - verify before making compliance decisions.