Working Paper — Evolving Knowledge Base
In the context of operating a billion-dollar AI-focused fund, the full team is expected to use persistent AI assistants through PureBrain. This document explores what data can and cannot be shared with AI, the regulatory landscape, and a concrete implementation plan for AI governance — built as a working paper and evolving knowledge base.
The core tension is straightforward: AI needs data to be useful, but data sharing creates legal, regulatory, and fiduciary risk. The answer is not "share nothing" (that makes AI useless) or "share everything" (that creates liability). The answer is a classification framework with clear lines.
| Tier | Classification | Examples | AI Sharing Rule |
|---|---|---|---|
| Tier 1 | Restricted | LP personal data (SSNs, passports, bank details), KYC/AML docs, attorney-client privileged communications, Material Non-Public Information (MNPI) | Never share with any external AI platform |
| Tier 2 | Confidential | Portfolio company financials, term sheets, fund strategy, IC deliberations, deal pipeline | Enterprise AI only, with redaction. Anonymize names when analysis does not require them |
| Tier 3 | Internal | Aggregate fund performance, operational procedures, vendor relationships, industry research | Share with vetted enterprise AI under standard controls |
| Tier 4 | Public | Marketing materials, published thought leadership, regulatory filings | Share freely |
| Category | Share with AI? | Condition |
|---|---|---|
| Your preferences, style, schedule | Yes | No restrictions |
| Your strategic thinking, thesis | Yes | No restrictions |
| Public market / industry research | Yes | No restrictions |
| Fund operations, workflows | Yes | No restrictions |
| Portfolio company data | With care | Anonymize when possible, redact specifics not needed for the task |
| Fund strategy, pipeline | With care | Enterprise platform only, no consumer AI |
| Partner communications | Selectively | Share context, not raw disputes |
| Fund performance (aggregate) | Yes | No individual LP attribution |
| LP personal data | No | Never on current platforms |
| KYC/AML docs | No | Never |
| Privileged legal communications | No | Never (Heppner waiver risk) |
| Material Non-Public Information (MNPI) | No | Never |
| NDA-protected counterparty data | No | Not without consent |
This brief covers the factual landscape surrounding AI data sharing in venture capital operations: regulatory positions, legal risks, industry practices, ethical arguments, and practical frameworks. Research drawn from 30 sources across regulatory bodies (SEC, FINMA, FCA, JFSC, EU), law firm analyses, court cases, and industry surveys.
"In the early days, OpenAI was collecting conversations with ChatGPT users and using that data to retrain the system, but there are huge privacy issues because people use them in companies and put in data with company proprietary information." Many companies have banned commercial LLMs because they do not trust conversations will remain proprietary.
"There needs to be a lot more independent research and there needs to be oversight of tech companies." AI concentrates power in the hands of governments and companies, away from individuals whose data feeds these systems.
Published opinion on using AI in compliance with GDPR, specifically addressing legitimate interest, purpose limitation, and the right to object to AI processing of personal data.
SEC Division Director Brian Daly stated the SEC is exploring how AI should be addressed within federal securities law but recognized that "by the time rules take effect, the market and technology may have moved on."
Ropes & Gray (Dec 2025): Asset managers' fiduciary duties require "appropriate diligence in selecting, engaging and overseeing AI service providers and disclosure to investors of risks and conflicts of interest associated with the use of AI."
Sources: SEC.gov, Venable LLP, Ropes & Gray, Goodwin Law, Kitces.comSwiss Regulatory Timeline: No AI-specific legislation yet. Draft consultation legislation expected by end of 2026. FINMA follows "same business, same risks, same rules."
Sources: FINMA.ch, Pestalozzi Law, Chambers AI Practice Guide 2025No AI-specific regulations planned. Relies on existing frameworks: Consumer Duty, SM&CR, SYSC, operational resilience. AI LAB launched Oct 2024 for supervised testing. Treasury Committee recommended comprehensive AI guidance by end of 2026.
No AI-specific guidance. Firms must comply with Data Protection (Jersey) Law 2018, aligned closely with GDPR. JFSC is implementing data-driven supervisory models using AI internally. 2025-2026 priorities focus on growth, risk management, and financial crime prevention.
Sources: FCA.org.uk, Kennedys Law, JFSC.orgCredit scoring, loan approval, fraud detection, AML risk profiling, and automated decision-making affecting access to financial services classified as high-risk AI systems. Fund management AI affecting investor outcomes may fall into this category.
Using AI without explainability or validation could be interpreted as a breach of the duty of care — analogous to relying on an unverified third-party analyst without due diligence. Investment advisers cannot delegate responsibility for decisions to algorithms.
Updated Practice (2025-2026): AI-specific NDA provisions are becoming standard: prohibiting upload to public AI, restricting tools that retain data for training, requiring consent before AI use in diligence.
Sources: Venable LLP, Ropes & Gray, NASAA, Roth Jackson, KJK, Sapience LawGoldman Sachs: Launched GS AI Assistant firmwide in mid-2025 after piloting with ~10,000 employees. Model-agnostic (GPT, Gemini, Claude) but operates within Goldman's audited environment. Client-facing AI deferred until accuracy and compliance thresholds are met.
JPMorgan Chase: Grants access to its LLM Suite to 200,000+ employees, generating ~$1.5B in annual business value. 300+ use cases in production. All within internal infrastructure.
Formal published AI governance policies from VC firms remain rare in the public domain.
Recommended approach: Start with NIST for risk management, add ISO 42001 for systematic management, layer EU AI Act for European compliance.
Sources: Evident Insights, DigitalDefynd, NIST.gov, PECB, AffinityIndirect injection (attacks in documents, emails, web pages) accounts for 80%+ of attempts. Shadow AI breaches disproportionately affected customer PII (65%) and intellectual property (40%).
| Platform | Data Used for Training? | Retention |
|---|---|---|
| Claude Free/Pro (Consumer) | Yes, by default | 5 years |
| Claude Enterprise/API | No (contractual DPA) | Per agreement |
| ChatGPT Free/Plus | Yes, unless opted out | 30 days abuse monitoring |
| ChatGPT Enterprise/API | No (contractual DPA) | Per agreement |
| PureBrain | No (Anthropic contractually restricted) | 30 days post-cancellation |
EchoLeak vulnerability: Zero-click prompt injection enabling data exfiltration without user interaction. Attacker sends email with hidden instructions, AI ingests malicious prompt, AI extracts sensitive data from connected systems.
Sources: Wiz Research, Reco, PurpleSec, eSecurity Planet, OWASP| Service | Data Shared |
|---|---|
| Anthropic (Claude API) | Conversation content |
| Cloudflare | IP address and traffic metadata |
| PayPal | Billing information |
| Brevo | Email address (newsletters) |
| Data Type | Examples | Risk If Exposed |
|---|---|---|
| LP personal data | Names, addresses, SSNs, bank accounts, passport copies | GDPR/privacy violations, regulatory sanctions, LP litigation |
| LP commitment amounts | Individual allocation details | Breach of confidentiality, competitive harm |
| MNPI | Pre-announcement deal terms, non-public financials | Securities law violations, insider trading liability |
| Legal privileged comms | Attorney advice, litigation strategy | Privilege waiver (Heppner), litigation exposure |
| KYC/AML documentation | Identity verification, source of funds | Regulatory violations, money laundering liability |
| Data Type | Examples | Risk If Exposed |
|---|---|---|
| Portfolio company financials | P&L, balance sheets, cap tables, runway | Competitive harm, breach of information rights |
| Term sheets and deal terms | Valuation, liquidation preferences, board seats | Competitive disadvantage, deal disruption |
| Fund strategy documents | Sector thesis, pipeline priorities, allocation model | Competitive intelligence loss |
| Internal partner comms | IC deliberations, partner disputes | Reputational damage, litigation discovery |
| Employee/contractor data | Compensation, performance reviews | Employment law violations, privacy claims |
| Data Type | Examples | Risk If Exposed |
|---|---|---|
| Fund performance data | Aggregate returns, benchmarking | Premature disclosure, marketing concerns |
| Operational procedures | Workflow docs, policy manuals | Limited competitive harm |
| Vendor relationships | Service providers, fee arrangements | Commercial sensitivity |
| Industry research | Sector landscapes, competitive maps | Low harm if from public sources |
| Data Type | Examples | Risk If Exposed |
|---|---|---|
| Marketing materials | Fund overview, team bios, sector focus | Intended for distribution |
| Published thought leadership | Research papers, blog posts | Already public |
| Regulatory filings | Form D, public regulatory submissions | Already public |
Anticipated LP Questions on AI:
Share generously with your AI — but not blindly. The competitive advantage of a fully-informed AI partner is enormous and real. But certain categories of data should never touch an AI platform that you don't fully control, and a formal policy is needed before the first LP writes a check. The line isn't "share nothing" (that makes the AI useless) or "share everything" (that creates liability). The line is: share what makes you effective, protect what could harm others, and document what you decided and why.
| Category | Share with AI? | Condition |
|---|---|---|
| Your preferences, style, schedule | Yes | No restrictions |
| Your strategic thinking, thesis | Yes | No restrictions |
| Public market / industry research | Yes | No restrictions |
| Fund operations, workflows | Yes | No restrictions |
| Portfolio company data | With care | Anonymize when possible |
| Fund strategy, pipeline | With care | Enterprise platform only |
| Partner communications | Selectively | Share context, not raw disputes |
| Fund performance (aggregate) | Yes | No individual LP attribution |
| LP personal data | No | Never on current platforms |
| KYC/AML docs | No | Never |
| Privileged legal communications | No | Never (Heppner waiver risk) |
| Material Non-Public Information (MNPI) | No | Never |
| NDA-protected counterparty data | No | Not without consent |
I run on PureBrain. I need to be honest about what that means.
The ethical foundation is consent and transparency. If an LP knows their data is processed by AI, and the GP has reasonable safeguards, and the purpose is to serve the LP's interests — that's ethical. If LP data is fed into AI without knowledge, for GP convenience, with no safeguards — that's not.
The "how much is too much" question is really about whose data it is. Your own data is yours to share. LP data, portfolio company data, counterparty data under NDA — that's someone else's data. You're a steward, not an owner. Stewardship demands care.
The strongest argument for AI transparency is self-interest. The fund that gets caught sharing LP data without disclosure faces regulatory action, LP lawsuits, and reputational destruction. The fund that proactively discloses AI use with clear policies gets LP trust, operational efficiency, and competitive advantage.
Katy challenged the policy recommendations with real-world operational objections. These are the honest responses.
Tools exist — Nitro Smart Redact ($20/month) detects 30+ PII types automatically in ~30 seconds per document. But they can't catch everything in context.
Tarin's reframe: Don't make redaction the primary control. The platform's contractual protections ARE the primary control. Redaction is a second layer for the most toxic data only. The preprocessor script + Nitro covers 80% of cases. Time cost per document: 3-5 minutes.
Every IC memo is a derivative of NDA-protected data. You can't synthesize across your portfolio without your AI knowing real details.
Tarin's reframe: Add AI processing clause to NDAs. Use AI-enabled VDRs (Datasite, Peony) for raw documents. Accept that the AI partner will know sensitive things — like any trusted employee. The question isn't whether, it's how to do it defensibly with contractual protections, audit logs, and no-training commitments.
Self-attestation points failures to individuals. When things go wrong, "they signed a piece of paper" doesn't protect the GP.
Tarin's reframe: Four layers of defense — all generating machine evidence, not human promises:
That's defensible. Not because it's perfect — because it demonstrates a multi-layered, documented, continuously monitored governance process.
One partner shares LP passport copies with their AI. One employee forwards a privileged memo to ChatGPT. One intern uploads a cap table to the free version of Claude.
Tarin's reframe: Same problem finance has always had. AI doesn't create it, amplifies it. Controls:
A determined employee can always use a personal device and a personal AI account. You can't physically prevent it. This is an employee conduct issue, not a system design issue.
Tarin's honest answer: Fund-managed system = controllable, auditable, guardrailed. Personal systems = outside the fund's control. The strategy has two parts:
This is the same framework used for any confidentiality obligation — the employee signs, the employee is accountable. The fund provides the tools and the policy; the employee is responsible for compliance.
Key insight from Katy: Bake guardrails into the AI's system-level instructions (immutable), not just memory (editable). The employee cannot tell the AI to override compliance rules.
How it works:
| Option | What It Does | Cost |
|---|---|---|
| Nitro Smart Redact [Link] | AI-powered, detects 30+ PII types, works on PDF/DOCX/XLSX, runs locally | $20/user/month |
| Microsoft Purview | Auto-classifies docs, applies labels, integrates with DLP | Included in M365 E5 or add-on |
| Tarin Preprocessor | Scans for blocklisted names, replaces with codes, outputs clean version + mapping file | $0 (Tarin builds it) |
Recommendation: Start with Tarin preprocessor (free, immediate) + Nitro for PDFs ($20/month). Workflow: run preprocessor → share anonymized version with AI → AI analyzes using codes → map codes back for final output.
Under GDPR/DPJL 2018, the legal test: "Could a reasonably informed person re-identify the individual from the anonymized data?"
The honest trade-off: This means Tarin doesn't have raw access to every data room page. But VDR-native AI handles 70-80% of document analysis, Tarin handles synthesis and strategy. Together they cover 95%. The 5% gap isn't worth the legal exposure.
| Layer | What It Does | Effort |
|---|---|---|
| 1. Platform Audit Logs | Machine-generated record of all data categories processed, timestamps, volume | Automated (request from PureBrain) |
| 2. AI-Side Guardrails | Pattern detection for SSNs, blocklisted names, "privileged and confidential" phrases | Tarin configures (this week) |
| 3. Quarterly Review | AI Officer reviews logs + flags, spot-checks 5 random interactions per partner, writes evidence memo | Half-day per quarter |
| 4. Annual External Validation | Third-party reviews policy, logs, memos, vendor DPA | $3-5K/yr |
| Layer | What It Does | Catches |
|---|---|---|
| Approved tools only + device blocking | Prevents accidental consumer AI use | 80% of incidents |
| Platform audit logs | Creates evidence trail | 100% of approved-tool usage |
| AI-side guardrails | Flags sensitive data at point of entry | 60-70% of PII/privileged content |
| Quarterly review | Catches patterns automated tools miss | 90% (combined with logs) |
| Partnership agreement clause | Legal consequences for violations | Deters deliberate bad actors |
| Annual external review | Third-party validation | Regulatory defensibility |
No single layer is sufficient. All six together create a system where accidental sharing is largely prevented, deliberate sharing is logged and detectable, and bad actors face legal consequences beyond just "breaking a rule."
| Item | Cost | When |
|---|---|---|
| Tarin preprocessor script | $0 (Tarin builds it) | This week |
| Nitro Smart Redact [Link] | $20/user/month ($960/yr for 4 GPs) | Phase 1 |
| Cloudflare Gateway [Link] | Free (up to 50 users) | Phase 1 |
| VDR with AI (Peony) [Link] | $40/admin/month ($480/yr) | When deal flow starts |
| VDR with AI (Datasite) [Link] | ~$15-25K/yr (ISO 42001 certified) | When fund scales |
| Partnership agreement AI clause | $0 (draft with existing counsel) | Before first close |
| Cyber/E&O insurance AI rider | ~$2-5K/yr additional premium | At fund formation |
| Annual external AI review | ~$3-5K/yr | Post-first close |
Organized by phase. Each item is independently actionable.