Data exposure related to AI Training

Overview

AI models need data to learn patterns, but improper training on proprietary or sensitive information can cause that data to surface in unrelated outputs. In the regulatory context, this could expose confidential clinical or product details and violate compliance requirements.

Hazardous situation: Proprietary data appears in AI responses because it was used in model training, leading to unintentional disclosure.

What data trains Flinn’s AI features

Flinn’s AI capabilities are designed to use generic, non‑confidential data rather than your proprietary content:

No full‑text training. The AI writing agent does not have access to the full text of your publications . It operates on structured data extracted from papers (tables and fields you configure) and therefore cannot leak verbatim passages from the publications.
Generalised extraction models. The extraction system works in a general way and will not target a specific device or proprietary topic unless you explicitly ask it to . This indicates that the underlying models are trained on broad, representative data rather than on your specific documents.
Validated and benchmarked. AI extraction has been validated with representative data and predefined benchmarks to ensure accuracy and reliability . It does not rely on ongoing training using your inputs, so information you enter is not incorporated into the core model.
Transparent outputs. When the writing agent generates text, it includes reference numbers that allow you to verify the source of each statement . This traceability confirms that outputs are derived from your extraction table rather than hidden training data.

If you want to learn more on how to avoid misinterpreting AI outputs in Flinn, read here.

Safe data to provide

Because Flinn’s AI models are not trained on user‑specific content and do not ingest the full text of your documents, you can safely provide:

Search queries and prompts. Titles, descriptions and examples used in extraction prompts help the AI understand what to retrieve , but they are used only to produce the requested output and are not added to the training corpus.
Structured extraction tables. Populated extraction fields and tables are used by the writing agent to draft sections but are not stored to train the core model .
Non‑confidential descriptions. High‑level descriptions of study parameters, devices or populations that appear in prompts or tables can guide the AI without exposing proprietary information.

Recommendations to mitigate leakage risk

Avoid proprietary details in prompts. While the AI does not train on your prompts, avoid including confidential identifiers or trade secrets when formulating descriptions or examples.
Use anonymised or aggregated data. When creating extraction fields or input tables, use generic labels or anonymised values whenever possible.
Verify outputs. Always review AI‑generated content and cross‑check the references to ensure no sensitive information has been inadvertently included .
Contact support for clarification. If you have concerns about data privacy or need confirmation about acceptable data types, reach out to Flinn support for further guidance.

By understanding what information trains Flinn’s AI features and following these recommendations, you can confidently use AI to streamline your regulatory processes without risking unintended data disclosure.