Running an Analysis

The basic flow

After uploading documents on the New Analysis page, each file shows a ready status with the auto-detected category and suggested standards. To run an analysis:

Review the standards selection

Each ready file has checkboxes for the standards Conformly auto-selected. Adjust if needed.

Click 'Analyze' on a row, or 'Analyze all'

Per-file Analyze runs that single document. Analyze all kicks off every ready row in parallel (limited to 5 concurrent jobs to stay inside OpenAI rate limits).

Watch the live progress

The row switches to analyzing and the progress bar starts moving. You’ll see status messages like “Parsing document…”, “Building embeddings…”, “Evaluating SWE.4…”, “Generating report…”. These come from the real backend pipeline — they’re not a fake timer.

Click 'Open Full Report' when complete

Or wait — the row auto-shows the score, severity counts, and a top-3 priorities preview the moment the analysis finishes. The full AnalysisViewer is one click away.

How long does an analysis take?

Honest answer — it depends on document size and number of standards:

Scenario	Wall time
5-page SyRS, 1 standard, 5 processes	25–45s
20-page SwRS, 1 standard, 8 processes	45–90s
50-page architecture doc, 2 standards, 11 processes	90–180s
Test plan + requirements + arch in one analysis	2–4 minutes

The dominant cost is the LLM evaluation calls — typically 4-8 seconds per process, run in parallel batches of 5. There’s no way to make those fundamentally faster without trading off quality. What Conformly does to make the wait feel shorter:

Server-side classify polling — no fixed sleep between upload and classify
Real progress messages — you see what step is actually running
Per-document parallelism — multiple files analyze concurrently
Cached parsing — re-analyzing the same document skips parsing entirely
Live result preview — the score and top findings appear as soon as the pipeline finishes its last node, before you even click “Open Report”

What happens under the hood

The backend pipeline is an 11-node LangGraph state machine. For each analysis:

Parse the document into page-level chunks
Index the chunks into a vector store with OpenAI embeddings
Web research (optional, default off) — fetch the latest interpretation notes for the relevant standards
Classify which standard processes apply to this document
Evaluate each applicable process against the document evidence using a structured LLM prompt with cross-standard context from the Knowledge Graph
Re-evaluate any low-confidence findings with expanded evidence (the iterative re-evaluation node)
Cross-validate for inter-process inconsistencies (e.g. SYS.3 marked compliant when SYS.2 is non-compliant — that’s mathematically impossible in a real V-Model)
Score using the unified scoring module — penalty-based and capability-level-aware
Persist findings to the database with full traceability links
Compute quality metadata — which providers, parsers, and fallback paths were actually used
Generate the final report the frontend renders

If any node fails, the pipeline returns whatever it has so far with an error attached — it never silently substitutes a default. (That’s a deliberate fix to a previous bug where a parser failure would silently return PARTIAL/0.4 confidence on every finding. See the Eval Benchmark discipline below.)

Reading the result

When the analysis completes, the result page shows three things prominently:

1. Overall compliance score

A single number from 0 to 100. Penalty-based: starts at 100 and subtracts weighted penalties for each open finding (15 for critical, 8 for high, 4 for medium, 2 for low). Capped at 0. Resolved findings don’t count. The score is per document in this view. The aggregate per-product and per-workspace scores live on the Audit Readiness page.

2. Severity breakdown

Critical / High / Medium / Low counts, color-coded. Critical and High are the “blocking” severities — until those are resolved or accepted, the product cannot be considered audit-ready.

3. Top priorities

The 3 highest-severity unresolved findings, surfaced inline so you don’t have to click into the full findings list to see what to fix first. Click any one to jump to its full context in the findings panel. If the AI pipeline ran without one or more of its preferred providers (OpenAI embeddings, OpenAI LLM, Landing AI parser), the result page shows a yellow banner explaining what was missing. Take it seriously — it means the underlying analysis was less thorough than usual, and the score should be treated as an estimate rather than a verdict. The banner also tells you which fallback was used:

“Fell back to PyMuPDF for parsing” — Landing AI was unavailable
“Embeddings unavailable — used keyword matching” — OpenAI embeddings failed
“1/3 documents parsed successfully” — some uploads failed silently

Re-run the analysis once your provider issues are resolved.

Quality metadata (for procurement reviews)

Every analysis result has a structured quality field showing:

Embedding provider used (openai / none)
LLM provider used (openai / none)
Vector store type (supabase_vector / faiss_fallback / none)
Documents requested vs documents successfully parsed
List of parse methods actually invoked (landing_ai, pymupdf, pypdf)
A boolean is_degraded and a list of human-readable reasons

This is the data your procurement team will ask for. It’s exposed in the API and can be exported alongside any analysis report.

Continuous quality monitoring

Conformly’s AI pipeline is itself validated against an internal eval benchmark — a corpus of known-good and known-broken compliance documents with hand-curated ground truth. Every prompt change, every model update, every refactor to the evaluation node has to keep the benchmark F1 score above the threshold or the change is rejected. The benchmark runs in mock mode on every PR (free, ~5 seconds) and in real-OpenAI mode every night (catches model drift and prompt regressions). This is the infrastructure that protects you against silent quality regressions — the kind that would otherwise show up months later as customer escalations.

Getting Started

Daily Workflow

Management Views

Organisation

Standards & Account

The basic flow

How long does an analysis take?

What happens under the hood

Reading the result

1. Overall compliance score

2. Severity breakdown

3. Top priorities

The “this analysis ran in degraded mode” banner

Quality metadata (for procurement reviews)

Continuous quality monitoring

Getting Started

Daily Workflow

Management Views

Organisation

Standards & Account

Documentation Index

​The basic flow

​How long does an analysis take?

​What happens under the hood

​Reading the result

​1. Overall compliance score

​2. Severity breakdown

​3. Top priorities

​The “this analysis ran in degraded mode” banner

​Quality metadata (for procurement reviews)

​Continuous quality monitoring

The basic flow

How long does an analysis take?

What happens under the hood

Reading the result

1. Overall compliance score

2. Severity breakdown

3. Top priorities

The “this analysis ran in degraded mode” banner

Quality metadata (for procurement reviews)

Continuous quality monitoring