Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.conformly.ai/llms.txt

Use this file to discover all available pages before exploring further.

The basic flow

After uploading documents on the New Analysis page, each file shows a ready status with the auto-detected category and suggested standards. To run an analysis:
1

Review the standards selection

Each ready file has checkboxes for the standards Conformly auto-selected. Adjust if needed.
2

Click 'Analyze' on a row, or 'Analyze all'

Per-file Analyze runs that single document. Analyze all kicks off every ready row in parallel (limited to 5 concurrent jobs to stay inside OpenAI rate limits).
3

Watch the live progress

The row switches to analyzing and the progress bar starts moving. You’ll see status messages like “Parsing document…”, “Building embeddings…”, “Evaluating SWE.4…”, “Generating report…”. These come from the real backend pipeline — they’re not a fake timer.
4

Click 'Open Full Report' when complete

Or wait — the row auto-shows the score, severity counts, and a top-3 priorities preview the moment the analysis finishes. The full AnalysisViewer is one click away.

How long does an analysis take?

Honest answer — it depends on document size and number of standards:
ScenarioWall time
5-page SyRS, 1 standard, 5 processes25–45s
20-page SwRS, 1 standard, 8 processes45–90s
50-page architecture doc, 2 standards, 11 processes90–180s
Test plan + requirements + arch in one analysis2–4 minutes
The dominant cost is the LLM evaluation calls — typically 4-8 seconds per process, run in parallel batches of 5. There’s no way to make those fundamentally faster without trading off quality. What Conformly does to make the wait feel shorter:
  • Server-side classify polling — no fixed sleep between upload and classify
  • Real progress messages — you see what step is actually running
  • Per-document parallelism — multiple files analyze concurrently
  • Cached parsing — re-analyzing the same document skips parsing entirely
  • Live result preview — the score and top findings appear as soon as the pipeline finishes its last node, before you even click “Open Report”

What happens under the hood

The backend pipeline is an 11-node LangGraph state machine. For each analysis:
  1. Parse the document into page-level chunks
  2. Index the chunks into a vector store with OpenAI embeddings
  3. Web research (optional, default off) — fetch the latest interpretation notes for the relevant standards
  4. Classify which standard processes apply to this document
  5. Evaluate each applicable process against the document evidence using a structured LLM prompt with cross-standard context from the Knowledge Graph
  6. Re-evaluate any low-confidence findings with expanded evidence (the iterative re-evaluation node)
  7. Cross-validate for inter-process inconsistencies (e.g. SYS.3 marked compliant when SYS.2 is non-compliant — that’s mathematically impossible in a real V-Model)
  8. Score using the unified scoring module — penalty-based and capability-level-aware
  9. Persist findings to the database with full traceability links
  10. Compute quality metadata — which providers, parsers, and fallback paths were actually used
  11. Generate the final report the frontend renders
If any node fails, the pipeline returns whatever it has so far with an error attached — it never silently substitutes a default. (That’s a deliberate fix to a previous bug where a parser failure would silently return PARTIAL/0.4 confidence on every finding. See the Eval Benchmark discipline below.)

Reading the result

When the analysis completes, the result page shows three things prominently:

1. Overall compliance score

A single number from 0 to 100. Penalty-based: starts at 100 and subtracts weighted penalties for each open finding (15 for critical, 8 for high, 4 for medium, 2 for low). Capped at 0. Resolved findings don’t count. The score is per document in this view. The aggregate per-product and per-workspace scores live on the Audit Readiness page.

2. Severity breakdown

Critical / High / Medium / Low counts, color-coded. Critical and High are the “blocking” severities — until those are resolved or accepted, the product cannot be considered audit-ready.

3. Top priorities

The 3 highest-severity unresolved findings, surfaced inline so you don’t have to click into the full findings list to see what to fix first. Click any one to jump to its full context in the findings panel.

The “this analysis ran in degraded mode” banner

If the AI pipeline ran without one or more of its preferred providers (OpenAI embeddings, OpenAI LLM, Landing AI parser), the result page shows a yellow banner explaining what was missing. Take it seriously — it means the underlying analysis was less thorough than usual, and the score should be treated as an estimate rather than a verdict. The banner also tells you which fallback was used:
  • “Fell back to PyMuPDF for parsing” — Landing AI was unavailable
  • “Embeddings unavailable — used keyword matching” — OpenAI embeddings failed
  • “1/3 documents parsed successfully” — some uploads failed silently
Re-run the analysis once your provider issues are resolved.

Quality metadata (for procurement reviews)

Every analysis result has a structured quality field showing:
  • Embedding provider used (openai / none)
  • LLM provider used (openai / none)
  • Vector store type (supabase_vector / faiss_fallback / none)
  • Documents requested vs documents successfully parsed
  • List of parse methods actually invoked (landing_ai, pymupdf, pypdf)
  • A boolean is_degraded and a list of human-readable reasons
This is the data your procurement team will ask for. It’s exposed in the API and can be exported alongside any analysis report.

Continuous quality monitoring

Conformly’s AI pipeline is itself validated against an internal eval benchmark — a corpus of known-good and known-broken compliance documents with hand-curated ground truth. Every prompt change, every model update, every refactor to the evaluation node has to keep the benchmark F1 score above the threshold or the change is rejected. The benchmark runs in mock mode on every PR (free, ~5 seconds) and in real-OpenAI mode every night (catches model drift and prompt regressions). This is the infrastructure that protects you against silent quality regressions — the kind that would otherwise show up months later as customer escalations.