LLM fallback extractor

// LLM fallback disclosure
LLM FallbackLast updated: 2026-04-22
What it isThe CTO.ai grapher reads your repositories through graphify, which uses tree-sitter for 25 programming languages and builds the structural knowledge graph fully locally, without any LLM. Tree-sitter covers the main languages (Python, JS, TS, Go, Rust, Java, C/C++, Ruby, C#, Kotlin, Scala, PHP, Swift, and others).
For files outside the tree-sitter set, we optionally use a LLM fallback extractor: the file contents are sent to Anthropic's Claude API with a prompt that asks only for structural extraction (what symbols are defined, what they reference). The response is parsed back into the same graph format.
When it triggersThe fallback runs only on:
Unsupported languages: SQL, Terraform, Haskell, OCaml, F#, Clojure, Dockerfile, YAML (CI/config), Helm templates, Makefile, Nix
Documentation (Markdown, RST, ADR)
PDFs, slides, and image-based diagrams
It never runs on: covered-language source files, binary assets, or files larger than 200 KB.
Privacy postureWhen the fallback runs, the file contents are transmitted to Anthropic over TLS. Anthropic's data-retention practices apply; at the time of writing, Claude API inputs and outputs may be retained for 30 days for abuse detection, then deleted.
We do not train models on your code, and Anthropic has committed that Claude API calls are not used to train their models.
The extractor always runs behind our Content Safety Layer: inputs are scrubbed for obvious secrets (API keys, private keys, passwords) before transmission, and outputs are scrubbed on receipt.
Opt-outThere is no self-serve Settings toggle for this yet. Today the fallback extractor is controlled by a platform-level default (environment / deployment configuration), not a per-workspace switch — a native per-workspace opt-out is planned but not built.
If you need it disabled for your workspace before that ships, email privacy@ctoai.live and we will configure it for you. With it disabled:
Unsupported-language files are listed but not parsed (they appear as opaque nodes in the graph with no structural edges)
Docs and images are not analyzed
The core tree-sitter scan for covered languages is unchanged
ReferencesAnthropic DPA
Claude data-usage FAQ
Our privacy policy: /privacy