Skip to content

Agentic Code Analysis: `CodeAnalysisProvider` as a first-class capability

Agentic Code Analysis: CodeAnalysisProvider as a first-class capability

Status: design · tracked in gm-l1i Author: mike (captured by polecat jasper) Date: 2026-04-22 (ratification pending) Parents: gm-root (Gemba EPIC), gm-eiw (Context providers) Related: gm-371 (source-code import path, previously the bootstrap wizard), gm-57b (Personas) Resolves DD: DD-33

Summary

Promote agentic code analysis — GitNexus-class knowledge-graph tools that answer “what calls this?”, “what’s the impact of changing this?”, “what’s untested?” — to a first-class, typed Gemba capability.

Today Gemba reaches for GitNexus ad-hoc when Coaches and Managers need structural code knowledge the bead-graph and git log alone don’t provide. Ad-hoc use means inconsistent queries, no caching, no pluggability, and no persona-level opt-in.

This design introduces CodeAnalysisProvider — a typed, READ-ONLY, pluggable context provider that exposes a knowledge-graph API over one or more workspace repos. GitNexus is the reference implementation; Sourcegraph, CodeQL, tree-sitter / LSP indexers are anticipated adapters.

Motivation

Every Manager and every Coach benefits from structural code knowledge:

  • PM — estimating epic decomposition needs module boundaries; parallel-safety checks need file-space signatures.
  • Architect — design review wants the real call graph and module cohesion metrics, not a summary someone typed once.
  • Code Reviewer — PR weight should track how load-bearing the touched code is (call-graph fan-in).
  • QA — regression-surface scans need “what tests cover which code” traceability.
  • Documentarian — documentation-coverage checks need to know which public APIs lack docs.
  • Security — auth-surface audits need reachability analysis from entry points to sensitive primitives.
  • Import from advanced source (gm-371, source-code-import path) — analyzing a source-code repo to derive candidate epic decomposition requires more than README + dir-listing; knowledge-graph richness produces better first-draft plans.

Without a typed surface, each persona would reach for GitNexus ad-hoc and produce inconsistent, un-cacheable queries.

The primitive

CodeAnalysisProvider is a typed context provider — it composes with the ContextProvider registry (gm-eiw) so personas opt in via their .toml config. It is backend-pluggable: GitNexus is the reference, but any knowledge-graph backend that can answer the interface is acceptable.

Interface (Go)

type CodeAnalysisProvider interface {
Describe() CodeAnalysisManifest
// Indexing
IsIndexed(ctx CallCtx, repo RepoRef) (bool, time.Time, error)
Reindex(ctx CallCtx, repo RepoRef, flags ReindexFlags) error
// Structure queries (sync or streamed)
Context(ctx CallCtx, symbol string) (SymbolContext, error)
Query(ctx CallCtx, q GraphQuery) ([]QueryResult, error)
Impact(ctx CallCtx, symbol string, direction ImpactDirection) (ImpactReport, error)
RouteMap(ctx CallCtx, from, to string) ([]Route, error)
ToolMap(ctx CallCtx, toolName string) (ToolMap, error)
// Higher-level summaries useful to personas
ModuleInventory(ctx CallCtx, repo RepoRef) ([]Module, error)
EntryPoints(ctx CallCtx, repo RepoRef) ([]EntryPoint, error)
HealthIndicators(ctx CallCtx, repo RepoRef) (HealthReport, error)
}
type CodeAnalysisManifest struct {
Backend string // "gitnexus" | "sourcegraph" | "codeql" | custom
IndexedRepos []RepoRef
SupportsSymbolContext bool
SupportsImpact bool
SupportsRouteMap bool
SupportsEmbeddings bool
MaxRepoSize int64
}
type RepoRef struct {
Name string
Path string // local path
Remote string
}
type ModuleInventory struct {
Name string
Path string
LOC int
FileCount int
TestCoverage float64 // 0..1 if available
Dependencies []string
}
type HealthReport struct {
CycleCount int
UntestedModules []string
GodClasses []SymbolRef // high-fan-in types
DeepFlows []FlowRef // long cross-module flows
UndocumentedAPIs []SymbolRef
TODOComments []LocationRef
StaleCode []SymbolRef // last-touched > 6mo, 0 test coverage
}

Configuration

Per-workspace in .gemba/code_analysis.toml:

backend = "gitnexus"
[repos.primary]
path = "./"
reindex_policy = "post_merge"
[repos.sibling]
path = "../sibling-project"
reindex_policy = "post_merge"
[backend.gitnexus]
embeddings = false
max_repo_size_mb = 500

Multiple repos per workspace is first-class — a workspace can register any number of repos (the primary project plus optional siblings, dependencies, or adjacent codebases) and have them all indexed under one analysis backend.

Integration with other primitives

Context providers (gm-eiw)

CodeAnalysisProvider exposes four new context provider IDs, separately toggleable per persona:

IDPurposeDefault consumers
code_analysis_summaryModule inventory + health top-linePM
symbol_contextCallers / callees / enclosing module / related beadsArchitect, Code Reviewer (on-demand)
impact_analysisCascade of changing a symbolArchitect, Code Reviewer
health_reportCycles, untested modules, god-classes, undocumented APIsQA, Documentarian

gm-eiw is amended to list these four in its seed catalog (see internal/core/promptctx/provider.go).

PM Skills

  • epic_decompose (agentic) — uses module_inventory to propose epic splits aligned with module boundaries.
  • parallel_safety — uses impact_analysis over member-WorkItems’ anticipated file touches to detect overlaps.
  • change_this — uses impact_analysis to estimate cascade breadth.

Architect Skills

  • design_review — uses the full graph for coupling / cohesion / cycle checks.
  • api_shape_review — uses tool_map / impact_analysis to check API-surface stability.

QA Skills

  • health_scan — reads health_report directly.
  • regression_impact_analysis — uses impact_analysis to identify affected tests.

Documentarian

  • document_scan consumes UndocumentedAPIs from health_report.
  • update_summary uses module_inventory to keep the project summary honest.

Import from advanced source — source-code-import path (gm-371)

  • Source-code-import calls ModuleInventory + EntryPoints + HealthIndicators to produce a rich first-draft epic decomposition.
  • Without a CodeAnalysisProvider, source-code-import falls back to README + git log heuristics.

Re-indexing policy

PolicyTrigger
post_merge (default)Re-index after every merge to main (matches existing GitNexus PostToolUse hook pattern)
scheduledNightly re-index
manualExplicit gemba code-analysis reindex command
on_demandPersona-triggered (a PM skill that wants fresh analysis fires a reindex first)

Stale indexes surface as warnings in persona responses: “analysis from 12h ago; reindex recommended before relying on health_report.”

The first shipped operator-facing freshness surface is the operational context card’s scope-status pill. For a worktree-backed session, Gemba reads .gitnexus/meta.json, compares lastCommit with git rev-parse HEAD, and renders GitNexus as current, stale, missing, or unknown. A dirty worktree forces the pill to stale even if lastCommit == HEAD, because uncommitted source is not represented in the graph. This is intentionally read-only; scheduling re-index runs remains a separate follow-up.

API surface

PathVerbPurpose
/api/v1/code-analysis/manifestGETBackend + configured repos + capabilities
/api/v1/code-analysis/indexPOSTTrigger reindex
/api/v1/code-analysis/contextPOSTSymbol context query
/api/v1/code-analysis/queryPOSTGraph query (Cypher-like for GitNexus; backend-specific)
/api/v1/code-analysis/impactPOSTImpact analysis
/api/v1/code-analysis/healthGETHealth report for a repo
/api/v1/code-analysis/modulesGETModule inventory

Exposed via an MCP-style server so agent polecats (not just personas) can query the graph — polecat work benefits from graph access too.

Onboarding and agent setup

Project setup should treat source analysis as a deterministic setup choice, not an LLM discovery task.

  • New projects seed the analysis contract into CLAUDE.md, AGENTS.md, and runtime bridge configuration even before meaningful code exists. This lets the first implementation agents know that a source-analysis MCP server may be available once code lands.
  • Existing and imported codebases prompt for a code-analysis backend and default to GitNexus.
  • When GitNexus is selected, setup installs or verifies the gitnexus CLI, writes .gemba/codeanalysis.toml, runs an initial gitnexus analyze --path <worktree> for existing/imported codebases, records freshness, and probes the source-analysis MCP command before launching the Onboarder or other LLM runtime.
  • Setup also tests the Beads/Gemba MCP connection so the LLM can inspect design decisions, epics, related beads, dependencies, and evidence.
  • Setup-file mutation must be idempotent and sentinel-scoped. Existing operator-authored CLAUDE.md, AGENTS.md, .claude/settings.local.json, .Codex/settings.local.json, and equivalent runtime files must not be clobbered.
  • /api/v1/onboarding/setup is the current deterministic entry point for this work. Its response includes a setup ledger, warnings, and status checks such as source_analysis, gemba_mcp, and source_analysis_mcp.

Architectural invariants

  • Pluggable, not forced. A workspace without GitNexus (or equivalent) still works; personas that opt in to code-analysis context get empty results with warning: no code-analysis backend configured.
  • Deterministic queries cache. Graph queries are content-addressed against the index commit; repeat queries return cached results until a reindex fires.
  • Privacy-aware. The provider runs LOCAL by default. Optional remote backends require explicit opt-in + per-repo allow-list.
  • No write authority. CodeAnalysisProvider is READ-ONLY. Personas consume it; they don’t mutate it.

Follow-up beads

BeadScope
gm-code-analysis-coreInterface + registry + configuration
gm-gitnexus-adapterGitNexus reference backend (wraps mcp__gitnexus__*)
gm-code-analysis-context-providersFour context providers integrated with gm-eiw
gm-code-analysis-reindex-policiesThe four policies + hook integration
gm-code-analysis-bootstrap-integrationSource-code-import path (ties gm-371)
Optionalgm-sourcegraph-adapter, gm-codeql-adapter

Definition of Done

  • Design ratified by mike.
  • This document committed.
  • Follow-up beads filed (see above).
  • gm-eiw amended to include the four new context provider IDs.
  • gm-371 (source-code import path) pointer to this design exists.

Not in scope

  • Implementation.
  • Running the knowledge graph in-process (always an out-of-process backend call).
  • Authoring a custom indexer (only the pluggable adapter shape is defined here).