Inspecting the Learning Loop
Punk's inductive compiler is a glass box, not a black box. Every time the learning loop tries to turn a pattern into a deterministic artifact, and whether it succeeds, abstains, or fails replay, Punk persists the attempt, including the compiler's verbatim reasoning. Operators can see exactly why an artifact did or didn't synthesize and promote. Evidence, not magic.
The synthesis attempt log
Synthesis used to be ephemeral: the compiler's report lines were returned from POST /patterns/:id/synthesize and then forgotten. Now every run of the compiler records a row in synthesis_attempts:
| field | meaning |
|---|---|
outcome | synthesized · aborted · replay_failed |
reason | the one-line verdict (the abort line, or the synthesized state reason) |
report | the compiler's full report-line array, verbatim |
sampleCount | aligned live samples the compiler loaded |
holdoutAccuracy | self-verification accuracy on the chronological holdout (when reached) |
artifactId | the artifact produced, when one was |
Both writers record the same shape:
- The learning loop (
LearningLoop.tick) records an attempt for each - The manual endpoint (
POST /api/v1/patterns/:id/synthesize) records an
candidate it compiles: aborted when the compiler abstains, synthesized when replay proves the program, replay_failed when it doesn't generalize.
attempt every time an operator hits Synthesize now.
The report lines are the gold. They narrate each decision the compiler made:
compiling pattern "support-triage" (pat_…, taskClass schema_deterministic)
loaded 14 live samples → 10 train / 4 holdout (chronological split)
field 'priority': induced 3-rule decision list
field 'note': no deterministic program found, delegating to small-model fallback (hybrid)
self-verification: train accuracy 100.0% (10 samples), holdout accuracy 100.0% (4 samples)
aborting: holdout accuracy 62.0% < 80% floor; the induced program does not generalize
Endpoints
GET /api/v1/learning/attempts?patternId=&limit=→{ attempts }: the log,GET /api/v1/patterns/:id/evidence→ `{ pattern, artifacts, attempts,
newest-first, tenant-scoped (read-only, not admin). Omit patternId for the whole tenant.
routeArms }`: everything known about one pattern's optimization, aggregated. Each artifact carries an evaluation summary (replay/shadow pass-fail counts) and a confidence trajectory (the running confidence as evidence accrued, for an inline sparkline).
The Learning view
The dashboard's Learning view (#/learning) turns the log into a panel:
- A pattern-centric table: name, state, runs, stability, task class, and
- Click a pattern → the evidence panel: the lifecycle pipeline, the
- A Synthesize now button runs the compiler against the pattern and
the latest synthesis outcome with a reason snippet.
synthesis attempt log (each attempt's outcome badge, reason, and report lines in a mono console block), per-artifact replay/shadow pass-fail bars, a confidence-trajectory sparkline, and the pattern's route arms.
refreshes the attempt log, so an operator can watch the compiler reason in real time.
This is the same evidence the promotion gate uses, surfaced so a human can audit the runtime's decisions.