Live Sandbox Validation

This playbook is the operator-friendly companion to flow/specs/flow-acceptance-sandbox-contract.md.

Use the disposable sandbox repository code4focus/test-salp unless the canonical contract is updated to name a different sandbox.

Treat code4focus/test-salp as clean sandbox infra on its default branch. It should keep the reviewer workflow under .github/**, ignore rules for generated .salp/flow/state/ and .salp/flow/tmp/, and minimal repo metadata, but it should not track flow/**. The current source repository remains authoritative for the bootstrapped repo pack and the installed wap-flow binary during the run.

Quick Read

The canonical live gate is one fresh compound proof sample, not a four-row matrix.

That sample starts from an untouched clean-infra sandbox clone, proves the clone is already flow-not-configured, bootstraps the current repo pack through wap-flow repo-pack bootstrap, carries one tiny seeded app through a bounded local-private development phase, then carries the same seeded sample family through a bounded GitHub-visible issue/branch/PR/review/merge phase.

The canonical sample proves:

installed repo-pack bootstrap on a disposable consumer repo
local-private task lifecycle, worktree flow, and blocking commit/push gates
GitHub-visible issue, branch, PR, review, repair, finalize, merge, and reconcile flow

It does not prove local packaging health, general product behavior, or arbitrary large-repo cost. Those stay in tools/flow-release-check.

Current Status

As of April 26, 2026, canonical live-sandbox status is evidence-driven from .salp/flow/tmp/live-sandbox/index.json. Treat the latest flow/live-sandbox.yml entry in that generated index as the current local proof verdict; a latest verdict of pass means the earlier April 23 missing-proof blocker is closed for that source checkout.

The durable sandbox-repo facts are currently aligned with this contract:

code4focus/test-salp default branch is infra-only rather than a source-repo copy of flow/**.
the default branch tracks no flow/**
the default branch ignores generated .salp/flow/state/ and .salp/flow/tmp/
the default branch currently exposes the GitHub reviewer workflow on main with the required issue_comment trigger

Treat operator-local prerequisites as verify-before-run checks, not as repo-tracked done state. GitHub auth, installed CLIs, and API keys must still be confirmed by the operator before each live run.

This dated status block is maintainership guidance. Update it whenever the current blocker changes or the latest canonical live-sandbox verdict changes.

Config Tiers

Config	Intended use	What it can prove	What it cannot prove
`flow/live-sandbox.yml`	Canonical GA gate	One real-provider compound sample against the existing Kimi-backed Claude sandbox reviewer path	Anything outside the disposable sandbox scope
`flow/live-sandbox.mock.yml`	Zero-cost frequent tier	The same compound runner model and lifecycle choreography without real reviewer spend	Real provider behavior or latency
`flow/live-sandbox.codex.yml`	Codex parity canary	A supplemental GitHub-visible slice of the same compound sample model	The canonical Kimi-backed GA proof or the full local-private lifecycle

Only flow/live-sandbox.yml can satisfy the GitHub-backed live proof requirement for Repo-Pack GA.

flowchart TD
    A["clean-infra sandbox clone"] --> B["repo-shape proof + doctor fail"]
    B --> C["repo-pack bootstrap + doctor"]
    C --> D["local task + worktree + blocked commit/push"]
    D --> E["local review + strict local finalize"]
    E --> F["GitHub issue + branch + PR"]
    F --> G["review findings loop"]
    G --> H["strict PR finalize + merge + reconcile"]
    H --> I["compound proof passes"]

Canonical Components

The canonical config requires all of these components in one run:

flow_not_configured_bootstrap
installed_repo_pack_bootstrap
local_task_lifecycle
worktree_happy_path
blocked_commit_before_local_review
successful_commit_after_local_review
blocked_push_before_second_local_review
successful_push_after_second_local_review
local_acceptance_finalize
github_issue_branch_lifecycle
pr_creation_and_binding
github_review_loop
pr_acceptance_finalize
pr_merge
pr_reconcile

Use --rows only as a compatibility selector for component ids. The runner resolves prerequisite components automatically.

Preflight

Before the live run:

start from the current source repository checkout you want to validate
confirm the sandbox repo default branch already carries the workflow that handles the GitHub-visible reviewer path
confirm the sandbox repo default branch tracks no flow/**
confirm the sandbox reviewer path is still the existing Claude sandbox reviewer backed by Kimi
keep the sample app small and bounded by the template catalog
keep the helper bounded to kimi-for-coding, medium, and the configured file/line/retry/timeout caps
keep the sandbox local claude_local review lane bounded to kimi-for-coding, medium, and 1200s
keep remote cleanup at close unless you are debugging a failure

Canonical Procedure

tools/flow-live-sandbox-check --config flow/live-sandbox.yml runs the canonical gate.

At a high level it will:

Clone code4focus/test-salp.
Prove the untouched clone has zero tracked flow/**.
Prove doctor --json --strict fails before bootstrap.
Install the current wap-flow binary from the source repository.
Run wap-flow repo-pack bootstrap --source <source-root> --target <clone> --json.
Prove the bootstrap report copied the repo pack, installed hooks, pinned the binary, and passed strict doctor.
Materialize one seeded tiny app template.
Run the local-private lifecycle:
- task local init with a run-specific key to avoid preserved failure branch collisions
- start --task ...
- plan record
- worktree create/edit/promote/select/reconcile
- authoritative-branch follow-up edit
- blocked commit
- accept review --agent claude_local
- successful commit
- blocked push
- second local review
- successful push
- accept check --strict
- accept finalize
Run the GitHub-visible lifecycle:
- create parent and task issues
- create the issue branch
- start --issue ...
- plan record
- commit and push one bounded reviewer-visible defect
- create and bind the PR
- complete
- initialize acceptance
- request GitHub review through the existing Kimi-backed Claude sandbox reviewer path
- wait for claim and result
- sync findings
- apply the repair
- resolve findings
- rerequest review
- wait and sync again
- accept check --strict
- accept finalize
- merge the PR
- pr reconcile --strict

Evidence

Every run must emit a comparable proof bundle under .salp/flow/tmp/live-sandbox/<run-id>/.

Every run also appends .salp/flow/tmp/live-sandbox/index.json and refreshes .salp/flow/tmp/live-sandbox/index.md so release readiness can cite a rolling verdict history instead of only latest.*.

The bundle must record:

template id, variant id, seed, and task prompt
fresh-clone repo shape for each executed lane
local task id, worktree ids, reviewed commits, and changed files
GitHub issue ids, PR id, trigger timestamps, and remote artifact ids
reviewer provenance for the local lane, GitHub lane, and helper
per-component status plus final verdict
cleanup outcome

pass is valid only when every required component passes in the same fresh run.

Interpretation

A single passing component is not enough for the canonical config.
The mock and Codex configs are support signals only.
If the sandbox environment is ambiguous or broken, fix the environment before expanding coverage.
Keep remote artifacts only when the run fails or when --cleanup keep is explicitly requested.