Live Sandbox Validation
This playbook is the operator-friendly companion to flow/specs/flow-acceptance-sandbox-contract.md.
Use the disposable sandbox repository code4focus/test-salp unless the canonical contract is updated to name a different sandbox.
Treat code4focus/test-salp as clean sandbox infra on its default branch. It should keep the reviewer workflow under .github/**, ignore rules for generated .salp/flow/state/ and .salp/flow/tmp/, and minimal repo metadata, but it should not track flow/**. The current source repository remains authoritative for the bootstrapped repo pack and the installed wap-flow binary during the run.
Quick Read
Section titled “Quick Read”The canonical live gate is one fresh compound proof sample, not a four-row matrix.
That sample starts from an untouched clean-infra sandbox clone, proves the clone is already flow-not-configured, bootstraps the current repo pack through wap-flow repo-pack bootstrap, carries one tiny seeded app through a bounded local-private development phase, then carries the same seeded sample family through a bounded GitHub-visible issue/branch/PR/review/merge phase.
The canonical sample proves:
- installed repo-pack bootstrap on a disposable consumer repo
- local-private task lifecycle, worktree flow, and blocking commit/push gates
- GitHub-visible issue, branch, PR, review, repair, finalize, merge, and reconcile flow
It does not prove local packaging health, general product behavior, or arbitrary large-repo cost. Those stay in tools/flow-release-check.
Current Status
Section titled “Current Status”As of April 26, 2026, canonical live-sandbox status is evidence-driven from .salp/flow/tmp/live-sandbox/index.json. Treat the latest flow/live-sandbox.yml entry in that generated index as the current local proof verdict; a latest verdict of pass means the earlier April 23 missing-proof blocker is closed for that source checkout.
The durable sandbox-repo facts are currently aligned with this contract:
code4focus/test-salpdefault branch is infra-only rather than a source-repo copy offlow/**.- the default branch tracks no
flow/** - the default branch ignores generated
.salp/flow/state/and.salp/flow/tmp/ - the default branch currently exposes the GitHub reviewer workflow on
mainwith the requiredissue_commenttrigger
Treat operator-local prerequisites as verify-before-run checks, not as repo-tracked done state. GitHub auth, installed CLIs, and API keys must still be confirmed by the operator before each live run.
This dated status block is maintainership guidance. Update it whenever the current blocker changes or the latest canonical live-sandbox verdict changes.
Config Tiers
Section titled “Config Tiers”| Config | Intended use | What it can prove | What it cannot prove |
|---|---|---|---|
flow/live-sandbox.yml | Canonical GA gate | One real-provider compound sample against the existing Kimi-backed Claude sandbox reviewer path | Anything outside the disposable sandbox scope |
flow/live-sandbox.mock.yml | Zero-cost frequent tier | The same compound runner model and lifecycle choreography without real reviewer spend | Real provider behavior or latency |
flow/live-sandbox.codex.yml | Codex parity canary | A supplemental GitHub-visible slice of the same compound sample model | The canonical Kimi-backed GA proof or the full local-private lifecycle |
Only flow/live-sandbox.yml can satisfy the GitHub-backed live proof requirement for Repo-Pack GA.
flowchart TD A["clean-infra sandbox clone"] --> B["repo-shape proof + doctor fail"] B --> C["repo-pack bootstrap + doctor"] C --> D["local task + worktree + blocked commit/push"] D --> E["local review + strict local finalize"] E --> F["GitHub issue + branch + PR"] F --> G["review findings loop"] G --> H["strict PR finalize + merge + reconcile"] H --> I["compound proof passes"]Canonical Components
Section titled “Canonical Components”The canonical config requires all of these components in one run:
flow_not_configured_bootstrapinstalled_repo_pack_bootstraplocal_task_lifecycleworktree_happy_pathblocked_commit_before_local_reviewsuccessful_commit_after_local_reviewblocked_push_before_second_local_reviewsuccessful_push_after_second_local_reviewlocal_acceptance_finalizegithub_issue_branch_lifecyclepr_creation_and_bindinggithub_review_looppr_acceptance_finalizepr_mergepr_reconcile
Use --rows only as a compatibility selector for component ids. The runner resolves prerequisite components automatically.
Preflight
Section titled “Preflight”Before the live run:
- start from the current source repository checkout you want to validate
- confirm the sandbox repo default branch already carries the workflow that handles the GitHub-visible reviewer path
- confirm the sandbox repo default branch tracks no
flow/** - confirm the sandbox reviewer path is still the existing Claude sandbox reviewer backed by Kimi
- keep the sample app small and bounded by the template catalog
- keep the helper bounded to
kimi-for-coding,medium, and the configured file/line/retry/timeout caps - keep the sandbox local
claude_localreview lane bounded tokimi-for-coding,medium, and1200s - keep remote cleanup at
closeunless you are debugging a failure
Canonical Procedure
Section titled “Canonical Procedure”tools/flow-live-sandbox-check --config flow/live-sandbox.yml runs the canonical gate.
At a high level it will:
- Clone
code4focus/test-salp. - Prove the untouched clone has zero tracked
flow/**. - Prove
doctor --json --strictfails before bootstrap. - Install the current
wap-flowbinary from the source repository. - Run
wap-flow repo-pack bootstrap --source <source-root> --target <clone> --json. - Prove the bootstrap report copied the repo pack, installed hooks, pinned the binary, and passed strict
doctor. - Materialize one seeded tiny app template.
- Run the local-private lifecycle:
task local initwith a run-specific key to avoid preserved failure branch collisionsstart --task ...plan record- worktree create/edit/promote/select/reconcile
- authoritative-branch follow-up edit
- blocked commit
accept review --agent claude_local- successful commit
- blocked push
- second local review
- successful push
accept check --strictaccept finalize
- Run the GitHub-visible lifecycle:
- create parent and task issues
- create the issue branch
start --issue ...plan record- commit and push one bounded reviewer-visible defect
- create and bind the PR
complete- initialize acceptance
- request GitHub review through the existing Kimi-backed Claude sandbox reviewer path
- wait for claim and result
- sync findings
- apply the repair
- resolve findings
- rerequest review
- wait and sync again
accept check --strictaccept finalize- merge the PR
pr reconcile --strict
Evidence
Section titled “Evidence”Every run must emit a comparable proof bundle under .salp/flow/tmp/live-sandbox/<run-id>/.
Every run also appends .salp/flow/tmp/live-sandbox/index.json and refreshes .salp/flow/tmp/live-sandbox/index.md so release readiness can cite a rolling verdict history instead of only latest.*.
The bundle must record:
- template id, variant id, seed, and task prompt
- fresh-clone repo shape for each executed lane
- local task id, worktree ids, reviewed commits, and changed files
- GitHub issue ids, PR id, trigger timestamps, and remote artifact ids
- reviewer provenance for the local lane, GitHub lane, and helper
- per-component status plus final verdict
- cleanup outcome
pass is valid only when every required component passes in the same fresh run.
Interpretation
Section titled “Interpretation”- A single passing component is not enough for the canonical config.
- The mock and Codex configs are support signals only.
- If the sandbox environment is ambiguous or broken, fix the environment before expanding coverage.
- Keep remote artifacts only when the run fails or when
--cleanup keepis explicitly requested.