Worker runtime

Lilith tests and benchmarks.

Dedicated validation pages make tests easier to browse, share, and expand without turning the landing page into a single-page application.

Validation

Evidence for the system

Use this page for demos, benchmark videos, reliability checks, and other proof that belongs with this version.

Long-run execution

Worker Loop Stability

Validates that agents keep state, recover from tool failures, and complete multi-step jobs.

Test details coming soon
Output verification

Artifact Accuracy

Checks generated files against expected structure, content, and delivery requirements.