Temporal Evaluation Engine

The evaluator does NOT use Baby Nexus self-descriptions. It scores actual outputs, memory persistence and checkpoint deltas.

No evaluation has been run yet. Click Run evaluation to generate the first report.