GSDBench

Draft ready

pharma-skills

GSDBench intake form.

Collect one standardized group sequential design benchmark example per GitHub issue. The form prepares a human-readable GitHub issue body and a machine-readable JSON block. Nothing leaves your browser.

Submission Governance

Confidentiality confirmations all required

Case Identity

Example: gsdb-20260430-pfs-os-alpha-split

Tags select at least one

Benchmark Prompt to Give the AI Agent

Write the prompt exactly as the evaluated AI agent should see it. Include only information that should be available to the agent.

Expected deliverable type select at least one

Allowed or expected tools/packages

Trial Design Content

Advanced design details

Examples: minimum follow-up, minimum gap between analyses, max N, feasibility range, data-cleaning buffer assumptions.

Reference Ground Truth

Reference truth type select at least one

Numerical checks

Power tolerance often uses percentage points. Type I error tolerance often uses percentage points. Event/timing checks may use relative percent or months.

Known gotchas tested by this case

Rubric Criteria

At least four criteria are required. Include numerical/statistical correctness and at least one penalty or fatal gate.

Reviewer and Curation Notes

Suggested reviewer expertise