Experiment Runner

The Experiment Runner is the execution engine of Hubify Labs. It takes experiment definitions, provisions compute, executes code, and captures every detail for reproducibility.

Running an Experiment

Web UI
CLI
API

Open the Captain View
Click New Experiment (or press Cmd+E)
Describe the experiment in natural language or fill in the structured form
Select compute requirements (GPU type, estimated duration)
Click Run

The orchestrator will handle agent assignment and pod allocation.

# Natural language
hubify experiment run "MCMC chain with Planck 2018 + BAO, 200K samples"

# Structured
hubify experiment run \
  --name "planck-bao-mcmc" \
  --script run_cobaya.py \
  --config planck_bao.yaml \
  --pod h100 \
  --timeout 4h

curl -X POST https://www.hubify.com/api/v1/labs/your-lab-slug/experiments \
  -H "Authorization: Bearer $HUBIFY_API_KEY" \
  -d '{
    "name": "planck-bao-mcmc",
    "script": "run_cobaya.py",
    "config": "planck_bao.yaml",
    "pod_type": "h100",
    "timeout": "4h"
  }'

Experiment Dashboard

Each running experiment has a detail view showing:

Live Logs, Streaming stdout/stderr from the pod
Metrics, Custom metrics emitted by your script (loss, convergence, sample count)
Figures, Plots generated during execution, updated in real time
Resource Usage, GPU utilization, memory, disk I/O
Checkpoints, Saved intermediate states you can resume from
Cost, Running cost in USD

Checkpointing

Experiments automatically checkpoint at configurable intervals:

# In your experiment config
checkpoint:
  interval: 30m    # Save state every 30 minutes
  keep_last: 5     # Keep the 5 most recent checkpoints
  path: /workspace/checkpoints/

If a pod crashes or an experiment is interrupted, you can resume from the last checkpoint:

hubify experiment resume EXP-054 --from-checkpoint latest

QC Gates

Every experiment passes through a quality control gate before results are accepted:

Check	Description	Threshold
Completeness	All expected output files exist	100%
Convergence	R-hat statistic for MCMC chains	< 1.05
Error Bounds	Statistical uncertainties are reasonable	Domain-specific
Reproducibility	Config + data + code are frozen	All locked
Review	Cross-model verification of results	Pass

If a QC gate fails, the experiment is flagged and the orchestrator decides whether to:

Rerun with more samples
Adjust parameters and retry
Escalate to you for a decision

Chaining

Experiments can be chained so outputs flow into inputs:

hubify experiment run --chain chain.yaml

# chain.yaml
steps:
  - name: preprocess
    script: preprocess.py
    pod: cpu
  - name: mcmc
    script: run_mcmc.py
    pod: h200
    depends_on: preprocess
  - name: analysis
    script: analyze.py
    pod: cpu
    depends_on: mcmc

Batch Experiments

Run parameter sweeps or multi-configuration experiments:

hubify experiment batch \
  --script train.py \
  --sweep '{"learning_rate": [0.001, 0.01, 0.1], "batch_size": [32, 64]}' \
  --pod h100

This creates 6 experiments (3 x 2) and runs them in parallel if pods are available.

Reproducibility Record

Every experiment captures:

Git SHA of the codebase
Full dependency list (pip freeze)
Config files (YAML/JSON, checksummed)
Input data SHA-256 hashes
Random seeds
Pod hardware specs
Start/end timestamps

This record is immutable and attached to the experiment forever.

​Experiment Runner

​Running an Experiment

​Experiment Dashboard

​Checkpointing

​QC Gates

​Chaining

​Batch Experiments

​Reproducibility Record