Skip to main content

Run Your First Experiment

This guide walks you through running your first experiment on GPU compute. We will use a simple MCMC chain as an example, but the workflow applies to any experiment type.

Prerequisites

Overview

Every experiment follows the same lifecycle:
DRAFT → QUEUED → RUNNING → QC_GATE → COMPLETE
You define it. The orchestrator queues it. An agent runs it on a GPU pod. QC validates the results. Done.

Option 1: Natural Language

The fastest way to run an experiment is to describe it to the orchestrator.
Open the Orchestrator Chat in Captain View and type:
Run a test MCMC chain with 1000 samples on the base Planck dataset.
Use an H100 pod. Save the chain output and a posterior plot.
The orchestrator will:
  1. Create the experiment (EXP-001)
  2. Allocate an H100 pod
  3. Assign the Research Lead
  4. Execute and report back when complete

Option 2: Structured Definition

For more control, define the experiment explicitly.
1

Write a config file

Create an experiment config:
# experiment.yaml
name: "test-mcmc-planck"
description: "Test MCMC chain on Planck base likelihood"
script: run_cobaya.py
config: planck_base.yaml
pod:
  gpu: h100
  timeout: 2h
outputs:
  - chain_samples.txt
  - posterior_plot.png
qc:
  convergence_threshold: 1.10  # Relaxed for test run
  min_samples: 1000
2

Submit the experiment

hubify experiment run --file experiment.yaml
3

Watch the logs

hubify logs EXP-001 --follow
You will see real-time output from the pod:
[10:42:01] Pod provisioned: h100-abc123
[10:42:15] Environment initialized
[10:42:20] Starting Cobaya MCMC sampler...
[10:43:05] Sample 100/1000
[10:44:12] Sample 500/1000
[10:45:30] Sample 1000/1000
[10:45:31] Chain complete. Writing output...
[10:45:35] QC gate: checking convergence...
[10:45:36] QC PASS: R-hat = 1.04 (threshold: 1.10)
[10:45:37] Experiment COMPLETE
4

Review results

# View experiment summary
hubify experiment status EXP-001

# Download outputs
hubify experiment outputs EXP-001 --download ./results/

# View in Data Explorer
hubify data open EXP-001

Understanding the Output

After completion, your experiment includes:
OutputDescription
chain_samples.txtRaw MCMC chain (space-delimited, weights in column 1)
posterior_plot.pngAuto-generated posterior distribution
experiment_log.txtFull execution log
qc_report.jsonQC gate results (convergence, completeness)
reproducibility.jsonGit SHA, dependencies, config checksums

What Happens Next

The Houston Method requires every completed experiment to generate follow-up tasks:
  1. Scientific analysis, What do the results mean?
  2. Knowledge base update, Record findings in the wiki
  3. Paper integration, Tag results for paper sections if applicable
  4. Queue expansion, Generate 5-15 new tasks based on what was learned
The orchestrator handles this automatically after QC passes.

Troubleshooting

Check that compute is connected and pods are available:
hubify pod list
hubify pod budget
View the QC report for details:
hubify experiment qc EXP-001
Common fixes: increase sample count, check input data, adjust convergence threshold.
Resume from the last checkpoint:
hubify experiment resume EXP-001 --from-checkpoint latest