Run Your First Experiment

This guide walks you through running your first experiment on GPU compute. We will use a simple MCMC chain as an example, but the workflow applies to any experiment type.

Prerequisites

A lab created (create one first)
GPU compute connected (set up RunPod), or use CPU for this tutorial

Overview

Every experiment follows the same lifecycle:

DRAFT → QUEUED → RUNNING → QC_GATE → COMPLETE

You define it. The orchestrator queues it. An agent runs it on a GPU pod. QC validates the results. Done.

Option 1: Natural Language

The fastest way to run an experiment is to describe it to the orchestrator.

Web UI
CLI

Open the Orchestrator Chat in Captain View and type:

Run a test MCMC chain with 1000 samples on the base Planck dataset.
Use an H100 pod. Save the chain output and a posterior plot.

The orchestrator will:

Create the experiment (EXP-001)
Allocate an H100 pod
Assign the Research Lead
Execute and report back when complete

hubify experiment run "Test MCMC chain, 1000 samples, Planck base, H100 pod"

Option 2: Structured Definition

For more control, define the experiment explicitly.

Write a config file

Create an experiment config:

# experiment.yaml
name: "test-mcmc-planck"
description: "Test MCMC chain on Planck base likelihood"
script: run_cobaya.py
config: planck_base.yaml
pod:
  gpu: h100
  timeout: 2h
outputs:
  - chain_samples.txt
  - posterior_plot.png
qc:
  convergence_threshold: 1.10  # Relaxed for test run
  min_samples: 1000

Submit the experiment

hubify experiment run --file experiment.yaml

Watch the logs

hubify logs EXP-001 --follow

You will see real-time output from the pod:

[10:42:01] Pod provisioned: h100-abc123
[10:42:15] Environment initialized
[10:42:20] Starting Cobaya MCMC sampler...
[10:43:05] Sample 100/1000
[10:44:12] Sample 500/1000
[10:45:30] Sample 1000/1000
[10:45:31] Chain complete. Writing output...
[10:45:35] QC gate: checking convergence...
[10:45:36] QC PASS: R-hat = 1.04 (threshold: 1.10)
[10:45:37] Experiment COMPLETE

Review results

# View experiment summary
hubify experiment status EXP-001

# Download outputs
hubify experiment outputs EXP-001 --download ./results/

# View in Data Explorer
hubify data open EXP-001

Understanding the Output

After completion, your experiment includes:

Output	Description
`chain_samples.txt`	Raw MCMC chain (space-delimited, weights in column 1)
`posterior_plot.png`	Auto-generated posterior distribution
`experiment_log.txt`	Full execution log
`qc_report.json`	QC gate results (convergence, completeness)
`reproducibility.json`	Git SHA, dependencies, config checksums

What Happens Next

The Houston Method requires every completed experiment to generate follow-up tasks:

Scientific analysis, What do the results mean?
Knowledge base update, Record findings in the wiki
Paper integration, Tag results for paper sections if applicable
Queue expansion, Generate 5-15 new tasks based on what was learned

The orchestrator handles this automatically after QC passes.

Troubleshooting

Experiment stuck in QUEUED

Check that compute is connected and pods are available:

hubify pod list
hubify pod budget

QC gate failed

View the QC report for details:

hubify experiment qc EXP-001

Common fixes: increase sample count, check input data, adjust convergence threshold.

Pod crashed mid-experiment

Resume from the last checkpoint:

hubify experiment resume EXP-001 --from-checkpoint latest

​Run Your First Experiment

​Prerequisites

​Overview

​Option 1: Natural Language

​Option 2: Structured Definition

​Understanding the Output

​What Happens Next

​Troubleshooting