Orchestrates local or HPC training using Detectron2. R computes batch size, workers, and a batch-scaled learning rate, then calls the Python trainer. Models are automatically pinned to the local board (.petrographer/) for versioning.
Usage
train_model(
dataset_id = NULL,
data_dir = NULL,
model_id = NULL,
num_classes,
backbone = "resnet50",
freeze_at = 2,
max_iter = 2000,
learning_rate = NULL,
device = "cuda",
eval_period = 1000,
checkpoint_period = 0,
ims_per_batch = NA,
num_workers = NULL,
hpc_cpus_per_task = NULL,
hpc_mem = NULL,
gpus = 1
)
Arguments
- dataset_id
Name of pinned dataset to use for training (preferred).
- data_dir
Path to dataset directory (alternative to dataset_id; will be auto-pinned with temp ID).
- model_id
Name for the trained model (used for pins). Defaults to
dataset_id
if not provided.- num_classes
Number of object classes in your dataset.
- backbone
Model backbone: "resnet50" (default), "resnet101", "resnext101", or a full Detectron2 model zoo key.
- freeze_at
Freeze backbone up to this stage: 0 (freeze nothing), 1 (freeze stem), 2 (freeze stem + res2; default). Lower values train more layers = slower but better domain adaptation.
- max_iter
Maximum training iterations. Default: 2000.
- learning_rate
Learning rate for the detection head. If NULL (default), uses smart auto-scaling based on
freeze_at
and batch size. Backbone automatically gets 0.1x this rate. If a number is provided, uses that exact value for the head (backbone still gets 0.1x).- device
Device for local training: 'cpu', 'cuda', or 'mps' (default: 'cuda').
- eval_period
Validation evaluation frequency in iterations (default: 500).
- checkpoint_period
Checkpoint saving frequency (0 = final only; > 0 = every N iters).
- ims_per_batch
Total images per iteration across all GPUs. If NA (default), uses 2 images per GPU.
- num_workers
DataLoader workers per process (Detectron2). If NULL (default), set to images per GPU.
- hpc_cpus_per_task
Optional SLURM cpus-per-task hint for HPC training.
- hpc_mem
Optional SLURM memory hint for HPC training (e.g., "24gb", "96gb").
- gpus
Number of GPUs for HPC training (default: 1; ignored for local).
Details
Training mode (local vs HPC) is auto-detected based on hipergator
configuration.
For HPC training, call hipergator::hpg_configure()
before train_model()
to set
connection details (host, user, base_dir).