Skip to contents

Orchestrates local or HPC training using Detectron2. R computes batch size, workers, and a batch-scaled learning rate, then calls the Python trainer. Models are automatically pinned to the local board (.petrographer/) for versioning.

Usage

train_model(
  dataset_id = NULL,
  data_dir = NULL,
  model_id = NULL,
  num_classes,
  backbone = "resnet50",
  freeze_at = 2,
  max_iter = 2000,
  learning_rate = NULL,
  device = "cuda",
  eval_period = 1000,
  checkpoint_period = 0,
  ims_per_batch = NA,
  num_workers = NULL,
  hpc_cpus_per_task = NULL,
  hpc_mem = NULL,
  gpus = 1
)

Arguments

dataset_id

Name of pinned dataset to use for training (preferred).

data_dir

Path to dataset directory (alternative to dataset_id; will be auto-pinned with temp ID).

model_id

Name for the trained model (used for pins). Defaults to dataset_id if not provided.

num_classes

Number of object classes in your dataset.

backbone

Model backbone: "resnet50" (default), "resnet101", "resnext101", or a full Detectron2 model zoo key.

freeze_at

Freeze backbone up to this stage: 0 (freeze nothing), 1 (freeze stem), 2 (freeze stem + res2; default). Lower values train more layers = slower but better domain adaptation.

max_iter

Maximum training iterations. Default: 2000.

learning_rate

Learning rate for the detection head. If NULL (default), uses smart auto-scaling based on freeze_at and batch size. Backbone automatically gets 0.1x this rate. If a number is provided, uses that exact value for the head (backbone still gets 0.1x).

device

Device for local training: 'cpu', 'cuda', or 'mps' (default: 'cuda').

eval_period

Validation evaluation frequency in iterations (default: 500).

checkpoint_period

Checkpoint saving frequency (0 = final only; > 0 = every N iters).

ims_per_batch

Total images per iteration across all GPUs. If NA (default), uses 2 images per GPU.

num_workers

DataLoader workers per process (Detectron2). If NULL (default), set to images per GPU.

hpc_cpus_per_task

Optional SLURM cpus-per-task hint for HPC training.

hpc_mem

Optional SLURM memory hint for HPC training (e.g., "24gb", "96gb").

gpus

Number of GPUs for HPC training (default: 1; ignored for local).

Value

Model ID (can be loaded with from_pretrained(model_id)).

Details

Training mode (local vs HPC) is auto-detected based on hipergator configuration. For HPC training, call hipergator::hpg_configure() before train_model() to set connection details (host, user, base_dir).