Train a new petrography detection model — train

Orchestrates local or HPC training using Detectron2. R computes batch size, workers, and a batch-scaled learning rate, then calls the Python trainer. Models are automatically pinned to the local board (.petrographer/) for versioning.

Usage

train_model(
  dataset_id = NULL,
  data_dir = NULL,
  model_id = NULL,
  num_classes,
  backbone = "resnet50",
  freeze_at = 2,
  max_iter = 2000,
  learning_rate = NULL,
  device = "cuda",
  eval_period = 1000,
  checkpoint_period = 0,
  ims_per_batch = NA,
  num_workers = NULL,
  hpc_cpus_per_task = NULL,
  hpc_mem = NULL,
  gpus = 1
)

Arguments

dataset_id: Name of pinned dataset to use for training (preferred).
data_dir: Path to dataset directory (alternative to dataset_id; will be auto-pinned with temp ID).
model_id: Name for the trained model (used for pins). Defaults to dataset_id if not provided.
num_classes: Number of object classes in your dataset.
backbone: Model backbone: "resnet50" (default), "resnet101", "resnext101", or a full Detectron2 model zoo key.
freeze_at: Freeze backbone up to this stage: 0 (freeze nothing), 1 (freeze stem), 2 (freeze stem + res2; default). Lower values train more layers = slower but better domain adaptation.
max_iter: Maximum training iterations. Default: 2000.
learning_rate: Learning rate for the detection head. If NULL (default), uses smart auto-scaling based on freeze_at and batch size. Backbone automatically gets 0.1x this rate. If a number is provided, uses that exact value for the head (backbone still gets 0.1x).
device: Device for local training: 'cpu', 'cuda', or 'mps' (default: 'cuda').
eval_period: Validation evaluation frequency in iterations (default: 500).
checkpoint_period: Checkpoint saving frequency (0 = final only; > 0 = every N iters).
ims_per_batch: Total images per iteration across all GPUs. If NA (default), uses 2 images per GPU.
num_workers: DataLoader workers per process (Detectron2). If NULL (default), set to images per GPU.
hpc_cpus_per_task: Optional SLURM cpus-per-task hint for HPC training.
hpc_mem: Optional SLURM memory hint for HPC training (e.g., "24gb", "96gb").
gpus: Number of GPUs for HPC training (default: 1; ignored for local).

Value

Model ID (can be loaded with from_pretrained(model_id)).

Details

Training mode (local vs HPC) is auto-detected based on hipergator configuration. For HPC training, call hipergator::hpg_configure() before train_model() to set connection details (host, user, base_dir).