Vision Training¶

m-gpux vision sample-data, vision train, vision evaluate, vision predict, and vision export form a complete image-classification workflow on Modal without making the user rewrite dataset setup, training, evaluation, inference, or export boilerplate.

What it does¶

The workflow:

Optionally generates a tiny local sample dataset
Validates a local dataset folder
Prompts for model, GPU, and training hyperparameters
Generates a complete modal_runner.py
Runs the job on Modal
Saves checkpoints and metrics to a Modal Volume

Supported dataset layouts¶

Pre-split¶

dataset/
  train/
    cat/
    dog/
  val/
    cat/
    dog/
  test/
    cat/
    dog/

Single root¶

dataset/
  cat/
  dog/

For a single-root dataset, m-gpux creates the validation split automatically.

Sample dataset generator¶

The repository includes a ready-to-use smoke-test dataset at data/m-gpux-vision-sample. Use sample-data when you want to regenerate it or create a customized copy without downloading anything:

m-gpux vision sample-data

By default this creates:

data/m-gpux-vision-sample/
  train/
    circle/
    square/
    triangle/
  val/
    circle/
    square/
    triangle/
  test/
    circle/
    square/
    triangle/

Useful options:

m-gpux vision sample-data --output ./data/demo-shapes --image-size 160
m-gpux vision sample-data --layout single-root --images-per-class 30
m-gpux vision sample-data --force

Example¶

Create a built-in demo dataset:

m-gpux vision sample-data --output ./data/m-gpux-vision-sample

Then train on it:

m-gpux vision train --dataset ./data/m-gpux-vision-sample --model resnet18 --gpu T4

Or train on your own dataset:

m-gpux vision train --dataset ./data/cats-vs-dogs --model resnet50 --gpu A10G

After training, run inference on new images with:

m-gpux vision predict --input ./samples --run-name imgclf-resnet50-20260420-113500 --gpu T4

Evaluate the checkpoint on a dataset split:

m-gpux vision evaluate --dataset ./data/cats-vs-dogs --run-name imgclf-resnet50-20260420-113500 --split test --gpu T4

Export the trained model for deployment:

m-gpux vision export --run-name imgclf-resnet50-20260420-113500 --format all

If you omit those flags, the wizard will guide you through:

Dataset folder
Model selection
GPU selection
Epochs, batch size, image size
Optimizer and scheduler
Augmentation strength
Mixed precision, early stopping, gradient accumulation
Artifact volume and experiment name

Model choices¶

The built-in picker includes a broad set of TorchVision classification models, including:

ResNet and Wide ResNet
ResNeXt
EfficientNet and EfficientNetV2
ConvNeXt
DenseNet
MobileNet
ShuffleNet
RegNet
Vision Transformer
Swin Transformer
MaxVit
Inception V3

You can also type a custom TorchVision model builder name manually.

Stored artifacts¶

Each run is saved into a persistent Modal Volume, by default m-gpux-vision-artifacts.

Typical output:

<run-name>/
  checkpoints/
    best_model.pt
    last_model.pt
  config.json
  history.json
  summary.json
  test_metrics.json
  test_report.json

Download them later with:

modal volume get m-gpux-vision-artifacts <run-name>/summary.json summary.json
modal volume get m-gpux-vision-artifacts <run-name>/checkpoints/best_model.pt best_model.pt

vision predict also writes a JSON report back into the same volume, usually under:

<run-name>/predictions/predictions-YYYYMMDD-HHMMSS.json

vision evaluate writes JSON metric reports, typically under:

<run-name>/evaluations/eval-<split>-YYYYMMDD-HHMMSS.json

vision export writes deployment artifacts, typically under:

<run-name>/exports/export-YYYYMMDD-HHMMSS/
  model.onnx
  model.ts
  labels.json
  export_summary.json

Practical notes¶

Smaller datasets and simpler models work well with T4, L4, or A10G.
Transformer backbones and larger ConvNeXt variants usually benefit from A100, H100, or better.
Local datasets are forwarded into the container with Image.add_local_dir, so very large datasets may take longer to start.
Checkpoints are persisted with a Modal Volume so they survive container shutdown.