Vision Training¶
m-gpux vision sample-data, vision train, vision evaluate, vision predict, and vision export form a complete image-classification workflow on Modal without making the user rewrite dataset setup, training, evaluation, inference, or export boilerplate.
What it does¶
The workflow:
- Optionally generates a tiny local sample dataset
- Validates a local dataset folder
- Prompts for model, GPU, and training hyperparameters
- Generates a complete
modal_runner.py - Runs the job on Modal
- Saves checkpoints and metrics to a Modal Volume
Supported dataset layouts¶
Pre-split¶
dataset/
train/
cat/
dog/
val/
cat/
dog/
test/
cat/
dog/
Single root¶
dataset/
cat/
dog/
For a single-root dataset, m-gpux creates the validation split automatically.
Sample dataset generator¶
The repository includes a ready-to-use smoke-test dataset at data/m-gpux-vision-sample. Use sample-data when you want to regenerate it or create a customized copy without downloading anything:
m-gpux vision sample-data
By default this creates:
data/m-gpux-vision-sample/
train/
circle/
square/
triangle/
val/
circle/
square/
triangle/
test/
circle/
square/
triangle/
Useful options:
m-gpux vision sample-data --output ./data/demo-shapes --image-size 160
m-gpux vision sample-data --layout single-root --images-per-class 30
m-gpux vision sample-data --force
Example¶
Create a built-in demo dataset:
m-gpux vision sample-data --output ./data/m-gpux-vision-sample
Then train on it:
m-gpux vision train --dataset ./data/m-gpux-vision-sample --model resnet18 --gpu T4
Or train on your own dataset:
m-gpux vision train --dataset ./data/cats-vs-dogs --model resnet50 --gpu A10G
After training, run inference on new images with:
m-gpux vision predict --input ./samples --run-name imgclf-resnet50-20260420-113500 --gpu T4
Evaluate the checkpoint on a dataset split:
m-gpux vision evaluate --dataset ./data/cats-vs-dogs --run-name imgclf-resnet50-20260420-113500 --split test --gpu T4
Export the trained model for deployment:
m-gpux vision export --run-name imgclf-resnet50-20260420-113500 --format all
If you omit those flags, the wizard will guide you through:
- Dataset folder
- Model selection
- GPU selection
- Epochs, batch size, image size
- Optimizer and scheduler
- Augmentation strength
- Mixed precision, early stopping, gradient accumulation
- Artifact volume and experiment name
Model choices¶
The built-in picker includes a broad set of TorchVision classification models, including:
- ResNet and Wide ResNet
- ResNeXt
- EfficientNet and EfficientNetV2
- ConvNeXt
- DenseNet
- MobileNet
- ShuffleNet
- RegNet
- Vision Transformer
- Swin Transformer
- MaxVit
- Inception V3
You can also type a custom TorchVision model builder name manually.
Stored artifacts¶
Each run is saved into a persistent Modal Volume, by default m-gpux-vision-artifacts.
Typical output:
<run-name>/
checkpoints/
best_model.pt
last_model.pt
config.json
history.json
summary.json
test_metrics.json
test_report.json
Download them later with:
modal volume get m-gpux-vision-artifacts <run-name>/summary.json summary.json
modal volume get m-gpux-vision-artifacts <run-name>/checkpoints/best_model.pt best_model.pt
vision predict also writes a JSON report back into the same volume, usually under:
<run-name>/predictions/predictions-YYYYMMDD-HHMMSS.json
vision evaluate writes JSON metric reports, typically under:
<run-name>/evaluations/eval-<split>-YYYYMMDD-HHMMSS.json
vision export writes deployment artifacts, typically under:
<run-name>/exports/export-YYYYMMDD-HHMMSS/
model.onnx
model.ts
labels.json
export_summary.json
Practical notes¶
- Smaller datasets and simpler models work well with
T4,L4, orA10G. - Transformer backbones and larger ConvNeXt variants usually benefit from
A100,H100, or better. - Local datasets are forwarded into the container with
Image.add_local_dir, so very large datasets may take longer to start. - Checkpoints are persisted with a Modal Volume so they survive container shutdown.