m-gpux Documentation¶
Welcome to the official docs for m-gpux — a production-focused CLI toolkit for Modal GPU operations.
One CLI to manage profiles, launch GPU runtimes, deploy LLM APIs, and track cloud costs.
What is m-gpux?¶
m-gpux turns Modal's serverless GPU platform into a streamlined developer experience:
| Capability | Description |
|---|---|
| Multi-profile management | Add, switch, and remove Modal identities — all stored in ~/.modal.toml |
| Interactive GPU Hub | Guided wizard to launch Jupyter Lab, run Python scripts, or open a web shell on any GPU (T4 → B200) |
| LLM API Server | Deploy any HuggingFace model as an OpenAI-compatible endpoint with Bearer token auth, streaming, and warm containers |
| API Key Management | Create, list, show, and revoke sk-mgpux-* keys — stored locally in ~/.m-gpux/api_keys.json |
| Billing Dashboard | Inspect 7/30/90-day usage per profile or aggregated across all accounts |
| GPU Metrics Probe | Live hardware utilization (GPU %, VRAM, temperature) on running containers |
| App Lifecycle | Stop any running m-gpux app (Jupyter, shells, LLM servers) from one command |
Quick Install¶
pip install m-gpux
Or from source:
git clone https://github.com/PuxHocDL/m-gpux.git
cd m-gpux && pip install -e .
Requirements
Python 3.10+, Modal CLI installed (pip install modal), and at least one Modal account with token_id / token_secret.
Start Here¶
| Page | What you'll learn |
|---|---|
| Getting Started | Install, add your first profile, and launch a GPU session in 5 minutes |
| Command Reference | Every command, flag, and option with examples |
| Architecture | How m-gpux works internally — proxy layer, template generation, profile resolution |
| FAQ & Troubleshooting | Common errors and how to fix them |
Common Workflows¶
1. Launch Jupyter on a GPU¶
m-gpux account add # one-time setup
m-gpux hub # pick GPU → pick Jupyter → launch
The hub generates a modal_runner.py script, shows it for review, then executes modal run to start a GPU-backed Jupyter Lab with a public URL.
2. Deploy an LLM as an OpenAI-compatible API¶
m-gpux serve keys create --name prod # generate API key
m-gpux serve deploy # 5-step wizard
The wizard walks through:
- Model — 11 presets (Qwen, Llama, Gemma, Mistral, DeepSeek, Phi) or custom HuggingFace ID
- GPU — T4, L4, A10G, L40S, A100, A100-80GB, H100, H200, B200
- Context length — max sequence length (lower = faster startup)
- Keep warm —
0scales to zero (saves cost),1+keeps container(s) always running (no cold start) - API key — pick an existing key or auto-create one
After deploy, your endpoint is a drop-in replacement for OpenAI / OpenRouter:
from openai import OpenAI
client = OpenAI(
base_url="https://<workspace>--m-gpux-llm-api-serve.modal.run/v1",
api_key="sk-mgpux-...",
)
resp = client.chat.completions.create(
model="Qwen/Qwen3-8B",
messages=[{"role": "user", "content": "Hello!"}],
stream=True,
)
for chunk in resp:
print(chunk.choices[0].delta.content or "", end="")
3. Check costs across all accounts¶
m-gpux billing usage --days 7 --all
Aggregates compute spend from every configured profile into a single Rich table.
4. Stop running apps and release GPUs¶
m-gpux stop --all # scan ALL profiles, pick which to stop
m-gpux serve stop # stop only the LLM API server
Pro workflow
Keep one profile for personal experiments and one for team workloads, then run m-gpux billing usage --all weekly to track total burn across both.
Supported GPUs¶
m-gpux supports all Modal GPU types:
| # | GPU | VRAM | Best for |
|---|---|---|---|
| 1 | T4 | 16 GB | Light inference, exploration |
| 2 | L4 | 24 GB | Cost/performance balance |
| 3 | A10G | 24 GB | Training and inference |
| 4 | L40S | 48 GB | Large-batch inference |
| 5 | A100 | 40 GB | High-performance training |
| 6 | A100-40GB | 40 GB | Ampere 40GB variant |
| 7 | A100-80GB | 80 GB | Large models (30B+) |
| 8 | RTX PRO 6000 | 48 GB | Pro workstation GPU |
| 9 | H100 | 80 GB | Hopper architecture |
| 10 | H100! | 80 GB | H100 reserved (guaranteed) |
| 11 | H200 | 141 GB | HBM3e, next-gen Hopper |
| 12 | B200 | — | Blackwell architecture |
| 13 | B200+ | — | B200 reserved (guaranteed) |
Model Presets (serve deploy)¶
| # | Model | Size | Recommended GPU |
|---|---|---|---|
| 1 | Qwen/Qwen2.5-1.5B-Instruct |
1.5B | T4 / L4 |
| 2 | Qwen/Qwen2.5-7B-Instruct |
7B | A10G |
| 3 | Qwen/Qwen3-8B |
8B | A10G |
| 4 | Qwen/Qwen3.5-35B-A3B |
35B MoE | A100-80GB |
| 5 | meta-llama/Llama-3.1-8B-Instruct |
8B | A10G |
| 6 | google/gemma-2-9b-it |
9B | A10G |
| 7 | mistralai/Mistral-7B-Instruct-v0.3 |
7B | A10G |
| 8 | Qwen/Qwen2.5-72B-Instruct-AWQ |
72B AWQ | A100-80GB |
| 9 | meta-llama/Llama-3.1-70B-Instruct |
70B | H100 |
| 10 | deepseek-ai/DeepSeek-V2-Lite-Chat |
16B | A100 |
| 11 | microsoft/Phi-3-medium-4k-instruct |
14B | A10G |
Select 0 during the wizard to enter any custom HuggingFace model ID.
Links¶
- PyPI: pypi.org/project/m-gpux
- Repository: github.com/PuxHocDL/m-gpux
- Issues: github.com/PuxHocDL/m-gpux/issues
- Modal docs: modal.com/docs