Getting Started with LMX

Learn how to use the LMX dashboard to load models, run inference, and manage your local AI server.

Updated 2026-03-01

What is LMX?

LMX is the inference engine at the core of Opta Local. It runs an OpenAI-compatible API on your machine, serves models via MLX on Apple Silicon, and provides a dashboard for monitoring and managing inference.

The Dashboard

Access the LMX dashboard at lmx.optalocal.com. From here you can see loaded models, VRAM usage, throughput, and active sessions. The dashboard connects to your local LMX server running on port 1234.

Loading a Model

Navigate to the Models tab in the LMX dashboard. Models are stored locally in your HuggingFace cache. Select a model from the list and click Load — it will be pulled into unified memory and made available via the API.

LMX uses MLX format (safetensors) on Apple Silicon for best performance. GGUF is not supported on this backend.

Running Inference

Once a model is loaded, use the Chat tab to test it directly, or send requests to the OpenAI-compatible endpoint at http://localhost:1234/v1/chat/completions from any compatible client.

curl http://localhost:1234/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model":"current","messages":[{"role":"user","content":"Hello"}]}'

← Back to all guides