Hub documentation
Local Agents with llama.cpp
Local Agents with llama.cpp
You can run a coding agent entirely on your own hardware. Several open-source agents can connect to a local llama.cpp server to give you an experience similar to Claude Code or Codex β but everything runs on your machine.
Getting Started
1. Set Your Local Hardware
Set your local hardware so it can show you which models are compatible with your setup.
Go to huggingface.co/settings/local-apps and configure your local hardware profile. Select llama.cpp in the Local Apps section as this will be the engine youβll use.
2. Find a Compatible Model
Browse for Llama.cpp-compatible models.
3. Launch the llama.cpp Server
On the model page, click the βUse this modelβ button and select llama.cpp. It will show you the exact commands for your setup. The first step is to start a llama.cpp server, e.g.
llama-server -hf ggml-org/gemma-4-26b-a4b-it-GGUF:Q4_K_M --jinja
This downloads the model and starts an OpenAI-compatible API server on your machine. See the llama.cpp guide for installation instructions.
4. Connect Your Agent
Pick one of the agents below and follow the setup instructions.
Pi
Pi is the agent behind OpenClaw and is now integrated directly into Hugging Face, giving you access to thousands of compatible models.
Install Pi:
npm install -g @mariozechner/pi-coding-agent
Then add your local model to Piβs configuration file at ~/.pi/agent/models.json:
{
"providers": {
"llama-cpp": {
"baseUrl": "http://localhost:8080/v1",
"api": "openai-completions",
"apiKey": "none",
"models": [
{
"id": "ggml-org-gemma-4-26b-4b-gguf"
}
]
}
}
}Start Pi in your project directory:
pi
Pi connects to your local llama.cpp server and gives you an interactive agent session.

OpenClaw
OpenClaw works locally with llama.cpp. You can set your model via the onboard command:
openclaw onboard --non-interactive \
--auth-choice custom-api-key \
--custom-base-url "http://127.0.0.1:8080/v1" \
--custom-model-id "ggml-org-gemma-4-26b-a4b-gguf" \
--custom-api-key "llama.cpp" \
--secret-input-mode plaintext \
--custom-compatibility openai \
--accept-riskYou can also run openclaw onboard interactively, select custom-compatibility with openai, and pass the same configuration.
Hermes
Hermes works locally with llama.cpp. Define a default config as:
model:
provider: custom
default: ggml-org/gemma-4-26B-A4B-it-GGUF:Q4_K_M
base_url: http://127.0.0.1:8080/v1
api_key: llama.cpp
custom_providers:
- name: Local (127.0.0.1:8080)
base_url: http://127.0.0.1:8080/v1
api_key: llama.cpp
model: ggml-org/gemma-4-26B-A4B-it-GGUF:Q4_K_MOpenCode
OpenCode works locally with llama.cpp. Define a ~/.config/opencode/opencode.json:
{
"$schema": "https://opencode.ai/config.json",
"provider": {
"llama.cpp": {
"npm": "@ai-sdk/openai-compatible",
"name": "llama-server (local)",
"options": {
"baseURL": "http://127.0.0.1:8080/v1"
},
"models": {
"gemma-4-26b-4b-it": {
"name": "Gemma 4 (local)",
"limit": {
"context": 128000,
"output": 8192
}
}
}
}
}
}How It Works
The setup has two components running locally:
- llama.cpp server β Serves the model as an OpenAI-compatible API on
localhost. - Your agent β The agent process that sends prompts to the local server, reasons about tasks, and executes actions.
βββββββββββ API calls ββββββββββββββββββββ
β Agent β ββββββββββββββββΆ β llama.cpp server β
β β ββββββββββββββββ β (local model) β
βββββββββββ responses ββββββββββββββββββββ
β
βΌ
Your files,
terminal, etc.Alternative: llama-agent
llama-agent takes a different approach β it builds the agent loop directly into llama.cpp as a single binary with zero external dependencies. No Node.js, no Python, just compile and run:
git clone https://github.com/gary149/llama-agent.git
cd llama-agent
# Build
cmake -B build
cmake --build build --target llama-agent
# Run (downloads the model automatically)
./build/bin/llama-agent -hf ggml-org/gemma-4-26b-a4b-it-GGUF:Q4_K_MBecause tool calls happen in-process rather than over HTTP, there is no network overhead between the model and the agent. It also supports subagents, MCP servers, and an HTTP API server mode.
Next Steps
- Use AI Models Locally β Learn more about running models on your machine
- llama.cpp Guide β Detailed llama.cpp installation and usage
- Agents on the Hub β Connect agents to the Hugging Face ecosystem