AI Stack¶
volnix runs a local, CUDA-accelerated AI stack plus a custom agent toolchain. The model services are
declared in
nixos/configuration.nix.
Ollama + Open WebUI¶
services.ollama = {
enable = true;
package = pkgs.ollama-cuda;
home = "/home/lowcache";
models = "/home/lowcache/Storage/ollama/models";
};
services.open-webui = {
enable = true;
port = 8080;
environment.OLLAMA_API_BASE_URL = "http://127.0.0.1:11434";
};
Ollama runs as the lowcache user (so model files live under ~/Storage without permission issues)
with a tuned environment:
| Variable | Effect |
|---|---|
OLLAMA_FLASH_ATTENTION=1 |
Flash attention |
OLLAMA_KEEP_ALIVE=5m |
Unload idle models → release CUDA → dGPU RTD3 (0 W) suspend |
OLLAMA_NUM_PARALLEL=1 |
Single parallel request |
CUDA_VISIBLE_DEVICES=0 |
Pin to the RTX 4050 |
OLLAMA_ORIGINS=* |
Allow web origins (Open WebUI) |
Open WebUI is reachable at http://127.0.0.1:8080; ffmpeg is injected into its PATH for media
handling.
VRAM fit (RTX 4050, 6 GB)
llama3.1:8b Q4 is the practical interactive ceiling. MoE models such as gpt-oss-20b (MXFP4)
also run — active experts in VRAM, inactive experts offloaded to RAM.
Fooocus (Stable Diffusion)¶
A non-autostarting Docker OCI container provides image generation with GPU passthrough:
virtualisation.oci-containers.containers."fooocus" = {
image = "ghcr.io/lllyasviel/fooocus:latest";
autoStart = false;
ports = [ "7865:7865" ];
volumes = [ "/home/lowcache/Storage/ai-generation/fooocus:/content/data" ];
extraOptions = [ "--device" "nvidia.com/gpu=0" ];
};
Control it with the Fish aliases stbldff-on / stbldff-off (start/stop docker-fooocus.service).
Outputs persist via the symlink ~/Pictures/fromAi/outputs → ~/Storage/ai-generation/fooocus/outputs.
Agent CLIs¶
The user package set in
home/pkgs.nix bundles AI tooling:
claude-code, gemini-cli, claude-code-router, github-copilot-cli, rtk, several MCP servers
(mcp-nixos, mcp-gateway, github-mcp-server, playwright-mcp, context7-mcp, …), and the
llm-agents.nix overlay. The ai / ai-shell Fish functions run any llm-agents.nix tool on the
fly. The custom curation/delegation layer is documented in Agent Toolchain.