Usage

localmodelproxy follows Unix-style command line conventions and is configured primarily through a YAML file.

localmodelproxy [OPTIONS]

By default, the config file is read from:

~/.localmodelproxy

Override it with --config or LOCALMODELPROXY_CONFIG.

Flags

Flag Argument Notes
--config path YAML config path. Overrides LOCALMODELPROXY_CONFIG and the default ~/.localmodelproxy.
--log path Appends request and response payload logs to the specified file.
--headless   Skips the interactive TUI and runs the proxy silently in the foreground. The process continues running until interrupted
--version   Prints version and exits.
--help   Prints help and exits.

Environment Variables

Variable Purpose
LOCALMODELPROXY_CONFIG Config file path when --config is not provided.
GOOGLE_APPLICATION_CREDENTIALS Service account JSON file used by Google Application Default Credentials.
GOOGLE_CLOUD_PROJECT Optional Google Cloud project fallback for Google ADC backends.
CLOUDSDK_CORE_PROJECT Optional Google Cloud project fallback for Google ADC backends.
GOOGLE_CLOUD_LOCATION Optional Google Cloud location fallback for Google ADC backends.
GOOGLE_CLOUD_REGION Optional Google Cloud location fallback for Google ADC backends.
CLOUDSDK_COMPUTE_REGION Optional Google Cloud location fallback for Google ADC backends.
NO_COLOR Disables color in terminal output when set to any value.

Environment variable expansion is supported in sensitive config fields by using ${NAME}.

Config Structure

server:
  host: 127.0.0.1
  port: 8080

ui:
  recent_requests: 10
  test:
    enabled: true
    system_message: "You are a helpful assistant."
    user_message: "Reply with a short test message to confirm this connection is working."

backends:
  - name: backend-name
    type: openai_compatible
    base_url: http://127.0.0.1:11434/v1
    insecure_skip_verify: false
    auth:
      type: none
    models:
      - id: local-model-name

Server

Field Required Default Notes
server.host no 127.0.0.1 Must be loopback.
server.port no 8080 Local listening port.

UI

Field Required Default Notes
ui.recent_requests no 10 Number of model request rows to show in the TUI recent list. Set to 0 to hide it. Maximum 100.
ui.test.enabled no true Enables the Test tab in the TUI for sending test requests to models. Set to false to disable.
ui.test.system_message no "You are a helpful assistant." System message sent with test requests.
ui.test.user_message no "Reply with a short test message to confirm this connection is working." User message sent with test requests.

The app uses the TUI when stdout is an interactive terminal and --headless is not set. When stdout is not a terminal, it prints plain startup and summary lines. When --headless is provided, all UI output is suppressed and the proxy runs silently until interrupted.

TUI Navigation

When the Test tab is enabled (the default), the TUI has two views accessible via the Tab key:

  • Stats – Displays per-model statistics and recent requests (the default view).
  • Test – Lists all configured models. Use ↑/↓ to select a model and Enter to send a test request. The response is displayed inline.

Test requests go through the proxy like any other request, so they count towards stats and token usage.

Backends

Each backend declares where requests go, how they authenticate, and which models they serve.

Field Required Notes
name yes Unique backend name shown in diagnostics.
type yes gcp_openai or openai_compatible.
base_url yes* OpenAI-compatible base URL. Required for openai_compatible; optional for gcp_openai when project/location are provided.
project yes* Google Cloud project for gcp_openai when base_url is omitted.
location yes* Google Cloud location for gcp_openai when base_url is omitted. Defaults to global when project is configured.
insecure_skip_verify no Disables TLS certificate verification for backend API calls. Defaults to false.
auth yes Auth configuration.
models yes all or a list of model entries.

When insecure_skip_verify: true, the app prints a startup warning. This is intended only for local debugging and self-signed development endpoints.

Models

Backends can expose specific models:

models:
  - id: local-model
    upstream_id: provider/model-name
    cost:
      input_per_million: 0.30
      output_per_million: 2.50
      cache_per_million: 0.075

Or pass through all models:

models: all

Routing rules:

  • Exact configured model IDs win.
  • models: all acts as a fallback backend.
  • If more than one backend uses models: all, config order decides.
  • If no backend matches, the proxy returns an OpenAI-style 400 model_not_found error.
  • cost is optional. When present, values are interpreted as USD per 1 million tokens and are shown per request, per model, and in total.

Authentication

None

Use this for local LLM servers that do not require auth.

auth:
  type: none

Bearer Token

Use this when the upstream accepts a fixed bearer token.

auth:
  type: bearer
  token: ${UPSTREAM_API_TOKEN}

The proxy strips inbound Authorization and replaces it with the configured token.

Google Application Default Credentials

Use this for Google Cloud OpenAI-compatible endpoints.

auth:
  type: google_adc

Authenticate locally with:

gcloud auth application-default login

Or use a service account:

export GOOGLE_APPLICATION_CREDENTIALS=/path/to/service-account.json

OAuth Client Credentials

Use this for internal or vendor endpoints that require an OAuth token exchange.

auth:
  type: oauth_client_credentials
  token_url: https://auth.example.internal/oauth/token
  client_id: local-client
  client_secret: ${LOCAL_CLIENT_SECRET}
  scopes:
    - models.invoke
  insecure_skip_verify: false
Field Required Notes
token_url yes OAuth token endpoint.
client_id yes OAuth client ID.
client_secret yes OAuth client secret. Environment expansion is recommended.
scopes no Optional OAuth scopes.
insecure_skip_verify no Disables TLS verification only for token exchange. Defaults to false.

When token exchange TLS verification is disabled, the app prints a startup warning.

Token Accounting

The TUI and logs track:

  • uncached input tokens
  • output tokens
  • thinking tokens
  • cached tokens
  • total tokens

The proxy captures standard OpenAI-compatible usage fields such as:

  • prompt_tokens
  • completion_tokens
  • total_tokens
  • prompt_tokens_details.cached_tokens
  • completion_tokens_details.reasoning_tokens

It also captures Google-style usage metadata when present.

When cached tokens are reported, they are subtracted from the input token count before display and cost calculation so cached input is not double-counted.

Request/Response Log

Use --log PATH to append request payloads, response payloads, streaming response chunks, and pre-forward failures to a file while keeping the TUI active.

Request log entries are timestamped and include the model, backend, and full upstream token value. Store the log file accordingly.