Usage

localmodelproxy follows Unix-style command line conventions and is configured primarily through a YAML file.

localmodelproxy [OPTIONS]

By default, the config file is read from:

~/.localmodelproxy

Override it with --config or LOCALMODELPROXY_CONFIG.

Flags

Flag	Argument	Notes
`--config`	path	YAML config path. Overrides `LOCALMODELPROXY_CONFIG` and the default `~/.localmodelproxy`.
`--log`	path	Appends request and response payload logs to the specified file.
`--headless`		Skips the interactive TUI and runs the proxy silently in the foreground. The process continues running until interrupted
`--version`		Prints version and exits.
`--help`		Prints help and exits.

Environment Variables

Variable	Purpose
`LOCALMODELPROXY_CONFIG`	Config file path when `--config` is not provided.
`GOOGLE_APPLICATION_CREDENTIALS`	Service account JSON file used by Google Application Default Credentials.
`GOOGLE_CLOUD_PROJECT`	Optional Google Cloud project fallback for Google ADC backends.
`CLOUDSDK_CORE_PROJECT`	Optional Google Cloud project fallback for Google ADC backends.
`GOOGLE_CLOUD_LOCATION`	Optional Google Cloud location fallback for Google ADC backends.
`GOOGLE_CLOUD_REGION`	Optional Google Cloud location fallback for Google ADC backends.
`CLOUDSDK_COMPUTE_REGION`	Optional Google Cloud location fallback for Google ADC backends.
`NO_COLOR`	Disables color in terminal output when set to any value.

Environment variable expansion is supported in sensitive config fields by using ${NAME}.

Config Structure

server:
  host: 127.0.0.1
  port: 8080

ui:
  recent_requests: 10
  test:
    enabled: true
    system_message: "You are a helpful assistant."
    user_message: "Reply with a short test message to confirm this connection is working."

backends:
  - name: backend-name
    type: openai_compatible
    base_url: http://127.0.0.1:11434/v1
    insecure_skip_verify: false
    auth:
      type: none
    models:
      - id: local-model-name

Server

Field	Required	Default	Notes
`server.host`	no	`127.0.0.1`	Must be loopback.
`server.port`	no	`8080`	Local listening port.

UI

Field	Required	Default	Notes
`ui.recent_requests`	no	`10`	Number of model request rows to show in the TUI recent list. Set to `0` to hide it. Maximum `100`.
`ui.test.enabled`	no	`true`	Enables the Test tab in the TUI for sending test requests to models. Set to `false` to disable.
`ui.test.system_message`	no	`"You are a helpful assistant."`	System message sent with test requests.
`ui.test.user_message`	no	`"Reply with a short test message to confirm this connection is working."`	User message sent with test requests.

The app uses the TUI when stdout is an interactive terminal and --headless is not set. When stdout is not a terminal, it prints plain startup and summary lines. When --headless is provided, all UI output is suppressed and the proxy runs silently until interrupted.

When the Test tab is enabled (the default), the TUI has two views accessible via the Tab key:

Stats – Displays per-model statistics and recent requests (the default view).
Test – Lists all configured models. Use ↑/↓ to select a model and Enter to send a test request. The response is displayed inline.

Test requests go through the proxy like any other request, so they count towards stats and token usage.

Backends

Each backend declares where requests go, how they authenticate, and which models they serve.

Field	Required	Notes
`name`	yes	Unique backend name shown in diagnostics.
`type`	yes	`gcp_openai` or `openai_compatible`.
`base_url`	yes*	OpenAI-compatible base URL. Required for `openai_compatible`; optional for `gcp_openai` when project/location are provided.
`project`	yes*	Google Cloud project for `gcp_openai` when `base_url` is omitted.
`location`	yes*	Google Cloud location for `gcp_openai` when `base_url` is omitted. Defaults to `global` when project is configured.
`insecure_skip_verify`	no	Disables TLS certificate verification for backend API calls. Defaults to `false`.
`auth`	yes	Auth configuration.
`models`	yes	`all` or a list of model entries.

When insecure_skip_verify: true, the app prints a startup warning. This is intended only for local debugging and self-signed development endpoints.

Models

Backends can expose specific models:

models:
  - id: local-model
    upstream_id: provider/model-name
    cost:
      input_per_million: 0.30
      output_per_million: 2.50
      cache_per_million: 0.075

Or pass through all models:

models: all

Routing rules:

Exact configured model IDs win.
models: all acts as a fallback backend.
If more than one backend uses models: all, config order decides.
If no backend matches, the proxy returns an OpenAI-style 400 model_not_found error.
cost is optional. When present, values are interpreted as USD per 1 million tokens and are shown per request, per model, and in total.

Authentication

None

Use this for local LLM servers that do not require auth.

auth:
  type: none

Bearer Token

Use this when the upstream accepts a fixed bearer token.

auth:
  type: bearer
  token: ${UPSTREAM_API_TOKEN}

The proxy strips inbound Authorization and replaces it with the configured token.

Google Application Default Credentials

Use this for Google Cloud OpenAI-compatible endpoints.

auth:
  type: google_adc

Authenticate locally with:

gcloud auth application-default login

Or use a service account:

export GOOGLE_APPLICATION_CREDENTIALS=/path/to/service-account.json

OAuth Client Credentials

Use this for internal or vendor endpoints that require an OAuth token exchange.

auth:
  type: oauth_client_credentials
  token_url: https://auth.example.internal/oauth/token
  client_id: local-client
  client_secret: ${LOCAL_CLIENT_SECRET}
  scopes:
    - models.invoke
  insecure_skip_verify: false

Field	Required	Notes
`token_url`	yes	OAuth token endpoint.
`client_id`	yes	OAuth client ID.
`client_secret`	yes	OAuth client secret. Environment expansion is recommended.
`scopes`	no	Optional OAuth scopes.
`insecure_skip_verify`	no	Disables TLS verification only for token exchange. Defaults to `false`.

When token exchange TLS verification is disabled, the app prints a startup warning.

Token Accounting

The TUI and logs track:

uncached input tokens
output tokens
thinking tokens
cached tokens
total tokens

The proxy captures standard OpenAI-compatible usage fields such as:

prompt_tokens
completion_tokens
total_tokens
prompt_tokens_details.cached_tokens
completion_tokens_details.reasoning_tokens

It also captures Google-style usage metadata when present.

When cached tokens are reported, they are subtracted from the input token count before display and cost calculation so cached input is not double-counted.

Request/Response Log

Use --log PATH to append request payloads, response payloads, streaming response chunks, and pre-forward failures to a file while keeping the TUI active.

Request log entries are timestamped and include the model, backend, and full upstream token value. Store the log file accordingly.