DuguetLabs
§ API · v1 Updated 2026-04-15

The Inference API

DuguetLabs serves frontier open-source and proprietary models from a single OpenAI-compatible endpoint. Drop-in for any OpenAI or OpenRouter client — change the base URL, keep your code.

Base URLhttps://api.duguetlabs.com/v1
AuthBearer dg_...
FormatOpenAI-compatible JSON / SSE

Quickstart

Three minutes from zero to your first token.

  1. Get a key. Sign up for a free API key — no card required. You'll receive $5 in prepaid credit.
  2. Pick a model. Browse the models catalogue. Start with duguet-ai/llama-3.1-8b ($0.05 / $0.08 per MTok) for day-to-day work.
  3. Fire a request. Your existing OpenAI SDK works. Just change the base URL.
curl https://api.duguetlabs.com/v1/chat/completions \
  -H "Authorization: Bearer $DUGUET_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "duguet-ai/llama-3.1-8b",
    "messages": [
      { "role": "user", "content": "Hello from DuguetLabs." }
    ]
  }'

Authentication

All requests carry an API key in the Authorization header, Bearer scheme. Keys start with dg_ and are shown only once at signup — save them securely.

Authorization: Bearer dg_177984c7a0e3ef28tHi0zHgQXdjYLHdLqm87M3kPmQ6OLSYf

Inspect a key's state (usage, remaining credit, call count) at GET /v1/auth/key.

$ curl https://api.duguetlabs.com/v1/auth/key \
    -H "Authorization: Bearer $DUGUET_API_KEY"

{
  "data": {
    "label": "[email protected]",
    "usage": 0.0014,
    "limit": 5.00,
    "is_free_tier": true,
    "rate_limit": { "requests": 500, "interval": "60s" },
    "tokens_used": { "prompt": 2431, "completion": 802 },
    "calls": 17
  }
}

Chat completions

POST /v1/chat/completions — OpenAI-compatible. All standard parameters are accepted: messages, temperature, top_p, max_tokens, stream, stop, seed, frequency_penalty, presence_penalty.

The response includes OpenRouter-compatible extensions:

  • provider — always "duguet-ai"
  • native_finish_reason — upstream raw reason, preserved alongside the normalised finish_reason
  • usage.cost — dollar cost of this call, computed from the per-model price

Request example — with tool use

{
  "model": "duguet-ai/llama-3.3-70b",
  "messages": [
    { "role": "system", "content": "You are a research assistant." },
    { "role": "user",   "content": "Summarise the paper at arxiv.org/abs/2401.XXX" }
  ],
  "temperature": 0.3,
  "max_tokens": 1024,
  "tools": [{
    "type": "function",
    "function": {
      "name": "fetch_url",
      "description": "Fetch the text of a URL",
      "parameters": {
        "type": "object",
        "properties": { "url": { "type": "string" } },
        "required": ["url"]
      }
    }
  }]
}

Response

{
  "id": "chatcmpl-9ecf…",
  "object": "chat.completion",
  "created": 1776293537,
  "model": "duguet-ai/llama-3.3-70b",
  "provider": "duguet-ai",
  "choices": [{
    "index": 0,
    "message": {
      "role": "assistant",
      "content": "…"
    },
    "finish_reason": "stop",
    "native_finish_reason": "stop"
  }],
  "usage": {
    "prompt_tokens": 142,
    "completion_tokens": 387,
    "total_tokens": 529,
    "cost": 0.000197,
    "is_byok": false
  }
}

Streaming

Pass "stream": true to receive a Server-Sent-Events stream of chat.completion.chunk objects. The stream ends with data: [DONE]. All chunks carry provider and the prefixed model name.

from openai import OpenAI

client = OpenAI(
    base_url="https://api.duguetlabs.com/v1",
    api_key=os.environ["DUGUET_API_KEY"],
)

stream = client.chat.completions.create(
    model="duguet-ai/mistral-large-3",
    messages=[{"role": "user", "content": "Write a haiku about sovereignty."}],
    stream=True,
)
for chunk in stream:
    delta = chunk.choices[0].delta.content or ""
    print(delta, end="", flush=True)

Embeddings

POST /v1/embeddings — OpenAI-compatible. Returns vector representations suitable for semantic search and RAG.

curl https://api.duguetlabs.com/v1/embeddings \
  -H "Authorization: Bearer $DUGUET_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "duguet-ai/nomic-embed",
    "input": "Sovereignty means the compute stays where the data lives."
  }'

Batching: input accepts an array of strings to embed in one request.

Models

Thirteen carefully chosen SKUs — five open-source frontier, three proprietary, four self-hosted on our sovereign A100, one embedding. The full live list is at GET /v1/models.

Rate limits

Free tier: 500 requests / minute per key, across all models. Paid accounts: bespoke, configured per contract. Signup rate limit: 3 signups / minute per source IP.

When a request would exceed the limit, the API returns HTTP 429 with a Retry-After header.

Errors

All errors follow the OpenAI shape:

{
  "error": {
    "message": "Invalid or missing API key",
    "type": "authentication_error"
  }
}
StatusTypeWhen
400invalid_request_errorMalformed body or unsupported parameter.
401authentication_errorMissing, invalid, or disabled API key.
402insufficient_quotaCredit exhausted. Top up via email for now.
404model_not_foundUnknown model id. See /v1/models.
429rate_limit_exceededSlow down.
5xxupstream_errorA backend provider is misbehaving. We'll tell you which.

OpenRouter compatibility

Every response body mirrors OpenRouter's extensions: provider, native_finish_reason, usage.cost, usage.is_byok. Model IDs use the duguet-ai/ prefix (drop the prefix, we still recognise the bare name).

If you're migrating from OpenRouter, set base_url to https://api.duguetlabs.com/v1 and adjust model names. That's the whole migration.

Support

One inbox. Answered by the person who wrote this.

[email protected]