Early access — no proxy, no gateway, no latency tradeoff

Control every LLM call in-process — no proxy, no latency, no surprises

Enforce cost, safety, and reliability at runtime — entirely inside your app. No proxy. No sidecar. No infrastructure.

View on GitHub
index.ts
<span class="text-purple-400">import</span><span class="text-white/80"> { Loret } </span><span class="text-purple-400">from</span><span class="text-green-400"> "@loret/sdk"</span><span class="text-white/80">;</span>

<span class="text-purple-400">const</span><span class="text-blue-300"> client</span><span class="text-white/80"> = </span><span class="text-purple-400">new</span><span class="text-yellow-300"> Loret</span><span class="text-white/80">({</span>
<span class="text-white/80">  </span><span class="text-blue-300">projectId</span><span class="text-white/80">: </span><span class="text-green-400">"my-app"</span><span class="text-white/80">,</span>
<span class="text-white/80">  </span><span class="text-blue-300">providers</span><span class="text-white/80">: [{ provider: </span><span class="text-green-400">"openai"</span><span class="text-white/80">, model: </span><span class="text-green-400">"gpt-4o"</span><span class="text-white/80">, priority: 1 },</span>
<span class="text-white/80">             { provider: </span><span class="text-green-400">"anthropic"</span><span class="text-white/80">, model: </span><span class="text-green-400">"claude-sonnet-4-6"</span><span class="text-white/80">, priority: 2 }],</span>
<span class="text-white/80">  </span><span class="text-blue-300">mode</span><span class="text-white/80">: </span><span class="text-green-400">"enforce"</span><span class="text-white/80">,</span>
<span class="text-white/80">  </span><span class="text-blue-300">budgetLimits</span><span class="text-white/80">: [{ scope: </span><span class="text-green-400">"per_call"</span><span class="text-white/80">, maxCostUsd: 0.05 }],</span>
<span class="text-white/80">});</span>

<span class="text-purple-400">const</span><span class="text-blue-300"> result</span><span class="text-white/80"> = </span><span class="text-purple-400">await</span><span class="text-blue-300"> client</span><span class="text-white/80">.</span><span class="text-yellow-300">run</span><span class="text-white/80">({</span>
<span class="text-white/80">  </span><span class="text-blue-300">messages</span><span class="text-white/80">: [{ role: </span><span class="text-green-400">"user"</span><span class="text-white/80">, content: </span><span class="text-green-400">"Hello"</span><span class="text-white/80"> }],</span>
<span class="text-white/80">});</span>

The problem

You're shipping AI features without real control

LLM APIs are expensive, unreliable, and opaque. Most teams only discover issues after a cost spike or a production incident — not before it happens.

$18k
average surprise LLM bill engineering teams discover too late
0
native tools from OpenAI or Anthropic for per-feature cost attribution
37%
of AI production incidents caused by provider outages with no fallback

Features

Runtime control for every LLM call

All enforcement happens before the request is sent — inside your application. Unlike proxy-based solutions, there is no extra hop, no infrastructure, no latency tradeoff.

Core

Budget enforcement

Block expensive requests before they happen. Enforce token and cost limits per call, per trace, or over time — violations throw typed errors before the request is ever sent.

Core

Provider routing and fallback

Retry failures and fall back across providers automatically — no orchestration layer required. Circuit breaking handles sustained outages without manual intervention.

Privacy

PII protection

Detect and optionally redact or block sensitive data before it leaves your system. Emails, phone numbers, SSNs, credit cards, secrets, and IPs — caught in-process.

Guardrails

Trace and workflow guards

Limit calls, cost, and execution time across multi-step agent runs. Stop waste before it accumulates — guards fire before provider dispatch, not after.

Telemetry

Full observability

Structured events for every request: start, completion, failure, retry, fallback, and guardrail triggers. Buffered and flushed asynchronously — no latency impact.

Performance

Zero added latency

Runs entirely in-process. No network hop, no proxy, no added infrastructure. Policy is read from a local snapshot — enforcement overhead is under 1ms per request.

How it works

From integration to production in three steps

01

Install and configure

npm install @loret/sdk. Define your providers, budgets, and guardrails locally. No external service required — enforcement starts immediately in your process.

02

Replace your provider calls

Wrap your OpenAI or Anthropic calls with client.run(). Every request is now enforced and observed — before it leaves your application.

03

Connect the control plane (coming soon)

When ready, add centralized policy management, cost attribution by feature, and team-level governance across every service instance.

Early access

Be first when we launch

The hosted control plane and team dashboard are in development. Join the waitlist and get early access plus a 3-month discount on any paid plan.

No spam. Unsubscribe any time.