Blink — Documentation

Quick Start

Prerequisites

Before installing Blink, make sure you have:

Python 3.12 or higher
Docker and Docker Compose
uv package manager
A Blink license key (from your purchase)
An Anthropic API key

Installation

Clone the repository:

bash

git clone https://github.com/devkauania/blink.git
cd blink

Copy the environment file and set your keys:

bash

cp .env.example .env

# Edit .env and set:
# BLINK_LICENSE_KEY=blink_your_key_here
# ANTHROPIC_API_KEY=sk-ant-your_key_here

Start the stack:

bash docker compose up -d

Verify it's running:

bash curl http://localhost:8080/health

json {"status": "ok", "version": "1.0.0"}

Your First Request

Submit a prompt:

bash

curl -X POST http://localhost:8080/v1/prompt \
  -H "X-API-Key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"prompt": "Summarize our Q4 revenue trends"}'

json {"task_id": "abc123...", "status": "pending"}

Check the result:

bash

curl http://localhost:8080/v1/tasks/abc123... \
  -H "X-API-Key: YOUR_API_KEY"

Architecture

Blink follows a layered pipeline architecture. Every request passes through a strict sequence of stages before reaching an LLM provider:

Gateway — FastAPI entry point. Validates input, checks auth, applies rate limits, and returns 202 Accepted immediately.
Governance — Input sanitizer scans for prompt injection. Rate limiter enforces quotas. Guardrails apply policy rules.
Cache — Semantic similarity cache (RedisVL) checks if a similar prompt was answered recently. Cache hits return instantly.
Orchestration — Celery task queue dispatches work to background workers. The gateway never blocks on LLM calls.
Agents — LangGraph agents execute the prompt with MCP tool connectors, then store results via the persistence layer.

Package Structure

The codebase is organized as a Python monorepo with strict dependency boundaries:

Package	Description
`shared`	Value objects, events, base types
`cache`	Semantic similarity cache (RedisVL)
`orchestration`	Celery task queue
`persistence`	PostgreSQL repositories
`governance`	Sanitizers, rate limiter, guardrails
`gateway`	FastAPI routes, middleware
`agents`	LangGraph agents, MCP connectors

All LLM and agent work happens asynchronously in Celery workers. The gateway accepts requests and returns a task ID immediately, keeping response times under 50ms even under load.

API Reference

Submit a prompt for processing. Returns a task ID for async tracking, or an immediate response on cache hit.

Request body:

json

{
  "prompt": "Summarize our Q4 revenue trends",
  "context": "Optional additional context",
  "metadata": {}
}

Response (202 Accepted):

json {"task_id": "abc123...", "status": "pending"}

Response (200 — cache hit):

json {"cached": true, "response": {"result": "..."}}

Retrieve the status and result of a submitted task.

Response:

json

{
  "task_id": "abc123...",
  "status": "completed",
  "result": {"response": "..."},
  "created_at": "2026-03-11T10:00:00Z",
  "updated_at": "2026-03-11T10:00:05Z"
}

Health check endpoint. No authentication required.

Response:

json {"status": "ok", "version": "1.0.0"}

Prometheus-compatible metrics endpoint. Returns request counts, latency histograms, cache hit rates, and governance event counters.

Query security events (blocked prompts, injection attempts, PII detections).

Query parameters: limit (1-1000), tenant_id

json {"events": [...], "total": 42}

Create a new API key. Requires the X-Admin-Key header.

Request body:

json

{
  "client_name": "my-app",
  "rate_limit_rpm": 100
}

Response (201 Created):

json {"key_id": "...", "raw_key": "bk_..."}

List all API keys. Requires the X-Admin-Key header.

Revoke an API key. Requires the X-Admin-Key header.

Initialize a fresh Blink deployment. Validates license key, generates admin credentials and first API key.

Request body:

json

{
  "license_key": "your-key",
  "client_name": "myapp"
}

Response (201 Created):

json

{
  "admin_key": "...",
  "api_key": "...",
  "api_key_id": "...",
  "tier": "pro",
  "features": {}
}

Response (409 Conflict):

json {"error": "already_initialized"}

Response (403 Forbidden):

json {"error": "invalid_license_key"}

Note: This endpoint can only be called once. Subsequent calls return 409 Conflict. No authentication required.

Deep health check that verifies all dependencies (cache, orchestration, governance). Use for Kubernetes readiness probes.

Response (200 OK):

json

{
  "status": "ready",
  "checks": {
    "cache": true,
    "orchestration": true,
    "governance": true
  }
}

Response (503 Service Unavailable):

json

{
  "status": "degraded",
  "checks": {
    "cache": true,
    "orchestration": false,
    "governance": true
  }
}

Get aggregated LLM cost data for a time window. Returns total spend, request count, and token usage.

Response:

json

{
  "total_cost_usd": 12.50,
  "total_requests": 450,
  "total_input_tokens": 125000,
  "total_output_tokens": 89000,
  "period_hours": 24
}

Returns service health, version, and recent tasks. No authentication required.

Response:

json

{
  "status": "ok",
  "version": "1.0.0",
  "recent_tasks": []
}

Returns a self-contained HTML terminal dashboard with real-time service status, recent tasks, and system metrics. No authentication required.

Response: text/html — a standalone HTML page.

Configuration

Blink is configured through environment variables. Copy .env.example to .env and adjust the values for your environment.

Warning: Do not use inline comments in the .env file. Python's dotenv parser reads them as part of the value. Use a separate line for comments.

Application

Variable	Default	Description
`APP_ENV`	`development`	Environment (development / staging / production)
`APP_PORT`	`8080`	Gateway port
`APP_LOG_LEVEL`	`INFO`	Log level (DEBUG / INFO / WARNING / ERROR)

Database

Variable	Default	Description
`POSTGRES_HOST`	`localhost`	PostgreSQL host
`POSTGRES_PORT`	`5432`	PostgreSQL port
`POSTGRES_DB`	`blink`	Database name
`POSTGRES_USER`	`blink`	Database user
`POSTGRES_PASSWORD`	—	Database password

Redis

Variable	Default	Description
`REDIS_HOST`	`localhost`	Redis host
`REDIS_PORT`	`6379`	Redis port

LLM Providers

Variable	Default	Description
`ANTHROPIC_API_KEY`	—	Your Anthropic API key
`OPENAI_API_KEY`	—	Optional: OpenAI key for embeddings

Auth

Variable	Default	Description
`BLINK_LICENSE_KEY`	—	Your Blink license key
`BLINK_ADMIN_KEY`	—	Admin key for API key management

Governance Toggles

Variable	Default	Description	Tier
`BLINK_GOVERNANCE_INPUT_SANITIZER_ENABLED`	`true`	Input sanitizer	STARTER
`BLINK_GOVERNANCE_OUTPUT_SANITIZER_ENABLED`	`true`	Output PII redaction	STARTER
`BLINK_GOVERNANCE_RATE_LIMIT_ENABLED`	`true`	Rate limiting	STARTER
`BLINK_GOVERNANCE_RATE_LIMIT_MAX_REQUESTS`	`100`	Max requests per window	STARTER
`BLINK_GOVERNANCE_RATE_LIMIT_WINDOW_SECONDS`	`60`	Rate limit window	STARTER
`BLINK_GOVERNANCE_NEMO_GUARDRAILS_ENABLED`	`false`	NeMo Guardrails	PRO
`BLINK_GOVERNANCE_PERPLEXITY_DETECTOR_ENABLED`	`false`	Perplexity detector	PRO
`BLINK_GOVERNANCE_SECURITY_EVENTS_ENABLED`	`false`	Security events	PRO

Semantic Cache

Variable	Default	Description	Tier
`CACHE_SIMILARITY_THRESHOLD`	`0.85`	Similarity threshold (0-1)	PRO
`CACHE_TTL_SECONDS`	`3600`	Cache TTL	PRO
`EMBEDDER_PROVIDER`	`sentence-transformers`	Embedding provider	PRO

Features Guide

Scans every incoming prompt against 262 attack patterns in 11 languages including prompt injection, jailbreak, system prompt extraction, persona hijack, role override, data exfiltration, and cost manipulation. Blocks malicious prompts before they reach any LLM provider.

env BLINK_GOVERNANCE_INPUT_SANITIZER_ENABLED=true

Automatically detects and redacts personally identifiable information (SSNs, credit cards, emails, phone numbers) from agent responses before they reach the client.

env BLINK_GOVERNANCE_OUTPUT_SANITIZER_ENABLED=true

Sliding-window rate limiter that enforces per-client request quotas. Prevents abuse and controls LLM costs by throttling excessive usage.

env BLINK_GOVERNANCE_RATE_LIMIT_ENABLED=true

Uses vector embeddings to identify semantically similar prompts. Returns cached responses for near-duplicate queries, reducing LLM costs and latency by up to 95%.

env CACHE_SIMILARITY_THRESHOLD=0.85

Measures the statistical perplexity of incoming prompts to detect adversarial token sequences (GCG suffixes) and machine-generated attack payloads that bypass pattern matching.

env BLINK_GOVERNANCE_PERPLEXITY_DETECTOR_ENABLED=true

NVIDIA NeMo Guardrails integration for advanced conversational policy enforcement. Defines allowed conversation flows and topic boundaries with Colang rules.

env BLINK_GOVERNANCE_NEMO_GUARDRAILS_ENABLED=true

Logs all governance actions (blocked prompts, PII redactions, rate limit hits) as structured security events with full audit trails, queryable via the API.

env BLINK_GOVERNANCE_SECURITY_EVENTS_ENABLED=true

Pre-built Grafana dashboards for real-time monitoring of request throughput, latency percentiles, cache hit rates, security events, and cost tracking via Prometheus metrics.

env PROMETHEUS_ENABLED=true

Set daily spending limits per tenant. When the budget threshold is reached, requests are paused and alerts are triggered. Prevents runaway LLM costs before they hit your invoice.

env BLINK_GOVERNANCE_BUDGET_LIMIT_24H_USD=100

Full data isolation between tenants at the database, cache, and agent levels. Each tenant gets independent rate limits, budgets, and security policies. Enterprise license required.

Client Dashboard

The Client Dashboard is a real-time monitoring interface that connects to your Blink gateway. It provides visibility into requests, costs, security events, and task status.

Connecting

Enter your gateway URL (e.g. http://localhost:8080 for local development, or your production URL) and your API key. The dashboard saves credentials locally and auto-connects on return visits.

Overview Tab

Shows four key performance indicators (Total Requests, Cost USD, Block Rate, Cache Hit Rate), a live request timeline chart, and service health status for all Blink components.

Security Tab

Displays attack type distribution (doughnut chart), severity breakdown (bar chart), and a chronological feed of security events. Use the Refresh button to load the latest events on demand.

Tasks Tab

Lists recent tasks with their ID, status, age, and prompt. Stalled tasks (running longer than 5 minutes) are highlighted for attention.

CORS Configuration

If the dashboard cannot connect, ensure your gateway has CORS enabled. Set the BLINK_CORS_ORIGINS environment variable to your dashboard domain (e.g. https://yourdomain.com) or * for development.

# In your .env file
BLINK_CORS_ORIGINS=https://yourdomain.com,http://localhost:3000

Onboarding Hints

Look for the ? buttons throughout the dashboard. These contextual hints explain each metric, chart, and panel, with links back to the relevant documentation section.

Security

OWASP LLM Top 10 Coverage

Blink provides mitigations for every risk in the OWASP Top 10 for LLM Applications:

Risk	Blink Mitigation
LLM01: Prompt Injection	Input Sanitizer (262 patterns, 11 languages) + NeMo Guardrails
LLM02: Insecure Output	Output Sanitizer (PII redaction)
LLM03: Training Data Poisoning	Memory Guard (agent-level)
LLM04: Model DoS	Rate Limiter + Cost Governance
LLM05: Supply Chain	Locked dependencies (uv.lock)
LLM06: Sensitive Info	PII Redaction + Audit Trail
LLM07: Insecure Plugin	MCP client isolation
LLM08: Excessive Agency	Agent guardrails + tool restrictions
LLM09: Overreliance	Perplexity Detector
LLM10: Model Theft	License validation + API key auth

Attack Patterns Detected

The input sanitizer detects 262 attack patterns across 11 languages (EN, PT, ES, FR, DE, IT, RU, ZH, JA, KO, AR):

Pattern	Description
Role Override	"Ignore previous instructions"
System Impersonation	[SYSTEM], developer mode
Prompt Extraction	"Show me your system prompt"
Encoding Evasion	Base64, unicode escapes
Delimiter Injection	```system, XML/INST tags
Tool Abuse	Fake function_call JSON
Data Exfiltration	Markdown image URLs, fetch()
Jailbreak	DAN, hypothetical scenarios
Multi-turn Manipulation	"As we discussed..."
PII Injection	SSN, credit card patterns
GCG Suffix	Adversarial token sequences

Authentication

Blink uses a two-layer authentication model. API keys (prefixed bk_) are issued via the admin endpoint and validated on every request. Each key is scoped to a client and carries its own rate limit. License keys validate the deployment tier and enable feature flags at startup.

Troubleshooting

Docker containers won't start

Check that ports 8080, 5432, and 6379 are not already in use. Run docker compose down first to clean up stale containers. Verify your .env file has no inline comments, as these are read as part of the value by python-dotenv.

Prompt returns 403 Forbidden

Your API key is invalid or missing. Ensure the X-API-Key: bk_... header is set correctly. Create a new key with POST /v1/api-keys using your admin key. Check that your license key is valid and the deployment tier matches the features you're using.

Prompt blocked by sanitizer unexpectedly

The input sanitizer may flag legitimate prompts that contain patterns similar to known attacks. Check the security events endpoint for details on which pattern was triggered. You can temporarily disable the sanitizer for debugging with BLINK_GOVERNANCE_INPUT_SANITIZER_ENABLED=false, but re-enable it in production.

Task stays in "pending" status

The Celery worker may not be running. Check that the worker container is up with docker compose ps. Verify Redis is reachable (the worker uses Redis as its broker). Check the worker logs with docker compose logs worker for errors.

Cache hit rate is very low

Lower the CACHE_SIMILARITY_THRESHOLD value (default 0.85). A threshold of 0.80 is more permissive and will match prompts with slightly different wording. Also verify that Redis has enough memory and the embedding model is loading correctly.

Rate limiter blocks requests too aggressively

Increase BLINK_GOVERNANCE_RATE_LIMIT_MAX_REQUESTS or widen the window with BLINK_GOVERNANCE_RATE_LIMIT_WINDOW_SECONDS. Each API key has its own rate limit counter. If multiple services share one key, consider creating separate keys for each.

Quick Start

Prerequisites

Installation

Your First Request

Architecture

Package Structure

API Reference

Configuration

Application

Database

Redis

LLM Providers

Auth

Governance Toggles

Semantic Cache

Features Guide

Input Sanitizer

Output Sanitizer (PII Redaction)

Rate Limiter

Semantic Cache

Perplexity Detector

NeMo Guardrails

Security Events

Grafana Dashboards

Cost Governance

Multi-tenant Isolation

Client Dashboard

Connecting

Overview Tab

Security Tab

Tasks Tab

CORS Configuration

Onboarding Hints

Security

OWASP LLM Top 10 Coverage

Attack Patterns Detected

Authentication

Troubleshooting