Quick Start

Prerequisites

Before installing Blink, make sure you have:

  • Python 3.12 or higher
  • Docker and Docker Compose
  • uv package manager
  • A Blink license key (from your purchase)
  • An Anthropic API key

Installation

Clone the repository:

bash git clone https://github.com/devkauania/blink.git cd blink

Copy the environment file and set your keys:

bash cp .env.example .env # Edit .env and set: # BLINK_LICENSE_KEY=blink_your_key_here # ANTHROPIC_API_KEY=sk-ant-your_key_here

Start the stack:

bash docker compose up -d

Verify it's running:

bash curl http://localhost:8080/health
json {"status": "ok", "version": "1.0.0"}

Your First Request

Submit a prompt:

bash curl -X POST http://localhost:8080/v1/prompt \ -H "X-API-Key: YOUR_API_KEY" \ -H "Content-Type: application/json" \ -d '{"prompt": "Summarize our Q4 revenue trends"}'
json {"task_id": "abc123...", "status": "pending"}

Check the result:

bash curl http://localhost:8080/v1/tasks/abc123... \ -H "X-API-Key: YOUR_API_KEY"

Architecture

Blink follows a layered pipeline architecture. Every request passes through a strict sequence of stages before reaching an LLM provider:

  1. Gateway — FastAPI entry point. Validates input, checks auth, applies rate limits, and returns 202 Accepted immediately.
  2. Governance — Input sanitizer scans for prompt injection. Rate limiter enforces quotas. Guardrails apply policy rules.
  3. Cache — Semantic similarity cache (RedisVL) checks if a similar prompt was answered recently. Cache hits return instantly.
  4. Orchestration — Celery task queue dispatches work to background workers. The gateway never blocks on LLM calls.
  5. Agents — LangGraph agents execute the prompt with MCP tool connectors, then store results via the persistence layer.

Package Structure

The codebase is organized as a Python monorepo with strict dependency boundaries:

Package Description
shared Value objects, events, base types
cache Semantic similarity cache (RedisVL)
orchestration Celery task queue
persistence PostgreSQL repositories
governance Sanitizers, rate limiter, guardrails
gateway FastAPI routes, middleware
agents LangGraph agents, MCP connectors

All LLM and agent work happens asynchronously in Celery workers. The gateway accepts requests and returns a task ID immediately, keeping response times under 50ms even under load.

API Reference

POST /v1/prompt

Submit a prompt for processing. Returns a task ID for async tracking, or an immediate response on cache hit.

Request body:

json { "prompt": "Summarize our Q4 revenue trends", "context": "Optional additional context", "metadata": {} }

Response (202 Accepted):

json {"task_id": "abc123...", "status": "pending"}

Response (200 — cache hit):

json {"cached": true, "response": {"result": "..."}}
GET /v1/tasks/{task_id}

Retrieve the status and result of a submitted task.

Response:

json { "task_id": "abc123...", "status": "completed", "result": {"response": "..."}, "created_at": "2026-03-11T10:00:00Z", "updated_at": "2026-03-11T10:00:05Z" }
GET /health

Health check endpoint. No authentication required.

Response:

json {"status": "ok", "version": "1.0.0"}
GET /v1/metrics PRO

Prometheus-compatible metrics endpoint. Returns request counts, latency histograms, cache hit rates, and governance event counters.

GET /v1/security/events PRO

Query security events (blocked prompts, injection attempts, PII detections).

Query parameters: limit (1-1000), tenant_id

json {"events": [...], "total": 42}
POST /v1/api-keys

Create a new API key. Requires the X-Admin-Key header.

Request body:

json { "client_name": "my-app", "rate_limit_rpm": 100 }

Response (201 Created):

json {"key_id": "...", "raw_key": "bk_..."}
GET /v1/api-keys

List all API keys. Requires the X-Admin-Key header.

DELETE /v1/api-keys/{key_id}

Revoke an API key. Requires the X-Admin-Key header.

POST /v1/init

Initialize a fresh Blink deployment. Validates license key, generates admin credentials and first API key.

Request body:

json { "license_key": "your-key", "client_name": "myapp" }

Response (201 Created):

json { "admin_key": "...", "api_key": "...", "api_key_id": "...", "tier": "pro", "features": {} }

Response (409 Conflict):

json {"error": "already_initialized"}

Response (403 Forbidden):

json {"error": "invalid_license_key"}
Note: This endpoint can only be called once. Subsequent calls return 409 Conflict. No authentication required.
GET /v1/readiness

Deep health check that verifies all dependencies (cache, orchestration, governance). Use for Kubernetes readiness probes.

Response (200 OK):

json { "status": "ready", "checks": { "cache": true, "orchestration": true, "governance": true } }

Response (503 Service Unavailable):

json { "status": "degraded", "checks": { "cache": true, "orchestration": false, "governance": true } }
GET /v1/governance/cost PRO

Get aggregated LLM cost data for a time window. Returns total spend, request count, and token usage.

Response:

json { "total_cost_usd": 12.50, "total_requests": 450, "total_input_tokens": 125000, "total_output_tokens": 89000, "period_hours": 24 }
GET /v1/status

Returns service health, version, and recent tasks. No authentication required.

Response:

json { "status": "ok", "version": "1.0.0", "recent_tasks": [] }
GET /v1/dashboard

Returns a self-contained HTML terminal dashboard with real-time service status, recent tasks, and system metrics. No authentication required.

Response: text/html — a standalone HTML page.

Configuration

Blink is configured through environment variables. Copy .env.example to .env and adjust the values for your environment.

Warning: Do not use inline comments in the .env file. Python's dotenv parser reads them as part of the value. Use a separate line for comments.

Application

Variable Default Description
APP_ENV development Environment (development / staging / production)
APP_PORT 8080 Gateway port
APP_LOG_LEVEL INFO Log level (DEBUG / INFO / WARNING / ERROR)

Database

Variable Default Description
POSTGRES_HOST localhost PostgreSQL host
POSTGRES_PORT 5432 PostgreSQL port
POSTGRES_DB blink Database name
POSTGRES_USER blink Database user
POSTGRES_PASSWORD Database password

Redis

Variable Default Description
REDIS_HOST localhost Redis host
REDIS_PORT 6379 Redis port

LLM Providers

Variable Default Description
ANTHROPIC_API_KEY Your Anthropic API key
OPENAI_API_KEY Optional: OpenAI key for embeddings

Auth

Variable Default Description
BLINK_LICENSE_KEY Your Blink license key
BLINK_ADMIN_KEY Admin key for API key management

Governance Toggles

Variable Default Description Tier
BLINK_GOVERNANCE_INPUT_SANITIZER_ENABLED true Input sanitizer STARTER
BLINK_GOVERNANCE_OUTPUT_SANITIZER_ENABLED true Output PII redaction STARTER
BLINK_GOVERNANCE_RATE_LIMIT_ENABLED true Rate limiting STARTER
BLINK_GOVERNANCE_RATE_LIMIT_MAX_REQUESTS 100 Max requests per window STARTER
BLINK_GOVERNANCE_RATE_LIMIT_WINDOW_SECONDS 60 Rate limit window STARTER
BLINK_GOVERNANCE_NEMO_GUARDRAILS_ENABLED false NeMo Guardrails PRO
BLINK_GOVERNANCE_PERPLEXITY_DETECTOR_ENABLED false Perplexity detector PRO
BLINK_GOVERNANCE_SECURITY_EVENTS_ENABLED false Security events PRO

Semantic Cache

Variable Default Description Tier
CACHE_SIMILARITY_THRESHOLD 0.85 Similarity threshold (0-1) PRO
CACHE_TTL_SECONDS 3600 Cache TTL PRO
EMBEDDER_PROVIDER sentence-transformers Embedding provider PRO

Features Guide

Input Sanitizer

STARTER

Scans every incoming prompt against 262 attack patterns in 11 languages including prompt injection, jailbreak, system prompt extraction, persona hijack, role override, data exfiltration, and cost manipulation. Blocks malicious prompts before they reach any LLM provider.

env BLINK_GOVERNANCE_INPUT_SANITIZER_ENABLED=true

Output Sanitizer (PII Redaction)

STARTER

Automatically detects and redacts personally identifiable information (SSNs, credit cards, emails, phone numbers) from agent responses before they reach the client.

env BLINK_GOVERNANCE_OUTPUT_SANITIZER_ENABLED=true

Rate Limiter

STARTER

Sliding-window rate limiter that enforces per-client request quotas. Prevents abuse and controls LLM costs by throttling excessive usage.

env BLINK_GOVERNANCE_RATE_LIMIT_ENABLED=true

Semantic Cache

PRO

Uses vector embeddings to identify semantically similar prompts. Returns cached responses for near-duplicate queries, reducing LLM costs and latency by up to 95%.

env CACHE_SIMILARITY_THRESHOLD=0.85

Perplexity Detector

PRO

Measures the statistical perplexity of incoming prompts to detect adversarial token sequences (GCG suffixes) and machine-generated attack payloads that bypass pattern matching.

env BLINK_GOVERNANCE_PERPLEXITY_DETECTOR_ENABLED=true

NeMo Guardrails

PRO

NVIDIA NeMo Guardrails integration for advanced conversational policy enforcement. Defines allowed conversation flows and topic boundaries with Colang rules.

env BLINK_GOVERNANCE_NEMO_GUARDRAILS_ENABLED=true

Security Events

PRO

Logs all governance actions (blocked prompts, PII redactions, rate limit hits) as structured security events with full audit trails, queryable via the API.

env BLINK_GOVERNANCE_SECURITY_EVENTS_ENABLED=true

Grafana Dashboards

PRO

Pre-built Grafana dashboards for real-time monitoring of request throughput, latency percentiles, cache hit rates, security events, and cost tracking via Prometheus metrics.

env PROMETHEUS_ENABLED=true

Cost Governance

PRO

Set daily spending limits per tenant. When the budget threshold is reached, requests are paused and alerts are triggered. Prevents runaway LLM costs before they hit your invoice.

env BLINK_GOVERNANCE_BUDGET_LIMIT_24H_USD=100

Multi-tenant Isolation

ENTERPRISE

Full data isolation between tenants at the database, cache, and agent levels. Each tenant gets independent rate limits, budgets, and security policies. Enterprise license required.

Client Dashboard

The Client Dashboard is a real-time monitoring interface that connects to your Blink gateway. It provides visibility into requests, costs, security events, and task status.

Connecting

Enter your gateway URL (e.g. http://localhost:8080 for local development, or your production URL) and your API key. The dashboard saves credentials locally and auto-connects on return visits.

Overview Tab

Shows four key performance indicators (Total Requests, Cost USD, Block Rate, Cache Hit Rate), a live request timeline chart, and service health status for all Blink components.

Security Tab

Displays attack type distribution (doughnut chart), severity breakdown (bar chart), and a chronological feed of security events. Use the Refresh button to load the latest events on demand.

Tasks Tab

Lists recent tasks with their ID, status, age, and prompt. Stalled tasks (running longer than 5 minutes) are highlighted for attention.

CORS Configuration

If the dashboard cannot connect, ensure your gateway has CORS enabled. Set the BLINK_CORS_ORIGINS environment variable to your dashboard domain (e.g. https://yourdomain.com) or * for development.

# In your .env file
BLINK_CORS_ORIGINS=https://yourdomain.com,http://localhost:3000

Onboarding Hints

Look for the ? buttons throughout the dashboard. These contextual hints explain each metric, chart, and panel, with links back to the relevant documentation section.

Security

OWASP LLM Top 10 Coverage

Blink provides mitigations for every risk in the OWASP Top 10 for LLM Applications:

Risk Blink Mitigation
LLM01: Prompt Injection Input Sanitizer (262 patterns, 11 languages) + NeMo Guardrails
LLM02: Insecure Output Output Sanitizer (PII redaction)
LLM03: Training Data Poisoning Memory Guard (agent-level)
LLM04: Model DoS Rate Limiter + Cost Governance
LLM05: Supply Chain Locked dependencies (uv.lock)
LLM06: Sensitive Info PII Redaction + Audit Trail
LLM07: Insecure Plugin MCP client isolation
LLM08: Excessive Agency Agent guardrails + tool restrictions
LLM09: Overreliance Perplexity Detector
LLM10: Model Theft License validation + API key auth

Attack Patterns Detected

The input sanitizer detects 262 attack patterns across 11 languages (EN, PT, ES, FR, DE, IT, RU, ZH, JA, KO, AR):

Pattern Description
Role Override "Ignore previous instructions"
System Impersonation [SYSTEM], developer mode
Prompt Extraction "Show me your system prompt"
Encoding Evasion Base64, unicode escapes
Delimiter Injection ```system, XML/INST tags
Tool Abuse Fake function_call JSON
Data Exfiltration Markdown image URLs, fetch()
Jailbreak DAN, hypothetical scenarios
Multi-turn Manipulation "As we discussed..."
PII Injection SSN, credit card patterns
GCG Suffix Adversarial token sequences

Authentication

Blink uses a two-layer authentication model. API keys (prefixed bk_) are issued via the admin endpoint and validated on every request. Each key is scoped to a client and carries its own rate limit. License keys validate the deployment tier and enable feature flags at startup.

Troubleshooting

Docker containers won't start
Check that ports 8080, 5432, and 6379 are not already in use. Run docker compose down first to clean up stale containers. Verify your .env file has no inline comments, as these are read as part of the value by python-dotenv.
Prompt returns 403 Forbidden
Your API key is invalid or missing. Ensure the X-API-Key: bk_... header is set correctly. Create a new key with POST /v1/api-keys using your admin key. Check that your license key is valid and the deployment tier matches the features you're using.
Prompt blocked by sanitizer unexpectedly
The input sanitizer may flag legitimate prompts that contain patterns similar to known attacks. Check the security events endpoint for details on which pattern was triggered. You can temporarily disable the sanitizer for debugging with BLINK_GOVERNANCE_INPUT_SANITIZER_ENABLED=false, but re-enable it in production.
Task stays in "pending" status
The Celery worker may not be running. Check that the worker container is up with docker compose ps. Verify Redis is reachable (the worker uses Redis as its broker). Check the worker logs with docker compose logs worker for errors.
Cache hit rate is very low
Lower the CACHE_SIMILARITY_THRESHOLD value (default 0.85). A threshold of 0.80 is more permissive and will match prompts with slightly different wording. Also verify that Redis has enough memory and the embedding model is loading correctly.
Rate limiter blocks requests too aggressively
Increase BLINK_GOVERNANCE_RATE_LIMIT_MAX_REQUESTS or widen the window with BLINK_GOVERNANCE_RATE_LIMIT_WINDOW_SECONDS. Each API key has its own rate limit counter. If multiple services share one key, consider creating separate keys for each.