Quick Start
Prerequisites
Before installing Blink, make sure you have:
- Python 3.12 or higher
- Docker and Docker Compose
- uv package manager
- A Blink license key (from your purchase)
- An Anthropic API key
Installation
Clone the repository:
git clone https://github.com/devkauania/blink.git
cd blink
Copy the environment file and set your keys:
cp .env.example .env
# Edit .env and set:
# BLINK_LICENSE_KEY=blink_your_key_here
# ANTHROPIC_API_KEY=sk-ant-your_key_here
Start the stack:
docker compose up -d
Verify it's running:
curl http://localhost:8080/health
{"status": "ok", "version": "1.0.0"}
Your First Request
Submit a prompt:
curl -X POST http://localhost:8080/v1/prompt \
-H "X-API-Key: YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{"prompt": "Summarize our Q4 revenue trends"}'
{"task_id": "abc123...", "status": "pending"}
Check the result:
curl http://localhost:8080/v1/tasks/abc123... \
-H "X-API-Key: YOUR_API_KEY"
Architecture
Blink follows a layered pipeline architecture. Every request passes through a strict sequence of stages before reaching an LLM provider:
- Gateway — FastAPI entry point. Validates input, checks auth, applies rate limits, and returns 202 Accepted immediately.
- Governance — Input sanitizer scans for prompt injection. Rate limiter enforces quotas. Guardrails apply policy rules.
- Cache — Semantic similarity cache (RedisVL) checks if a similar prompt was answered recently. Cache hits return instantly.
- Orchestration — Celery task queue dispatches work to background workers. The gateway never blocks on LLM calls.
- Agents — LangGraph agents execute the prompt with MCP tool connectors, then store results via the persistence layer.
Package Structure
The codebase is organized as a Python monorepo with strict dependency boundaries:
| Package | Description |
|---|---|
shared |
Value objects, events, base types |
cache |
Semantic similarity cache (RedisVL) |
orchestration |
Celery task queue |
persistence |
PostgreSQL repositories |
governance |
Sanitizers, rate limiter, guardrails |
gateway |
FastAPI routes, middleware |
agents |
LangGraph agents, MCP connectors |
All LLM and agent work happens asynchronously in Celery workers. The gateway accepts requests and returns a task ID immediately, keeping response times under 50ms even under load.
API Reference
Submit a prompt for processing. Returns a task ID for async tracking, or an immediate response on cache hit.
Request body:
{
"prompt": "Summarize our Q4 revenue trends",
"context": "Optional additional context",
"metadata": {}
}
Response (202 Accepted):
{"task_id": "abc123...", "status": "pending"}
Response (200 — cache hit):
{"cached": true, "response": {"result": "..."}}
Retrieve the status and result of a submitted task.
Response:
{
"task_id": "abc123...",
"status": "completed",
"result": {"response": "..."},
"created_at": "2026-03-11T10:00:00Z",
"updated_at": "2026-03-11T10:00:05Z"
}
Health check endpoint. No authentication required.
Response:
{"status": "ok", "version": "1.0.0"}
Prometheus-compatible metrics endpoint. Returns request counts, latency histograms, cache hit rates, and governance event counters.
Query security events (blocked prompts, injection attempts, PII detections).
Query parameters: limit (1-1000), tenant_id
{"events": [...], "total": 42}
Create a new API key. Requires the X-Admin-Key header.
Request body:
{
"client_name": "my-app",
"rate_limit_rpm": 100
}
Response (201 Created):
{"key_id": "...", "raw_key": "bk_..."}
List all API keys. Requires the X-Admin-Key header.
Revoke an API key. Requires the X-Admin-Key header.
Initialize a fresh Blink deployment. Validates license key, generates admin credentials and first API key.
Request body:
{
"license_key": "your-key",
"client_name": "myapp"
}
Response (201 Created):
{
"admin_key": "...",
"api_key": "...",
"api_key_id": "...",
"tier": "pro",
"features": {}
}
Response (409 Conflict):
{"error": "already_initialized"}
Response (403 Forbidden):
{"error": "invalid_license_key"}
Deep health check that verifies all dependencies (cache, orchestration, governance). Use for Kubernetes readiness probes.
Response (200 OK):
{
"status": "ready",
"checks": {
"cache": true,
"orchestration": true,
"governance": true
}
}
Response (503 Service Unavailable):
{
"status": "degraded",
"checks": {
"cache": true,
"orchestration": false,
"governance": true
}
}
Get aggregated LLM cost data for a time window. Returns total spend, request count, and token usage.
Response:
{
"total_cost_usd": 12.50,
"total_requests": 450,
"total_input_tokens": 125000,
"total_output_tokens": 89000,
"period_hours": 24
}
Returns service health, version, and recent tasks. No authentication required.
Response:
{
"status": "ok",
"version": "1.0.0",
"recent_tasks": []
}
Returns a self-contained HTML terminal dashboard with real-time service status, recent tasks, and system metrics. No authentication required.
Response: text/html — a standalone HTML page.
Configuration
Blink is configured through environment variables. Copy .env.example to .env and adjust the values for your environment.
.env file. Python's dotenv parser reads them as part of the value. Use a separate line for comments.Application
| Variable | Default | Description |
|---|---|---|
APP_ENV |
development |
Environment (development / staging / production) |
APP_PORT |
8080 |
Gateway port |
APP_LOG_LEVEL |
INFO |
Log level (DEBUG / INFO / WARNING / ERROR) |
Database
| Variable | Default | Description |
|---|---|---|
POSTGRES_HOST |
localhost |
PostgreSQL host |
POSTGRES_PORT |
5432 |
PostgreSQL port |
POSTGRES_DB |
blink |
Database name |
POSTGRES_USER |
blink |
Database user |
POSTGRES_PASSWORD |
— | Database password |
Redis
| Variable | Default | Description |
|---|---|---|
REDIS_HOST |
localhost |
Redis host |
REDIS_PORT |
6379 |
Redis port |
LLM Providers
| Variable | Default | Description |
|---|---|---|
ANTHROPIC_API_KEY |
— | Your Anthropic API key |
OPENAI_API_KEY |
— | Optional: OpenAI key for embeddings |
Auth
| Variable | Default | Description |
|---|---|---|
BLINK_LICENSE_KEY |
— | Your Blink license key |
BLINK_ADMIN_KEY |
— | Admin key for API key management |
Governance Toggles
| Variable | Default | Description | Tier |
|---|---|---|---|
BLINK_GOVERNANCE_INPUT_SANITIZER_ENABLED |
true |
Input sanitizer | STARTER |
BLINK_GOVERNANCE_OUTPUT_SANITIZER_ENABLED |
true |
Output PII redaction | STARTER |
BLINK_GOVERNANCE_RATE_LIMIT_ENABLED |
true |
Rate limiting | STARTER |
BLINK_GOVERNANCE_RATE_LIMIT_MAX_REQUESTS |
100 |
Max requests per window | STARTER |
BLINK_GOVERNANCE_RATE_LIMIT_WINDOW_SECONDS |
60 |
Rate limit window | STARTER |
BLINK_GOVERNANCE_NEMO_GUARDRAILS_ENABLED |
false |
NeMo Guardrails | PRO |
BLINK_GOVERNANCE_PERPLEXITY_DETECTOR_ENABLED |
false |
Perplexity detector | PRO |
BLINK_GOVERNANCE_SECURITY_EVENTS_ENABLED |
false |
Security events | PRO |
Semantic Cache
| Variable | Default | Description | Tier |
|---|---|---|---|
CACHE_SIMILARITY_THRESHOLD |
0.85 |
Similarity threshold (0-1) | PRO |
CACHE_TTL_SECONDS |
3600 |
Cache TTL | PRO |
EMBEDDER_PROVIDER |
sentence-transformers |
Embedding provider | PRO |
Features Guide
Input Sanitizer
STARTERScans every incoming prompt against 262 attack patterns in 11 languages including prompt injection, jailbreak, system prompt extraction, persona hijack, role override, data exfiltration, and cost manipulation. Blocks malicious prompts before they reach any LLM provider.
BLINK_GOVERNANCE_INPUT_SANITIZER_ENABLED=true
Output Sanitizer (PII Redaction)
STARTERAutomatically detects and redacts personally identifiable information (SSNs, credit cards, emails, phone numbers) from agent responses before they reach the client.
BLINK_GOVERNANCE_OUTPUT_SANITIZER_ENABLED=true
Rate Limiter
STARTERSliding-window rate limiter that enforces per-client request quotas. Prevents abuse and controls LLM costs by throttling excessive usage.
BLINK_GOVERNANCE_RATE_LIMIT_ENABLED=true
Semantic Cache
PROUses vector embeddings to identify semantically similar prompts. Returns cached responses for near-duplicate queries, reducing LLM costs and latency by up to 95%.
CACHE_SIMILARITY_THRESHOLD=0.85
Perplexity Detector
PROMeasures the statistical perplexity of incoming prompts to detect adversarial token sequences (GCG suffixes) and machine-generated attack payloads that bypass pattern matching.
BLINK_GOVERNANCE_PERPLEXITY_DETECTOR_ENABLED=true
NeMo Guardrails
PRONVIDIA NeMo Guardrails integration for advanced conversational policy enforcement. Defines allowed conversation flows and topic boundaries with Colang rules.
BLINK_GOVERNANCE_NEMO_GUARDRAILS_ENABLED=true
Security Events
PROLogs all governance actions (blocked prompts, PII redactions, rate limit hits) as structured security events with full audit trails, queryable via the API.
BLINK_GOVERNANCE_SECURITY_EVENTS_ENABLED=true
Grafana Dashboards
PROPre-built Grafana dashboards for real-time monitoring of request throughput, latency percentiles, cache hit rates, security events, and cost tracking via Prometheus metrics.
PROMETHEUS_ENABLED=true
Cost Governance
PROSet daily spending limits per tenant. When the budget threshold is reached, requests are paused and alerts are triggered. Prevents runaway LLM costs before they hit your invoice.
BLINK_GOVERNANCE_BUDGET_LIMIT_24H_USD=100
Multi-tenant Isolation
ENTERPRISEFull data isolation between tenants at the database, cache, and agent levels. Each tenant gets independent rate limits, budgets, and security policies. Enterprise license required.
Client Dashboard
The Client Dashboard is a real-time monitoring interface that connects to your Blink gateway. It provides visibility into requests, costs, security events, and task status.
Connecting
Enter your gateway URL (e.g. http://localhost:8080 for local development, or your production URL) and your API key. The dashboard saves credentials locally and auto-connects on return visits.
Overview Tab
Shows four key performance indicators (Total Requests, Cost USD, Block Rate, Cache Hit Rate), a live request timeline chart, and service health status for all Blink components.
Security Tab
Displays attack type distribution (doughnut chart), severity breakdown (bar chart), and a chronological feed of security events. Use the Refresh button to load the latest events on demand.
Tasks Tab
Lists recent tasks with their ID, status, age, and prompt. Stalled tasks (running longer than 5 minutes) are highlighted for attention.
CORS Configuration
If the dashboard cannot connect, ensure your gateway has CORS enabled. Set the BLINK_CORS_ORIGINS environment variable to your dashboard domain (e.g. https://yourdomain.com) or * for development.
# In your .env file
BLINK_CORS_ORIGINS=https://yourdomain.com,http://localhost:3000
Onboarding Hints
Look for the ? buttons throughout the dashboard. These contextual hints explain each metric, chart, and panel, with links back to the relevant documentation section.
Security
OWASP LLM Top 10 Coverage
Blink provides mitigations for every risk in the OWASP Top 10 for LLM Applications:
| Risk | Blink Mitigation |
|---|---|
| LLM01: Prompt Injection | Input Sanitizer (262 patterns, 11 languages) + NeMo Guardrails |
| LLM02: Insecure Output | Output Sanitizer (PII redaction) |
| LLM03: Training Data Poisoning | Memory Guard (agent-level) |
| LLM04: Model DoS | Rate Limiter + Cost Governance |
| LLM05: Supply Chain | Locked dependencies (uv.lock) |
| LLM06: Sensitive Info | PII Redaction + Audit Trail |
| LLM07: Insecure Plugin | MCP client isolation |
| LLM08: Excessive Agency | Agent guardrails + tool restrictions |
| LLM09: Overreliance | Perplexity Detector |
| LLM10: Model Theft | License validation + API key auth |
Attack Patterns Detected
The input sanitizer detects 262 attack patterns across 11 languages (EN, PT, ES, FR, DE, IT, RU, ZH, JA, KO, AR):
| Pattern | Description |
|---|---|
| Role Override | "Ignore previous instructions" |
| System Impersonation | [SYSTEM], developer mode |
| Prompt Extraction | "Show me your system prompt" |
| Encoding Evasion | Base64, unicode escapes |
| Delimiter Injection | ```system, XML/INST tags |
| Tool Abuse | Fake function_call JSON |
| Data Exfiltration | Markdown image URLs, fetch() |
| Jailbreak | DAN, hypothetical scenarios |
| Multi-turn Manipulation | "As we discussed..." |
| PII Injection | SSN, credit card patterns |
| GCG Suffix | Adversarial token sequences |
Authentication
Blink uses a two-layer authentication model. API keys (prefixed bk_) are issued via the admin endpoint and validated on every request. Each key is scoped to a client and carries its own rate limit. License keys validate the deployment tier and enable feature flags at startup.
Troubleshooting
Docker containers won't start
docker compose down first to clean up stale containers. Verify your .env file has no inline comments, as these are read as part of the value by python-dotenv.Prompt returns 403 Forbidden
X-API-Key: bk_... header is set correctly. Create a new key with POST /v1/api-keys using your admin key. Check that your license key is valid and the deployment tier matches the features you're using.Prompt blocked by sanitizer unexpectedly
BLINK_GOVERNANCE_INPUT_SANITIZER_ENABLED=false, but re-enable it in production.Task stays in "pending" status
docker compose ps. Verify Redis is reachable (the worker uses Redis as its broker). Check the worker logs with docker compose logs worker for errors.Cache hit rate is very low
CACHE_SIMILARITY_THRESHOLD value (default 0.85). A threshold of 0.80 is more permissive and will match prompts with slightly different wording. Also verify that Redis has enough memory and the embedding model is loading correctly.Rate limiter blocks requests too aggressively
BLINK_GOVERNANCE_RATE_LIMIT_MAX_REQUESTS or widen the window with BLINK_GOVERNANCE_RATE_LIMIT_WINDOW_SECONDS. Each API key has its own rate limit counter. If multiple services share one key, consider creating separate keys for each.