Enterprise-Grade Performance

pLLM
High-Performance
LLM Gateway

Drop-in OpenAI replacement built in Go. Handle thousands of concurrent requests with adaptive routing, multi-provider support, and enterprise-grade reliability.

High Performance

Cost Efficient

Low Latency

100% OpenAI Compatible

Get Started View on GitHub

Enterprise-Grade Features

Built from the ground up for production workloads with performance, reliability, and developer experience in mind.

Zero Migration

🔌

100% OpenAI Compatible

Drop-in replacement for OpenAI API. No code changes needed - just update your base URL and you're ready to go.

7+ Providers

🌐

Multi-Provider Support

Support for OpenAI, Anthropic, Azure OpenAI, AWS Bedrock, Vertex AI, Llama, and Cohere with unified interface.

Zero Downtime

🎯

Adaptive Routing

Intelligent request routing with automatic failover, circuit breakers, and health-based load balancing.

Native Go

🚀

High Performance

Built in Go for maximum performance. Handle thousands of concurrent requests with minimal latency overhead.

Production Ready

🛡️

Enterprise Security

JWT authentication, RBAC, audit logging, and comprehensive monitoring with Prometheus metrics.

Save Money

💰

Cost Optimization

Budget management, intelligent caching, and multi-key load balancing to minimize API costs.

Technical Excellence

Deep technical capabilities designed for mission-critical production environments.

Performance

Sub-millisecond routing overhead
Thousands of concurrent connections
Native compilation with Go
Efficient memory management

Reliability

Circuit breaker protection
Automatic health monitoring
Graceful degradation
Zero-downtime deployments

Scalability

Horizontal scaling ready
Kubernetes native
Redis-backed caching
Distributed rate limiting

Observability

Prometheus metrics
Grafana dashboards
Distributed tracing
Comprehensive logging

Performance Advantage

See how pLLM compares to typical interpreted gateway solutions.

Metric	pLLM (Go)	Typical Gateway	Advantage
Concurrent Connections	Thousands	Limited	🚀 Superior
Memory Usage	50-80MB	150-300MB+	💾 3-6x Less
Startup Time	<100ms	2-5s	⚡ 20-50x Faster
CPU Efficiency	All cores	GIL limited	🔥 True Parallel

Enterprise Authentication

Seamless integration with your existing identity infrastructure through OAuth/OIDC support powered by Dex.

Zero Configuration

Connect to your existing identity providers without complex setup. Dex handles the OAuth/OIDC protocols while pLLM manages authorization.

Free with all plans

Enterprise Security

Industry-standard OAuth 2.0 and OpenID Connect protocols with support for SAML, LDAP, and multi-factor authentication.

Single Sign-On (SSO)

Multi-Factor Auth

Role-Based Access

Audit Logging

Supported Identity Providers

Connect with popular identity providers and enterprise systems

Google

Google Workspace & Gmail

Microsoft

Azure AD & Office 365

GitHub

GitHub Organizations

Active Directory

Windows Active Directory

LDAP

LDAP Directory Services

SAML

SAML 2.0 Identity Providers

AWS

AWS IAM & Cognito

Okta

Okta Identity Cloud

Auth0

Auth0 Identity Platform

Providers

SSO

Ready

Free

Included

And many more through standard protocols:

SAML 2.0 OAuth 2.0 OpenID Connect LDAP/AD

Simple Configuration

Get started with OAuth/OIDC in minutes with a simple YAML configuration

auth:
  dex:
    issuer: https://dex.yourcompany.com
    connectors:
      - type: oidc
        name: Google
        config:
          issuer: https://accounts.google.com
          clientID: your-google-client-id
          clientSecret: your-google-client-secret
      - type: ldap
        name: Corporate Directory
        config:
          host: ldap.yourcompany.com:636
          insecureNoSSL: false
          bindDN: cn=admin,dc=company,dc=com

System Architecture

Enterprise-grade architecture designed for high availability, scalability, and performance

Client Layer

Applications & Services

🌐

Web Applications

React, Vue, Angular

📱

Mobile Apps

iOS, Android, React Native

⚙️

Backend Services

Node.js, Python, Go

🤖

AI Platforms

LangChain, AutoGPT

pLLM Gateway

Intelligent Routing Engine

Core Gateway

High-performance Go runtime

Router

Chi HTTP Router

Auth

JWT & RBAC

Cache

Redis Layer

Monitor

Metrics & Logs

Intelligent Load Balancer

Round Robin

Least Busy

Weighted

Provider Layer

LLM Service Providers

OpenAI

healthy

99.9%

uptime

Anthropic

healthy

99.9%

uptime

Azure OpenAI

degraded

85.2%

uptime

AWS Bedrock

healthy

99.9%

uptime

Google Vertex

healthy

99.9%

uptime

Llama

failed

uptime

Data Flow & Features

Real-time monitoring and intelligent routing

⚡

Circuit Breaker

Automatic failover protection

💓

Health Checks

Continuous monitoring

🚦

Rate Limiting

Traffic control & quotas

📊

Analytics

Performance insights

Live Performance Metrics

1000+

Requests/sec

<1ms

Latency

99.9%

Uptime

65MB

Memory

Intelligent Load Balancing

Choose from 6 different routing strategies optimized for different use cases and workload patterns.

Round Robin

Even distribution across all providers

Best for: Balanced load scenarios

Least Busy

Routes to least loaded provider

Best for: Variable workloads

Weighted

Custom weight distribution

Best for: Tiered provider setups

Priority

Prefers high-priority providers

Best for: Cost optimization

Latency-Based

Routes to fastest responding provider

Best for: Performance critical

Usage-Based

Respects rate limits and quotas

Best for: Token management

Production-Ready Stack

Built with battle-tested technologies and modern best practices for enterprise reliability.

Chi Router

Lightning-fast HTTP routing and middleware

PostgreSQL

Reliable data persistence with GORM ORM

Redis

High-speed caching and rate limiting

Prometheus

Enterprise monitoring and metrics

Adaptive Request Flow

Interactive visualization of intelligent routing with automatic failover and circuit breaker protection.

Client Request

pLLM Gateway

Intelligent Routing

OpenAI

healthy

Anthropic

degraded

Google

healthy

Circuit Breaker

Response

React Flow

Simulation Mode

Status Legend

Healthy

Degraded

Failed

Product Roadmap

Exciting features coming soon to make pLLM even more powerful and enterprise-ready.

Key Rotation & Secret Management

In ProgressQ1 2025

Automated key rotation and integration with external secret managers for enhanced security.

Key Features:

Automated API key rotation

AWS Secrets Manager integration

Azure Key Vault support

HashiCorp Vault connector

Secret versioning & rollback

Advanced Guardrails

PlannedQ1 2025

Content filtering, rate limiting, and safety mechanisms to ensure responsible AI usage.

Key Features:

Content filtering & moderation

PII detection & redaction

Toxicity & bias detection

Custom prompt validation

Usage policy enforcement

Enhanced Audit & Logging

PlannedQ2 2025

Comprehensive audit trails with retention policies and compliance reporting.

Key Features:

Detailed audit logs

Log retention policies

Compliance reporting

Real-time log streaming

Custom log formats

Shape Our Roadmap

Have a feature request or want to influence our development priorities? We'd love to hear from you.

Join Discussion Request Feature

Enterprise Support

While pLLM is open source and free, we offer professional support and custom solutions for enterprises with mission-critical requirements.

Community

Perfect for developers and small teams getting started with pLLM.

Free

What's included:

GitHub Issues & Discussions
Community Discord support
Documentation and guides
Best effort response time
Open source under MIT license

Limitations:

No SLA guarantees
Community-driven support
No priority bug fixes

Get Started

Professional

For growing businesses that need reliable support and faster issue resolution.

CustomContact for pricing

What's included:

Priority email support
Guaranteed 24-hour response time
Deployment assistance
Configuration review
Priority bug fixes
Access to beta features

Limitations:

Business hours support only
Email-based communication

Contact Sales

Enterprise

Comprehensive support for mission-critical deployments with custom requirements.

CustomContact for pricing

What's included:

Dedicated support engineer
Custom SLA (down to 2-hour response)
Phone & video call support
Custom feature development
Architecture consulting
On-site deployment assistance
Training and workshops
Priority feature requests

Schedule Consultation

Ready for Enterprise Deployment?

Contact our team for custom integrations, dedicated support, and enterprise-grade deployment assistance.

Professional Consultation

Architecture review, deployment planning, and best practices guidance from our core team.

Custom Development

Tailored features, custom integrations, and specialized deployment configurations for your use case.

Dedicated Support

Priority support channels, SLA guarantees, and direct access to our engineering team.

Schedule a Consultation

Book a 30-minute call to discuss your requirements and explore how pLLM can fit your enterprise needs.

Book Consultation

Enterprise Inquiry

Submit a detailed inquiry for custom features, deployment assistance, or partnership opportunities.

Submit Inquiry Form

Frequently Asked Questions

Everything you need to know about pLLM, from technical details to enterprise support options.

Yes, pLLM is completely free and open source under the MIT license. This means you can use it in commercial applications, modify the code, and deploy it anywhere without licensing fees. The only costs you'll incur are your infrastructure expenses (servers, cloud resources) and API costs from the LLM providers themselves (OpenAI, Anthropic, etc.).

pLLM is built in Go for superior performance and lower resource usage compared to Python-based solutions. Key advantages include: sub-millisecond routing overhead, native compilation for better performance, 3-6x lower memory usage, 20-50x faster startup times, and true parallel processing without GIL limitations. Plus, it's 100% OpenAI API compatible, requiring zero code changes to integrate.

pLLM supports all major LLM providers including OpenAI (GPT-3.5, GPT-4, GPT-4 Turbo), Anthropic Claude, Azure OpenAI, AWS Bedrock, Google Vertex AI, Groq, and Cohere. The unified API interface means you can switch between providers or use multiple providers simultaneously with intelligent routing and automatic failover.

Yes, we provide comprehensive enterprise support including dedicated support engineers, custom SLA agreements (down to 2-hour response times), priority bug fixes, custom feature development, architecture consulting, on-site deployment assistance, and training workshops. Enterprise support is available through custom pricing based on your specific requirements.

Absolutely. pLLM is designed for flexible deployment scenarios including on-premise installations, air-gapped environments, and hybrid cloud setups. We provide Kubernetes manifests, Docker containers, and can assist with custom deployment configurations. The gateway can run entirely within your infrastructure while connecting to external LLM APIs or internal models.

pLLM includes comprehensive security features: JWT-based authentication, Role-Based Access Control (RBAC), audit logging for compliance, OAuth/OIDC integration through Dex (supporting Google, Microsoft, LDAP, Active Directory), API key management, rate limiting, and request monitoring. All communications use TLS encryption, and we support enterprise identity providers.

pLLM is optimized for high-performance scenarios: handles thousands of concurrent connections, sub-millisecond routing overhead, efficient memory usage (50-80MB typical), fast startup times (<100ms), and intelligent caching to reduce API costs. The Go-based architecture provides significant performance advantages over interpreted language solutions.

Yes! We have comprehensive documentation, GitHub discussions for community support, Discord server for real-time help, and regular updates on our roadmap. The open-source community actively contributes features and bug fixes. For enterprise customers, we provide dedicated documentation, training materials, and direct access to our engineering team.

Performance Benchmarks

Real-world performance data showing why Go-based pLLM outperforms interpreted gateway solutions.

🚀

4.8x faster

Requests/sec

pLLM 12,000+

Typical 2,500

⚡

18.7x faster

P99 Latency

pLLM 0.8ms

Typical 15ms

💾

4-8x faster

Memory Efficiency

pLLM 50-80MB

Typical 200-400MB

🏃

20-50x faster

Cold Start

pLLM <100ms

Typical 2-5s

Performance Comparison

pLLM vs Typical Interpreted Gateway

Concurrent Connections

900% more

pLLM

Typical

Higher is better

Memory Usage

71% less

pLLM

0MB

Typical

0MB

Lower is better

Startup Time

97% less

pLLM

Typical

Lower is better

Response Time

76% less

pLLM

0 ms overhead

Typical

0 ms overhead

Lower is better

pLLM (Go)

Typical Gateway

Performance comparison between pLLM and typical gateways

Load Testing Results

Stress tested with 10,000 concurrent users making chat completion requests.

Test Configuration

Concurrent Users: 10,000

Request Type: Chat Completions

Test Duration: 10 minutes

Infrastructure: Single 4-core instance

Memory Limit: 1GB

Results

Success Rate

99.97%

Zero failed requests under normal conditions

Average Response Time

1.2ms

Gateway overhead only, excluding LLM processing

Memory Usage

78MB

Peak memory during 10K concurrent connections

Enterprise Scalability

Built-in scalability features that make pLLM ideal for high-volume production workloads.

True Parallelism

No GIL limitations - utilize all CPU cores effectively

Handle thousands of concurrent requests on a single instance

Memory Efficient

Native compilation with optimized memory management

3-6x less memory usage compared to interpreted alternatives

Instant Scaling

Sub-100ms startup enables aggressive auto-scaling

Scale from 0 to production load in milliseconds

Network Optimized

Efficient connection pooling and keep-alive management

Minimal network overhead with connection reuse

⚠️

Enterprise Performance Scaling

For massive performance and ultra-low latency, the bottleneck is often the LLM providers themselves, not the gateway. To achieve true enterprise scale:

Multiple LLM Deployments: Deploy several instances of the same model (e.g., 5-10 GPT-4 Azure OpenAI deployments)
Multi-Provider Redundancy: Use multiple AWS Bedrock accounts, Azure regions, or provider accounts
Geographic Distribution: Deploy models across regions for latency optimization

Why This Matters: A single LLM deployment typically handles 60-100 RPM. For 10,000+ concurrent users, you need multiple deployments of the same model to prevent provider-side bottlenecks. pLLM's adaptive routing automatically distributes load across all deployments.

Get Started in Minutes

Choose your deployment method and get pLLM running in your environment quickly.

Deployment Options

Kubernetes with Helm

Production-ready deployment with auto-scaling

Production

High Availability

Auto-scaling

Monitoring

Production Ready

bash

# Add the Helm repository
helm repo add pllm https://andreimerfu.github.io/pllm
helm repo update

# Install with your configuration
helm install pllm pllm/pllm \
  --set pllm.secrets.jwtSecret="your-jwt-secret" \
  --set pllm.secrets.masterKey="sk-master-key" \
  --set pllm.secrets.openaiApiKey="sk-your-openai-key"

# Check status
kubectl get pods -l app.kubernetes.io/name=pllm

Docker Compose

Perfect for development and testing

Development

Quick Setup

Local Development

Easy Testing

Full Stack

bash

# Clone and setup
git clone https://github.com/andreimerfu/pllm.git
cd pllm

# Configure environment
cp .env.example .env
echo "OPENAI_API_KEY=sk-your-key-here" >> .env

# Launch pLLM
docker compose up -d

# Test it works
curl http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model": "gpt-3.5-turbo", "messages": [{"role": "user", "content": "Hello!"}]}'

Binary Installation

Lightweight deployment for simple setups

Minimal

No Dependencies

Single Binary

Fast Startup

Cross Platform

bash

# Download latest release
wget https://github.com/andreimerfu/pllm/releases/latest/download/pllm-linux-amd64

# Make executable
chmod +x pllm-linux-amd64

# Set environment variables
export OPENAI_API_KEY=sk-your-key-here
export JWT_SECRET=your-jwt-secret
export MASTER_KEY=sk-master-key

# Run pLLM
./pllm-linux-amd64 server

Drop-in Integration

pLLM is 100% OpenAI compatible. Just change your base URL and you're ready to go.

Python

python

from openai import OpenAI

# Just change the base_url - that's it!
client = OpenAI(
    api_key="your-api-key",
    base_url="http://localhost:8080/v1"  # ← Point to pLLM
)

# Use exactly like OpenAI
response = client.chat.completions.create(
    model="gpt-3.5-turbo",
    messages=[{"role": "user", "content": "Hello!"}]
)

Node.js

javascript

import OpenAI from 'openai';

const openai = new OpenAI({
  apiKey: 'your-api-key',
  baseURL: 'http://localhost:8080/v1'  // ← Point to pLLM
});

const completion = await openai.chat.completions.create({
  model: "gpt-3.5-turbo",
  messages: [{role: "user", content: "Hello!"}]
});

cURL

bash

curl -X POST http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "X-API-Key: your-api-key" \
  -d '{
    "model": "gpt-3.5-turbo",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

Start with GitHub Read Documentation

pLLM High-Performance LLM Gateway

Enterprise-Grade Features

100% OpenAI Compatible

Multi-Provider Support

Adaptive Routing

High Performance

Enterprise Security

Cost Optimization

Technical Excellence

Performance

Reliability

Scalability

Observability

Performance Advantage

Enterprise Authentication

Zero Configuration

Enterprise Security

Supported Identity Providers

Simple Configuration

System Architecture

Client Layer

pLLM Gateway

Core Gateway

Provider Layer

Data Flow & Features

Live Performance Metrics

Intelligent Load Balancing

Round Robin

Least Busy

Weighted

Priority

Latency-Based

Usage-Based

Production-Ready Stack

Chi Router

PostgreSQL

Redis

Prometheus

Adaptive Request Flow

Product Roadmap

Key Rotation & Secret Management

Key Features:

Advanced Guardrails

Key Features:

Enhanced Audit & Logging

Key Features:

Shape Our Roadmap

Enterprise Support

Community

What's included:

Limitations:

Professional

What's included:

Limitations:

Enterprise

What's included:

Ready for Enterprise Deployment?

Professional Consultation

Custom Development

Dedicated Support

Schedule a Consultation

Enterprise Inquiry

Frequently Asked Questions

Is pLLM really free for commercial use?

How does pLLM compare to other LLM gateways?

What LLM providers are supported?

Do you offer enterprise support and custom features?

Can pLLM be deployed on-premise or in air-gapped environments?

How does authentication and security work?

What kind of performance can I expect?

Is there a community or documentation available?

Performance Benchmarks

Requests/sec

P99 Latency

Memory Efficiency

Cold Start

Performance Comparison

Concurrent Connections

Memory Usage

Startup Time

pLLM
High-Performance
LLM Gateway