CustomBench
A full-stack LLM benchmarking platform built with Next.js 16, React 19, TypeScript, and Bun, enabling concurrent evaluation of 10+ LLMs per run via OpenRouter. Features custom Q&A dataset support, real-time Server-Sent Events execution, automated LLM-as-judge evaluation with structured Zod schemas, and consolidated result reporting with leaderboard rankings.
December 2025 - Present
TypeScriptReactNext.jsBunTailwind CSSOpenRouterZod
The Challenge
Details coming soon...
The Solution
Details coming soon...
Key Results & Impact
Business Impact
Details coming soon...
Key Achievements
Built full-stack LLM benchmarking platform enabling concurrent evaluation of 10+ models per run via OpenRouter
Implemented real-time benchmark execution with Server-Sent Events (SSE) and 30-minute timeout safeguards
Engineered automated LLM-as-judge evaluation pipeline with 100% structured verdict output using Zod schemas
Designed leaderboard system with accuracy metrics, winner highlighting, and reproducible JSON result exports
Created dual-interface platform supporting both web UI workflows and CLI automation for research/CI use cases
Interested in Learning More?
Check out the source code or see the project in action