The fastest AI behind my chatbot (and why speed matters more than smarts)
The chatbot on this site answers in under 200 milliseconds. Not because the AI is the smartest one out there, but because it is the fastest. Here is why that trade-off was worth it.
The little chatbot in the corner of this site runs on something called Cerebras Llama. It is a smaller AI model, not the kind that writes essays or passes bar exams. But it answers questions about my work almost instantly. When someone clicks a chatbot on a portfolio site, they give it about two seconds before they decide it is broken and close the window. Two seconds. That is the entire design constraint. Cerebras gets the first words on screen in under 200 milliseconds, which feels like the chatbot already knows the answer before you finish asking.
This chatbot has a narrow job. It answers questions about my projects, skills, and experience. All the information it needs is already written down in a detailed set of instructions it reads before every conversation. A smaller, faster AI with good instructions beats a bigger, slower AI for this kind of work. The answers are accurate because the AI is reading from a script, not making things up. And if it does give a bad answer, a separate AI scores every response afterward. Bad answers get flagged, I review them, and I improve the instructions. The system gets better over time without needing a more powerful model.
Sign up at the Cerebras developer portal and grab an API key. Their system speaks the same language as OpenAI, so any tutorial or tool that works with OpenAI will work with Cerebras with one small change: you point it at the Cerebras address instead. From there, you write a detailed set of instructions for your chatbot (what it knows, how it should talk, what it should refuse to answer) and connect it to your site. The instructions are the real product. A well-written set of instructions makes even a small AI sound like an expert on your specific topic. Spend your time on the instructions, not on picking the biggest model.
If the chatbot ever needed to compare projects, synthesize information across different experiences, or handle multi-step follow-up questions, I would move to a larger model and accept the slower response time. For straightforward question-and-answer grounded in a known set of facts, the small fast model is more than enough.
Product leader shipping across enterprise SaaS, AI in production, and 0→1. Writing about what actually ships — not what sounds good in a deck.