Sleep clinic administrative lines are frequently bottlenecked by a heavy volume of repetitive patient inquiries regarding overnight sleep study prep, parking, and facility logistics. While automating these responses through an AI assistant drastically recovers staff velocity, a standard unconstrained model introduces a critical clinical risk if it overreaches into diagnostic or treatment advice. To bridge this gap, I engineered a safety-first RAG chatbot that deploys automated intent-based routing to block medical queries while grounding all logistical answers strictly within approved local text assets.
Part A: Patient-Facing RAG Bot
The Workflow
- Trigger & Classification: The pipeline activates the moment a patient submits an unstructured text inquiry through the Streamlit web interface. The Python backend immediately passes the input through an automated intent-based routing layer powered by the Gemini 2.5 Flash API to classify the request as either safe logistics or unsafe clinical query.
- Retrieval & Response: If the intent is flagged as logistical (e.g., parking, location, prep steps), a LangChain retrieval pipeline extracts relevant data strictly from the local text assets to generate a source-grounded response.
- Deterministic Guardrails: If the intent classification catches clinical or emergency keywords, the system triggers a circuit-breaker fallback—completely bypassing response generation to automatically deploy an approved disclaimer or human escalation route.

Patient-facing RAG bot responding with source-grounded clinic information
Safety by Design
Failure Mode 1: Unsafe Medical Advice Risk: Unconstrained prompts could allow the bot to offer medical diagnostics or CPAP pressure adjustments, risking severe patient harm and clinic liability. Fix: An intent-based routing layer intercepts clinical keywords prior to generation, forcing a hard stop and applying an approved safety disclaimer or human escalation.
Failure Mode 2: Protected Health Information (PHI) Leakage Risk: Feeding raw, unvalidated CSV or Excel reports into the reporting PoC risks exposing sensitive PHI, such as patient names, member IDs, and dates of birth. Fix: I built a deterministic Pandas data validation pipeline that strictly parses and scrubs all incoming spreadsheets so only verified, sanitized data reaches the Gemini API.
Failure Mode 3: Hallucinated Operational Logistics Risk: When asked about niche parking or clinic details absent from the database, the model may confidently hallucinate false instructions, causing immediate patient delays and operational friction. Fix: I configured strict local text asset boundaries that force the retrieval pipeline to output a deterministic “not found” status rather than guessing outside of the approved documents.
Part B: AI Reporting PoC
Clinicians require rapid insights from patient data, but accessing raw spreadsheet reports introduces severe PHI exposure risks. To solve this operational bottleneck, a Pandas-validated data pipeline was engineered to aggressively scrub sensitive identifiers from all incoming files. This system securely feeds only sanitized data to the Gemini API, generating actionable narrative summaries without compromising patient privacy or regulatory compliance.
Validation & Metrics
During Week 1 terminal stress-testing of the intent-based routing pipeline, the system achieved a 96% JSON validation success rate, while maintaining an average latency of 420ms per query event. Crucially, the system demonstrated a 100% escalation accuracy rate by successfully intercepting every single simulated out-of-scope clinical request before it could hit the generation model. This validates the system’s operational viability: intent classification is highly reliable, latency is well within live application budgets, and the safety routing logic holds firm under stress.
What This Proves
This case study demonstrates a core AI Operations philosophy: applied AI is only valuable when bounded by absolute deterministic safety. By engineering strict data validation pipelines and intent-based routing, operational efficiency is achieved without ever superseding patient privacy or clinical accuracy. This architecture proves that intelligent systems can be integrated into high-stakes healthcare environments responsibly, transforming generative AI from a probabilistic risk into a measurable, defensive asset.