Sezgin
Saygili
Founding Engineer @ Patientdesk.ai YC W26
About
A passionate full-stack engineer and startup founder specializing in scalable backend systems, real-time communications, and low-level performance. Currently building complex technical infrastructure at Patientdesk.ai in San Francisco and scaling ImpliedOptions.com.
Technical Arsenal
Experience
Patientdesk.ai
Founding Engineer
Building core technical infrastructure as a founding team member of a Y Combinator backed startup (W26). Architecting scalable backend systems utilizing FastAPI and Next.js, and deploying high-performance data pipelines and real-time WebRTC communications for modern dental clinics.
ImpliedOptions.com
Founder
Founded and developed an advanced options analytics platform. Engineered high-performance, low-latency financial models utilizing Rust and WebAssembly (WASM), alongside interactive and dense data visualizations in React.
Aalto University
Research Assistant
- Developed LLM pipelines to conduct synthetic survey experiments.
- Fine-tuned NLP models for multi-label classification and built a complex RAG system scanning 90,000+ pages.
- Developed web crawlers and models to identify latest developments in medical tech.
- Developed models to understand policy implications on the energy industry.
Research & Publications
DentesBench - Paper
DentesBench - Paper
Published the first comprehensive benchmark framework evaluating frontier LLMs as dental clinic phone agents across 483 standardized scenarios. DentesBench represents a critical step forward in assessing the real-world utility of conversational AI in medical administration, bridging the gap between general-purpose benchmarks and domain-specific clinic requirements.
The framework categorizes patient interactions into complex multi-turn dialogs involving scheduling, triage, insurance verification, and billing. By pitting models like GPT-4, Claude 3, and fine-tuned open-weight models against identical patient persona scripts, we mapped out the capability frontiers—and the systematic failure modes—of modern AI.
Our methodology highlighted that while most LLMs can handle standard appointment booking, they rapidly degrade in performance when navigating convoluted insurance contexts, multi-party calls, or nuanced medical triage constraints. The suite establishes a new standardized grading rubric focused on empathetic alignment, semantic accuracy, and hallucination resistance in high-stakes healthcare environments.
Bachelor's Thesis - Predicting Economic Stated Preferences Using LLMs
Bachelor's Thesis - Predicting Economic Stated Preferences Using LLMs
Surveys play a central role in economics, particularly in climate and public economics where researchers measure expectations, policy preferences, and willingness to pay. However, collecting survey data is expensive, slow, and constrained. This thesis investigates a cutting-edge methodological question: can large language models (LLMs) act as synthetic respondents to reliably reproduce economically meaningful response variation?
Using a private Finnish survey dataset on climate policy attitudes, transport emissions, and willingness to pay, I structured prompt sequences to test whether LLMs can accurately predict individual human responses from demographic and attitudinal parameters. This economically relevant setting examines support for carbon pricing—an area where heterogeneity in preferences is non-trivial and vital to policy making.
The findings offer a nuanced perspective on AI in economics research. While LLMs showed a capacity to recover high-level response patterns well—especially for binary or ternary categorical attitude questions—they significantly struggled when tasked with fine-grained classification or deeper subgroup differentiation. The results suggest that LLMs are currently best suited as a low-cost complement for pretesting questionnaires or exploring survey counterfactuals, rather than acting as a full replacement for field surveys measuring complex economic heterogeneity.
Blog
Gemma 4 Fine-Tuning - April 2026
Gemma 4 Fine-Tuning - April 2026
Detailed the soul-document-driven fine-tuning of Gemma 4 31B with Opus 4.6 acting as a judge—resulting in unprecedented benchmark scoring for clinical conversational AI.
We demonstrated that injecting deep, domain-specific semantic understanding via highly-curated specialized prompt arrays drastically reduced hallucination rates when compared directly against generic frontier checkpoints.
Read Full Post →