We believe healthcare AI should be verifiable. We publish our benchmarks, open-source our evaluation code, and release models under permissive licences.
42 speech-to-text models ranked by Medical Word Error Rate (M-WER) on real-format medical conversations. Google Gemini 3 Pro leads overall (2.65% M-WER); VibeVoice-ASR 9B leads open-source (3.16% M-WER).
Read article → BenchmarkSafety-first benchmark for SOAP note generation. Measures hallucination rates, clinical coverage, and note quality. Omi-SOAP-edge-v1 has the highest Safety and Evidence scores. Open-source evaluation framework.
Read article → ModelOpen-source clinical language model that generates structured SOAP notes from medical dialogues. Fine-tuned from Phi-3 Mini. Higher ROUGE-1 (70) than GPT-4 Turbo (69) on the Omi-Sum test set.
Read article →Benchmark for evaluating clinical SOAP note generation. Measures safety, grounding, and quality.
GitHub →