Cohere Transcribe: Speech Recognition

by gmays on 3/31/2026, 4:27:02 PM

Comments

by: dinakernel

My worry is that ASR will end up like OCR. If the multi modal large AI system is good enough (latency wise), the advantage of domain understanding eats the other technlogies alive.In OCR, even when the characters are poorly scanned, the deep domain understanding these large multi modal AIs have allows it to understand what the document actually meant - this is going to be order id because in the million invoices I have seen before order id is normally below order date - etc. The same issue is going to be there in ASR also is my worry.

3/31/2026, 4:56:14 PM

by: gruez

> Limitations>Timestamps/Speaker diarization. The model does not feature either of these.What a shame. Is whisperx still the best choice if you want timestamps/diarization?

3/31/2026, 5:32:15 PM

by: Void_

Just today I shipped support for this in Whisper Memos: <a href="https://whispermemos.com/changelog/2026-04-cohere-transcribe" rel="nofollow">https://whispermemos.com/changelog/2026-04-cohere-transcribe</a>Accurate and fast model, very happy with it so far!

3/31/2026, 6:10:44 PM

by: teach

Dumb question, but if this is "open source" is there source code somewhere? Or does that term mean something different in the world of models that must be trained to be useful?

3/31/2026, 5:29:39 PM

by: geooff_

I can't say enough nice things about Cohere's services. I migrated over to their embedding model a few months ago for clip-style embeddings and it's been fantastic.It has the most crisp, steady P50 of any external service I've used in a long time.

3/31/2026, 4:43:51 PM

by: topazas

How hard could it be to train other European language(-s)?

3/31/2026, 5:20:07 PM

by: simonw

It's great that this is Apache 2.0 licensed - several of Cohere's other models are licensed free for non-commercial use only.

3/31/2026, 4:50:49 PM

by: aplomb1026

[dead]

3/31/2026, 5:31:28 PM

Hacker News Viewer

Top 20