Show HN: Postgres extension for BM25 relevance-ranked full-text search

by tjgreen on 3/31/2026, 4:29:52 PM

Last summer we faced a conundrum at my company, Tiger Data, a Postgres cloud vendor whose main business is in timeseries data. We were trying to grow our business towards emerging AI-centric workloads and wanted to provide a state-of-the-art hybrid search stack in Postgres. We'd already built pgvectorscale in house with the goal of scaling semantic search beyond pgvector's main memory limitations. We just needed a scalable ranked keyword search solution too.The problem: core Postgres doesn't provide this; the leading Postgres BM25 extension, ParadeDB, is guarded behind AGPL; developing our own extension appeared daunting. We'd need a small team of sharp engineers and 6-12 months, I figured. And we'd probably still fall short of the performance of a mature system like Parade/Tantivy.Or would we? I'd be experimenting long enough with AI-boosted development at that point to realize that with the latest tools (Claude Code + Opus) and an experienced hand (I've been working in database systems internals for 25 years now), the old time estimates pretty much go out the window.I told our CTO I thought I could solo the project in one quarter. This raised some eyebrows.It did take a little more time than that (two quarters), and we got some real help from the community (amazing!) after open-sourcing the pre-release. But I'm thrilled/exhausted today to share that pg_textsearch v1.0 is freely available via open source (Postgres license), on Tiger Data cloud, and hopefully soon, a hyperscalar near you:<a href="https://github.com/timescale/pg_textsearch" rel="nofollow">https://github.com/timescale/pg_textsearch</a>In the blog post accompanying the release, I overview the architecture and present benchmark results using MS-MARCO. To my surprise, we were not only able to meet Parade/Tantivy's query performance, but exceed it substantially, measuring a 4.7x advantage on query throughput at scale:<a href="https://www.tigerdata.com/blog/pg-textsearch-bm25-full-text-search-postgres" rel="nofollow">https://www.tigerdata.com/blog/pg-textsearch-bm25-full-text-...</a>It's exciting (and, to be honest, a little unnerving) to see a field I've spent so much time toiling in change so quickly in ways that enable us to be more ambitious in our technical objectives. Technical moats are moats no longer.The benchmark scripts and methodology are available in the github repo. Happy to answer any questions in the thread.Thanks,TJ (tj@tigerdata.com)

https://github.com/timescale/pg_textsearch

Comments

by: simonw

This is really cool. I've built things on PostgreSQL ts_vector() FTS in the past which works well but doesn't have whole-index ranking algorithms so can't do BM25.It's a bit surprising to me that this doesn't appear to have a mechanism to say "filter for just documents matching terms X and Y, then sort by BM25 relevance" - it looks like this extension currently handles just the BM25 ranking but not the FTS filtering. Are you planning to address that in the future?I found this example in the README quite confusing:<pre><code> SELECT * FROM documents WHERE content <@> to_bm25query('search terms', 'docs_idx') < -5.0 ORDER BY content <@> 'search terms' LIMIT 10; </code></pre> That -5.0 is a magic number which, based on my understanding of BM25, is difficult to predict in advance since the threshold you would want to pick varies for different datasets.

3/31/2026, 8:02:53 PM

by: mattbessey

Please oh please let GCP add this to the supported managed Postgres extensions...

3/31/2026, 9:21:07 PM

by: andai

Can you explain this in more detail? Is this for RAG, i.e. combining vector search with keyword search?My knowledge on that subject roughly begins and ends with this excellent article, so I'd love to hear how this relates to that.<a href="https://www.anthropic.com/engineering/contextual-retrieval" rel="nofollow">https://www.anthropic.com/engineering/contextual-retrieval</a>Especially since what Anthropic describes here is a bit of a rube Goldberg machine which also involves preprocessing (contextual summarization) and a reranking model, so I was wondering if there's any "good enough" out of the box solutions for it.

3/31/2026, 9:00:27 PM

by: shreyssh

Nice work. pg_search has been on my radar for a while, having BM25 natively in Postgres instead of bolting on Elasticsearch is a huge DX win. Curious about the index build time on larger datasets though. I'm working with ~2M row tables and the bottleneck for most Postgres extensions I've tried isn't query speed, it's the initial indexing. Any benchmarks on that?

3/31/2026, 8:35:47 PM

by: jascha_eng

FWIW TJ is not your average vibe coder imo: <a href="https://www.linkedin.com/in/todd-j-green/" rel="nofollow">https://www.linkedin.com/in/todd-j-green/</a>In september he burned through 3000$ in API credits though, but I think that's before we finally bought max plans for everyone that wanted it.

3/31/2026, 7:36:06 PM

by: zephyrwhimsy

Input quality is almost always the actual bottleneck. Teams spend months tuning retrieval while feeding HTML boilerplate into their vector stores.

3/31/2026, 9:10:29 PM

by: Unical-A

Impressive benchmarks. How does the BM25 implementation handle high-frequency updates (writes) while maintaining search latency? Usually, there's a trade-off between ingest speed and search performance in Postgres-based full-text search.

3/31/2026, 8:57:48 PM

by: timedude

When is this available on AWS in Aurora? Anyone from AWS here, add it pronto

3/31/2026, 9:12:50 PM

by: gmassman

Very exciting! Congrats on the release, this will be a huge benefit to all folks building RAG/rerank systems on top of Postgres. Looking forward to testing it out myself.

3/31/2026, 8:47:17 PM

by: jackyliang

VERY excited about this, literally just looking to build hybrid search using Postgres FTS. When will this be available on Supabase?

3/31/2026, 8:50:07 PM

by:

3/31/2026, 8:03:41 PM

by: gplprotects

> ParadeDB, is guarded behind AGPLWhat a wonderful ad for ParadeDB, and clear signal that "TigerData" is a pernicious entity.

3/31/2026, 8:04:09 PM

by: benjiro3000

[dead]

3/31/2026, 8:05:38 PM

by: zephyrwhimsy

[dead]

3/31/2026, 9:11:29 PM

by:

3/31/2026, 4:29:52 PM

Hacker News Viewer

Top 20

Show HN: Postgres extension for BM25 relevance-ranked full-text search

Comments

by: simonw

by: mattbessey

by: andai

by: shreyssh

by: jascha_eng

by: zephyrwhimsy

by: Unical-A

by: timedude

by: gmassman

by: jackyliang

by:

by: gplprotects

by: benjiro3000

by: zephyrwhimsy

by: