Show HN: Postgres extension for BM25 relevance-ranked full-text search
by tjgreen on 3/31/2026, 4:29:52 PM
Last summer we faced a conundrum at my company, Tiger Data, a Postgres cloud vendor whose main business is in timeseries data. We were trying to grow our business towards emerging AI-centric workloads and wanted to provide a state-of-the-art hybrid search stack in Postgres. We'd already built pgvectorscale in house with the goal of scaling semantic search beyond pgvector's main memory limitations. We just needed a scalable ranked keyword search solution too.<p>The problem: core Postgres doesn't provide this; the leading Postgres BM25 extension, ParadeDB, is guarded behind AGPL; developing our own extension appeared daunting. We'd need a small team of sharp engineers and 6-12 months, I figured. And we'd probably still fall short of the performance of a mature system like Parade/Tantivy.<p>Or would we? I'd be experimenting long enough with AI-boosted development at that point to realize that with the latest tools (Claude Code + Opus) and an experienced hand (I've been working in database systems internals for 25 years now), the old time estimates pretty much go out the window.<p>I told our CTO I thought I could solo the project in one quarter. This raised some eyebrows.<p>It did take a little more time than that (two quarters), and we got some real help from the community (amazing!) after open-sourcing the pre-release. But I'm thrilled/exhausted today to share that pg_textsearch v1.0 is freely available via open source (Postgres license), on Tiger Data cloud, and hopefully soon, a hyperscalar near you:<p><a href="https://github.com/timescale/pg_textsearch" rel="nofollow">https://github.com/timescale/pg_textsearch</a><p>In the blog post accompanying the release, I overview the architecture and present benchmark results using MS-MARCO. To my surprise, we were not only able to meet Parade/Tantivy's query performance, but exceed it substantially, measuring a 4.7x advantage on query throughput at scale:<p><a href="https://www.tigerdata.com/blog/pg-textsearch-bm25-full-text-search-postgres" rel="nofollow">https://www.tigerdata.com/blog/pg-textsearch-bm25-full-text-...</a><p>It's exciting (and, to be honest, a little unnerving) to see a field I've spent so much time toiling in change so quickly in ways that enable us to be more ambitious in our technical objectives. Technical moats are moats no longer.<p>The benchmark scripts and methodology are available in the github repo. Happy to answer any questions in the thread.<p>Thanks,<p>TJ (tj@tigerdata.com)
https://github.com/timescale/pg_textsearch
Comments
by: simonw
This is really cool. I've built things on PostgreSQL ts_vector() FTS in the past which works well but doesn't have whole-index ranking algorithms so can't do BM25.<p>It's a bit surprising to me that this doesn't appear to have a mechanism to say "filter for just documents matching terms X and Y, then sort by BM25 relevance" - it looks like this extension currently handles just the BM25 ranking but not the FTS filtering. Are you planning to address that in the future?<p>I found this example in the README quite confusing:<p><pre><code> SELECT * FROM documents WHERE content <@> to_bm25query('search terms', 'docs_idx') < -5.0 ORDER BY content <@> 'search terms' LIMIT 10; </code></pre> That -5.0 is a magic number which, based on my understanding of BM25, is difficult to predict in advance since the threshold you would want to pick varies for different datasets.
3/31/2026, 8:02:53 PM
by: mattbessey
Please oh please let GCP add this to the supported managed Postgres extensions...
3/31/2026, 9:21:07 PM
by: andai
Can you explain this in more detail? Is this for RAG, i.e. combining vector search with keyword search?<p>My knowledge on that subject roughly begins and ends with this excellent article, so I'd love to hear how this relates to that.<p><a href="https://www.anthropic.com/engineering/contextual-retrieval" rel="nofollow">https://www.anthropic.com/engineering/contextual-retrieval</a><p>Especially since what Anthropic describes here is a bit of a rube Goldberg machine which also involves preprocessing (contextual summarization) and a reranking model, so I was wondering if there's any "good enough" out of the box solutions for it.
3/31/2026, 9:00:27 PM
by: shreyssh
Nice work. pg_search has been on my radar for a while, having BM25 natively in Postgres instead of bolting on Elasticsearch is a huge DX win. Curious about the index build time on larger datasets though. I'm working with ~2M row tables and the bottleneck for most Postgres extensions I've tried isn't query speed, it's the initial indexing. Any benchmarks on that?
3/31/2026, 8:35:47 PM
by: jascha_eng
FWIW TJ is not your average vibe coder imo: <a href="https://www.linkedin.com/in/todd-j-green/" rel="nofollow">https://www.linkedin.com/in/todd-j-green/</a><p>In september he burned through 3000$ in API credits though, but I think that's before we finally bought max plans for everyone that wanted it.
3/31/2026, 7:36:06 PM
by: zephyrwhimsy
Input quality is almost always the actual bottleneck. Teams spend months tuning retrieval while feeding HTML boilerplate into their vector stores.
3/31/2026, 9:10:29 PM
by: Unical-A
Impressive benchmarks. How does the BM25 implementation handle high-frequency updates (writes) while maintaining search latency? Usually, there's a trade-off between ingest speed and search performance in Postgres-based full-text search.
3/31/2026, 8:57:48 PM
by: timedude
When is this available on AWS in Aurora? Anyone from AWS here, add it pronto
3/31/2026, 9:12:50 PM
by: gmassman
Very exciting! Congrats on the release, this will be a huge benefit to all folks building RAG/rerank systems on top of Postgres. Looking forward to testing it out myself.
3/31/2026, 8:47:17 PM
by: jackyliang
VERY excited about this, literally just looking to build hybrid search using Postgres FTS. When will this be available on Supabase?
3/31/2026, 8:50:07 PM
by:
3/31/2026, 8:03:41 PM
by: gplprotects
> ParadeDB, is guarded behind AGPL<p>What a wonderful ad for ParadeDB, and clear signal that "TigerData" is a pernicious entity.
3/31/2026, 8:04:09 PM
by: benjiro3000
[dead]
3/31/2026, 8:05:38 PM
by: zephyrwhimsy
[dead]
3/31/2026, 9:11:29 PM
by:
3/31/2026, 4:29:52 PM