Hacker News Viewer

Show HN: Postgres extension for BM25 relevance-ranked full-text search

by tjgreen on 3/31/2026, 4:29:52 PM

Last summer we faced a conundrum at my company, Tiger Data, a Postgres cloud vendor whose main business is in timeseries data. We were trying to grow our business towards emerging AI-centric workloads and wanted to provide a state-of-the-art hybrid search stack in Postgres. We&#x27;d already built pgvectorscale in house with the goal of scaling semantic search beyond pgvector&#x27;s main memory limitations. We just needed a scalable ranked keyword search solution too.<p>The problem: core Postgres doesn&#x27;t provide this; the leading Postgres BM25 extension, ParadeDB, is guarded behind AGPL; developing our own extension appeared daunting. We&#x27;d need a small team of sharp engineers and 6-12 months, I figured. And we&#x27;d probably still fall short of the performance of a mature system like Parade&#x2F;Tantivy.<p>Or would we? I&#x27;d be experimenting long enough with AI-boosted development at that point to realize that with the latest tools (Claude Code + Opus) and an experienced hand (I&#x27;ve been working in database systems internals for 25 years now), the old time estimates pretty much go out the window.<p>I told our CTO I thought I could solo the project in one quarter. This raised some eyebrows.<p>It did take a little more time than that (two quarters), and we got some real help from the community (amazing!) after open-sourcing the pre-release. But I&#x27;m thrilled&#x2F;exhausted today to share that pg_textsearch v1.0 is freely available via open source (Postgres license), on Tiger Data cloud, and hopefully soon, a hyperscalar near you:<p><a href="https:&#x2F;&#x2F;github.com&#x2F;timescale&#x2F;pg_textsearch" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;timescale&#x2F;pg_textsearch</a><p>In the blog post accompanying the release, I overview the architecture and present benchmark results using MS-MARCO. To my surprise, we were not only able to meet Parade&#x2F;Tantivy&#x27;s query performance, but exceed it substantially, measuring a 4.7x advantage on query throughput at scale:<p><a href="https:&#x2F;&#x2F;www.tigerdata.com&#x2F;blog&#x2F;pg-textsearch-bm25-full-text-search-postgres" rel="nofollow">https:&#x2F;&#x2F;www.tigerdata.com&#x2F;blog&#x2F;pg-textsearch-bm25-full-text-...</a><p>It&#x27;s exciting (and, to be honest, a little unnerving) to see a field I&#x27;ve spent so much time toiling in change so quickly in ways that enable us to be more ambitious in our technical objectives. Technical moats are moats no longer.<p>The benchmark scripts and methodology are available in the github repo. Happy to answer any questions in the thread.<p>Thanks,<p>TJ (tj@tigerdata.com)

https://github.com/timescale/pg_textsearch

Comments

by: simonw

This is really cool. I&#x27;ve built things on PostgreSQL ts_vector() FTS in the past which works well but doesn&#x27;t have whole-index ranking algorithms so can&#x27;t do BM25.<p>It&#x27;s a bit surprising to me that this doesn&#x27;t appear to have a mechanism to say &quot;filter for just documents matching terms X and Y, then sort by BM25 relevance&quot; - it looks like this extension currently handles just the BM25 ranking but not the FTS filtering. Are you planning to address that in the future?<p>I found this example in the README quite confusing:<p><pre><code> SELECT * FROM documents WHERE content &lt;@&gt; to_bm25query(&#x27;search terms&#x27;, &#x27;docs_idx&#x27;) &lt; -5.0 ORDER BY content &lt;@&gt; &#x27;search terms&#x27; LIMIT 10; </code></pre> That -5.0 is a magic number which, based on my understanding of BM25, is difficult to predict in advance since the threshold you would want to pick varies for different datasets.

3/31/2026, 8:02:53 PM


by: mattbessey

Please oh please let GCP add this to the supported managed Postgres extensions...

3/31/2026, 9:21:07 PM


by: andai

Can you explain this in more detail? Is this for RAG, i.e. combining vector search with keyword search?<p>My knowledge on that subject roughly begins and ends with this excellent article, so I&#x27;d love to hear how this relates to that.<p><a href="https:&#x2F;&#x2F;www.anthropic.com&#x2F;engineering&#x2F;contextual-retrieval" rel="nofollow">https:&#x2F;&#x2F;www.anthropic.com&#x2F;engineering&#x2F;contextual-retrieval</a><p>Especially since what Anthropic describes here is a bit of a rube Goldberg machine which also involves preprocessing (contextual summarization) and a reranking model, so I was wondering if there&#x27;s any &quot;good enough&quot; out of the box solutions for it.

3/31/2026, 9:00:27 PM


by: shreyssh

Nice work. pg_search has been on my radar for a while, having BM25 natively in Postgres instead of bolting on Elasticsearch is a huge DX win. Curious about the index build time on larger datasets though. I&#x27;m working with ~2M row tables and the bottleneck for most Postgres extensions I&#x27;ve tried isn&#x27;t query speed, it&#x27;s the initial indexing. Any benchmarks on that?

3/31/2026, 8:35:47 PM


by: jascha_eng

FWIW TJ is not your average vibe coder imo: <a href="https:&#x2F;&#x2F;www.linkedin.com&#x2F;in&#x2F;todd-j-green&#x2F;" rel="nofollow">https:&#x2F;&#x2F;www.linkedin.com&#x2F;in&#x2F;todd-j-green&#x2F;</a><p>In september he burned through 3000$ in API credits though, but I think that&#x27;s before we finally bought max plans for everyone that wanted it.

3/31/2026, 7:36:06 PM


by: zephyrwhimsy

Input quality is almost always the actual bottleneck. Teams spend months tuning retrieval while feeding HTML boilerplate into their vector stores.

3/31/2026, 9:10:29 PM


by: Unical-A

Impressive benchmarks. How does the BM25 implementation handle high-frequency updates (writes) while maintaining search latency? Usually, there&#x27;s a trade-off between ingest speed and search performance in Postgres-based full-text search.

3/31/2026, 8:57:48 PM


by: timedude

When is this available on AWS in Aurora? Anyone from AWS here, add it pronto

3/31/2026, 9:12:50 PM


by: gmassman

Very exciting! Congrats on the release, this will be a huge benefit to all folks building RAG&#x2F;rerank systems on top of Postgres. Looking forward to testing it out myself.

3/31/2026, 8:47:17 PM


by: jackyliang

VERY excited about this, literally just looking to build hybrid search using Postgres FTS. When will this be available on Supabase?

3/31/2026, 8:50:07 PM


by:

3/31/2026, 8:03:41 PM


by: gplprotects

&gt; ParadeDB, is guarded behind AGPL<p>What a wonderful ad for ParadeDB, and clear signal that &quot;TigerData&quot; is a pernicious entity.

3/31/2026, 8:04:09 PM


by: benjiro3000

[dead]

3/31/2026, 8:05:38 PM


by: zephyrwhimsy

[dead]

3/31/2026, 9:11:29 PM


by:

3/31/2026, 4:29:52 PM