Show HN: 1-Bit Bonsai, the First Commercially Viable 1-Bit LLMs

by PrismML on 3/31/2026, 9:01:18 PM

Comments

by: jjcm

1 bit with a FP16 scale factor every 128 bits. Fascinating that this works so well.I tried a few things with it. Got it driving Cursor, which in itself was impressive - it handled some tool usage. Via cursor I had it generate a few web page tests.On a monte carlo simulation of pi, it got the logic correct but failed to build an interface to start the test. Requesting changes mostly worked, but left over some symbols which caused things to fail. Required a bit of manual editing.Tried a Simon Wilson pelican as well - very abstract, not recognizable at all as a bird or a bicycle.Pictures of the results here: <a href="https://x.com/pwnies/status/2039122871604441213" rel="nofollow">https://x.com/pwnies/status/2039122871604441213</a>There doesn't seem to be a demo link on their webpage, so here's a llama.cpp running on my local desktop if people want to try it out. I'll keep this running for a couple hours past this post: <a href="https://unfarmable-overaffirmatively-euclid.ngrok-free.dev" rel="nofollow">https://unfarmable-overaffirmatively-euclid.ngrok-free.dev</a>

3/31/2026, 11:39:52 PM

by: keyle

Extremely cool!Can't wait to give it a spin with ollama, if ollama could list it as a model that would be helpful.

4/1/2026, 12:22:46 AM

by: volume_tech

the speed is not just about storage -- at 1-bit you are reading roughly 16x less data from DRAM per forward pass compared to FP16. on memory-bandwidth-constrained hardware that is usually the actual bottleneck, so the speedup scales pretty directly. the ac

4/1/2026, 12:22:17 AM

by: ariwilson

Very cool and works pretty well!

4/1/2026, 12:24:29 AM

by: _fw

What’s the trade-off? If it’s smaller, faster and more efficient - is it worse performance? A layman here, curious to know.

3/31/2026, 11:25:52 PM

by: alyxya

I expect the trend of large machine learning models to go towards bits rather than operating on floats. There's a lot of inefficiency in floats because typically they're something like normally distributed, which makes the storage and computation with weights inefficient when most values are clustered in a small range. The foundation of neural networks may be rooted in real valued functions, which are simulated with floats, but float operations are just bitwise operations underneath. The only issue is that GPUs operate on floats and standard ML theory works over real numbers.

3/31/2026, 11:18:35 PM

by: syntaxing

Super interesting, building their llama cpp fork on my Jetson Orin Nano to test this out.

3/31/2026, 10:59:48 PM

by: Archit3ch

Doesn't Jevons paradox dictate larger 1-bit models?

3/31/2026, 11:22:26 PM

by: hatthew

I feel like it's a little disingenuous to compare against full-precision models. Anyone concerned about model size and memory usage is surely already using at least an 8 bit quantization.Their main contribution seems to be hyperparameter tuning, and they don't compare against other quantization techniques of any sort.

3/31/2026, 11:52:10 PM

by: yodon

Is Bonsai 1 Bit or 1.58 Bit?

3/31/2026, 10:38:28 PM

by: OutOfHere

How do I run this on Android?

3/31/2026, 11:22:04 PM

by: stogot

What is the value of a 1 bit? For those that do not kno

3/31/2026, 10:47:57 PM

by:

3/31/2026, 9:31:21 PM

by: techpulselab

[dead]

4/1/2026, 12:14:00 AM

by: simian1983