OCR for construction documents does not work, we fixed it
by wcisco17 on 3/30/2026, 4:05:46 PM
So we've built an API and trained models that detects fixtures, extracts schedules, and analyzes construction documents. Check us out!<p>More examples: - <a href="https://www.getanchorgrid.com/developer/docs/endpoints/drawings-doors" rel="nofollow">https://www.getanchorgrid.com/developer/docs/endpoints/drawi...</a><p>Main website: - <a href="https://www.getanchorgrid.com/developer" rel="nofollow">https://www.getanchorgrid.com/developer</a><p>Why we did it: <a href="https://www.getanchorgrid.com/developer/docs/changelog/construction-drawings-are-data-prisons" rel="nofollow">https://www.getanchorgrid.com/developer/docs/changelog/const...</a>
https://www.getanchorgrid.com/developer/docs/endpoints/drawings-doors
Comments
by: Terr_
> OCR for construction documents does not work<p>I'm reminded of the Xerox JBIG2 bug back in ~2013, where certain scan settings could silently replace numbers inside documents, and bad construction-plans were one of the cases that led to it being discovered. [0]<p>It wasn't overt OCR <i>per se</i>, end-user users weren't intending to convert pixels to characters or vice-versa.<p>[0] <a href="https://www.youtube.com/watch?v=c0O6UXrOZJo&t=6m03s" rel="nofollow">https://www.youtube.com/watch?v=c0O6UXrOZJo&t=6m03s</a>
3/30/2026, 5:34:32 PM
by: i18nagentai
OCR accuracy on technical documents is one of those problems that looks 95% solved until you hit the edge cases. Construction docs are especially tricky because of mixed handwriting, stamps, revision clouds, and poor scan quality. Curious how you handle multi-language documents — a lot of international construction projects have specs in two or three languages on the same page.
3/30/2026, 7:06:17 PM
by: sreekanth850
We’re taking a different path, building a parsing engine that converts CAD (DWG/DXF) into fully structured JSON with preserved semantics (no ML in the critical path).We also have a separate GIS parser that extracts vector data (features, layers, geometries) independently, Like to know how you handle consistency and reproducibility across runs using models and how you make it affordable, especially at scale. because as far as i know CAD and GIS need precision and accuracy.
3/30/2026, 6:53:20 PM
by: frogguy
Looks cool! Where are you getting the data to finetune the cv models for element extraction? I'm worried there isn't a robust enough dataset to be able to build a detection model that will generalize to all of the slightly different standards each discipline (and each firm for that matter) use.
3/30/2026, 6:14:19 PM
by: testUser1228
What do you foresee being the end use case for this (or most valuable use case)?
3/30/2026, 5:37:55 PM
by: Iulioh
When will this be available for 30000x8000px electrical diagrams?<p>I have to make a BOM and oh boy I hate my job
3/30/2026, 5:05:15 PM
by: hspraggins77
Great points raised!
3/30/2026, 5:47:27 PM
by: alexeischiopu
Good idea :)
3/30/2026, 5:30:59 PM
by: vessenes
cool. What's pricing like?
3/30/2026, 5:31:45 PM
by: achillesheels
Love it! <i>Starbucks Vente Machiato sip</i><p>Love to give it to an arc client, not sure who the right person to implement this would be? Hmm…
3/30/2026, 4:57:50 PM
by:
3/30/2026, 4:05:46 PM
by: fithisux
Of course it is not working. PDF and images are supposed to be tamper resistant. OCR tries to reverse engineer them.
3/30/2026, 4:48:30 PM
by: ware-intel
Your smart features looks like a game changer? Nice job!
3/30/2026, 6:04:50 PM