Research-Led Since Day One.
Not Retrofitted.
Xtracta’s engine is built on research that started long before document intelligence became a technology trend. Our team has been studying how machines can understand documents the way humans do for over 14 years. That research is not a feature we added. It is the foundation everything else is built on.
Why This Matters
Most companies in this space started with traditional document scanning technology and added intelligence later. A layer on top. A feature in the marketing. Xtracta went the other way. The intelligence came first. The product was built around it.
That matters because the engine doesn’t just recognise characters on a page. It understands layout, structure, language, and context. It knows what type of document it’s looking at, what the data means, and where it belongs. That’s not something you get by bolting a new feature onto old technology. It comes from years of focused research.
What the Research Covers
How Machines Learn from Documents
The engine learns from every document it processes. Not from instructions.
Xtracta’s research programme focuses on how systems can learn to extract data from documents they’ve never seen before, without templates or manual rules. The engine builds its understanding from real-world production data, not training sets in a lab.
Understanding Language in Context
The same word means different things in different documents.
The research team works on how the engine interprets language within the specific context of a document. A “total” on a receipt means something different from a “total” on a contract. The engine needs to know the difference, and it does.
Reading Structure, Not Just Text
Where data sits on a page matters as much as what it says.
Documents communicate through layout, not just words. Tables, columns, headers, indentation, spacing. The research programme focuses on how the engine reads these structural signals to understand what data means and where it belongs.
Continuous Improvement at Scale
Every document the engine processes makes it better for the next one.
The research team works on how to make the engine improve continuously from production data across the entire global network. Not just for one customer, but for all of them. Patterns learned from one document type improve accuracy across others.
How the Engine Learns
Xtracta operates as a cloud service. Every document that has ever been processed across the entire global network contributes to a continuously evolving knowledge base. The engine mines this data to find patterns, correlations, and structures that improve how it extracts information.
This works in two parts. First, core data mining looks across the entire network, finds relevant links and structures, and generates an organically changing pool of extraction approaches. Second, the processing servers that capture data in real time for each customer are continuously updated with these learnings.
The result: an engine that gets smarter every day, without anyone having to tell it what to do. A document processed in one country improves accuracy for a different customer in another. The more the network grows, the better it gets for everyone.
The Research Team
Our research team is a diverse group of specialists from around the world. Several hold PhDs. Many have published in international journals. They bring together the open thinking of academic research with the discipline of building products that work in production, every day, at global scale.
Years collective R&D experience
Years collective tech experience
PhDs
On the research team
NZ Govt
High-tech funding recipient
How the Technology Has Evolved
2010
Research begins. The founding question: can a system learn to read documents the way a human does, without being told what to look for?
2013
First production engine launches. The core approach works: learn from documents, not from templates.
2014
Advanced extraction capabilities go live, delivering near-perfect results on complex documents from day one.
2018
New models for complex line item extraction. System re-architecture for scalable, container-based infrastructure.
2021
Major investment in straight-through processing. New data transformation and validation capabilities reduce the need for human verification.
2022
Research begins on deep learning transformer approaches. The goal: dramatically improved out-of-the-box extraction across all document types.
2024
First production-ready deep learning transformer models rolled out for the most common document types.
Next
The engine keeps evolving. The research team is working on the next generation of document understanding. The principle stays the same: set it up, let it learn, forget about it.
FAQ
Since 2010. Xtracta has been studying how machines can understand documents for over 14 years, long before document intelligence became a technology trend. The research programme has been continuous and production-focused from the start.
The research programme covers how machines learn from documents without templates, how language is interpreted in context, how document layout and structure communicate meaning, and how the engine can improve continuously from production data at global scale.
The engine learns from every document processed across its entire global network. It mines production data to find patterns and structures, builds a continuously evolving knowledge base, and updates processing servers in real time. The more the network grows, the better it gets for everyone.
Transformer models are a newer approach to teaching machines to understand documents. Xtracta began research into this area in 2022 and rolled out production-ready models in 2024 for the most common document types. They deliver significantly improved out-of-the-box accuracy.
The team includes specialists with PhDs and international journal publications. Collectively, they have over 55 years of R&D experience and over 200 years of technology experience. Xtracta is a recipient of high-tech funding from the New Zealand government for its research programme.
Learn more: About Xtracta · How Xtracta Works · Customer Stories · Careers
See What 14 Years of Research Delivers.
This isn’t a self-service sandbox. Our team sets it up with you.
Your data, from day one.