Deep Learning Announcement
As a number of our partners and customers already know, Xtracta has been undertaking research since early 2022 on building new data extraction methodologies using deep learning techniques. Deep learning is a subset of ML (Machine Learning) and the broader AI world. At its core it takes an approach akin to the mimicking of biological brains to perform its activities rather than statistical approaches or heuristics that are common in more traditional Machine Learning techniques.
Deep Learning techniques differ from our traditional approaches to learning and extraction and offer an exciting opportunity for our users to see significant improvements in the accuracy of data extraction from Xtracta’s systems.
On the 22nd June 2023, we launched this technology with our first customer in testing mode and are planning a wider rollout in the coming months. We wanted to alert our partners and customers to this release as it is the biggest change to our AI technologies in the company’s history and promises new opportunities and improves on existing workloads. We have prepared a number of Q/As to cover some of the questions we have already been asked and to provide more information regarding the technology.
Is this like ChatGPT and other Generative AI?
Xtracta made the decision early last year when undertaking this work to use transformer-based models, this was before products like ChatGPT came to market. As it turns out, generative AI products like ChatGPT, Bard etc. that are taking the world by storm currently have a similar technological approach in that they use transformer-based models. There are some key differences between Xtracta and these systems with the most important being that document data extraction AI tasks benefit from understanding on multi-modalities, i.e., text, layout and image. ChatGPT, Bard etc. are purely text-based and are thus unable to accurately represent the inherent complexities of visually rich documents.
Xtracta’s next-generation approach to extraction utilizes multi-modal transformers which incorporate all modalities of visually rich documents. Looking ahead, we plan to initiate research in multi-modal generative transformers. This upcoming initiative set to commence later this year aims to enable new extraction capabilities such as OCR-free end-to-end extraction (without a discrete OCR process), prompting and more.
What benefits will this bring?
The new technology has shown much enhanced precision for extraction, especially for documents where classification and per-class models are not in use or have not yet been trained. For most clients, these are typically situations where they are getting new designs and formats of documents, for example for invoice processing where a new supplier design of document arrives.
One of the key goals is to lower the accuracy differential in such instances so that even when the documents are first extracted, they get extremely high accuracy rather than only after a trained model is generated.
What are the new opportunities?
Xtracta has long offered extraction from a wide range of documents. For less-structured documents – I.e. those that are more text heavy, these have always been more challenging as typically we/our clients didn’t have the volume of data required to get optimal results considering the large degree of variety that usually came with them (and thus need for large training sets).
Transformer models allow us to use other documents in the training process (even those not of the type the model will be applied to) to ensure we have a large volumes of training data necessary for model generation.
Therefore, one of the key new opportunities to our partners and customers is the ability to extract from new, text heavy document types with much less training data than previously required.
When is this coming?
Our first iteration of this technology will be to introduce globally trained models for common document types. With our most common document types primarily relating to finance (such as invoices, purchase receipts, statements etc.), these will be the first document types with the technology enabled. Rollout for other document types where Xtracta maintains global models with then be next followed by custom document types.
Will pricing change for access to this technology?
For nearly all Xtracta users the answer is no – this new technology will be available to practically all clients under their existing pricing agreements with Xtracta. For those clients where there will need to be pricing changes for access to this technology, we will be making individual contact with you. There may be exceptions in cases clients want their own specific models trained. In such instances pricing will be indicated beforehand with consideration of elements such as the cost of hardware needed for the model training.
For those clients using private clouds, on-premise deployments or via your own accounts on public clouds, you may need to be prepared for greater infrastructure cost for access to the necessary hardware types or greater capacity on existing hardware types.
How can I start using deep-learning models in my workflows?
We aim to have deep-learning available in the first processing region (Australia) on our public cloud in late July 2023. Other regions will follow in August/September 2023. If you would like to add any of your workflows within the Australian region please email support@xtracta.com including your workflow ID and our team will add you to the waitlist. Initially we will have support for the following document types:
- Invoices
- Credit Notes
- Purchase Receipts
- Statements
- Land Deeds (USA only)
For those with private clouds or on-premise / customer clouds please contact us regarding deployment within your environment.
How is learning impacted?
Deep Learning differs quite substantially from more traditional machine learning in that the volume of data used to train individual models is at a completely different scale (i.e. it could include millions of documents vs. a statistical machine learning model that can often be built with as little as 5-6 documents). Especially considering the superior un-trained accuracy levels, we expect there to be less individually trained classes with these models deployed. That being said, we are continuing to support our existing models and training methodology and if these are found to provide a superior result, they will still take precedence over the deep learning model.
In addition, we are working on ways of training small additions to the core deep learning model for individual workflows (or even classes) that may be able to bring the best of both worlds into a unified system.
I use a private cloud(s) – can I use this technology?
With deep learning technologies, typically alongside standard computing infrastructure such as a CPU, system memory (RAM), persistent storage (disk) etc., a specialized type of hardware is required called a GPU – Graphics Processing Unit. This hardware has been used for many years in the support of rendering screen frames for video games and has proven to be particularly useful for new deep learning approaches due to the architecture of the platforms.
In saying this, the deep learning models that Xtracta has built can also be run from CPU only, albeit much, much more slowly than with GPUs. We are evaluating the performance differences currently but it may be that for lower document volume environments, the use of CPU to run models may be sufficient, especially where fast turn-around times for documents is not critical. We do expect that in most instances, the use of GPUs will offer more efficient and cost-effective processing.
All private clouds that Xtracta currently provides are with providers that can offer GPU based instances. CPU/GPU instances are more expensive than CPU only instances but we do expect these prices to more closely converge as there is growing demand for GPU processing from a range of applications into the future.
I use Xtracta on my own hardware or within my own account on a cloud provider – can I use this technology?
The short answer is yes. The information on the Q/A above regarding Private Clouds is relevant here too. We do understand that for many IT departments, the idea of including specialized hardware in their servers is from a bygone era of computing from the 20th century. To that end, the concept to them of providing specialized hardware (GPUs) for software applications may seem odd and they may take time to deploy this hardware. However, with more and more generative AI and deep learning technologies coming to market, we also expect that Xtracta will not be the only need your companies will have for running GPUs in their datacenters.
Xtracta can provide guidance on what hardware should be deployed.
If you are using a major public cloud provider, practically all are offering GPU options now and it usually is as simple as changing instance types to add support for GPU processing.
CPU based processing is also available and may be more suitable depending on your volume and capabilities of your IT departments to deploy new hardware.