How We’re Progressing OCR AI Technology

By October 28, 2021 Blog
An OCR AI concept graphic around machine learning, deep learning, and innovation

How We’re Progressing Artificial Intelligence

Today, we discuss how Xtracta is progressing within the realm of OCR AI technology and machine learning, including deep learning and its relevance within the development of general training models.

Controlled Feature Set Machine Learning

When it comes to introducing learning models to new clients, one of the decisions we have made is to utilise controlled feature set machine learning models. By doing this, the number of training sets and the number of samples customers need to train with can be significantly decreased.

One advantage of using this approach to training models is significantly reducing the length of time it takes for users to implement our software and get started using it. Compared to other products within the market, which require large volumes of data and a lengthy timeframe to train models, models created through controlled feature sets take very little training time to start getting results.

General Models and Where Deep Learning Fits In

In Xtracta’s machine learning world, a ‘general model’ is a training model that customers can start using straight away, to get results instantaneously for common document types such as contracts, invoices, receipts, and more, without having to spend a lot of time training.

Xtracta currently provides generic extraction models for a number of common Intelligent Document Processing (IDP) use cases. This means clients can get started with extracting data without having to train their own models; although most clients will look to improve results by training their own specific models where the generic models are not as accurate as they need. Xtracta is currently looking into how to generate more improved generic models that clients can then build upon.

Xtracta has very large volumes of training data—partly because most of our users are utilising a single instance of our system through our public cloud. While their documents are kept private and secure from different users, our learning systems do have the ability to combine the documents from all users from that system. This means that Xtracta has incredibly high volumes of information that can be data mined.

However, while we do use this data to build on some of our models, because of the high volume, Xtracta must be somewhat selective about how we create training sets used to train general models.

Xtracta is progressing toward incorporating more aspects of deep learning into training models. Through a deep learning approach, we are looking to create larger models that can utilise a much more complete set of data for training and, therefore, generalise better to the problem.

Through these advancements, Xtracta clients also will have the ability to utilise larger base models to build upon rather than having their own separate models built each time they do training.

The future of Optical Character Recognition (OCR AI)

At the same time, we are constantly introducing new features within Xtracta’s existing system. For example, we are constantly introducing new specialised vendors for clients to work with—meaning that we can offer our clients an ever-growing list of alternatives for optical character recognition (OCR).

This means if a user is struggling with the accuracy of a particular OCR engine, they have the freedom to simply switch and use a different engine that better meets their needs.

As that list of engines expands so does the capability of the system to process lower-quality and handwritten documents, including documents in a multitude of different written scripts. This is because different engines have different benefits in certain circumstances. By providing a wide range of engine options that users can choose from, clients can manage a wide variety of OCR issues simply by choosing different engines that perform better for them.

Talk to the team at Xtracta today to discover the advantages of implementing OCR data capture technology in your company

Companies can maximise their workflow efficiency and accuracy through Xtracta’s specialised approach to machine learning and intelligent document processing. Powered by artificial intelligence, the Xtracta engine offers incredible document classification accuracy, and general models require minimum training time to show results.

To learn more about invoice, receipt, and contract data capture possibilities with Xtracta, get in touch with the team today.