Last Updated: Dec 28, 2022
Document Processing is the conversion of paper-based and electronic documents into digital information using the combination of Intelligent Character Recognition (ICR), Optical Character Recognition (OCR), Machine Learning (ML) algorithm, and necessary manual interventions.
The types of documents and the nodes which are used to process them are listed below.
If the document contains image, install ABBY Fine Reader to convert image to editable text and pass it through the Doc Reader node to extract the data.
In JIFFY.ai, Invoice and Bill of Lading are provided as predefined schemas for ease of use. Invoice schema comes with thirty-five predefined fields and Bill of Lading schema with twelve predefined fields. Jiffy.ai automatically extracts information from these documents without any training and provides out-of-the-box machine learning models for these document types.
The model is already trained for the predefined schemas. When an Invoice or Bill of Lading is processed through the Doc Reader node, you do not need to train the ML. The data is extracted automatically from the documents using the built-in extraction modules.
For other documents, you may have to train using the point and click familiarization environment provided.
Document processing is achieved in four phases:
If Document Table is created using custom schema, the fields are auto-extracted based on the existing trained model.
In an Invoice Processing HyperApp:
The document is familiarized, saved, and approved to train the ML engine for the category of document being processed. The approved fields are populated into InvoiceTable for further processing.
- A Document Table with name InvoiceTable is created using Invoice schema.
- A Task is designed with Doc Reader node to extract the fields from the Invoice.
- The Task is executed to extract the fields.