Search

Design and Execute Document Processing Task

Last Updated: Dec 28, 2022

Articles

Design Document Processing Task

The Intelligent Document Processing system of JIFFY.ai recognizes, classifies, extracts data from the document. It provides a Human in the loop interface to confirm and verify the Machine interpretations. It validates extracted data using a Post-Processing Task and stores extracted data in a Document Table.

You must create the Document Table before you start designing the Document Processing task.

Perform the following steps to design Document Processing Task using the Doc Reader node.

Use Doc Reader Node

Drag the Doc Reader node from the Document processing category in the Nodes panel to the Design Studio and connect to the Start and End nodes.
Specify the Configuration and Properties of the Doc Reader node.

In the Properties tab, select the Document Table and Post Processing Task.

Document Table

Document Table is the persistence layer to store, track and present extracted contents of the document being processed. When the task is executed, the extracted fields get populated into Document Table.

All Document Tables that are created for the HyperApp are listed in the Document Table drop-down.
Select the Document Table created to populate data extracted from document.

Post Processing Task

Post Processing Tasks are designed to validate the fields extracted by the Doc Reader from the document. This ensures that a check is in place and validates the fields extracted by Doc Reader.
If the validation in the Post Processing Task fails, the status of the record is updated to MANUAL_INTERVENTION_VALIDATION_FAILED. The user can then re-familiarize the document to correct the fields.
Validation error messages are displayed in the familiarization window.

All Tasks created in the HyperApp are listed in the Post Processing Task drop-down.
Select the Post Processing Task you have designed for validatiion.

Task is created to validate if the PONumber extracted exists in the PurchaseOrder System. When this task is run and if the extracted PONumber is not found in PurchaseOrder System, the status is updated to MANUAL_INTERVENTION_VALIDATION_FAILED. The user re-familiarizes the document and picks the correct PONumber from document.

Post Processing Task Prerequisites

Design task based on validation requirements for the extracted fields. To know more about how to design a task, click here.
Add parameters for UUID and TablePath in the Start node of the Post Processing Task.
If validation fails, update status of the record to MANUAL_INTERVENTION_VALIDATION_FAILED using the function Set Status For Document

The data for these parameters are passed from the Doc Reader node and they link the Post Processing Task to the main task.

The Post Processing Task must not contain any Doc Reader node or Task node in it.

Input Parameters to the Doc Reader Node

The following input parameters are mapped to Doc Reader node.

Location: Path of Document

C:\Documents\Invoice.pdf

Password: If the document is password-protected
Category: Classifies the document by type of document and identifies the classification group that the document falls in, based on document format.
pdfFID: Jiffy File Server ID of the document. If pdfFID of document on server is mapped, Doc reader executes on the server.

Category Identification

When a document is passed through Doc Reader node, it automatically classifies and assigns a category name to the document. When the document is processed by Doc Reader node, it inserts a record in the Document Table selected in Properties tab. The Document Table is populated with the assigned category.
The category of the document can be defined in two ways:

1. Map the Category in the Task

You can choose a specific category for the document. Every time the task is run, the documents processed are assigned the same category that you map in the task.

Select the category tag from the right-hand side panel and click the Element Map button.
Enter the desired category for the document in the What to get? field.

You can map variables to the Doc reader node instead of providing constant values.

2. Doc Reader Defines the Category for the Document

Doc Reader assigns a unique category for document if the Category is not mapped in the task.

When documents of similar formats are processed, Doc Reader assigns the same category to the documents. So, all documents of a similar format get assigned with the same category and the same ML algorithm gets applied for the data extraction.

If the document processed is a different template, then a new unique category is assigned to it.

Execute Task

When the task is executed, the fields are extracted from the document.

To know more about how to execute a task, click here.

Click the Image description icon to view or edit the Post Processing Task.

Click the Image description to view the Result of Execution window.

The Result of Execution window displays Input, Output, and Run Info.

Input: The input variables that are mapped to the node are displayed.
Output: The TablePath, the Run ID of post-processing task, and documentUUID are displayed.
Run Info: The Run Details and Configuration Details are displayed.

Navigate to the Datasets listing screen to view the Document Table specified in the task.

The Document Table is populated with default columns and the status of document is NEW_TYPE.

Use Other Document Processing Nodes

In addition to Doc Reader node, you can also use the following Document processing nodes based on the contents and type of document being processed

Hand Written Text Extraction node if the document contains handwritten text/ images that have to be extracted/ converted or removed.
Document Processing node to perform various operations like converting Excel to PDF, splitting the PDF, merging PDFs, setting validations for fields being extracted.