Search

Articles

When the Doc Reader node is executed, the document is processed, and the fields are extracted by the Doc Reader.
A user interface is provided for confirming and verifying the machine interpretations.
The user can simply point and correct any changes; both label as well as data and train the Doc Reader to extract the correct fields from the document. Rule-based model gets applied to all documents of a similar format.

  1. In Datasets listing page, click the Document Table name selected in the Properties of the Doc Reader node.
  2. Click the Image description icon next to the processed row. The Document familiarization window opens.

The PDF displays on the left-hand side panel. The Fields predicted by the Doc Reader are auto-populated in the right-hand side panel.

If any corrections, for the fields predicted by the Doc Reader, or if any fields not predicted by the Doc Reader,

  • Click the field in the right-hand side panel that needs to be corrected and select the required Label from the PDF in the left panel.
  • Correct the value of the fields by selecting the required value from the PDF in the left-hand side panel.

If the Invoice Date field value is not captured correctly, click the Invoice Date field value in the right panel and select the required value for Invoice Date in the PDF from the left panel. The correct Invoice Date is now extracted.

Data Capture Rule

In scenarios where the field occures:

  • in multiple pages, and you need to extract the first or last occurrence of the field.
  • at multiple places in a page, and you need to provide a reference position to extract the field.

These additional rules can be specified for extraction, using the Data Capture Rule option.

Click Image description icon against the field in the right panel for which you want to add any additional rules. The Data Capture Rule window opens for that field.

Data Capture Rule for Invoice Date

Image description

Either of the following Data Capture Rules can be applied.

Occurrence of the Data

You can choose the occurrence of the data from the drop-down. It can be either First Occurrence or Last Occurrence.

Consider a PDF with ten pages having Total Amount on all pages, the Last Occurrence is selected in the drop-down and the Total Amount on the last page is extracted.

Relative Reference

If a document has multiple occurrences of labels that you are extracting, use this option to identify the one to be extracted.

Consider a PDF where the GST Number occurs at multiple places, use this option to extract the desired GST Number which is displayed below the Address.

Manual Review

In cases where the document being processed is not of the desired format and you want to skip it from the auto category identification algorithm.

Click the Manual Review button in the document familiarization window to skip the document training process. All the fields become non-editable.

You choose not to train the Doc Reader for that document and the status is updated to MANUAL_ INTERVENTION_FOR_REVIEW.

Image description

Familiarize Table within the Document

If the document has tables, Doc Reader automatically identifies the tabular structures and extracts the contents as tables in the right-hand side panel under Line-Item details.
To know more about how to familiarize table within the document, click here.

SAVE AND APPROVE

After all, the required fields are familiarized and verified, click the SAVE AND APPROVE button to save the changes and approve the category of the document processed.

The extracted fields are populated into the Document Table and the status is updated to EXTRACTED_SUCCESSFULLY. Image description

Click the view button to view the Data in the Inline Table.

Image description

Behavior when Similar Template Document is Processed

When a document of the similar template is processed, the same category gets assigned to the document as the already approved one. The existing rule-based model for the category is applied, and the status is auto-populated to EXTRACTED_SUCCESSFULLY. All the required data from the document is extracted and auto-populated in the Document Table.

Base Document

The earlier approved PDF for the category, referred to as the Base Document , is displayed in the Document Familiarization page along with the current processed PDF.

The Base Document displays all the extracted fields as per the settings applied in reference label and table settings options.

Image description

The currently processed PDF document is auto-approved, and the Approved Date is displayed.

Image description

Auto Processing

You can enable the Auto Processing option for Document Tables with predefined schemas. The values for predifined columns get auto-predicted by the prebuilt ML models. Using the predicted data, the documents are auto-approved without a need of manual interventions. The status is updated to EXTRACTED_SUCCESSFULLY automatically.

Did you find what you were looking for?