Search

Familiarize the Document

Last Updated: Sep 13, 2021

Articles

When the Doc Reader node is executed, the document is processed, and the fields are extracted by the Doc Reader.
A user interface is provided for confirming and verifying the machine interpretations or for training the Doc Reader.
The user can simply point and correct any changes; both label as well as data and train the Doc Reader to extract the correct fields from the document. Rule-based model gets applied to all documents of a similar format.

If the document table is created using Predefined Schema, Doc Reader predicts the fields and are auto-populated in the right-hand side panel. You can correct them by choosing the exact fields from the document.
- You can enable the Auto Processing option for Document Tables with Predefined schemas. The values for predefined columns get auto-predicted by the prebuilt ML models. Using the predicted data, the documents are auto-approved without a need for manual interventions. The status is updated to EXTRACTED_SUCCESSFULLY automatically.
If the document table is created using New Schema, you must train the Doc Reader to extract the fields.

Document Familiarization

In Datasets listing page, click the Document Table name selected in the Properties of the Doc Reader node.
Click the icon next to the processed row. The Document familiarization window opens.

Document is displayed on the left-hand side panel and the Fields to be trained or corrected are displayed in the right-hand side panel.

If any corrections, for the fields predicted by the Doc Reader, or if any fields not predicted by the Doc Reader,

Click the field in the right-hand side panel that needs to be corrected and select the required Label from the PDF in the left panel.
Correct the value of the fields by selecting the required value from the PDF in the left-hand side panel.

If the Invoice Date field value is not captured correctly, click the Invoice Date field value in the right panel and select the required value for Invoice Date in the PDF from the left panel. The correct Invoice Date is now extracted.

Familiarize Singleline

Click the label to be familiarized in the right-hand panel.
Select label from the document for the singleline field.
Click the value field against the label. You can draw a square around the desired value using the icon. The value field gets extracted by Doc Reader.

Invoice Number, PO Number, Total Amount, Date, etc.

Ensure that the field in Document table is created with the datatype as Singleline.

Familiarize Multiline

Click the label to be familiarized in the right-hand panel.
Select label from the document for the multiline field.
Click the value field to be familiarized in the right-hand panel.
Press and hold the Ctrl key and select value from the document in the left-hand panel.- Only the selected text gets extracted.
OR
You can draw a square around the desired text using the icon.
The text from the selected area gets extracted by Doc Reader.- This enables extraction of the text from the multiline field without any restriction on the number of lines.

Address of the Company

Ensure that the field in Document table is created with the datatype as Multiline.

If the document has no labels, familiarize the value field alone.

Familiarize Checkbox

Select the label from the document.
Click the value field against the label. You can draw a square around the desired checkbox/radio button using the icon. The value field gets updated by Doc Reader as:
- Checked if the checkbox is ticked.
- Unchecked if the checkbox is blank.
Click the value field to preview the captured checkbox/radio button.

Ensure that the field in Document table is created with the datatype as Toggle.

Familiarize Image

Select the label from the document.
Click the value field against the label. You can draw a square around the desired image using icon. The captured image gets saved as a .png file.
Click the value field to preview the captured image.

Signatures, Logo, Photographs, etc. of the Company

Ensure that the field in Document table is created with the datatype as Image.

You can fetch the values from Document table in Tasks using Jiffy Select node and use Download from Server function to download the images.

Familiarize Table within the Document

If the document has tables, Doc Reader automatically identifies the tabular structures and extracts the contents as tables in the right-hand side panel under Line-Item details.
To know more about how to familiarize table within the document, click here.

Save and Approve

After all, the required fields are familiarized and verified, click the SAVE AND APPROVE button to save the changes and approve the category of the document processed. The trained model gets applied to all documents of the same category.

The extracted fields are populated into the Document Table and the status is updated to EXTRACTED_SUCCESSFULLY.

Click the view button to view the Data in the Inline Table.

Behavior when Similar Template Document is Processed

When a document of the similar template is processed, the same category gets assigned to the document as the already approved one. The existing rule-based model for the category is applied, and the status is auto-populated to EXTRACTED_SUCCESSFULLY. All the required data from the document is extracted and auto-populated in the Document Table.

Base Document

The earlier approved PDF for the category, referred to as the Base Document , is displayed in the Document Familiarization page along with the current processed PDF.

The Base Document displays all the extracted fields as per the settings applied in data cleansing, reference label, and table settings options.

The currently processed PDF document is auto-approved, and the Approved Date is displayed.