Last Updated: Feb 14, 2022
When the Doc Reader node is executed, the document is processed, and the fields are extracted by the Doc Reader. A user interface is provided for confirming and verifying the machine interpretations or for training the Doc Reader. The user can simply point and correct any changes; both label as well as data and train the Doc Reader to extract the correct fields from the document. Rule-based model gets applied to all documents of a similar format.
Document is displayed on the left-hand side panel and the Fields to be trained or corrected are displayed in the right-hand side panel.
If any corrections, for the fields predicted by the Doc Reader, or if any fields not predicted by the Doc Reader,
If the Invoice Date field value is not captured correctly, click the Invoice Date field value in the right panel and select the required value for Invoice Date in the PDF from the left panel. The correct Invoice Date is now extracted.
Invoice Number, PO Number, Total Amount, Date, etc.
Ensure that the field in Document table is created with the datatype as Singleline.
Address of the Company
Ensure that the field in Document table is created with the datatype as Multiline.
If the document has no labels, familiarize the value field alone.
Ensure that the field in Document table is created with the datatype as Toggle.
Signatures, Logo, Photographs, etc. of the Company
Ensure that the field in Document table is created with the datatype as Image.
You can fetch the values from Document table in Tasks using Jiffy Select node and use Download from Server function to download the images.
If the document has tables, Doc Reader automatically identifies the tabular structures and extracts the contents as tables in the right-hand side panel under Line-Item details. To know more about how to familiarize table within the document, click here.
After all, the required fields are familiarized and verified, click the APPROVE button to save the changes and approve the category of the document processed. The trained model gets applied to all documents of the same category. The extracted fields are populated into the Document Table and the status is updated to EXTRACTED_SUCCESSFULLY.
Click the view button to view the Data in the Inline Table.
When a document of the similar template is processed, the same category gets assigned to the document as the already approved one. The existing rule-based model for the category is applied, and the status is auto-populated to EXTRACTED_SUCCESSFULLY. All the required data from the document is extracted and auto-populated in the Document Table.
The earlier approved PDF for the category, referred to as the Base Document , is displayed in the Document Familiarization page along with the current processed PDF.
The Base Document displays all the extracted fields as per the settings applied in data cleansing, reference label, and table settings options.
The currently processed PDF document is auto-approved, and the Approved Date is displayed.