Doc Reader node is used to extract data from documents and store the data in Document Table.
Doc reader node has a built-in classifier and machine learning models for extracting fields from the documents.
You can design a Task using Doc Reader node to extract fields from the document. The Task is executed, and the extracted fields are populated into Document Table.
How does the Doc Reader Node Work?
When a document is passed through Doc Reader node, it automatically classifies and assigns a category name to the document. When the document is processed by Doc Reader node, it inserts a record in the Document Table selected in Properties tab. The Document Table is populated with the assigned category.
- If a document was previously trained, in that category, the document is processed based on the past training and the status of the record changes to EXTRACTED_SUCCESSFULLY.
- If the category was not previously trained for any document, the status is assigned as NEW_TYPE so that you can familiarize the document.
- When the document is familiarized, verified, and approved, the status of the record changes to EXTRACTED_SUCCESSFULLY.
- You can provide additional validation rules for the extracted fields using the Post-Processing task. If any of the validations fail, the status of the record is updated to MANUAL_INTERVENTION_VALIDATION_FAILED, to refamiliarize the document.
- After the document is refamiliarized, the status of the record changes to EXTRACTED_SUCCESSFULLY.
- If the document received is not in a proper format, and you do not want to familiarize the document, you can skip the training and status gets updated to MANUAL_INTERVENTION_FOR_REVIEW.
Configurations of Doc Reader Node
Double-click the Doc Reader node.
Enter the Configuration Name and Cluster in Configurations tab.
Properties of Doc Reader Node
- Navigate to Properties tab and enter Name and Description.
- Enable Continue on Failure and Mark run Failure on Node Fail options as required.
Provide the following details:
- Document Table: All the Document Tables that are created for the HyperApp are listed in the drop-down. Select the Document Table into which you want to populate the extracted data.
- Post-Processing Task: All the Tasks created in the HyperApp are listed in the drop-down. Select the Post-Processing Task which needs to be triggered for validating the extracted fields.
- Node Version: You can select the version(New/Deprecated) from the drop-down.
New version is an improved version that can process up to 300 documents in five minutes.
By default, for all new tasks Version is New and for all existing tasks version is Deprecated.
Result of Execution in Doc Reader Node
After executing task, Result of Execution window is displayed with Input, Output, and Run Info tabs.
- Input: The input variables that are mapped to the node are displayed.
- Output: The TablePath, the Run ID of post-processing task, and documentUUID are displayed.
- Run Info: The Run Details and Configuration Details are displayed.
- Run Details: Run Details include Run ID, Sequence Number, Iteration ID, Iteration Start Time, Iteration End Time, Iteration Time (in seconds), and Total Node Execution Time (in seconds).
- Configuration Details: Configuration Details include Configuration Name, Document Type, Clusters, and Config Level.