Last Updated: Apr 27, 2020
Doc Reader node is used to extract data from pdf documents and stores the details in the doc table. Doc reader node extracts the data based on the definition of the doc table selected by the designer in the node properties. Refer to Doc Table to know more.
The doc reader has a built in classifier and has built in machine learning models for extracting fields from the pdf/doc/image files. Depending on the schema selected while creating the doc table the relevant ML models will be automatically triggered.
The doc reader uses ABBYY Fine reader internally. The bot machine where the doc reader will execute needs to have ABBYY fine reader installed if the OCR engine has to be invoked. However, note that the doc reader will identify the type of document and will invoke the OCR engine only if required. Hence the OCR engine may not be invoked for a text pdf.
The training screen for teaching the system to interpret the new documents are automatically created by the system based on the document table definition provided in the node.
When a pdf document is passed to the node, it automatically understands if the document is a new type that it is seeing for the first time and then depending on that it takes a decision to tag it to one of the status.
Double-click on the Doc Reader node.
Select any existing configuration if it matches the criteria of the user or create a new configuration. The user can also edit or copy the existing configuration by clicking on the Edit and Copy icons displayed against each configuration, respectively.
Click on the New Configuration radio button to create a new configuration if required.
Type the configuration name and the cluster.
Cluster is not required for design mode. It is required only for execution mode.
Double-click on the Doc Reader node and then click on the Properties tab.
Provide the following details:
Name: Name of the node. A default name is displayed in this field, which the user can edit according to the task and the intent of using the node.
Description: A short note on the purpose of the node.
Document Table: It is a drop-down field that lists all the Doc tables created for that App, from which the required table can be selected.
Continue on Failure:
Post-Processing Task: It is a drop-down field that lists all the tasks created for that App from which the user will be able to select a task that will be executed post the UI based validations. UUID of the record being processed will be passed to this task. The relevant status must be set within the post processing task. The post processing task must not have any PDF node in the task design and in the task node also. This selected task must not be same as the current working task.
Mark run Failure on Node Fail: When the Mark run Failure on Node Fail field is ON, if the node execution fails then the complete task execution is marked as fail.
Refer to Document Processing sample task to know how to use Doc Reader node in a task.