Additional Options in Document Familiarization Window


The following additional options are available in document familiarization window. You can use these options based on the document being familiarized.

Data Capture Rule

In scenarios where the field occures:

  • in multiple pages, and you need to extract the first or last occurrence of the field.
  • at multiple places in a page, and you need to provide a reference position to extract the field.

These additional rules can be specified for extraction, using the Data Capture Rule option.

Click Image description icon against the field in the right panel for which you want to add any additional rules. The Data Capture Rule window opens for that field.

Data Capture Rule for Invoice Date

Image description

Either of the following Data Capture Rules can be applied.

Occurrence of the Data

You can choose the occurrence of the data from the drop-down. It can be either First Occurrence or Last Occurrence.

Consider a PDF with ten pages having Total Amount on all pages, the Last Occurrence is selected in the drop-down and the Total Amount on the last page is extracted.

Relative Reference

If a document has multiple occurrences of labels that you are extracting, use this option to identify the one to be extracted.

Consider a PDF where the GST Number occurs at multiple places, use this option to extract the desired GST Number which is displayed below the Address.

Skip Review

In cases where the document being processed is not of the desired format and you want to skip it from the auto category identification algorithm.

Click the SKIP REVIEW button in the document familiarization window to skip the document training process. All the fields become non-editable.

You choose not to train the Doc Reader for that document and the status is updated to MANUAL_ INTERVENTION_FOR_REVIEW.

Image description

Automatic Resolution Adjustment

There can be multiple resolution for the same type of documents. You may adjust it automatically to the resolution of the base document.
The system saves the resolution details of the base document, for that specific category, as part of the document training process.
If the system fails to adjust the resolution for internal reasons, the document status will be Manual Intervention Validation Failed.
Click the Image description icon and enable the Auto Adjust PDF resolution option to automatically scale the resolution of the current PDF into the base document.

Word by Word Capturing

You can select the required words alone during document extraction. All new documents from version 4.8 or above, will follow the word split logic.
For example, if you need only Invoice Date to be selected, but the field in the document contains Invoice Date (Shipped to), extraction using new word split logic would be helpful.

All documents trained in the older version will continue in the previous segmentation logic. You can change to segmentation logic if required, using the Extract with segmentation logic toggle in the Image description icon.

This logic is applicable only for documents in the English language and users must use an area selector.

Image description

Did you find what you were looking for?