Document Processing-Best Practices


  1. It is recommended in Document Tables to use Field Type Singleline or Multiline for fields being extracted from the document, so that special characters in the fields, if any, are handled by the datatype.

  2. Ensure that validations for the required fields are designed in the Post Processing task.
    The status of the record is updated to MANUAL_INTERVENTION_VALIDATION_FAILED only if the designed validations are not met. And then the document can be re-familiarized.
    When the document is processed for an existing category, and if no validations are designed for the extracted fields, the status of the processed document will be updated to EXTRACTED_SUCCESSFULLY even if it is blank or if it doesn’t meet the required validation rules (e.g. PO Number should be numerical only)
    And you will not be able to re-familiarize the document.

  3. Add all possible labels for the extracted field that may occur in that category of the document as Pseudonyms so that the field is extracted for any of those labels.

  4. When the document received is a wrong template, choose the option of SKIP REVIEW and skip training the ML model, so that it is not considered in the category identification.

  5. When familiarizing the table in the document ensure that you

    • Verify the columns that are predicted in the right panel under Line-Item details and correct if it is not the desired columns.
    • Select the End of the table option.
    • Verify that the rows are predicted correctly, if not use the Row and Column Definition option to identify the correct rows.
  6. If any handwritten text in the document being processed, use the Hand Written Text Extraction Node with the required mode, to handle the handwritten text as per the document processing requirements.

Did you find what you were looking for?