Search

Articles

If the document has tables, Doc Reader automatically identifies the tabular structures and extracts the contents as tables in the right-hand side panel under Line-Item details.

  1. Verify the columns predicted are correct.
  2. Click the Image description icon. Image description

    • When the Doc Reader identifies lines to separate columns, the Detect Column based on line separation toggle button is enabled.
    • When the Doc Reader identifies lines to separate rows, the Detect Rows based on line separation toggle button is enabled.
    • You can override the above options and use Row and Column Definition option to identify the rows and columns of the table.
    • If multiple tables are present in a document, use the Data Capture Rule option to choose the desired table to be extracted .
      • You can choose the occurrence of the table from the drop-down.
        OR
      • Select a data point so that the table to be extracted comes after this reference label.
      • Click the SAVE button. Once the settings are saved, it applies to all documents of the same category. Image description

        Ensure that you are selecting the datapoint above the table to be extracted.

  3. Select the End of the table option. Select the immediate text below the table from the left panel.

  4. Click the Update Data button. Table data is extracted from the PDF and populated in the table in the right panel.

If both Row and Column Definition and Detect Rows based on line separation options are enabled, row based line separation is considered first. If Doc Reader fails to identify lines, Row and Column Definition is used.

Row and Column Definition

If the rows and columns in the table of the document are not aligned properly, Doc Reader cannot identify the rows of the table correctly.

You can use Row and Column Definition to identify the rows and columns for extraction. Based on the parameters provided, the rows are marked, and Doc Reader identifies the row to be extracted.

  1. Click the Image description icon and select the Row and Column Definition option. The Row and Column Definition window is displayed in the right panel. Image description

    1. Key Column : Any column that is properly aligned can be selected as the reference or row marker.
    2. Alignment : You can select Top or Bottom from the drop-down.
      • Top : Row marker starts from above the text in the Key Column record and extends to the top of the text in the next record.
      • Bottom : Row marker starts from the bottom of the text in the Key Column record and extends to the bottom of the text in the previous record.
    3. Column Starts After : Defines the offset for the row marker. This is used when the data in the columns are misaligned with the Key Column data.

      Depending on the position of data in the table, the row lines are automatically captured. If the row lines are not separating the rows correctly, you can use this option to define the exact location of the row separator.

  2. Provide the values for the required parameters Key Column, Alignment , Column Starts After based on the alignment of data in the table

  3. Click the Update Data button. The rows are identified, and the data is updated in the table.

In the below PDF, the Quantity column is selected as the Key Column field; Top is selected in the Alignment field.
Row marker starts from the top of each text in the Quantity column and extends to the top of the text in next record.
The rows are correctly identified, and the Description column is also correctly displayed.

Image description

Change the Alignment field to Bottom. Row marker starts from the bottom of each text in the Quantity column and extends to the bottom of the text in the previous record. The rows are identified, but the Description column is misaligned. So, for the below PDF Alignment must be selected as Top.

Image description

Did you find what you were looking for?