Last Updated: Apr 30, 2021
If the document has tables, Doc Reader automatically identifies the tabular structures and extracts the contents as tables in the right-hand side panel under Line-Item details.
Click the Update Data button. Table data is extracted from the PDF and populated in the table in the right panel.
If both Row and Column Definition and Detect Rows based on line separation options are enabled, row-based line separation is considered first. If Doc Reader fails to identify lines, Row and Column Definition is used.
For documents where table extends across multiple pages, but table headings occur only at the start of table, enable Headers at starting page only option to extract the table data from whole document.
For documents where table extends across multiple pages, disable the Stop at the first end of table option if you wish to extract contents from all the pages.
By default, Stop at the first end of table option is enabled and table content from the table header to first end of table is extracted.
Use the Data Capture Rule option to choose the desired table to be extracted.
Ensure that you are selecting the datapoint above the table to be extracted.
If the rows and columns in the table of the document are not aligned properly, Doc Reader cannot identify the rows of the table correctly.
You can use Row and Column Definition to identify the rows and columns for extraction. Based on the parameters provided, the rows are marked, and Doc Reader identifies the row to be extracted.
Click the icon and select the Row and Column Definition option. The Row and Column Definition window is displayed in the right panel.
Depending on the position of data in the table, the row lines are automatically captured. If the row lines are not separating the rows correctly, you can use this option to define the exact location of the row separator.
Provide the values for the required parameters Key Column, Alignment , Column Starts After based on the alignment of data in the table
Click the Update Data button. The rows are identified, and the data is updated in the table.
In the below PDF, the Quantity column is selected as the Key Column field; Top is selected in the Alignment field. Row marker starts from the top of each text in the Quantity column and extends to the top of the text in next record. The rows are correctly identified, and the Description column is also correctly displayed.
Change the Alignment field to Bottom. Row marker starts from the bottom of each text in the Quantity column and extends to the bottom of the text in the previous record. The rows are identified, but the Description column is misaligned. So, for the below PDF Alignment must be selected as Top.