Search

Dataset

Last Updated: Feb 14, 2022

Articles

Dataset node is used to retrieve data from Datasets using a query. The query editor supports ANSI SQL SELECT queries.

For Jiffy table and Document table, it is recommended to use Jiffy Select and Jiffy Insert/Update nodes.

Query Tab of Dataset Node

  1. Double click the Dataset node and click the Query tab.
  2. Specify query to be executed in the Query field. Query must be in the format:
    1. If using Jiffy table: select * from `JiffyTableName`
    2. If using Dataset: select * from `CsvDatasetName.csv`

To select the columns from EmployeeDetails dataset where the value in the column LastName is Smith, the query is:
Select * from `EmployeeDetails.csv` where LastName ='Smith'

To select the columns from EmployeeDetails Jiffy Table where the value in the column LastName is Smith, the query is:
Select * from {{EmployeeDetailsTable}} where LastName ='Smith'

  • Parameters in the query can be passed in the following format:

    • String values: ‘$[String]’
    • Integer values: $[Integer]

Select * from `DatasetTest.csv`where Value = '$[param]'

  • Enable the New Query Engine option to process the SQLs directly from Mongo DB and for better performance.
    • The CData JDBC Driver for MongoDB supports querying data on JIffy Tables. Click here to know more about syntax and examples for SELECT statements.

  • Get the sum of column N1 in the Table1: SELECT sum(N1) FROM {{Table1}}
  • Extract the execution state of all the records from Table1: SELECT [Execution State] from {{Table1}}
  • Get the minimum, maximum and average of the column N1 from the Table1: SELECT min(N1), max(N1), avg(N1) FROM {{Table1}}

New Query Engine option is applicable only to Jiffy Tables.

  • The symbol used is backquote` and not single quote'
  • No semicolon is required at the end of the query.

Processing Type

The query can be processed in two ways:

  1. Row Wise Processing: Records in the dataset are processed row by row.
    Specify the following details:

    • Row Limit: Output row limit. By default, it is set to 500.
    • Columns in result set (space separated): Column names to display in the result set and must be space-separated.

      This is mandatory and the specified column names should be the same as the name of the columns in the dataset.

    • Iterate on this node’s output?: Toggle to ON, if you want to iterate on the output records. 

      Use Row Wise Processing when:

      • You need each row in the dataset to be fetched as an iteration for further automation.
      • You need to process less than 1000 records.

      Row limit for Row wise Processing is 1000.

      Image description
  2. Bulk Processing: Records in the dataset are processed in bulk.
    Specify the following details:

    • Row Limit: Output row limit. By default, it is set to 500.
    • Table Name: Select the DataTable to which you want the records to be written. Default DataTables and Custom DataTables created in the task will be listed in the dropdown.
      To know more about DataTable, click here.

      Use Bulk Processing when:

      • All the records in the dataset must be used for further automation as a Datatable.
      • You need to process more than 1000 records.

      Row limit for Bulk Processing is 5000.

Image description

Properties of the Dataset Node

  1. Navigate to Properties and enter Name and Description.
  2. Enable Mark run Failure on Node Fail, Continue on no result? options as required.

Image description

Result of Execution of Dataset Node

After executing the task, the Result of Execution window displays the following tabs:

  • Input: The query executed is displayed.
  • Output: If Row Wise Processing operation was performed in the node, output tab displays the result of the executed query consisting of columns selected in Columns in result set (space separated) field.
    Click the Iteration # to see the details of each row.
    Image description
  • Run Info: The Run Details and Configuration Details are displayed.
    • Run Details include Run ID, Sequence Number, Iteration ID, and Total Node Execution Time(in seconds).
    • Configuration Details include Configuration Name, JDBC Driver Class, Username, and Config Level.
  • Formatted Output: If Bulk Processing operation was performed in the node, Formatted Output tab displays the result set as a table.
    Image description
Did you find what you were looking for?