Resolve issue where output gets distorted when files are longer than 2600 lines in Sagent Data Flow

Product Feature: DataFlow Server


 

Issue

The user may come across an issue where sorting of output gets distorted in the files that are longer than 2600 lines.

Cause

Scenario:
  • The plan is complex and it has many sub-plans. To diagnose the root cause o the issue, add splitter & delimited text file sink, before and after each sub-plan.
  • Execute the plan and check all output file of each delimited text file sink and Identified sub-plan. 
  • Then perform the same approach iteratively inside sub-plan to get to JOIN transform after which records get jumbled up.
After identifying JOIN transform, collect input to the JOIN transform in a file.

The possible root cause of the issue could be automatic disk sort.

Resolution

UPDATED: November 16, 2017
Workaround:
  1. Add 3 new transforms (Record Number, Memory sort, Column select).
  2. “Record Number”, added a new column “rownum”.
  3. Using the “Memory Sort” transform, sorted the input using column “rownum”.
  4. Using the “Column Select” transform, exclude the newly added “rownum” from the transform.