Product Feature: DataFlow Server
Issue
The user may come across an issue where sorting of output gets distorted in the files that are longer than 2600 lines.
Cause
Scenario:
The possible root cause of the issue could be automatic disk sort.
- The plan is complex and it has many sub-plans. To diagnose the root cause o the issue, add splitter & delimited text file sink, before and after each sub-plan.
- Execute the plan and check all output file of each delimited text file sink and Identified sub-plan.
- Then perform the same approach iteratively inside sub-plan to get to JOIN transform after which records get jumbled up.
The possible root cause of the issue could be automatic disk sort.
Resolution
UPDATED: November 16, 2017Workaround:
- Add 3 new transforms (Record Number, Memory sort, Column select).
- “Record Number”, added a new column “rownum”.
- Using the “Memory Sort” transform, sorted the input using column “rownum”.
- Using the “Column Select” transform, exclude the newly added “rownum” from the transform.