Rank Process Object

The Integrator Rank process object defines new calculated columns that are based on the entire input flow, such as ranks, percentiles, and medians. These calculations require the entire input flow to be processed and sorted to obtain the results, as opposed to calculated columns in the Calc process object, which operate on a row basis.

The need to use the entire input flow has an effect on the performance of the Rank object. It buffers the input flow in memory and a temporary file before returning the results to later objects in the data flow. When ranking large data sets (data that does not fit into the memory of the machine that is being run), ensure there is enough temporary disk space available to buffer this data.

Rank Attributes

Attribute Type Description
process_type
(required)
String Identifies the object as a Rank process object. The value of this string is "rank".
input
(required)
String Defines the object from which the data flow is arriving.
rank_columns Array of Strings

Defines numeric input columns that should be ranked from highest to lowest. The resulting column will be named:
"<name> Rank"
where <name> is the corresponding input column.
Values in the rank columns must be numeric or blank. Blank values will be treated as zero. The rank will be calculated so that the maximum values in the input column will be assigned a rank of "1".
If there are multiple rows with the same value, they will be assigned the same rank value, and the next highest value will skip the appropriate number of ranks.
For example, if there are two rows with the highest value, they will be assigned a rank of 1, and the next highest value will be given a rank of 3.

NOTE: This attribute is Rank in Visual Integrator.

percentile_columns Array of Strings Defines numeric input columns whose percentiles should be calculated. The resulting column will be named:
"<name> <ordinal> percentile"
where <name> is the corresponding input column, and <ordinal> is the ordinal number (25th, 92nd, etc) for the percentile to be calculated.
The percentile to calculate is obtained by the corresponding integer in the percentile_values attribute.
The percentile calculation interpolates the two values based on the fractional value of 1+p(N-1), where p is the percentile divided by 100 and N is the number of values. Values in the input columns must be numeric or blank (null value). Blank values will be treated as zero.
The resulting percentile will appear in each row in the output flow.
percentile_values Array of Strings Defines the percentile values to calculate for the percentile columns. Each percentile column will use the percentile value from the corresponding position in the percentile values array. The percentile values must be an integer between 0 and 100 inclusive. If there are not sufficient values available, the last percentile value will be used. This attribute is required if the percentile_columns attribute is defined.
median_columns Array of Strings Defines numeric input columns whose medians should be calculated. The resulting column will be named:
"<name> median"
where <name> is the corresponding input column.
Values in the input columns must be numeric or blank (null value). Blank values will be treated as zero. This attribute is optional. The median column is equivalent to the 50th percentile.
The resulting median will appear in each row in the output flow.
percentile_rank_columns Array of Strings

Defines numeric input columns whose percentile rank should be calculated. The resulting column will be named:
"<name> percentile rank"
where <name> is the corresponding input column.
The percentile rank of a value is the percentage of values which are lower or equal to it. Values in the input columns must be numeric or blank (null value). Blank values will be treated as zero.

NOTE: This attribute is Percentile (semicolon separated) in Visual Integrator.

temp_directory String

Defines the directory which will be used to hold the temporary file for the Rank buffer. This file will be named:
<name>.tmp
where <name> consists of the process ID of the Integrator followed by incrementing letters.
The temporary file is automatically deleted when the Integrator successfully finishes. This attribute is optional, but it is good practice to specify a temporary location known to have enough space for the data flowing through the Rank object.

NOTE: This attribute is Temp Directory in Visual Integrator.

trace_after Sub-object Traces data flows leaving the specified object, which makes debugging scripts easier. This is equivalent to adding a Trace process object immediately after the current object. See Embedded Trace Object for more on using trace sub-objects.
trace_before Sub-object Traces data flows entering the specified object. This is equivalent to adding a Trace process object immediately before the current object. See Embedded Trace Object for more on using trace sub-objects.