Rank Process Object
The Integrator Rank process object defines new calculated columns that are based on the entire input flow, such as ranks, percentiles, and medians. These calculations require the entire input flow to be processed and sorted to obtain the results, as opposed to calculated columns in the Calc process object, which operate on a row basis.
The need to use the entire input flow has an effect on the performance of the Rank object. It buffers the input flow in memory and a temporary file before returning the results to later objects in the data flow. When ranking large data sets (data that does not fit into the memory of the machine that is being run), ensure there is enough temporary disk space available to buffer this data.
Rank Attributes
| Attribute | Type | Description |
|---|---|---|
| process_type (required) |
String | Identifies the object as a Rank process object. The value of this string is "rank". |
| input (required) |
String | Defines the object from which the data flow is arriving. |
| rank_columns | Array of Strings |
Defines numeric input columns that should be ranked from highest to lowest. The resulting
column will be named: NOTE: This attribute is Rank in Visual Integrator. |
| percentile_columns | Array of Strings | Defines numeric input columns whose percentiles should be calculated. The resulting column
will be named: "<name> <ordinal> percentile" where <name> is the corresponding input column, and <ordinal> is the ordinal number (25th, 92nd, etc) for the percentile to be calculated. The percentile to calculate is obtained by the corresponding integer in the percentile_values attribute. The percentile calculation interpolates the two values based on the fractional value of 1+p(N-1), where p is the percentile divided by 100 and N is the number of values. Values in the input columns must be numeric or blank (null value). Blank values will be treated as zero. The resulting percentile will appear in each row in the output flow. |
| percentile_values | Array of Strings | Defines the percentile values to calculate for the percentile columns. Each percentile column will use the percentile value from the corresponding position in the percentile values array. The percentile values must be an integer between 0 and 100 inclusive. If there are not sufficient values available, the last percentile value will be used. This attribute is required if the percentile_columns attribute is defined. |
| median_columns | Array of Strings | Defines numeric input columns whose medians should be calculated. The resulting column will
be named: "<name> median" where <name> is the corresponding input column. Values in the input columns must be numeric or blank (null value). Blank values will be treated as zero. This attribute is optional. The median column is equivalent to the 50th percentile. The resulting median will appear in each row in the output flow. |
| percentile_rank_columns | Array of Strings |
Defines numeric input columns whose percentile rank should be calculated. The resulting column
will be named: NOTE: This attribute is Percentile (semicolon separated) in Visual Integrator. |
| temp_directory | String |
Defines the directory which will be used to hold the temporary file for the Rank buffer. This file
will be named: NOTE: This attribute is Temp Directory in Visual Integrator. |
| trace_after | Sub-object | Traces data flows leaving the specified object, which makes debugging scripts easier. This is equivalent to adding a Trace process object immediately after the current object. See Embedded Trace Object for more on using trace sub-objects. |
| trace_before | Sub-object | Traces data flows entering the specified object. This is equivalent to adding a Trace process object immediately before the current object. See Embedded Trace Object for more on using trace sub-objects. |