Split Output Object
The Integrator Split output object splits data into a set of files determined by the contents of the data. It is used to divide a data flow into separate output files for different sets of data. For example, it could be used to divide sales information into separate region files. A data column in the input flow is used to direct the output. For each unique value of the data column, an output file will be created and the corresponding data will be written.
The Split output object can also write out a dictionary suitable for input to Builder or back into Integrator. It can also write out a report file listing the created files and the number of records written to each file. There is no limit to the number of created files, however, on a Windows platform the open file limit is 2040. If the file already exists, it is overwritten.
Split Attributes
| Attribute | Type | Description |
|---|---|---|
| output_type (required) |
String | Identifies the object as a Split output object. The value of the string is "split". |
| input (required) |
String | Defines the object from which the data flow is arriving. |
| filename_column (required) |
String | Specifies the "split column". Defines the input column containing the destination file names. The destination filename is modified by the filename_prefix and filename_extension attributes described below. These files are created if they do not exist. |
| filename_prefix | String | Defines a string that is prepended to the filename_column value to create the destination filename for the row. This attribute can be used to specify a directory name or common prefix for the output files without having to define a special calculated field. |
| filename_extension | String | Defines a string that is appended to the filename_column value to create the destination filename for the row. This attribute can be used to specify a common extension (e.g., ".txt") for the output files without having to define a special calculated field. |
| create_directory | Boolean | Determines whether Integrator creates parent directories needed for the output of the split operation. A value of "true" indicates a parent directory is created; "false" indicates it is not created. The default is "false". |
| columns | Array of Strings | Selects the columns that will be written to the file. Column names should always start with a letter. This attribute and the remove_columns attribute are mutually exclusive. If this attribute is used, do not use remove_columns. |
| remove_columns | Array of Strings | Selects the columns that will not be written to the output files. Each string names a column that will not be included in the output, allowing Integrator to output a subset of the columns without having to list all the desired output columns. This attribute and the columns attribute are mutually exclusive. If this attribute is used, do not use columns. |
| delimiter | Single Character | Defines the single-character to be used as the delimiter in the output file. Required with the file_type "column headers". The default character is an ASCII tab. Examples:
|
| newline | String |
Specifies a newline character for text output. The specified character will be used to end each output line of regular output. This attribute should not be used for XML output. If this attribute is not specified or empty, the file output object will use the default linefeed for each platform—LF (ASCII 10) for Unix systems, and CR LF (ASCII 13 10) for Windows systems. The following special values are accepted for this attribute:
NOTE: |
| file_type | String | Specifies the file type of the output file. Values include:
|
| reportfile | String | Specifies the name and path of the optional report file. The report file is a tab-delimited file with two columns. The first column contains the filenames written by the Split object. The second column contains the number of records written to the corresponding file. This file could be used to script a number of builds. |
| reportfile_type | String | Determines the output format of the report file. If the value is "column_headers", Integrator adds a line of column headers defining a filename column and a record_count column. If the value is "standard" or not set, Integrator writes out the report file without column headers. |
| verbose | Boolean | Controls whether the individual files are listed in the Integrator output. When set to "true", the number of records and output filenames will be shown. |
| always_quote | Boolean | Controls whether or not quotes are added to column headers and output values. If "true" then all column headers and output values are quoted and any existing quotes in the data are doubled. If "false" or missing, then all column headers are quoted only if they begin with a quote or contain an output delimiter (a tab or comma). |
| never_quote | Boolean | Disables any quoting added to output values. If "true", no attribute values are quoted, even if they contain delimiters or double-quote characters. This attribute is used for fine-grain control of quoting behavior in unusual circumstances. The always_quote and never_quote attributes cannot be used simultaneously. |
| dictfile1 | String | Defines a dictionary file for output. The dictionary is written in the original DI Dictionary format, which defines column types as well as column names. Either dictfile1 or dictfile2 is required for the file_type "standard". |
| dictfile2 | String | Defines a new-style dictionary file for output. These can contain field names which contain special characters (like parentheses). This dictionary style is more compatible with various versions of Builder and Diver (for building Memory Models). Use either dictfile1 or dictfile2 when the file_type is "standard". |
| dicttypes1 | Array of Strings | Defines the types of the columns for the dictionary defined in dictfile1. A string can be either Key, Data, or Info[keyname]. This attribute allows a complete version 1 dictionary that can be immediately used in the Builder, without any modifications. If dicttypes1 is not defined, all columns in the dictionary are identified as Key. |
| encoding | String | Defines the encoding used to write the output files. This attribute is optional. Values include:
If a signature is requested (see the signature attribute), the appropriate Byte Order Mark (BOM) will be written to the file. If the encoding attribute is auto and no signature is found, the encoding is assumed to be latin1 if no other object in the task handles Unicode data and the Integrator file is not encoded as utf-8 (using the charset 1208 directive). Otherwise, the encoding is assumed to be utf-8. Any dictionary written by the Split object will match the Latin or Unicode status of the output files.. (UNIX or Windows only) See Integrator Unicode Data Support. |
| signature | String | Defines whether a Byte Order Mark (BOM) is written for Unicode files. Values include:
|
| trace_before | Sub-object | Traces data flows entering the specified object. This is equivalent to adding the Trace process object immediately before the current output object. See Embedded Trace Object. |