Split Output Object

The Integrator Split output object splits data into a set of files determined by the contents of the data. It is used to divide a data flow into separate output files for different sets of data. For example, it could be used to divide sales information into separate region files. A data column in the input flow is used to direct the output. For each unique value of the data column, an output file will be created and the corresponding data will be written.

The Split output object can also write out a dictionary suitable for input to Builder or back into Integrator. It can also write out a report file listing the created files and the number of records written to each file. There is no limit to the number of created files, however, on a Windows platform the open file limit is 2040. If the file already exists, it is overwritten.

Split Attributes

Attribute Type Description
output_type
(required)
String Identifies the object as a Split output object. The value of the string is "split".
input
(required)
String Defines the object from which the data flow is arriving.
filename_column
(required)
String Specifies the "split column". Defines the input column containing the destination file names. The destination filename is modified by the filename_prefix and filename_extension attributes described below. These files are created if they do not exist.
filename_prefix String Defines a string that is prepended to the filename_column value to create the destination filename for the row. This attribute can be used to specify a directory name or common prefix for the output files without having to define a special calculated field.
filename_extension String Defines a string that is appended to the filename_column value to create the destination filename for the row. This attribute can be used to specify a common extension (e.g., ".txt") for the output files without having to define a special calculated field.
create_directory Boolean Determines whether Integrator creates parent directories needed for the output of the split operation. A value of "true" indicates a parent directory is created; "false" indicates it is not created. The default is "false".
columns Array of Strings Selects the columns that will be written to the file. Column names should always start with a letter. This attribute and the remove_columns attribute are mutually exclusive. If this attribute is used, do not use remove_columns.
remove_columns Array of Strings Selects the columns that will not be written to the output files. Each string names a column that will not be included in the output, allowing Integrator to output a subset of the columns without having to list all the desired output columns. This attribute and the columns attribute are mutually exclusive. If this attribute is used, do not use columns.
delimiter Single Character Defines the single-character to be used as the delimiter in the output file. Required with the file_type "column headers". The default character is an ASCII tab.
Examples:
  • delimiter = "\t"
  • delimiter = ","
  • delimiter = "|"
newline String

Specifies a newline character for text output. The specified character will be used to end each output line of regular output. This attribute should not be used for XML output. If this attribute is not specified or empty, the file output object will use the default linefeed for each platform—LF (ASCII 10) for Unix systems, and CR LF (ASCII 13 10) for Windows systems. The following special values are accepted for this attribute:

  • crlf—The newline will be CR LF (ASCII 13 10)

  • lf—The newline will be LF (ASCII 10)

  • cr—The newline will be CR (ASCII 13)

NOTEAvailable starting with 7.0(56).

file_type String Specifies the file type of the output file. Values include:
  • column_headers—Indicates the file is a delimited output file with the first line being a list of column names.
  • standard—Indicates the file is a fixed-format or variable-format text file described by the DI dictionary file (see the dictfile1 and dictfile2 attributes). This is the default.
  • xml—Indicates the file is a basic Extensible Markup Language (XML) file, similar in format to those created by Microsoft Access. The top-level element is "dataroot", containing a set of row elements named "row". Each row element contains a set of column elements, each element named with the name of the corresponding Integrator column.
reportfile String Specifies the name and path of the optional report file. The report file is a tab-delimited file with two columns. The first column contains the filenames written by the Split object. The second column contains the number of records written to the corresponding file. This file could be used to script a number of builds.
reportfile_type String Determines the output format of the report file. If the value is "column_headers", Integrator adds a line of column headers defining a filename column and a record_count column. If the value is "standard" or not set, Integrator writes out the report file without column headers.
verbose Boolean Controls whether the individual files are listed in the Integrator output. When set to "true", the number of records and output filenames will be shown.
always_quote Boolean Controls whether or not quotes are added to column headers and output values. If "true" then all column headers and output values are quoted and any existing quotes in the data are doubled. If "false" or missing, then all column headers are quoted only if they begin with a quote or contain an output delimiter (a tab or comma).
never_quote Boolean Disables any quoting added to output values. If "true", no attribute values are quoted, even if they contain delimiters or double-quote characters. This attribute is used for fine-grain control of quoting behavior in unusual circumstances. The always_quote and never_quote attributes cannot be used simultaneously.
dictfile1 String Defines a dictionary file for output. The dictionary is written in the original DI Dictionary format, which defines column types as well as column names. Either dictfile1 or dictfile2 is required for the file_type "standard".
dictfile2 String Defines a new-style dictionary file for output. These can contain field names which contain special characters (like parentheses). This dictionary style is more compatible with various versions of Builder and Diver (for building Memory Models). Use either dictfile1 or dictfile2 when the file_type is "standard".
dicttypes1 Array of Strings Defines the types of the columns for the dictionary defined in dictfile1. A string can be either Key, Data, or Info[keyname]. This attribute allows a complete version 1 dictionary that can be immediately used in the Builder, without any modifications. If dicttypes1 is not defined, all columns in the dictionary are identified as Key.
encoding String Defines the encoding used to write the output files. This attribute is optional. Values include:
  • auto—Files will be written as UTF-8 if any of the other objects in the same task processes Unicode characters; otherwise, the files will be written using Latin1 characters. See Integrator Unicode Data Support.
  • ascii—The file will be written as ISO-8859-1 or Latin1 characters.
  • latin1—The file will be written as ISO-8859-1 or Latin1 characters.
  • utf-8—The file will be written as UTF-8 Unicode characters.
  • unicode—The file will be written as 2-byte Unicode characters (UCS-2) with native byte swapping.
  • unicode-be—The file will be written as being UCS-2 characters in big-endian fashion.
  • unicode-le—The file will be written as being UCS-2 characters in little-endian fashion.

If a signature is requested (see the signature attribute), the appropriate Byte Order Mark (BOM) will be written to the file.

If the encoding attribute is auto and no signature is found, the encoding is assumed to be latin1 if no other object in the task handles Unicode data and the Integrator file is not encoded as utf-8 (using the charset 1208 directive). Otherwise, the encoding is assumed to be utf-8. Any dictionary written by the Split object will match the Latin or Unicode status of the output files.. (UNIX or Windows only) See Integrator Unicode Data Support.

signature String Defines whether a Byte Order Mark (BOM) is written for Unicode files. Values include:
  • auto—A signature is always written to a Unicode file. (default)
  • true—A signature is never written to a Unicode file.
  • false—A signature is only written if the files are encoded in UCS-2 characters; no signature will be written if the files are encoding in UTF-8. (UNIX or Windows only)
trace_before Sub-object Traces data flows entering the specified object. This is equivalent to adding the Trace process object immediately before the current output object. See Embedded Trace Object.