Filein Input Object

An Integrator Filein input object accepts input from a set of external text files, described by an external dictionary file or column headers within the text file. When an external dictionary is used, these files can be either variable (delimited) or fixed format. For input objects, the input_type defaults to "filein". This object can be as brief as a filename and file_type (column_headers or DBF).

If the file is described by a DI dictionary, the column names and attributes are used from the dictionary. If a list of file names is given, the input is formed by concatenating all files. Multiple files listed with the same input object must have the same format. If the file formats are different, consider using the Concat process object to create a single flow (see Concat Process Object).

Filein Attributes

Attribute Type Description
input_type
(required)
String Identifies the object as a Filein input object. The value of this string is "filein". This is the default type for an INPT object if input_type is omitted.
file_type
(required)
String Specifies the file type of the input file and how columns are named in the file. Values include:
  • standard—Indicates the file is a fixed-format or variable-format text file described by a DI dictionary. This type requires use of either the dictfile or the dictobj attribute.
  • column_headers—Indicates the file is a variable-format text file with the first line containing column headers naming the columns. The delimiter attribute is required to specify the column delimiter. (default)
  • DBF—Indicates the file is a standard DBase-2 DBF file. The column names are taken from the DBF header, and Integrator automatically converts the input rows into text fields. Foxpro DBF files are also supported. DBF data types supported are Character, Date, Time, Integer, and Double.

  • ignore_column_headers—Indicates Integrator ignores the first line in the file and uses a dictionary to describe the file columns. (Seldom used since aliases are available.) See the dictfile or dictobj attributes.

    TIP: The file type ignore_column_headers allows you to use a dictionary with trim=false to keep leading spaces in the input data.

NOTE:

  • JSON data files are not an option. Try converting to XML format and using the XML Input object.
  • This attribute is File Type in Visual Integrator.
filename String Defines the file name for a single input file.
filenames Array of Strings Defines the file names for multiple input files. The files are logically concatenated together before using. Multiple column headers are stripped.
starname String Defines a "star name" (file match string) for selecting a set of input files based on a wildcard. Values include:
  • ? (question mark)—Matches any single character.
  • * (asterisk)—Matches a sequence of characters.

Wildcard matching occurs within a single folder or directory; there is no matching across directories. It is case-insensitive. For example:
starname = “*.dat” and starname = “*.DAT” match all .dat files.
When using the starname attribute, files are returned in the order they are in the directory; they are not necessarily sorted alphabetically. The order will vary across systems, and are likely related to how files are deleted and added in the directory. Programs should not rely on the order of starnames.
starnames Array of Strings Allows multiple starname strings to be used to specify filenames. If a starname string does not match any files, it is ignored.
NOTE: The attributes filename, filenames, starname and starnames are mutually exclusive. Use only one of these attributes. Use of one of these or the file_list_input attribute is required.
file_list_input String

Specifies a separate input flow to generate a list of filenames. This string is an object name, not a filename. This input flow should have a column named "filename". The Filein object uses the values in this column as a list of filenames to open as its input. To use file_list_input, two Input objects, one for the file_list_input and a second for the actual file input, are required.
This attribute allows programmatic control over the input to Integrator. The input filenames can be read from a file and processed using any of the Integrator process objects (with calculated fields, etc.). See the Directory process object for another alternative.
NOTE:

  • If you use this attribute in an object, do not use the filename, filenames, starname, or the starnames attributes in the same object.
  • This attribute is File List Input in Visual Integrator.
union Boolean When true, concatenates multiple input files together and produces the union of the input columns. This attribute may only be "true" when reading files with column headers. This result is similar to the way that the Concat process object combines multiple input flows. If a column is requested in the output flow that does not appears in all the files, the value of that output column will be blank in the appropriate rows. This flag allows the Filein object to read multiple files that have columns added over time without having to add the columns to earlier files, that is, columns and column order can change over time. This attribute is optional, but recommended when using a series of files.
delimiter String

Specifies the delimiter that is used to separate columns for variable format files. If not specified, ASCII tab is used. Choices are:

  • space
  • tab \t
  • comma ,
  • semicolon ;
  • pipe |

Required with file_type "column headers" to properly read a variable-format file.

newline String

Specifies a newline character, as a string containing exactly one character from the input file. The specified character will be replaced with a new line. For example:

Before a newline character is specified, the following is displayed:

483574387548~4434839~4782939029~

After specifying "~ "as the newline character:

newline = "~"

The information is displayed as follows:

483574387548
4434839
4782939029

If not specified, the default newline is a carriage return (ASCII 13), a line feed (ASCII 10), or a carriage return line feed.
This attribute cannot be specified if require_crlf is "true".

require_crlf Boolean

Determines whether or not a carriage return followed by a line feed is required to indicate the end of a line. If this attribute is "false", then either a carriage return, a line feed, or the combination of carriage return and line feed would indicate the end of a line. If this attribute is "true", then both characters must be present. This is useful for handling the output of certain Microsoft software that export single line feeds without a carriage return from internal fields. If this attribute is "true", the newline attribute cannot be used.
Note that these internal line feeds should be replaced using the translate calc function before these fields are output from Integrator to an intermediate file. For example, you can use a calc like this:

translate(MyColumn, concat(chr(10),chr(13)), "|%")

Where MyColumn is the name of the column that contains the line endings, chr(10) is the character for linefeed (LF), and chr(13) is the character for carriage return(CR). The linefeed will be replaced with | and the carriage return will be replaced with %. You can replace | and % with something that makes sense to your situation.

NOTE: This attribute is Require CRLF in Visual Integrator.

dictfile String

Defines the file name for the dictionary describing the file columns (for example, Sales.dic). This attribute is used with old format dictionaries, which list both the column names and the data categories. When the file_type attribute is set to "standard", either this attribute or the dictobj attribute below must be defined. When this attribute is used, do not define the dictobj attribute since these attributes are mutually exclusive.

NOTE: This attribute is Dict File in Visual Integrator.

dictobj String

Defines the object name of a dictionary object that lists the file columns. This dictionary object should appear in a separate script file as a 'DICT' object. This attribute is used with new format dictionaries. For information on the dictionary object, see Dictionary Input Object. When the file_type attribute is set to "standard", either this attribute or the dictfile attribute above must be defined. When this attribute is used, do not define the dictfile attribute since these attributes are mutually exclusive.

NOTE: This attribute is Dict Obj in Visual Integrator.

filename_column String

Indicates the name of a new column in the output flow that contains the input filename for each row of data.

NOTE: This attribute is Filename Column in Visual Integrator.

first Integer Determines the number of records to be read from the input file. If used, Integrator reads up to the specified number of records. This limit is particularly useful for script testing on a small number of input records. If this optional attribute is not used, all rows are returned.
ignore_line_end Boolean

Specifies whether parse errors dealing with the end of an input line are ignored or not (for example, "Fixed field occurs past end of line" or "Too few fields"). This attribute helps control processing when there are too few data items in an input row. If "true", Integrator processes the line. If "false", Integrator prints out a warning message and skips the line. (default)

NOTE: This attribute is Ignore Line End in Visual Integrator.

ignore_extra_columns Boolean Displays or ignores extra columns that appear in the input following the last column described in the column headers or dictionary. This attribute helps control processing when there are too many data items in a row of the input. Values include:
  • true—Ignores extra columns.
  • false—Any line that contains extra columns will be reported as a parse error and the line will not be processed by Integrator. (default)

If this attribute is not used, "false" is assumed.

NOTE: This attribute is Ignore Extra Columns in Visual Integrator.

ignore_quotes Boolean

Ignores the beginning and ending quotes while keeping the embedded quote. This optional attribute should be used only in special cases. By default, if a field starts with a double quote ("), it is stripped away, along with any trailing double quotes. If the delimiter is the comma, all commas within double quotes are kept as part of the column value, rather than being parsed as a delimiter. When ignore_quotes="true", double quotes are passed in for processing, resulting in the display of quotes.

NOTE: This attribute is Ignore Quotes in Visual Integrator.

strict_quotes Boolean

If this attribute is set, delimiters found within a quoted string are always treated as part of the quoted string, as opposed to delimiting a new field in a variable format file. This behavior is always the case when the delimiter is a space, a comma (,) or a semi-colon (;). By default, Integrator will treat other delimiters (like tabs) as a hard stop for the field, with the expectation that quoting in fields is possibly incorrect.

aliases Array of Strings

Defines new column names for the columns already defined in the input data. Format is "oldname=newname". Blanks before or after the columns names will be ignored. Spaces within a column name are acceptable. If newname is blank, then the given column is deleted from the output flow.

NOTE: This attribute is Alias Lines in Visual Integrator.

prefix String Defines a prefix that is prepended to all columns in the flow that are not aliased using the aliases array. If you want a space between the prefix and the column name, include that space in the prefix string definition.
keep_columns Array of Strings Defines a list of columns to be kept by the input object. If this attribute is not used, all columns are kept. The output flow of the object is limited to those columns that are listed, and no excluded columns are available to subsequent process objects. Column names in the keep_columns array should be given after they are aliased or prepended with the prefix string.
rename_duplicates Boolean Creates new column names for duplicate columns names that appear in the input flow for this object. Subsequent columns for a column with the same name as a column name will be given the names name_2 ... name_(n) based on the positional order in the input. If, for some reason, a column in the input flow already has this name, that number will be skipped.

For example, if the input flow already has a column named "DESC_2", the object will name the duplicate column DESC as "DESC_3". The duplicate naming process occurs before attributes defining aliases, prefixes or the columns to keep are applied, so these generated column names can be aliases to another name.

NOTE: This attribute is Rename Duplicates in Visual Integrator.

encoding String

Defines how files names are read and interpreted in terms of character encoding. Values include:

  • auto—The input object sets the encoding based on the file signature and the Unicode state of other objects in the same task.
  • ascii—The characters in the file are interpreted as ISO-8859-1 or Latin1 characters.
  • gb18030—The file is interpreted as Chinese National Standard 18030-2000 characters. The gb18030 encoding option is supported on Windows platforms only.

  • latin1—The characters in the file are interpreted as ISO-8859-1 or Latin1 characters.
  • utf-8—The file is interpreted as UTF-8 Unicode characters.
  • unicode—The file is interpreted as 2-byte Unicode characters (UCS-2) with native byte swapping, unless overridden by a UCS-2 file signature.
  • unicode-be—The file is interpreted as UCS-2 characters in a big-endian fashion.

  • unicode-le—The file is interpreted as UCS-2 characters in a little-endian fashion.

UCS-2 and UTF-8 files can include a Byte Order Mark (BOM) at the beginning of the file to denote the file encoding. These file signatures are defined as follows:

  • UCS-2 Big EndianFE FF
  • UCS-2 Little EndianFF FE
  • UTF-8EF BB BF

File signatures are common for Unicode files on Windows operating systems. If the file input object reads multiple files, the signature of each file determines its encoding.

If the encoding attribute is auto and no signature is found, the encoding is assumed to be latin1 if no other object in the task handles Unicode data and the VI file is not encoded as utf-8 (using the charset 1208 directive). Otherwise, the encoding is assumed to be utf-8.

See also Integrator Unicode Data Support.

trace_after Sub-object

Traces data flows leaving the specified object, which makes debugging scripts easier. This is equivalent to adding a Trace process object immediately after the current object.

See Embedded Trace Object for more on using trace sub-objects.