Builder Output Object

The Integrator Builder output object can feed data directly to the classic Builder. It seamlessly connects the two programs and creates Models without needing to store data in temporary files. A Build Description file is stored as part of the output object and is passed along with the necessary data to Builder for execution.

If a Model is larger than 2 gigabytes, Builder will split the Model into multiple files (referred to as continuation files) to accommodate file systems with a two gigabyte limit. The first file will be named with the Model name and the extension .mdl. After two gigabytes, the Model continuation file will be named <modelname>.md0 and so on until <modelname>.md9. Any files beyond that are named <modelname>.10.mdc up through <modelname>.49.mdc.

This allows for 50 continuation files in addition to the original .mdl file (for a total of 102GB). Builder will automatically treat all continuation files as part of one large Model with no intervention on the part of the administrator or end-user.

NOTE: The capacity of cBase files eliminates the need for continuation files. See cBase Output Object.

Builder Attributes

Attribute Type Description
output_type
(required)
String Identifies the object as a Builder output object. The value of the string is "builder".
input
(required)
String Defines the object from which the data flow is arriving.
output
(required)
String Specifies the location and name of the Model to be created by the build process. The file extension is .mdl. For Models larger than 2 gigabytes, Builder will split the Model into multiple files to accommodate file systems with a two gigabyte limit.
builder String Defines the location of the Builder executable. This can be an absolute pathname or a command name to be found using the operating system path. This attribute defaults to "builder". See the builtin attribute below.
Examples:
  • Windows: "c:\\di_solution\\executables\\builder.exe"
  • UNIX: "~/di_solution/executables/builder.exe"
  • OS/400: "/qsys.lib/atlantis.lib/builder.pgm"

NOTE: The Builder object requires Builder version 4.2 (20) or later.

builtin Boolean This attribute when set to "true" runs Integrator and Builder in a single process using Integrator’s internal Builder component to build a Model. When "false" (default), the script uses the external Builder component to build the Model. On Windows platforms, running Integrator and Builder in a single process is more efficient and can speed up the build time but does not allow any multiprocessing to occur.
When using Visual Integrator to develop a new script, this attribute is set to "true" by default. An existing script opened in Visual Integrator retains the value that was set for the attribute. For example, if this attribute is false in the existing script, it will remain false when opened in Visual Integrator. If this attribute was not defined in an existing script, it is also assumed to be false and will remain false when opened in Visual Integrator.
columns Array of Strings Defines the names of columns that will be sent to Builder. Normally this attribute does not need to be defined since Integrator will check the various Builder attributes (such as dimensions, summary, info, sort, detail_dimensions, etc.) and send the columns along as necessary. The columns attribute allows the user to override Integrator's list in case the Build Description contains calculated fields or a new attribute that the Integrator does not recognize.
description String Defines the name of an external Build Description file to use to set the build parameters.
  • If this attribute is present, Integrator will read the description object (Builder type 'DESC') named "Main" from this file and send it to Builder. However, any build attributes defined in the Builder output object will override build attributes defined in the Build Description file.
  • If not used, all necessary build attributes should appear in the Builder output object.
defines Array of Strings Contains an array of parameter definitions for the given Description file, similar to how parameters are defined on the Builder command line using the *-define option. Each parameter definition should have the form parm-name parm-value, where parm-name is the parameter in the Description file to be defined with the parm-value.
journal String Specifies the location and name of the journal file to be created by the build process. Usually placed in a logs directory with a file extension of .jou. The default location is the current directory. Much of what is printed to stdout is captured in the journal file.
dimensions Array of Strings Contains an array of names, one for each core Dimension to be defined in the Model. These names should each start with a letter, be enclosed in quotes, and be separated by commas.
summary Array of Strings Contains an array of names, one for each Summary item to be defined in the Model. These names should each start with a letter, be enclosed in quotes, and be separated by commas.
info Array of Strings Contains an array of sub-strings, one for each Info item to be defined in the Model. Each sub-string consists of the Info item, a colon (:), and the Dimension it relates to. Each should be surrounded by quotes and separated by commas.
dimcounts Array of Strings Specifies that Builder will precalculate Dimension Counts (DimCounts) for a Dimension. In Diver, DimCounts display the remaining number of unique values for all Dimensions, based on a selected Dimension Value. It is a numeric value that is displayed in the Console, but it can also be added as a calculated column in a tabular. DimCounts deliver a large amount of information, with very little work. However, they are computation-intensive, and can be slow for large data sets with many unique values in the Dimension being counted. Sometimes it may be more efficient to have Builder precalculate some DimCounts. This attribute speeds up Diver access to Dimcounts for non-grouped core Dimensions.
sort Array of Strings By default, when diving into Model data, Diver sorts low to high in the following order:
  1. numbers
  2. uppercase characters
  3. lowercase characters

To change the sort order use the sort attribute to provide an alternate sort option. Alternate sort is an internal sort within a Model of one column following the default sort of another column. The sort attribute contains an array of sub-strings, one for each alternative sort to be defined in the Model. Each sub-string consists of the Dimension, a colon (:), and the alternative sort column. Each should be surrounded by quotes and separated by commas.

An example of an alternate sort would be to sort the Dimension “parts” by “part number” rather than alphabetically. “Part number” need not be a Dimension in the Model (it could be an Info Field), although it does need to be part of each record in the input file.

Another example would be a field with values such as Jan-2011, Feb-2011, March-2011, etc.; it needs to be sorted by a month number (1 to 12), otherwise, April-2011 will display first.

If using DiveMaster to create a MultiModel, any field you want to use as an alternate sort key must be built into the Model as an Info Field.

Keep in mind the following rules for alternate sorts that will be used in a MultiModel:

  • The alternate sort column should be an Info Field on the Dimension it sorts.
  • The alternate sort Info Field should appear in all Models that contain the corresponding Dimension.
  • There should be a 1-1 (one-to-one) mapping between Dimension values and alternate sort values. Do not assign the same sort value to different Dimension values. Do not assign multiple sort values to the same Dimension value.
  • Use a consistent mapping between different Models. For example, if you map "January" => 1, "February" => 2, in one Model, do not map "January" => 13, "February" => 14 in another Model.

NOTE: Use caution moving data and code between platforms, since the different sort sequences may give different results.
In general, the sort order varies by platform. For example, Windows and UNIX systems use ASCII characters, while the OS/400 uses EBCDIC coding. The sort sequences are not the same.
Use caution when creating an alternate sort on a Model that will be used in a MultiModel. Some alternate sorts will work on a single Model but produce unexpected results in a MultiModel situation. To merge the Dimension values in the proper order, the alternate sort information should be stored in each Model as Info Fields.

sum_types Array of Strings Specifies the summary types that should be calculated as part of the build. By default, the summary types are not calculated; only the total summary is calculated. Values include:
  • minimum—The minimum number of summary types are calculated.
  • maximum—The maximum number of summary types are calculated.
  • std dev—The sum of the squares of the column values is calculated.

Count is always created and represents the number of occurrences of each Dimension, that is the number of rows in the Model. Counts are referred to as DimCounts in the Diver console.

index_dimension String Defines a Model where the specified Dimension is the only Dimension. Any remaining columns that are not Summary fields are defined as Info Fields based on that Dimension. This attribute creates one-dimensional Models that are suited for lookups or single Dimension index Models. This attribute conflicts with the dimensions and info attributes.
index_model String When set, Builder will create an index Model, which is a detail Model with every input column included as a detail Dimension. This is equivalent to defining every column in the detail dimensions dimension-list. Since this attribute defines the detail Model structure, it conflicts with the detail_dimensions, detail_summary and detail_info options.
journal_append Boolean Directs Builder to append the build information to the journal file, rather than overwrite it. This defaults to "false".
bucket_dir String Specifies the directory location of the temporary work files.
output_dict String Specifies the location and name of the new Dictionary file that corresponds to the file created by the output_text attribute.
output_text String Directs Builder to create a new file containing the input data from the current input file with just the columns that are used by this particular build. This is useful with calculated fields, as the text file that is output will contain the calculated field values for each line. See output_dict attribute above.
bf Number or String The block factor defines how much pre-processing is done by Builder so that it does less work and responds faster. The larger the block factor is, the less pre-processing is done by Builder. The resulting Model will be smaller and the build will finish faster. However, Diver may also respond slower since it is doing the calculations. The default value for the block factor is 100,000 rows.
The block factor can be expressed as a percentage of the number of post-squash rows of data. If the block factor is given as a percentage, the block factor is recalculated for the Model being built. When specified as a percentage, the bf_minimum and bf_maximum are made available.
The integer number should be specified without commas, for example, bf=500000. If a numeric percentage is specified, it should be in quotes (for example, bf="20%" or bf="0.5%").
bf_minimum Numeric Defines the minimum block factor to use when the block factor is given as a percentage of the input records. If the calculated block factor is less than this minimum, the minimum is used instead. The default minimum is 10,000.
bf_maximum Numeric Defines the maximum block factor to use when the block factor is given as a percentage of the input records. If the calculated block factor is greater than this maximum, the maximum is used instead. There is no default maximum.
skip_build Array of Strings

Specifies a set of Dimension values that will be skipped when summarizing the data in Phase 6 of the build process. When Builder summarizes the data for the given Dimension value, it will use a large Blocking Factor, optionally given by the skip_block_factor attribute. This will reduce the amount of pre-processing Builder does for these data values. Diving into these values will be slower. Use this option for Dimension values that are not expected to be dived on regularly by users.

This attribute contains an array of sub-strings, one for each Dimension value to be skipped. The sub-string is in the form <dimension:dimension-value>, where dimension is one of the Dimensions in Builder, and dimension-value is the value for that Dimension. If dimension is "*", the specified value is used for all Dimensions. Each sub-string should be surrounded by quotes and separated by commas.
For example, "*:" will match blank values for all Dimensions, "Product Line:Other" will match the value "Other" for Dimension "Product Line", and "Customer:" will skip blank values in the Customer Dimension.

skip_block_factor Numeric Specifies the block factor to be used for skipped Dimension values. This attribute defaults to the number of records in the build, which is the largest possible value.
ws Numeric Overrides the default working storage (also referred to as working set and working space) area. The integer number should be specified without commas. The default working storage is 30,000,000 bytes.
dump_key Boolean This attribute assists with debugging by giving a profile of the data. When set to "true", Builder dumps the statistical results and key distribution information of the build from Phase 2 to a journal file. This includes the number of distinct Dimension values and the largest number of repeated values for each Dimension, the so-called max clump.
Also dumped to the journal are lines causing parsing errors, data integrity problems, and summary data issues. This information can be used to help tune the Model build process.
Only the first 100 errors encountered are written to the journal. However, if there are too many Info Field warnings, one warning per new field will be shown, regardless of whether the 100 warning limit has been reached.
keep_buckets Boolean Specifies, if set to "true", that temporary sort files, the "buckets", should not be deleted after the build. This is useful when looking into build disk space problems, but in general is not recommended.
parse_only Boolean Specifies whether or not Builder checks the input file for errors without creating a Model. When set to "true", Builder checks for errors. When set to "false", Builder does not check for errors.
quiet Boolean When set to "true", phase progress information during the build is suppressed. When set to "false", Builder does not suppress information. By default, this attribute is not set indicating build information is displayed.
remove_on_fail Boolean Specifies the action for a failed build. When set to "true" and the Model fails to build, the Model will be automatically deleted. This prevents users from accessing incomplete or imperfect data sets. If a user accesses an incomplete data set on-line, a “Model did not finish” message is displayed.
It should also be noted that when building a new model.mdl, any existing model.mdl is first overwritten by Builder with no records, and the build work continues. Failure to finish successfully does not restore the original model.mdl, so plan accordingly.
On Windows, if a Model is open in DiveLine when a rebuild fails with this option, it will not disappear from disk until DiveLine closes it. When not set or set to “false”, the Model is not deleted (default).
squash Boolean When set to "true", allows the Builder to collapse duplicate rows, while totaling the Summaries. This attribute reduces the build time, memory requirements, and space required for the Model when there are many duplicate records. Setting this attribute to "true" deactivates the calculation of Maximum, Minimum, and Standard Deviation columns in the Model. The count of records in the Model reflects the number of records after the squash. When not set or set to "false" (default), duplicate rows are not collapsed.
no_build Boolean When set to "true", Builder processes options but does not build an actual Model. This is useful to create or test a new Build Description without running the actual build. Default is "false".
allow_empty Boolean Allows empty Models to be built when a script returns no records due to an empty set of parameters being used to filter the data. When set to "true", empty Models will be built. When set to "false" or not set (default), they are not.
Failed model builds may still produce invalid model files when allow_empty is set to "false". Set the remove_on_fail attribute to "true" to clean up such files.
detail_dimensions Array of Strings Contains an array of names, one for each core Detail Dimension to be defined in the Model. These names should each start with a letter, be enclosed in quotes, and be separated by commas.
detail_summary Array of Strings Contains an array of names, one for each Detail Summary item to be defined in the Model. These names should each start with a letter, be enclosed in quotes, and be separated by commas.
detail_info Array of Strings Contains an array of sub-strings, one for each Detail Info item to be defined in the Model. Each sub-string consists of the Info item, a colon (:), and the Detail Dimension it relates to. Each should be surrounded by quotes and separated by commas.
comments String Allows the Model creator to include text information that will later be available through Diver’s Info option in the Open Model dialog, or through the variable $COMMENTS used in Diver Reports or Report Palettes. For more information, see the ProDiver Help > About Text Variables.
footer String Specifies a string of text to be used by Diver as the value of the variable $MODEL_FOOTER, used in Diver Reports and Report Palettes. For more information, see the ProDiver Help > About Text Variables.
extract_time String

Defines an extract_time that describes when the input data was extracted from the source data.
If extract_time has the value "now", the extract time will be set to be the build time. Otherwise, extract_time should be given a date in the form of "YYYY-MM-DD" or "YYYY/MM/DD", or a date and time in the format of "YYYY-MM-DD HH:MM:SS" or "YYYY/MM/DD HH:MM:SS". If only a date is given, the time will be set to 12 noon of that date. Used in Diver Reports or Report Palettes to populate the $EXTRACT_DATE and $EXTRACT_TIME variables. For more information, see the ProDiver Help > About Text Variables.

NOTE: The build time is recorded automatically and is available in Diver Reports using $BUILD_DATE and $BUILD_TIME.

model_vars Array of Strings Contains an array of sub-strings, one for each user-defined Model variable to be set in the Model. Each sub-string consists of the variable name, an equal (=) sign, and the variable value. Each should be surrounded by quotes and separated by commas. These model variables are available in the Diver through the $MODELVAR(<variable name>) reporter variable used in Reports and Report Palettes. For more information, see the ProDiver Help > About Text Variables.
update Boolean Directs the build. When set to "true", performs an incremental build, meaning that the additional data is placed in a separate Model update file (xxx.mdu). When the corresponding Model file (xxx.mdl) is opened, the incremental data is seamlessly included in the Diver data. See the compact attribute.
compact Boolean Directs the build. When set to "true", directs the Builder to merge incremental Model files with the main Model by performing a complete rebuild. See the update attribute.
ignore_line_end Boolean Controls end-of-line processing. In a fixed format file, records shorter than the last field defined in a record are usually flagged as parse errors. If set to "true", this option tells Builder to treat the fields missing from the end of the record as blank, and to continue the build. Similarly, for a variable delimited format file, if a row has fewer than the expected number of elements resulting in parse errors, setting this attribute to "true" allows Builder to treat the missing fields as blank and to continue the build. This attribute is intended for development and testing only.
ignore_parse_errors Boolean Controls how parsing errors are handled. If set to "true", parsing errors encountered while processing the input file will be flagged, but Builder will continue. Lines with parse errors will be ignored, i.e., data will be eliminated from the build. This option is useful for preliminary tests on data, allowing builds to continue while the data is being cleaned up. Defaults to "false". This attribute is intended for development and testing only.
model_encoding String

Controls the encoding for the resulting model. Values include:

  • auto—The builder output object will set the encoding based on the Unicode state of other objects in the same task. If no other object deals with Unicode data, and the script itself is not encoded as Unicode, then the data will be stored in the model as ISO-8859-1 or Latin1 characters (default).
  • ascii—The data in the model will be stored as ISO-8859-1 or Latin1 characters.
  • latin1—The data in the model will be stored as ISO-8859-1 or Latin1 characters.
  • utf-8—The data in the model will be stored as UTF-8 Unicode characters.
  • unicode—The data in the model will be stored as UTF-8 Unicode characters.
trace_before Sub-object Traces data flows entering the specified object. This is equivalent to adding the Trace process object immediately before the current output object. See Embedded Trace Object.