Builder Output Object
The Integrator Builder output object can feed data directly to the classic Builder. It seamlessly connects the two programs and creates Models without needing to store data in temporary files. A Build Description file is stored as part of the output object and is passed along with the necessary data to Builder for execution.
If a Model is larger than 2 gigabytes, Builder will split the Model into multiple files (referred to as continuation files) to accommodate file systems with a two gigabyte limit. The first file will be named with the Model name and the extension .mdl. After two gigabytes, the Model continuation file will be named <modelname>.md0 and so on until <modelname>.md9. Any files beyond that are named <modelname>.10.mdc up through <modelname>.49.mdc.
This allows for 50 continuation files in addition to the original .mdl file (for a total of 102GB). Builder will automatically treat all continuation files as part of one large Model with no intervention on the part of the administrator or end-user.
NOTE: The capacity of cBase files eliminates the need for continuation files. See cBase Output Object.
Builder Attributes
| Attribute | Type | Description |
|---|---|---|
| output_type (required) |
String | Identifies the object as a Builder output object. The value of the string is "builder". |
| input (required) |
String | Defines the object from which the data flow is arriving. |
| output (required) |
String | Specifies the location and name of the Model to be created by the build process. The file extension is .mdl. For Models larger than 2 gigabytes, Builder will split the Model into multiple files to accommodate file systems with a two gigabyte limit. |
| builder | String | Defines the location of the Builder executable. This can be an absolute pathname or a command name to be found using the operating system path. This attribute defaults
to "builder". See the builtin attribute below. Examples:
NOTE: The Builder object requires Builder version 4.2 (20) or later. |
| builtin | Boolean | This attribute when set to "true" runs Integrator and Builder in a single process using Integrator’s internal Builder component to build a Model. When "false" (default), the script uses the external Builder component to build the Model. On Windows platforms, running Integrator and Builder in a single process is more efficient and can speed up the build time but does not
allow any multiprocessing to occur. When using Visual Integrator to develop a new script, this attribute is set to "true" by default. An existing script opened in Visual Integrator retains the value that was set for the attribute. For example, if this attribute is false in the existing script, it will remain false when opened in Visual Integrator. If this attribute was not defined in an existing script, it is also assumed to be false and will remain false when opened in Visual Integrator. |
| columns | Array of Strings | Defines the names of columns that will be sent to Builder. Normally this attribute does not need to be defined since Integrator will check the various Builder attributes (such as dimensions, summary, info, sort, detail_dimensions, etc.) and send the columns along as necessary. The columns attribute allows the user to override Integrator's list in case the Build Description contains calculated fields or a new attribute that the Integrator does not recognize. |
| description | String | Defines the name of an external Build Description file to use to set the build parameters.
|
| defines | Array of Strings | Contains an array of parameter definitions for the given Description file, similar to how parameters are defined on the Builder command line using the *-define option. Each parameter definition should have the form parm-name parm-value, where parm-name is the parameter in the Description file to be defined with the parm-value. |
| journal | String | Specifies the location and name of the journal file to be created by the build process. Usually placed in a logs directory with a file extension of .jou. The default location is the current directory. Much of what is printed to stdout is captured in the journal file. |
| dimensions | Array of Strings | Contains an array of names, one for each core Dimension to be defined in the Model. These names should each start with a letter, be enclosed in quotes, and be separated by commas. |
| summary | Array of Strings | Contains an array of names, one for each Summary item to be defined in the Model. These names should each start with a letter, be enclosed in quotes, and be separated by commas. |
| info | Array of Strings | Contains an array of sub-strings, one for each Info item to be defined in the Model. Each sub-string consists of the Info item, a colon (:), and the Dimension it relates to. Each should be surrounded by quotes and separated by commas. |
| dimcounts | Array of Strings | Specifies that Builder will precalculate Dimension Counts (DimCounts) for a Dimension. In Diver, DimCounts display the remaining number of unique values for all Dimensions, based on a selected Dimension Value. It is a numeric value that is displayed in the Console, but it can also be added as a calculated column in a tabular. DimCounts deliver a large amount of information, with very little work. However, they are computation-intensive, and can be slow for large data sets with many unique values in the Dimension being counted. Sometimes it may be more efficient to have Builder precalculate some DimCounts. This attribute speeds up Diver access to Dimcounts for non-grouped core Dimensions. |
| sort | Array of Strings | By default, when diving into Model data, Diver sorts low to high in the following order:
To change the sort order use the sort attribute to provide an alternate sort option. Alternate sort is an internal sort within a Model of one column following the default sort of another column. The sort attribute contains an array of sub-strings, one for each alternative sort to be defined in the Model. Each sub-string consists of the Dimension, a colon (:), and the alternative sort column. Each should be surrounded by quotes and separated by commas. An example of an alternate sort would be to sort the Dimension “parts” by “part number” rather than alphabetically. “Part number” need not be a Dimension in the Model (it could be an Info Field), although it does need to be part of each record in the input file. Another example would be a field with values such as Jan-2011, Feb-2011, March-2011, etc.; it needs to be sorted by a month number (1 to 12), otherwise, April-2011 will display first. If using DiveMaster to create a MultiModel, any field you want to use as an alternate sort key must be built into the Model as an Info Field. Keep in mind the following rules for alternate sorts that will be used in a MultiModel:
NOTE: Use caution moving data and code between platforms, since the different
sort sequences may give different results. |
| sum_types | Array of Strings | Specifies the summary types that should be calculated as part of the build. By default, the
summary types are not calculated; only the total summary is calculated. Values include:
Count is always created and represents the number of occurrences of each Dimension, that is the number of rows in the Model. Counts are referred to as DimCounts in the Diver console. |
| index_dimension | String | Defines a Model where the specified Dimension is the only Dimension. Any remaining columns that are not Summary fields are defined as Info Fields based on that Dimension. This attribute creates one-dimensional Models that are suited for lookups or single Dimension index Models. This attribute conflicts with the dimensions and info attributes. |
| index_model | String | When set, Builder will create an index Model, which is a detail Model with every input column included as a detail Dimension. This is equivalent to defining every column in the detail dimensions dimension-list. Since this attribute defines the detail Model structure, it conflicts with the detail_dimensions, detail_summary and detail_info options. |
| journal_append | Boolean | Directs Builder to append the build information to the journal file, rather than overwrite it. This defaults to "false". |
| bucket_dir | String | Specifies the directory location of the temporary work files. |
| output_dict | String | Specifies the location and name of the new Dictionary file that corresponds to the file created by the output_text attribute. |
| output_text | String | Directs Builder to create a new file containing the input data from the current input file with just the columns that are used by this particular build. This is useful with calculated fields, as the text file that is output will contain the calculated field values for each line. See output_dict attribute above. |
| bf | Number or String | The block factor defines how much pre-processing is done by Builder so that it does less work and responds faster. The larger the block factor is, the less pre-processing is done by Builder. The resulting Model will be smaller and the build will finish faster. However, Diver may also respond slower since it is doing the calculations. The default value for the block factor is 100,000
rows. The block factor can be expressed as a percentage of the number of post-squash rows of data. If the block factor is given as a percentage, the block factor is recalculated for the Model being built. When specified as a percentage, the bf_minimum and bf_maximum are made available. The integer number should be specified without commas, for example, bf=500000. If a numeric percentage is specified, it should be in quotes (for example, bf="20%" or bf="0.5%"). |
| bf_minimum | Numeric | Defines the minimum block factor to use when the block factor is given as a percentage of the input records. If the calculated block factor is less than this minimum, the minimum is used instead. The default minimum is 10,000. |
| bf_maximum | Numeric | Defines the maximum block factor to use when the block factor is given as a percentage of the input records. If the calculated block factor is greater than this maximum, the maximum is used instead. There is no default maximum. |
| skip_build | Array of Strings |
Specifies a set of Dimension values that will be skipped when summarizing the data in Phase 6 of the build process. When Builder summarizes the data for the given Dimension value, it will use a large Blocking Factor, optionally given by the skip_block_factor attribute. This will reduce the amount of pre-processing Builder does for these data values. Diving into these values will be slower. Use this option for Dimension values that are not expected to be dived on regularly by users. This attribute contains an array of sub-strings, one for each Dimension value to be skipped. The sub-string is in the form <dimension:dimension-value>, where dimension is
one of the Dimensions in Builder, and dimension-value is the value for that Dimension. If
dimension is "*", the specified value is used for all Dimensions. Each sub-string should be surrounded by quotes and separated by commas. |
| skip_block_factor | Numeric | Specifies the block factor to be used for skipped Dimension values. This attribute defaults to the number of records in the build, which is the largest possible value. |
| ws | Numeric | Overrides the default working storage (also referred to as working set and working space) area. The integer number should be specified without commas. The default working storage is 30,000,000 bytes. |
| dump_key | Boolean | This attribute assists with debugging by giving a profile of the data. When set to "true", Builder dumps the statistical results and key distribution information of the build from Phase 2 to a journal file. This includes the number of distinct Dimension values and the largest number of repeated values for each Dimension, the so-called max clump. Also dumped to the journal are lines causing parsing errors, data integrity problems, and summary data issues. This information can be used to help tune the Model build process. Only the first 100 errors encountered are written to the journal. However, if there are too many Info Field warnings, one warning per new field will be shown, regardless of whether the 100 warning limit has been reached. |
| keep_buckets | Boolean | Specifies, if set to "true", that temporary sort files, the "buckets", should not be deleted after the build. This is useful when looking into build disk space problems, but in general is not recommended. |
| parse_only | Boolean | Specifies whether or not Builder checks the input file for errors without creating a Model. When set to "true", Builder checks for errors. When set to "false", Builder does not check for errors. |
| quiet | Boolean | When set to "true", phase progress information during the build is suppressed. When set to "false", Builder does not suppress information. By default, this attribute is not set indicating build information is displayed. |
| remove_on_fail | Boolean | Specifies the action for a failed build. When set to "true" and the Model fails to build, the Model will be automatically deleted. This prevents users from accessing incomplete or imperfect data sets. If a user accesses an incomplete data set on-line, a “Model did not finish” message is displayed. It should also be noted that when building a new model.mdl, any existing model.mdl is first overwritten by Builder with no records, and the build work continues. Failure to finish successfully does not restore the original model.mdl, so plan accordingly. On Windows, if a Model is open in DiveLine when a rebuild fails with this option, it will not disappear from disk until DiveLine closes it. When not set or set to “false”, the Model is not deleted (default). |
| squash | Boolean | When set to "true", allows the Builder to collapse duplicate rows, while totaling the Summaries. This attribute reduces the build time, memory requirements, and space required for the Model when there are many duplicate records. Setting this attribute to "true" deactivates the calculation of Maximum, Minimum, and Standard Deviation columns in the Model. The count of records in the Model reflects the number of records after the squash. When not set or set to "false" (default), duplicate rows are not collapsed. |
| no_build | Boolean | When set to "true", Builder processes options but does not build an actual Model. This is useful to create or test a new Build Description without running the actual build. Default is "false". |
| allow_empty | Boolean | Allows empty Models to be built when a script returns no records due to an empty set of parameters being used to filter the data. When set to "true", empty Models will be built. When set
to "false" or not set (default), they are not. Failed model builds may still produce invalid model files when allow_empty is set to "false". Set the remove_on_fail attribute to "true" to clean up such files. |
| detail_dimensions | Array of Strings | Contains an array of names, one for each core Detail Dimension to be defined in the Model. These names should each start with a letter, be enclosed in quotes, and be separated by commas. |
| detail_summary | Array of Strings | Contains an array of names, one for each Detail Summary item to be defined in the Model. These names should each start with a letter, be enclosed in quotes, and be separated by commas. |
| detail_info | Array of Strings | Contains an array of sub-strings, one for each Detail Info item to be defined in the Model. Each sub-string consists of the Info item, a colon (:), and the Detail Dimension it relates to. Each should be surrounded by quotes and separated by commas. |
| comments | String | Allows the Model creator to include text information that will later be available through Diver’s Info option in the Open Model dialog, or through the variable $COMMENTS used in Diver Reports or Report Palettes. For more information, see the ProDiver Help > About Text Variables. |
| footer | String | Specifies a string of text to be used by Diver as the value of the variable $MODEL_FOOTER, used in Diver Reports and Report Palettes. For more information, see the ProDiver Help > About Text Variables. |
| extract_time | String |
Defines an extract_time that describes when the input data was extracted from the source data. NOTE: The build time is recorded automatically and is available in Diver Reports using $BUILD_DATE and $BUILD_TIME. |
| model_vars | Array of Strings | Contains an array of sub-strings, one for each user-defined Model variable to be set in the Model. Each sub-string consists of the variable name, an equal (=) sign, and the variable value. Each should be surrounded by quotes and separated by commas. These model variables are available in the Diver through the $MODELVAR(<variable name>) reporter variable used in Reports and Report Palettes. For more information, see the ProDiver Help > About Text Variables. |
| update | Boolean | Directs the build. When set to "true", performs an incremental build, meaning that the additional data is placed in a separate Model update file (xxx.mdu). When the corresponding Model file (xxx.mdl) is opened, the incremental data is seamlessly included in the Diver data. See the compact attribute. |
| compact | Boolean | Directs the build. When set to "true", directs the Builder to merge incremental Model files with the main Model by performing a complete rebuild. See the update attribute. |
| ignore_line_end | Boolean | Controls end-of-line processing. In a fixed format file, records shorter than the last field defined in a record are usually flagged as parse errors. If set to "true", this option tells Builder to treat the fields missing from the end of the record as blank, and to continue the build. Similarly, for a variable delimited format file, if a row has fewer than the expected number of elements resulting in parse errors, setting this attribute to "true" allows Builder to treat the missing fields as blank and to continue the build. This attribute is intended for development and testing only. |
| ignore_parse_errors | Boolean | Controls how parsing errors are handled. If set to "true", parsing errors encountered while processing the input file will be flagged, but Builder will continue. Lines with parse errors will be ignored, i.e., data will be eliminated from the build. This option is useful for preliminary tests on data, allowing builds to continue while the data is being cleaned up. Defaults to "false". This attribute is intended for development and testing only. |
| model_encoding | String |
Controls the encoding for the resulting model. Values include:
|
| trace_before | Sub-object | Traces data flows entering the specified object. This is equivalent to adding the Trace process object immediately before the current output object. See Embedded Trace Object. |