Text, cBase, and Dive Input Build Tags

A Spectre Build script is defined using a build tag with optional tags between open and close { } braces. This page describes the build tags that can be used within the text-input, cbase-input, tunnel-input, excel-input, json-input, pgsql-input, datagen-input and dive-input code blocks. The tags are listed in alphabetical order.

Text, cBase, and Dive Input Build Tags

Tags Examples Notes
build

build {

...

}

Defines the block for this build description.

calendar

column "Quarter" type="period" calendar="standard" period-type="year-quarter"

 

column "Fiscal Month" type="period" calendar="offset August" period-type="year-month"

Assigns a calendar and period-type to the column. Valid calendars are:

  • standard—Gregorian calendar; period type options are year, year-half, year-quarter, year-tertile, and year-month.
  • offset <month>—Fiscal year similar to the standard calendar except starting on a different month; period type options are the same as standard calendars.
  • iso 8601—ISO week-numbering system; period type options are year and year-week.
  • custom—Custom calendar created from a text file with a contiguous range of dates; required columns are date and year, optional columns are half, quarter, tertile, month, and week.

See General Build Tags for a different use of calendar as a Build tag for calendar declarations.

calendar

(Deprecated)

column "Sales Quarter" type="period" calendar="gregorian month" format="YYYY/MM" input-format="YYYY/M"

 

column "Accounting Period" type="period" calendar="iso8601 week"

 

column "Order Month" type="period" calendar="gregorian+9 month"

 

column "Sales Month" type="period" calendar="my-custom-calendar.cbase"

Supports a period type column. Valid calendars are:
  • gregorian month—Gregorian year-month
  • gregorian quarter—Gregorian year-quarter
  • iso8601 week—ISO-8601 year-week
  • gregorian+<N> month—Offset Gregorian year-month for fiscal calendars. Specifying "gregorian+6 month" means "add six months to the real year-mo to get the fiscal year-mo", which is like saying "The fiscal year starts in July".
  • <custom calendar file>—For more tailored fiscal calendars
  • See Spectre Calendars.

cbase-input

build {

text-input "monthly.txt"

cbase-input "/cbases/revenue.cbase"

output "/cbases/revenue_plus.cbase"

}

Specifies one or more cBases as a source of data. Can be combined with text-input. Columns do not need to match. Columns in secondary inputs need to be explicitly declared to be included. Missing columns are treated as nulls unless an add operation is included in the build block. See the text-input tag later in this table and the Build Operations Tags in General Build Tags.

column

column "Order Type" type="string"

column "Liters Per Case" type="double"

column "Event Day" type="date" format="MMMM D, YYYY"

column "Return Date" type="datetime"

column "Times" type="datetime" format="YYYY/MM/DD hh:mm:ss {zone:short}" time-zone="America/New_York"

column "Sales-Period" type="period" calendar="iso8601 week"

column "Revenue" type="fixed100" format="$#,#.00"

column "Invoice-Request" type="boolean" format="Y;N"

column "Ship Date" input-format="YYYY/MM/DD" format="YYYY-MM-DD" type="date"

column "Price" type="fixed100"

 

lookup {

text-input "lookup.txt"

key "A" "A"

column "B" suggested-dimension=false

}

Defines the columns for the cBase when using text-input or odbc-input.

  • If column headings exist, Spectre attempts to auto-detect the type; however, it is a good idea to check these and correct as necessary to ensure faster processing and less memory usage.
  • If column headings do not exist, you must specify all columns for Spectre to know the names. The empty string cannot be used as a column name. Column names are case insensitive.
  • If no columns are specified, all are brought in.
  • It is a best practice to specify which columns to keep rather than which to drop.
  • If a column is declared but not found in the input, the build fails. If multiple inputs are in use and some of them include the column, then the build succeeds by assigning null values.

Column tags are often followed by type, calendar, format, duration, or input-format options to clarify the data being read and added to the cBase. Specifying the type is a best practice.

When used in a lookup block, column supports the suggested-dimension tag. This allows you to flag the additional column as an optional dimension, avoiding the dimcount process used for ProDiver's console display.

NOTE: The column tag cannot be used with cbase-input, but it can be used with tunnel-input.

custom-property

column "units_sold" type="integer" {

custom-property "My Prop" "A"

}

custom-property "Abbreviation" "Dir"

custom-property "Footer" "ABC Company®, Copyright 2019."

Defines custom properties for the cBase. Custom properties are not used by Spectre but are available for users to add their own metadata to Spectre objects. Custom properties can be referenced in Spectre expressions using functions like column_custom_property(), table_custom_property(), or calc_custom_property(), depending on the context.

See Spectre Properties.

data

list-input delimiter="," {

data """

Zip Code,Location

90210,"Beverly Hills, CA"

99501,"Anchorage, AK"

33162,"Miama, FL"

"""

column "Zip Code" type="string"

column "Location" type="string"

}

Defines the subblock for list-input. Used to define your data elements inline in the Build script. See About Spectre List Input.
datagen-input

datagen-input count=100 {

column "ID" {

sequence start=10 end=25

}

column "Name" {

name

}

}

Generates random data in specified columns. For a list of all attributes, see Spectre Datagen Input Properties.
definition

column "Str" {

property "Definition" "Letters starting with A"

property "Reference" "The Alphabet"

}

Adds the system property Definition for the column.
delimiter

build {

text-input "/data/sales.txt" delimiter="," {
...
}

}

Specifies the delimiter for the flat file. If delimiter is not specified, Spectre attempts to auto-detect the value. Best practice is to specify the delimiter value. The value must be a string and is either a single character (for example, "\t" for tab, "," for comma, " " for space, ";" for semicolon, "|" for pipe) or the special strings "tab" or "comma".

description

column "Str" {

property "Description" "Column Desc"

}

Adds the system property Description for the column.
dict-file

build {

text-input "basic.txt" dict-file="basic.dic"

output "basic.cbase"

}

Specifies the path to a dictionary file to use with text-input. Dictionary files are used to specify column names when the input lacks column headers. A sample dic file:

version "1";

// Computer generated object language file

object 'DICT' "Main" {

type="variable",

delimiter=" ",

columns={

{ Name="Colour" },

{ Name="Taste" },

{ Name="Weight" },

{ Name="Value" }

}

};

NOTE: Dictionary files are considered legacy functionality—not all options are supported.

disable-date-range-warnings column "MyDate" type="date" disable-date-range-warnings=true Sets a flag for a date column. The default is False. When it is set to True, it does not warn about dates prior to 1900 or after 2199. Available in 7.1(19).
disable-similar-values-warning column "Address line 1" type="string" disable-similar-values-warnings=true Turns off data warnings of similar values for this column. Use only after data validation steps have determined that the warnings are in fact not worrisome.
dive-input

build {

cbase-input "Sales.cbase"

dive-input "Profit.dive"

output "Sales-and-Profit.cbase"

}

build {

cbase-input "Sales.cbase"

dive-input {

dive {

cplan "/cplans/transactions.cplan"

window {

dimension "Dept"

dimension "Item Type"

column "Total Charges"

column "Total Payments"

}

}

}

output "/cbases/SalesExpanded.cbase"

}

Specifies a dive file to use as input. The Dive is executed and the result used as a build input verbatim. Multiple dive-input tags are allowed and mix freely with the other inputs. Use a Dive as input to squash the data, that is, create a single record for a series of dimensions.

TIP: When an input lookup file has duplicate keys, you can dive into the input to remove the duplicates and lookup against that instead.

NOTE: There are limits to the types of dive windows that can be passed to Spectre build, for example, a crosstab Dive is not supported.

drop-column drop-column "Customer Name"

Drops the named column from the input file when creating the cBase.

When using text-input, if you want to specify only which columns to keep, so that the build drops all others, use the column tag to specify the columns.

Keep in mind that dropping columns only affects the detail file size and build time, it does not affect the performance of dives.

duration column "Visit Duration" type="integer" duration=true

Indicates that the numeric value is meant to be displayed as a time duration when set to true. The format is h:mm:ss.

Use with add tags on any input, or with column declarations on text-input.

encoding

text-input "/data/sales.txt" encoding="utf-8" {

...

}

Defines the encoding used in the input and output files. Encoding types supported include: cp-1252, iso-8859-1, utf-16be, utf-16le, utf-8, and gb18030.

excel-input

excel-input "/data/sales.xlsx" sheet="Region1" range="A1:D25" headers=true {

...

}

Specifies an xlsx file to use as input, along with options for interpreting that data in the file. The following attributes are supported for use:

  • sheet—Specifies which sheet in the Excel file to pull data from. The value can be either a name or a number. Spectre checks for a name match first. If that fails, it will try treating sheet as a number. Name matching ignores case, and if multiple sheets have the same name it will only look at the first one that matches. If this value is empty, then the first worksheet in the Excel file is used.

  • range—Specifies a range of cells to use from the input. Any cells outside of the range are ignored. The format for the range is the top-left cell (column letter and row number), followed by a colon, followed by the bottom-left cell (column letter and row number). For example:

    A1:D25 returns the 25 rows from the first column (column A) through the fourth column (column D)

  • limit-rows—Directs the build to use a specific number of rows from the Excel file.

  • headers—Indicates the presence or absence of column headings in the Excel file. Set to false to name each column manually.

  • start-row—Defines the row number to begin reading data from. If this value is empty, Spectre attempts to automatically determine which row represents the start.

    NOTE: If the headers value is set to true, the value of start-row should be the row number the header is on.

  • password—Provides a password if the Excel file is password-protected. It is not recommended to use password-protected Excel files as data sources.

  • trim—Removes any leading or trailing spaces from the data. This is the default behavior for Spectre. Set to false to disable the trimming of white space.

file-list

text-input file-list="list-of-files.txt" {

...

}

Indicate file list input. As a sub tag for text-input and, as of Spectre 7.2(15), cbase-input, a text file is indicated as the source of multiple input files. This text file has one file name per line. File paths therein are relative to the text file.
filename-column

text-input "data-*.txt" {

...

filename-column "Text Source"

}

cbase-input "data-*.cbase" {

...

filename-column "Source File"

}

Adds a column with the input filename. When reading multiple files, this tag captures the filename on each row in the cBase. In this first example, the resulting "Text Source" column might have values like "data-1.txt" and "data-2.txt".
filter

build {

text-input "data.txt" {

column "ID" type="string"

column "Status" type="string"

column "Status date" type="date"

}

sort {

column "ID"

column "Status date" reverse=true

}

filter `on_previous_row(value("ID")) != value("ID")`

output "data.cbase"

}

Filters out records that are not needed for the cBase.

This example sorts on two columns and checks for a change in value to filter out all but the most recent record for the ID.

format

column "Request Date" type="date" format="YYYY-MM-DD" input-format="YYYY-M-D"

column "fiscal quarter" type="period" calendar="MyFiscal" format="YYYY-\\F\\QQQ"

column "Ship Quarter" type="period" calendar="gregorian quarter" format="YYYY-\\QQ"

column "year_month" type="period" calendar="gregorian month" format="YYYY-MM"

 

column "Price" format="$#,0.00"

column "Revenue" type="fixed100" format="$#,#.00"

 

column "Active" type="boolean" format="T;F"

 

column "Times" type="datetime" format="YYYY/MM/DD hh:mm:ss {zone:short}" time-zone="America/New_York"

Sets both the display format and the input properties, unless a separate input-format is specified.

Formats the 'date' or 'period' data for the cBase column.

  • To store dates like July 4, 2023, use format="MMMM D, YYYY".
  • To include letters such as FQ (fiscal quarter), as in YYYY-FQQQ, backslashes are needed to force the characters to be taken literally, and another backslash is needed to escape each backslash in the string.

Formats numeric data, for example as currency.

Also indicates how to format Boolean data in the cBase column.

  • The default format is "True;False".
  • Other alternatives are "T;F", "Yes;No", and "true".
  • If the format has a blank after the semicolon, or if there is no semicolon, then blank values are considered false. Otherwise, when there is a semicolon and the string after it is not empty, blank values are parsed as null.

Formats the datetime data.

  • To indicate the data is zoned time and should be stored as GMT, use the zone option.
  • To specify the default time zone for display, use the time-zone tag.

See also: Spectre Format Conventions.

good direction

column "Num" {

property "Good Direction" "Down"

}

Defines the system property Good Direction for the named column. The string must be Up or Down. Available starting with Spectre 7.1(18).
headers

text-input "/data/sales.txt" headers=false {

...

}

Indicates the presence or absence of column headings in the flat file. Columns with the heading "" are ignored. Set to False if the input file does not contain headings. Be sure to list each column by name and type.
include-other-columns

build {

text-input "basic.txt" {

column "Color" label "Favorite"

include-other-columns

}

output "basic.cbase"

}

lookup {

text-input "lookup.txt" {

include-other-columns suggested-dimension=false

}

key "A" "A"

column "B"

}

include-other-columns type="string"

include-other-columns type="integer" summary-type="sum"

Ensures no columns are dropped. Use when you explicitly define a few columns with special properties and then want to ensure that all data is included in the build.

When used in a lookup block, include-other-columns supports the suggested-dimension tag. This allows you to flag the additional columns as optional dimensions, avoiding the dimcount process used for ProDiver's console display.

The include-other-columns tags are often followed by type and summary-type options to clarify the data being read and added to the cBase.

If no columns are specified in a Build script, include-other-columns is assumed to avoid building a cBase with no columns.

input-format

column "Date" type="date" format="MMM D, YYYY" input-format="YYYY-MM-DD"

 

column "Log-time" type="datetime" input-format="YYYY/MM/DD hh:mm:ss {zone:short}" time-zone="America/New_York"

 

column "Active" type="boolean" format="yes;no" input-format="Y;N"

Clarifies the incoming data.

  • For dates and periods, the default format is YYYY/MM/DD. Use input-format to define the date format in the input file to ensure correct interpretation of date data.
  • For datetime data with time zones, use input-format with the zone option to save the data in the cBase in GMT. Use the time-zone option to specify the default display. SeeSpectre Format Conventions.
  • For Boolean data, use input-format to define the data as it appears in the input file.

Also see format.

json-input

json-input "/data/sales.json" {

column "product" key="product"

column "revenue" key="revenue"

column "units" key="revenue.0"

}

Specifies a json file to use as input. Each column that is created in the input must be paired with a key value in the json file. The asterisk wildcard character is supported. For more information, see Working with JSON File Format in Spectre .
label column "Color" type="string" label="favorite" Defines the display label used for a column. This is a system property for the column.
limit-line-warnings

text-input "/data/sample.txt" {

limit-line-warnings 20

column "group" type="string"

...

}

Limits the number of lines with problems that are logged before the text reader stops logging the errors.
limit-rows

build {

text-input "basic.txt" limit-rows=100 {

...
}

}

Directs the build to use a specific number of rows from each input file. If multiple input files are used, the first 100 rows are read from each file, ensuring a complete sampling of data values.
TIP: Using limit-rows=1 is an easy way to ensure calcs find all needed columns in the input.
limit-similar-values-warnings

text-input "x.txt" {

limit-similar-values-warnings 2

column "Address line 1" type="string"

...

}

Limits the 'similar-value' check for the entire text-input. The default is 20 warnings per column. Set to zero to turn off all warnings for the input. See also disable-similar-values-warning.
limit-value-warnings

build {

text-input "/data/folder_properties.txt" {

column "folder" type="string"

column "group" type="string"

limit-value-warnings 10

}

...

}

Limits the number of warnings logged for issues with values in the source file.

list-input

build {

list-input {

data """

Mountain "Height (ft)"

"Mount Everest" 29029

K2 28251

Kangchenjunga 28169

Lhotse 27940

Makalu 27838

"""

}

output "peaks.cbase"

}

Specifies an inline list as the data source. For details on how to enter the list data, see About Spectre List Input.

Options are the same as for text-input, except you cannot specify encoding, headers (always true), or filename-column.

The default delimiter is a space rather than a tab. You can specify an alternate delimiter or column type.

modified-column

text-input "data-*.txt" {

...

modified-column "Last Update"

}

Adds a column with GMT, indicating when the data file was last modified. Use with text-input or cbase-input tags.
nulls

column "phone" type ="string" {

nulls "xxx" "N/A"

}

column "order date" type="date" {

nulls "0000/00/00"

}

Maps known values used to represent 'no data' to the null value in the cBase.
pass-parameter

dive-input "insurances.dive" {

pass-parameter "file"

}

Passes a parameter value from the Build to the Dive; the Dive must have take-parameter.
period-type

column "Ship Month" type="period" calendar="standard" period-type="year-month"

column "YearMo" type="period" period-type ="year-month"

Indicates the period type being applied for the column. If the calendar tag is omitted, the default calendar is used. See the calendar tag description.
pgsql-input

pgsql-input {

connect-string "host=localhost port=5432 dbname=main-db user=postgres password=admin"

query "select * from database"

}

pgsql-input {

connect-file "connection.txt"

query "select * from database"

}

Specifies a PostgreSQL database as the input source. To use this input you must specify a connection string composed of the following sections:

  • host

  • port

  • dbname

  • user

  • password

The connection string can be listed directly in the input using the connect-string tag or in a file that is then linked to in the input using the connect-file tag.

The query tag is used to provide a SQL SELECT query that returns the data to be input into Spectre.

For more information about making a connect file, see Options for ODBC Login Credentials

prefix prefix "input." Prefixes all the outputted column names with the specified string.

property

column "units_sold" type="integer" format="#,#" label="Units Sold" {

property "Description" "How many units were sold"

}

Specifies a system property for the column.

Other system properties defined using the property tag include:

  • Definition

  • Reference

  • Good Direction

See Spectre Properties.

quotes

text-input "/data/sales.txt" quotes=true {

...

}

text-input "notes.txt" quotes=false

Indicates how to handle quotes. By default, Spectre looks for quotes and keeps strings intact.

Set quote=false for Spectre to ignore any quotes found in the data file. This is useful when the input does not use quotes consistently.

reference

column "Str" {

property "Definition" "Letters starting with A"

property "Reference" "The Alphabet"

}

Adds the system property Reference for the named column. Available starting with Spectre 7.1(18).
required-dimension column "Product Status" type="string" required-dimension="Status"

Sets a column to behave like classic Info Fields. In ProDiver, this field is available as a column that you can add to other columns displayed in a dive. In this example, Product Status becomes an Info Field on Status in the client console. In other words, Product Status only displays if Status is in the dive window.

See Defining Info Relationships in Build Code Samples.

sort-by

column "Symbol" type="string" sort-by="Number"
column "Number" type="integer"

 

column "Priority Sort" type="integer"
column "Priority" type="string" sort-by="Priority Sort"

Specifies an alternate sort. When specifying columns, use sort-by to order them based on another column's value.

sort-natural

column "ICD-10-code" type="string" sort-natural=false

add "diagnosis" `value("Diagnosis Code")+" -- "+value("Diagnosis Desc")` sort-natural=false

Disables natural string sorting in text-input. The default is true—Spectre interprets value of digits as decimal numbers and sorts according to the numeric values in the string.

TIP: This feature only makes sense together with sort-by if the sort-by column is also a string column.

suggested-dimension column "Product Class" type="string" suggested-dimension=false

Allows you to promote (true) or demote (false) a column.

summary-type

column "Quantity In Stock" type="integer" summary-type="sum"

column "OrderNum" type="integer" summary-type="max"

column "CustomerNum" type="integer" summary-type="info"

include-other-columns type="integer" summary-type="sum"

Sets the column's summary type explicitly. Any summary function that takes a single argument (except for count and percentile) are allowed. See Summary Functions.

Most common are sum, info, any, first, and last. The default for numeric data is sum; use kahan_sum if concerned about precision; use const for a column with only one value. The rest of the functions (for example, min, average, median) are better done as a separate calc (that is, not the default one).

text-input

build {

text-input "/data/sales.txt"

output "/cbases/sales.cbase"

}

 

build {

text-input "/months/jan.txt" "/months/feb.txt" "/months/mar.txt"

...

}

 

build {

text-input "/data/batch_item/2013*_HMA.dat" {

column "Batch Date" type="date" format="YYYY-MM-DD" input-format="YYYY-M-D"

column "Journal Code" type="string"

column "Batch Number" type="string"

...

}

output "/publish/cbases/HMA_batch_item.cbase"

}

 

build {

text-input "/data/dairy.csv" delimiter=","

output "/cbases/dairycsv.cbase"

}

 

build {

text-input "/data/months-data.zip"

output "/cbases/months.cbase"

}

Specifies one or more text files as a data source. Also note:
  • Wildcard characters ("*" and "?") are supported.
  • Zip files are supported, and the files within are treated as multiple files.
  • Windows has a limit of 16,384 files. On Linux, the limit is set by an administrator.
  • A build block can include both text-input and cbase-input tags.
  • Columns from all inputs go into the cBase. Columns in secondary inputs need to be explicitly declared to be included. If an input does not have a column, the cBase column is created with nulls. If nulls are not appropriate, an add operation can be used under the input to specify the missing values.
  • Operations outside the input block operate on the union of the inputs for the build.
  • If multiple text files are listed using a single text-input tag, there is an option to parse the files in parallel when building the cBase. (See Spectre build Options and Build cBase Process Node.)
  • When there is no child block after text-input, all columns are used and their types are inferred. Use of a child block allows you to limit columns and apply build operations. See the table for Build Operations Tags under General Build Tags.

TIP: The input files can use a wildcard character in the filename, and as of 7.1(13), in the directory name as well.

time-zone

time-zone="America/New_York"

time-zone="America/Los_Angeles"

Indicates a default time zone to associate with a datetime column when displaying the data. The string should be an ICU time zone name. Use with format or input-format tags.

If unspecified, GMT is assumed.

For a list of ICU time zone names, see ICU Time Zones.

trim

text-input "/data/sales.txt" trim=true {

...

}

Removes any leading or trailing spaces from the data. This is the default behavior for Spectre. Set trim=false to disable the trimming of white space.
tunnel-input tunnel-input "todays-data.tnl"

Specifies a Tunnel script as the source for the input data. The tunnel script is executed and Spectre treats the resulting table of data as if it were a text file.

As with other data inputs, tunnel-input takes parameter provisions (same as in dive-input blocks), column options (same as in text-input blocks), and the stale-after option (same as in odbc-input blocks).

type

column "Qty Ordered" type="integer"

column "Telephone Number" type="string"

column "Order Date" type="date"

column "Ship Time" type = "datetime"

column "Times" type="datetime" input-format="YYYY/MM/DD hh:mm:ss {zone:short}"

column "Margin" type="fixed100"

column "Account Closed" type="boolean" format="Yes;No"

include-other-columns type="string"

column "Quarter" type="period" calendar="standard" period-type="year-quarter"

column "Fiscal Month" type="period" calendar="offset August" period-type="year-month"

column "Fiscal Year-Mo" type="period" calendar="gregorian+6 month" format="YYYY-MM (GYYYY-GMMM)"

Specifies the type of data in the column.

  • Valid values are boolean, date, datetime, double, fixed100, integer, period, and string. See Spectre Data Types.
  • If the type is "period", a calendar tag is required. Here in the final snippet (deprecated), the format tag says to use the standard Gregorian counterparts—for example: 2016-01 (2015-Jul).
  • If the type is datetime, a zone tag can be included as part of the format to indicate a specific time zone.
  • The type for date assumes that the input has a formatted date, not the binary date value.
unique column "Unique ID" type="string" unique=true Specify a column as having only unique values. Only available with text-input and columns with the string type. If every value in the column is not unique, a warning displays.
zone

column "Times" type="datetime" input-format="YYYY/MM/DD hh:mm:ss {zone:short}" time-zone="America/New_York"

 

column "Times" format="YYYY/MM/DD hh:mm:ss {zone:generic-location}"

For input, zone indicates that the time zone data is to be stored as GMT. Use with input-format tags.

NOTE: The source zone is determined by the content of the string itself.

For output, zone indicates how the datetime data column is formatted. Use with format tags.

Supported options:

Time Zone Format Example
{zone:short} 2020/01/05 14:17:09 EST
{zone:long} 2020/01/05 14:17:09 Eastern Standard Time
{zone:short-generic} 2020/01/05 14:17:09 ET
{zone:long-generic} 2020/01/05 14:17:09 Eastern Time
{zone:short-gmt} 2020/01/05 14:17:09 -0500
{zone:long-gmt} 2020/01/05 14:17:09 GMT-05:00
{zone:generic-location} 2020/01/05 14:17:09 New York Time

See also: