#
Summary Functions

Spectre summary functions operate on columns of data, rather than rows, to produce a single value. They can be used when doing a dive.

`any(expression : any) : any`

For example:

``any(value("Address"))``returns the value of the "Address" column in a randomly selected row

`argmax(expression : any, dimension : string) : any`

For example:

``argmax(calc("Population"), "State")`` returns California

`argmin(expression : any, dimension : string) : any`

For example:

``argmin(calc("Population"), "State")`` returns Wyoming

`average(expression : numeric) : double`

For example:

``average(value("Price"))` `returns the average for the "Price" column

**any**except that it does not get sanitized out of totals.

`const(expression : any) : any`

For example:

``const(value("Extract Date"))`` returns the value from one row in the "Extract Date" column

`count() : integer`

For example:

``count()`` returns the number of rows in the working set

``filter(count(), value("State") = "Ohio")` `returns the number of rows where "State" is "Ohio"

`first(expression : any) : any`

For example:

``first(value("Charge"))`` returns the first charge encountered for the dimension value in the data set

`geomean(expression : numeric) : double`

For example:

``geomean(value("Price"))`` returns the geometric mean value of the price column

`harmean(expression : numeric) : double`

For example:

In a table where column "X" has values 1, 2, 3, 4, 5, ``harmean(value("X"))`` returns 2.189781

`v1`,

`v2`, and so on.

`hyperloglog(precision : integer, expression1 : any, expression2 : any, ... : any) : intege`r

`precision`argument must be between 4 and 16. Higher precision means the results are closer to the actual dimcount, but the calculation uses more memory.

For example:

`calc "Customer Count" `hyperloglog(8, value("Customer Name"))`` returns an estimate of unique customers

`calc "Cust Product Count" `hyperloglog(8, value("Customer Name"),value("Product Name"))`` returns an estimate of the unique combinations of customer and product

`info(expression : any) : any`

**any**) and return the value. Then, evaluate the expression on all rows and log errors for those rows where the value is different than the picked value.

For example:

``info(value("Address"))`` returns the value found in the "Address" column if all values are the same. If all values are not the same, returns an error.

intercept(y-value: numeric, x-value: numeric) : double

For example:

``intercept(value("Y"), value("X"))`` returns 1.0 with the indicated Y and X values.

``intercept(value("Y"), value("X"))`` returns 0.0 with the indicated Y and X values.

``intercept(value("Y"), value("X"))`` returns -9.0 with the indicated Y and X values.

**sum**, but with countermeasures against floating point imprecision.

`kahan_sum(expression : double) : double`

**sum**millions of floating point values, the imprecision inherent in that calculation can become noticeable. If that happens, use the

**kahan_sum**function instead,which is slower than

**sum**but reduces the occurrence of this problem.

For example:

This is mostly the same as `sum()`, with very rare exceptions, but in a table where column "X" has values 1.001, 2.002, 3.003, 4.004, 5.005, ``kahan_sum(value("X"))`` returns 15.015.

`kurtosis(expression : numeric) : double`

For example:

In a table where column "X" has values 1, 2, 3, 4, 5, ``kurtosis(value("X"))`` returns -1.2

`last(expression : any) : any`

For example:

``last(value("Direction"))`` returns the last direction encountered for the dimension value in the data set

`max(value1 : any, value2 : any, ... : any) : any`

For example:

``max(100, 300, 500)``returns 500

`max(expression : any) : any`

For example:

``max(value("Price"))``returns the maximum value in the "Price" column

``max(value("Ship Date"))`` returns the maximum Ship Date for the current working set. If a dive is on a particular customer, this expression returns the last ship date for that customer only.

`median(expression : numeric) : double`

For example:

``median(value("Salary"))``returns the median value from the Salary column

`min(value1 : any, value2 : any, ... : any) : any`

For example:

``min(100, 300, 500)``returns 100

`min(expression : any) : any`

For example:

``min(value("Price"))``returns the minimum value in the price column

`mode(expression : any) : any`

For example:

Using this data:

Yields these results:

`mul(expression : any) : any`

For example:

In a table where column "X" has values 1, 2, 3, 4, 5, ``mul(value("X"))`` returns 120

`percentile(expression : numeric, percentile : numeric, predicate : boolean) : double`

For example:

``percentile(value("Units), 75)`` looks at all of the Units values in the working set, and returns the value at the 75th percentile. If you sorted the Units in the working set from highest to lowest, the expression returns the value three-fourths of the way down the list.

``percentile(value("Units"), 50)`` is equivalent to` `median(value("Units"))``

``filter(percentile(value("Units"), 75), value("Units") != 0)` `filters out zero units from the working set before calculating the 75th percentile

`skewness(expression : numeric) : double`

slope(y-value: numeric, x-value: numeric) : double

For example:

``slope(value("Y"), value("X")`` gives 0.428571429 with the indicated Y and X values.

``slope(value("Y"), value("X")`` gives 0.6 with the indicated Y and X values.

``slope(value("Y"), value("X")`` gives 1.0 with the indicated Y and X values.

`squash(expression : any, ... :any) : any`

For example:

``squash(max(squash(calc("X"), "D")))`` computes calc("X") for each value of "D" (as if you made a window on "D"), and then returns the max of those "X" values. This is the equivalent of DimMax[<dimension>,<summary>] in ProDiver.

`calc "Solid Doses Sedative" ```squash(sum(squash(filter(calc("Solid Quantity"),`

`value("BJA Class") = "Sedative"`

`and dimcount("Prescriber Name-DEA") >= 5`

`and dimcount("Pharmacy Name-DEA") >= 5),`

`"ID")))```` returns the number of solid dose sedatives that were prescribed to patients with more than 5 prescribers and more than 5 pharmacies.

When used in a cPlan, this calc says:

- From here, dive on "ID" (the inner squash(... "ID"))
- For each "ID", compute "Solid Quantity" (the calc(...)) for each record where "BJA Class" is "Sedative" and where that "ID" has at least 5 of the two dimcounts (the filter(...))
- Sum all of those computed "Solid Quantity" values (squash(sum(...)))

Suppose you have a dimension *ID*, and a dimension *Source*, and every *ID *should come from a single *Source*. You want to compute a single number that tells you if every *ID *has a single *Source *value associated with it. You could write a dimensionless dive that computes the number you want, using `squash()` to dive off each *ID *and check the `dimcount` of *Source*. For example:

dive { cplan { text-input "squashdata.txt" } window { column "Conflicted Count" `squash(sum(squash(if(dimcount("Source") > 1, 1, 0), "ID")))` } }

This gives the number of *ID*s which have more than one *Source*.

`squash_right(expression.: any, number : integer) : any`

For example:

``squash_right(sum(value("Rev")), 2)`` squashes records by eliminating values from the right. The "2" is the number of dimensions to roll up and eliminate from the right in the multitab. This value defaults to 1.

*Sample Standard Deviation*.

`stddev(expression : numeric) : double`

For example:

``stddev(value("Value"))`` returns the standard deviation for the sample values

*Population Standard Deviation*.

`stddevp(expression : numeric) : double`

For example:

``stddevp(value("Value"))`` returns the standard deviation for the population values

Available in Spectre 7.1(30) or later. Aggregates all values of a column that have a matching dimension. Returns one row of aggregated values for each unique dimension value. The delimiter and last delimiter parameters are optional and default to a comma. The last delimiter parameter defines the delimiter that separates the last two elements of the aggregated values. The include nulls parameter is optional and defaults to false. When the include nulls parameter is true, any null values are aggregated along with non-null values.

`string_agg(expression : string, delimiter : string, last delimiter : string, include nulls : boolean)`

For example:

`column "Aggregated" `string_agg(value("Child"), ",", " and ")`` using this data:

yields these results in the new column *Aggregated*.

`sum(expression : numeric) : numeric`

For example:

``filter(sum(value("Revenue")), value("State") = "Ohio")``returns the sum of "Revenue" where "State" is "Ohio"

``sum(value("Revenue"))`` returns the total for the Revenue column

``sum(value("Price") * value("Units"))`` returns the total of the calculation on each row of the data at run-time

`ucount(expression : any) : integer`

For example:

``ucount(month(value("Order Date")))``returns the number of months found in the "Order Date" column

``ucount(value("Customer"))` `is equivalent to ``dimcount("Customer")``

``ucount(period("gregorian month", value("Sold Date")))`` returns the number of months represented by dates in the "Sold Date" column

`weighted_median(expression: numeric, weight: numeric) : double`

For example:

`column "Weighted Median" `weighted_median(value("Rent"), value("Number of Units"))`` using this data:

yields these results by the *Bedroom Type*:

`weighted_percentile(expression: numeric, weight: numeric, percentile : numeric) : double`

For example:

`column "Weighted Percentile" `weighted_percentile(value("Rent"), value("Number of Units"), 80)`` returns the following using the weighted median example:

See also: