Frequently Asked Questions
When upgrading, many customers have questions about equivalent functionality and the behavior of the new software. Here are some FAQ's.
Security
A: In Solution 6.x's default-allow approach, you can set ACLs on a directory or a file. In Platform 7.x where the approach is default-deny, Access Control can only be set down to the directory level. If your cBases require different access settings, place the cBases in different directories. See DiveLine Security: 6.x versus 7.x.
A: You might be missing a step on your project access control—projects also need to have access files defined in Workbench. By default, new projects are set to "no access" for non-administrator users.
A: Tunnels are part of classic model technology, so model access rules are required to have an impact on the data coming out. For example:
model-access {
limit-dimension-by-property {
dimension "Company State"
property "Company"
}
limit-dimension-by-property {
dimension "Sales Person"
property "Sales Rep"
}
}
A: You can assign groups to projects permanently via workbench. You can assign users to groups programmatically via dicfg.
So create a group for each project (or set of projects), then use dicfg to put users into those groups.
A: The DiveLine service preserves users data associated with logins and password changes across restarts. Nothing gets reset. The format of the file used to track last-changed date for passwords has not changed between 6.4 and 7.0.
A: You cannot disallow navigation to a file that the user can legitimately open by way of access controls. The access controls are not client program-specific.
However, you can hide a folder with content that the user has access to, as long as the folder the user has access to is a sub-folder of the hidden folder.
For example, take this folder-structure
/cbases
/cbases/sales
/cbases/inventory
/cplans
If the user has no access on the /cbases folder, but read access on the /cbases/sales and /cbases/inventory and /cplans folders, they will not be able to see the cbases folder in the file-open dialog, but they will be able to open any cPlan in the /cplans folder that links to cBases in the sub-folders directly.
Though it is correct that this limit can’t be different between ProDiver and DivePort, you can define access so that users see only the things they are supposed to interact with (which in the above example would be the cplans folder).
This is an extremely useful construct to limit users from seeing anything except what they really need to open, but still gives them access to the files that they can’t see.
Another useful approach may be to use a Home Project which has a Project alias to just the published markers in some other project. Using the Open dialog the user would not be able to navigate to the cBase folder—unless they have permission and know how to change projects. If the issue is users getting overwhelmed by choice, this approach is good. Some people also use DiveBooks for a similar reason.
A: "Allow access if limit/delete column is not found in model" is effectively turned on when doing access control on both cBases and Models in Projects. Missing columns are ignored.
A: There is not a simple way to programmatically set ACLs/Access Control in 7.x Projects.
While you cannot programmatically set access rules, you can programmatically alter access control as it gets applied, by modifying Property assignments.
The idea is:
- Set access rules once, using Properties
- Modify user/group property values assignments as needed day-to-day.
This second step can be done programmatically. When you define the User Property in Workbench, you can choose to have the value assignments be specified in an external file. This external file is tab-delimited text, and can be generated by Integrator.
This is the approach to use in automation of access control in 7.x.
A: No, the current access control implementation cannot do that.
If the restricted users are only using DivePort and are unable to edit portlets, then you can get a less secure version of this kind of control. You can implement the masking in a cPlan (using a replace operation to substitute null values in a column on certain rows) and then make sure that all of the portlets go through that cPlan. This is less secure because other products such as ProDiver would be able to circumvent it, and it would be easier to accidentally point a portlet at the wrong cPlan.
A: Users only see what is allowed by the intersection of every access rule defined. For a cBase, each row-based access rule has a condition and a limit. For a given user, every rule with a condition that includes the user will be applied. All rules to be applied are applied one-by-one. The result is a logical AND of all the rules. If the user cannot see something, adding more rules will not help.
Revisit your cBase access rules to see how they can be loosened up and reconfigured to give appropriate access. On the condition side of things, maybe you can define groups and add limits for the groups. Disjointed groups can be used to apply different rules to different types of user. Just make sure a user is only in one group.
On the limits side, what properties are defined? Maybe you can simply limit based on the property values assigned to each user, or better, which group. This would be done under Server Settings. Note you can assign <All Values> on a property to reduce the number of rules required.
TIP: Be sure you gave the user access to the project where the cBase resides, as well as read permissions for directories.
A: Access Denied means the user cannot see the cBase. Check read permissions for the directory where the cBase is located, and any higher level directories. Confirm that project access has been allowed.
Tabular Window Missing Data indicates that the user does have access to the cBase, but limits are being applied that restricts returning any data.
cBases versus Models
A: Spectre sorting is not the same as classic Model sorting. See cBase versus Model Sorting.
A: Spectre Build (cBases) considers blank values in numeric source data columns as nulls; they appear as blank cells in ProDiver. A zero in the source data is a zero in the resulting cBase. On the other hand, the model engine (Integrator model build) considers blank values in numeric source data columns as zero, unless the column is declared as nullable in its Integrator script. If the column is declared as nullable, Integrator builds a null into the resulting model for rows where the source data for the column is blank; the null values appear as blank cells in ProDiver. See also: Null Values in Spectre.
A: There are no development plans to support a stand-alone DI-Diver to read cBases. cBases are designed to be held in memory on enterprise-grade hardware, not loaded on laptops.
A: To make migration from the Model engine easier, the Model engine's interpretation of null in mathematical contexts as zero was carried forward to the Spectre engine. Therefore nulls are treated as zeroes in mathematical contexts in Spectre calcs (for example: null + 4 + 5 = 9). You can explicitly test for null using is_null() or notnull() functions. See also: Null Values in Spectre.
A: Infos do not become automatically useless when using cBases. Sometimes you really do have the info data relationship in your data, where a piece of data only makes sense if some other dimension is present in the window. We still support Infos added to the list of selected columns in Diver appearing and disappearing depending on what dimensions are present in the window. That was the original idea behind Infos; they perhaps have gotten a reputation for being a way to evade the core-dimension limit by promoting them to dynamic dimensions, but they are more useful than that.
A: The Spectre coalesce() function does this. For example, `coalesce( <something that may produce null>, 0)` will replace null values with 0. Then you can format a potentially null summary value using the format ##.00 in the cPlan calc.
In the models world we first built models to a staging directory with "remove_on_fail" set to true. If the build was successful then models were moved from the staging directory to a models directory. This ensured that the model build could not interfere with users accessing the model. Also, if the build failed, staging was empty and we never copied over a failed build.
In the cBase world it seems Spectre takes care of these concerns. Is it still wise to build to staging? Or is it counterproductive?
A: Spectre addresses the primary technical concerns and then some:
-
A running build will leave the old cBase in place while it runs.
-
A failed Spectre build still leaves the old cBase in place.
-
A successful Spectre build will finish on the side and nicely move the cBase into place. Diver will even keep diving in the new numbers.
-
You can even write a Spectre build that reads a cBase and overwrites the same cBase, and get all the same benefits.
Intuitively, staging is unnecessary. However, there are still scenarios where using a staging area makes sense:
-
When building multiple cBases as part of a build process, and someone for example, accessing a cPlan that combines today's sales with yesterday's inventory would see misleading numbers.
-
If I need new cBases and new user property files to be put live at the same moment.
-
If I want to compare the new and old cBases in some way—something like "don't make the new one live if it has fewer records than the old one."
So there are still reasons why using a staging area can be helpful with maintaining data integrity.
NOTE: Measure Factory builds in staging—it builds all of the cBases and cPlans into a staging folder in /temp then moves them all into production at once.
A: True, ProDiver is a 32-bit client application. However, DiveLine for version 7 is 64-bit; and it handles opening all cBases on the server, including those which are larger than 2 GB. Because the DiveLine is 64-bit, the 2 GB file size limit does not apply to it.
A: Column and calc names are NOT case sensitive. So in general, yes, you can refer to a "class" column with the string "CLASS".
Spectre Builds and Performance
A: No, processing the input does not require all inputs to be loaded into memory. However, there is an opt-in Spectre build option to allow use of multiple cores for the build. If you turn on this option, then the resulting pieces of the cBase have to be joined back together, and that involves mapping it to memory. While there are some optimizations, the worst case IF you use this opt-in feature is that you need about twice the size of the final cBase in memory. See Build cBase Process Node and Spectre Build Temporary Files.
A: Spectre builds an "empty" cBase if the source file contains only column headers; the cBase has "0 rows" when built.
A: The first thing to figure out is: Why is the console dimcounts dive taking so long? Typical answers are: (1) too many dimensions being counted, and (2) inadequate hardware.
You can turn on log_spectre_requests="TRUE" in your atlcfg.cfg, open the cBase in ProDiver, and then pull the Dive out of the DiveLine session log. You can request timing information from the Spectre CLI: spectre dive extracted.dive --timing which might give you some insight. Often there are far more dimensions in the console than are really needed. If you mark columns in the Build script as suggested-dimension=false then they do not appear in the console. Users can then go to Edit > Add Dimensions in ProDiver to put them in the console (or you can do so using DiveMaster in Workbench) as dynamic dimensions. There is no performance difference (unlike with Models), but the console does not count their values. See About Overloading the ProDiver Console.
If you are likely to already have the console dimcounts Dive in your cache, you could run spectre cache --refresh (or --refresh-with_predicate) to spend time nightly re-running queries after the cBases are updated. Typically, you would want to first use spectre cache with one or more of --max-age, --max-idle, or --max-size to throw out entries that should not get re-run. Also see Spectre Cache Refresh Process Node.
A: The cPlan gives you more flexibility, but sometimes calculations are best done in the Build. See Calculations: Build versus cPlan.
A: All columns in a cBase can be used in a summary context—that is, as non-dimension columns in a Spectre Dive (that is, in a dive file). All columns which are not doubles can be dimensions in a Spectre dive.
In the ProDiver context, ProDiver treats all numeric columns as summaries, and in a DivePlan, all are selected by default. Also, ProDiver treats all string, date, and period columns as core dimensions, minus those which are marked as having a "required-dimension" or are explicitly tagged with suggested-dimension="false". However, even those can be promoted with the Edit > Add Dimensions dialog, or with DiveMaster, to a dimension.
A: If you run into difficulties with ProDiver using Spectre data sources, it can be helpful to enable Spectre request logging in the atlcfg.cfg file with the following statement:
log_spectre_requests="TRUE",
There is no need to restart the DiveLine server after adding the statement. The next time you start a ProDiver session, the session log contains the Spectre request. You can then copy that request out of the session log into its own dive file in the Project, and run it from the command-line with spectre dive to see if the problem is a Spectre engine issue or something else. Sometimes you can gain some insight by just reviewing the Dive request.
A: It appears that column-override is being used as a way to do datatype casting, which is not its purpose. The correct approach is to include the casting in the SQL query itself and have the database do the cast, or to use a replace tag in the Build script afterward, with an expression that does the desired conversion.
The purpose of column-override is to be able to read unusual database-specific types as something vaguely useful. For example, asking PostGIS for a lat/long combination without an override fails, but column-overriding it as a string might produce something readable.
Here is an example of doing a cast from a string DB column to an integer in the query:
SELECT CAST(some_column AS int) FROM table;
Here is an example of converting in the Build script:
replace "some_column" `parse_integer(value("some_column"))`
A: The input phase of cBase builds, that is reading in and parsing of the text, are currently single threaded. If there are calculations to run after that, then those might run in parallel, using up to 8 cores at once.
A: There is no performance impact on the build time.
There is no performance impact on run-time queries.
However, when counting the unique values in Dimensions for display in the ProDiver Console, more dimensions mean more columns to DimCount to prepare that display. If you suddenly have 200 Dimensions rather than a few dozen, there could be a noticeable delay the first time a cBase is open after the data source has been rebuilt.
Also note, when migrating existing markers: if you have report markers that display Info columns in reports, then all of the Info columns in the underlying tabular need to work as info columns in the new cBase.
A: Not currently, no. The files would need to be transformed to CSV or tab-delimited first. For instance, see
https://stackoverflow.com/questions/39419975/how-to-copy-and-convert-parquet-files-to-csv
A: There is no limit in Spectre itself—it is whatever the operating system and environment allows. On many Linux installations the default is a maximum of 1024 file handles open at a time, and that includes dynamic libraries and output files. On such systems, the limit would be around 1000 input files. You can check the limit with ulimit -n. On many Linux distributions it is set in /etc/security/limits.conf. The sysadmin for the system can easily increase it.
A: A cBase is basically one big table holding whatever is in the input. Data is organized in columns, not as dimensions or summaries. There are just detail records. When you use the Model Builder and specify dimensions and summaries, you are telling the builder how to squash the data. With Spectre, you can specify squashing while building by using dive-input. Alternatively, you can limit the input data by using Integrator to squash or filter the source, before it is fed to Spectre. Reducing the size of the inputs reduces the size of the resulting cBase.
A: A Spectre builds succeeds whenever possible. Any data which fails to parse generates Unknown values and error messages which cause the node to turn yellow in Production rather than causing the entire build process to fail. Examination of logs for ERROR messages should be part of normal processing.
cBases
A: Yes, new customers should only use cBases. See What is a cBase?
A: Text dimension values in a cBase can contain up to 1 GiB of text. But be forewarned, not all DI clients can handle that much text.
A: Building a single cBase is preferable to maintaining the data in separate cBases to parallel what was done with models. In fact, when you use the consolidate tag in a cPlan, the Spectre engine builds a single cBase out of two cBases when the cPlan is opened. (It caches the resulting cBase and does not rebuild it next time if the source cBases have not changed.) This might be convenient, but work that could be done ahead of time is now done when the user is waiting to see the data.
A Spectre Build script can easily concatenate two cBases. Here is an example, along with defining a Model column in Actor.build:
build {
cbase-input "Actor_45.cbase" {
add "Model" `"Actor_45"`
}
cbase-input "Actor_Not_45.cbase" {
add "Model" `"Actor_Not_45"`
}
output "Actor.cbase"
}
It is also a best practice to define a cPlan for the cBase—it is the best place to define calculations once a cBase is built. For now, it can be a simple as Actor.cplan:
cplan {
input "Actor.cbase"
}
A cPlan can be used to present a view of only one set of records, which is similar to having them in separate models. You could then delete the input cBases if you wanted to save a little space. A filtering cPlan could be Actor_45.cplan:
cplan {
cplan-input "Actor.cplan"
filter `value("Model")="Actor_45"`
}
You can recreate the Marker or make manual edits to the merge section: replace the Model (mdl) reference with a cPlan reference. As an example for Actor.cplan. replace
merge_dbs
with
merge_dbs={ { dbname="Actor.cplan", diveline_dbname="/cplans/Actor.cplan" } }, merge_dbs
This adds a second merge_dbs entry, before the previous one, but DiveLine ignores the duplicate.
NOTE: If you have a vast number of Markers/DivePlans, check with Technical Support about available tools that can help with the task of converting Markers.
A: Sort of. cPlans do not get stored inside Markers, but a DivePlan is created there. If you edit a column definition in ProDiver, it is saved into the DivePlan in the Marker, even if it uses a Spectre expression, and does not affect the cPlan. If you add new columns in ProDiver, they also go into the Marker's DivePlan.
A: There is a certain amount of overhead for each column of each input file, around 1 MB per column per file, independent of the data volume. If you are concerned about memory usage when working with thousands of files, you might try an Integrator script and the concat process to concatenate the files into a single file to feed to Spectre.
A: There are limitations to floating point operations because of the magnitude of the values involved. There is a way around this. If your column is naturally a double and addition must preserve this, there is the `kahan_sum()` function to sum them up in a way that actively suppresses the buildup of the inaccuracies that you might be seeing. This function is slightly more expensive than a normal `sum()`. So for example, you would have `kahan_sum(value("My Column"))` as your expression.
Another way is to treat the underlying column values as fixed100 values: `sum(fixed100(value("My Column")))`
TIP: Do not use double when the data is naturally fixed100—it avoids buildup of inaccuracies and is faster too.
A: The operating system locale is used with the environment variables LC_COLLATE and LANG.
However, on Linux, the initscript uses /sbin/runuser if it exists, and this strips out these variables. You can work-around that by putting LANG=... in the PREPCMDS variable in the script or modify the script to not use runuser. The reasoning for runuser was that on some earlier RedHat versions, that worked whereas su did not due to SELinux enforcement.
NOTE: You need to restart the di-service after changing the default locale, since the locale setting is inherited when a process is created and not looked up later. See Command Line Utility di-service.
A: You can check the locale by opening a cBase in Workbench with the cBase Viewer, then selecting the Diagnostics section. Look in the log for "Using system locale...". See also cBase versus Model Sorting.
A: No, merging cBases with different locales is not allowed, which can be a problem when working with cBases built on a local laptop and on the customer server.
A: Yes. The cBases store strings as Unicode characters, and if you have Unicode builds of ProDiver and DiveLine, and a font that covers all the desired characters, then your reporting can be in multiple languages.
Models
A: When building models and defining dimensions, the maximum field length for a single piece of data is 32,767 characters. When defining info fields, there is no limit to the number of info fields associated with a dimension. However, the total size of all info fields based on a single dimension should not exceed 32,767 characters, otherwise an error occurs.
A: You can continue to use a DivePlan referencing a cBase/cPlan which also contains a diveplan-level lookup, and it will act as with Models—all values come in as text. However, a better practice is to use cPlans only. Consider this code snippet:
cplan {
cbase-input "something.cbase"
lookup {
text-input "lookup.txt"
key "Mine" "Theirs"
}
}
This does not specify specific columns in the text-input, so all columns come in. Also it does not specify which columns to lookup, so all columns are looked up. Such lookups work with changing columns in the lookup.txt without having to edit the supporting cPlan.
If you need to specify some of the text columns in the text-input (for example to name types) you can do so and then use include-other-columns to bring the rest of the text file in.
You may also be interested in the keys-are-required-columns tag, if you want the looked-up columns to be viewed as Infos in ProDiver.
A: The file limit is the same in 7.0 and 7.1. You are encouraged to move large data sets over to Spectre—DI does not intend to push the limits of the model engine.
Functions and Calcs
A: Yes. Normally, a ProDiver column named "Units" would be defined as Total[Units] which is converted into `sum(value("Units"))` internally. But you can change the definition of the ProDiver column to something else. For instance you could change it to `average(value("Units"))` and it would take the average instead.
A: Yes, yes and yes, as long as the data is coming from a cBase, you can enter the classic ProDiver "dimcount[Foo,Bar>100]" syntax and ProDiver and the DiveLine server converts it internally to the Spectre expression `dimcount("Foo", calc("Bar") > 100)` and you get Spectre performance.
A: The only case where you should absolutely use a Spectre expression is when a ProDiver expression cannot do it. If there is an equivalent ProDiver expression, then it does not matter which expression you use.
For instance, you can enter Total[Units] or `sum(value("Units"))` and it is all the same. On the other hand, there is no ProDiver equivalent to `sum(value("Units") * value("Cost"))` to add up the products of Units and Cost for each row. Note that this calc is different from ProDiver's "Units*Cost" which multiplies the Units sum by the Cost sum.
A: Yes, you can use filtered dimcounts (and other Spectre expressions) both in cPlans and in ProDiver when working with Spectre data. In ProDiver, you can make a column and enter a Spectre expression as the definition of the column, and that expression can have, for example, filtered dimcounts in it.
A: If by "built-in stuff" you are referring to how you can enter an expression into ProDiver when adding a column, it is basically the same as putting it in the cPlan. The advantage of putting it in the cPlan is that you can refer to that cPlan in multiple places and see the same definition of the calc.
A: The calculation is going to be run over all of the detail records in the cBase with the same Dimension values. For example, if you dive on Sales Region, for the "West" row, the calc will be run over all records where Sales Region is "West". Let's imagine there are 10 such records.
Because of that, you cannot use a calc such as `value("Plan Units")` because there are likely many different Plan Units values across the records—which of the 10 records should we use? Something in the calc has to summarize these multiple records down to a single value. The simplest way would be to add them up: `sum(value("Plan Units"))`.
Sometimes you only want one particular value, in which case the summary functions any(), first(), last(), or info() would be used instead of sum(). Sometimes you are certain that there will only be one detail record, or that the values will be the same on all the records in the set. An example might be an arrival datetime and an order datetime on a set of records where you want to calculate the time difference. Your Spectre calc could be as follows:
`elapsed_time_in_seconds( info(value("ED Arrival Date Time")), info(value("ED Arrival Order Date Time")) )`
Note that info() means "if all values in the set are the same return that, else return <unknown> (appears as blank)".
A: A calc is a declaration as to how to calculate something. It is not an instruction to perform the calculation. Suppose you have calc "mycalc" in a script.
Once declared, you can use the calculation in a column verbatim (with `column "mycalc"` in a window), as part of another calc with `calc("mycalc")` or not use it at all.
Calcs do not allocate memory for columns nor put numbers in columns. However, operations like add and column do allocate memory and put numbers in columns and they can use the declared calcs to do so.
The add operation can be used in a dive window or in a cPlan input operation. Note that cPlans, like the calcs in them, are more abstract and are more declarations than requests to take immediate action. A cPlan does not have a table to talk about, so an add there has nothing to add to—while dive windows and cPlan inputs do have tables.
To calculate the average order count by customer id…
In ProDiver:
DimAverage[Customer ID, Total[Order Count]]
In Spectre:
`dimaverage(calc("Order Count"), "Customer ID")`
A: Spectre cannot tell the difference between a literal string and a column name. With the summary expression as the first argument, followed by the (arbitrary number of) dimensions, Spectre can unambiguously pick out which is which, even if the summary is a literal string.
A: The same calculation is run for the Totals position as for any of the other rows, but over ALL of the underlying detail data for the window, rather than just over the underlying detail data for a particular row in the window. This is both sensible and has been explicitly requested.
What you are using under the covers when you request Info[] is the Spectre info() function. This function returns a value if the "Info relationship" holds, and null if not. The "Info relationship" holds if all of the values being considered are the same. So if all the records have the same value, then you will see that value in the Total row, and a null if not.
A: The "iso 8601" calendar starts weeks on Monday, but the other calendar types start weeks on Sunday. If you have a time-series in a cPlan that does not specify a calendar, it uses the default calendar, "standard", and treats Sunday as the first day of the week. To change this behavior, you could set the default-calendar for the cPlan to "iso 8601". Then the time-series will use that calendar instead, and treat Monday as the first day in the week.
Example: I have a string dimension called "Thing". It contains the strings "Car", "Banana" and "" (null).
If I want to create a Boolean called "hasWheels", how can I make sure that the nulls stay null, instead of counting towards the 'false' category?
This does not seem to work:
add "hasWheels" `if(is_null(value("Thing")),null,if(value("Thing")="Car",true,false))`
Is there a calculation that can give me nulls in a Boolean calc?
A: Consider this setup and Dive calculation:
TIP: The extra column X allows you to work around a feature of the text parser. The parser cannot handle a single string-typed input column with a null because it declares the entire row empty, warns about it and ignores it.
With this input, the results are as desired:
A: No. When Spectre prints to a terminal, it uses "human-readable" output where it aligns the columns. When the output is piped to a file (e.g. spectre dive foo.dive > my-output.txt) it uses tab-delimited output, but the tab delimiter is hard-coded and there is no option to change it.
However, a simple Integrator script with the Dive as an input and a pipe-delimited fileout as the output would accomplish the same thing.
A: Squash is the only function that uses a double-call pattern. With squash(), a calculation is done on a new table with different dimensions than the current one. So you need to describe a two-level summary: First summarize the detail table (the records in the cBase) to produce the new table, then summarize the values in that table to put them into the current window.
For example, squash(average(squash(sum(value("Cost")), "Year-Month")) computes the sum of Cost for each Year-Month, and then averages those sums together. Note that the inner squash takes extra arguments to set dimensions, and the outer squash does not.
There are other functions that work with two expressions, but because one expression is not "inside" the other, there is no need for this double-call pattern. For instance, filter takes a summary expression and a predicate expression: filter(sum(value("Cost")), value("Product") = "Milk") computes the sum of Cost for all records where Product is Milk. The predicate does not produce values for the summary, it just affects the records that are used.