How to Design a Mondrian Schema
- What is a schema?
- Schema files
- Annotation
- Logical model
- Cube
- Measures
- Dimensions, Hierarchies, Levels
- Mapping dimensions and hierarchies onto tables
- The 'All' member
- Time dimensions
- Order and Display of Levels
- Multiple hierarchies
- Degenerate dimensions
- Inline tables
- Member properties and formatters
- Approximate level cardinality
- Default Measure Attribute
- Functional Dependency Optimizations
- Table Hints
- Star and snowflake schemas
- Shared dimensions
- Join optimization
- Advanced logical constructs
- Virtual cubes
- Parent-child hierarchies
- Tuning parent-child
hierarchies
- Closure tables
- Populating closure tables
- Member properties
- Calculated members
- Named sets
- Plug-ins
- User-defined function
- Member reader
- Cell reader
- Cell formatter
- Member formatter
- Property formatter
- Schema processor
- Data source change listener
- Dynamic datasource xmla servlet
- Internationalization
- Aggregate tables
- Access-control
- Defining a role
- Rollup policy
- Union roles
- Setting a connection's role
- Appendix A: XML elements
1. What is a schema?
A schema defines a multi-dimensional database. It contains a logical model, consisting of cubes, hierarchies, and members, and a mapping of this model onto a physical model.
The logical model consists of the constructs used to write queries in MDX language: cubes, dimensions, hierarchies, levels, and members.
The physical model is the source of the data which is presented through the logical model. It is typically a star schema, which is a set of tables in a relational database; later, we shall see examples of other kinds of mappings.
2. Schema files
Mondrian schemas are represented in an XML file. An example schema, containing almost all of the
constructs we discuss here, is supplied as demo/FoodMart.xml in the Mondrian distribution.
The dataset to populate this schema is also in the distribution.
Currently, the only way to create a schema is to edit a schema XML file in a text editor. The XML
syntax is not too complicated, so this is not as difficult as it sounds, particularly if you use the FoodMart
schema as a guiding example.
The structure of the XML document is as follows:
aggElements
aggElements
relation
relation ::=
relation
aggElement ::=
NOTE: The order of XML elements is important. For example, the
<UserDefinedFunction>
element has to occur inside the <Schema>
element after all collections of <Cube> ,
<VirtualCube> ,
<NamedSet>
and <Role> elements. If you include it before the first
<Cube> element,
the rest of the schema will be ignored.
The content of each XML element is described in
Appendix A and in the XML schema.
2.1 Annotation
The major element types (schema, cube, virtual cube, shared dimension,
dimension, hierarchy, level, measure, calculated member) support annotations. An
annotation is a way of associating a user-defined property with a metadata
element, and in particular, allows tools to add metadata without extending the
official Mondrian schema.
Create an
<Annotations> element as a child of the element you wish to
annotate (generally it is the first child element, but check the schema
definition for details), then include a number of
<Annotation> elements.
<Annotation>
elements' names must be unique within their element. If you are adding
annotations to support a particular tool that you maintain, choose annotation
names carefully, to ensure that they do not clash with annotations used by other
tools.
The following example shows "Author" and "Date" annotations attached to a <Schema> object.
...
3. Logical model
The most important components of a schema are cubes, measures, and dimensions:
- A cube is a collection of dimensions and measures in a particular subject area.
- A measure is a quantity that you are interested in measuring, for example, unit sales of a
product, or cost price of inventory items.
- A dimension is an attribute, or set of attributes, by which you can divide measures into
sub-categories. For example, you might wish to break down product sales by their color, the gender of the
customer, and the store in which the product was sold; color, gender, and store are all dimensions.
Let's look at the XML definition of a simple schema.
< Table name="sales_fact_1997"/>
< Dimension name="Gender" foreignKey="customer_id">
< Hierarchy hasAll="true" allMemberName="All Genders" primaryKey="customer_id">
< Level name="Gender" column="gender" uniqueMembers="true"/>
< Hierarchy hasAll="false" primaryKey="time_id">
< Table name="time_by_day"/>
< Level name="Year" column="the_year" type="Numeric" uniqueMembers="true"/>
< Level name="Quarter" column="quarter" uniqueMembers="false"/>
< Level name="Month" column="month_of_year" type="Numeric" uniqueMembers="false"/>
< Measure name="Unit Sales" column="unit_sales" aggregator="sum" formatString="#,###"/>
< Measure name="Store Sales" column="store_sales" aggregator="sum" formatString="#,###.##"/>
< Measure name="Store Cost" column="store_cost" aggregator="sum" formatString="#,###.00"/>
< CalculatedMember name="Profit" dimension="Measures" formula="[Measures].[Store Sales] - [Measures].[Store Cost]">
This schema contains a single cube, called "Sales". The Sales cube has two dimensions,
"Time", and "Gender", and four measures, "Unit Sales", "Store Sales", "Store Cost", and "Profit".
We can write an MDX query on this schema:
SELECT {[Measures].[Unit Sales], [Measures].[Store Sales]} ON COLUMNS,
{descendants([Time].[1997].[Q1])} ON ROWS
FROM [Sales]
WHERE [Gender].[F]
This query refers to the Sales cube ([Sales] ), each of the dimensions
[Measures] , [Time] , [Gender ], and various members
of those dimensions. The results are as follows:
[Time] |
[Measures].[Unit Sales] |
[Measures].[Store Sales] |
[1997].[Q1] |
0 |
0 |
[1997].[Q1].[Jan] |
0 |
0 |
[1997].[Q1].[Feb] |
0 |
0 |
[1997].[Q1].[Mar] |
0 |
0 |
Now let's look at the schema definition in more detail.
3.1 Cube
A cube (see <Cube> ) is a named collection of measures
and dimensions. The one thing the measures and dimensions have in common is the fact table, here
"sales_fact_1997" . As we shall see, the fact table holds the columns
from which measures are calculated, and contains references to the tables which hold the dimensions.
< Table name="sales_fact_1997"/>
...
The fact table is defined using the <Table> element. If the fact table is not in
the default schema, you can provide an explicit schema using the "schema" attribute, for example
< Table schema=" dmart" name="sales_fact_1997"/>
You can also use the <View>
construct to build more complicated SQL statements. The
<Join> construct is not supported for fact tables.
3.2 Measures
The Sales cube defines several measures, including "Unit Sales" and "Store Sales".
< Measure name="Unit Sales" column="unit_sales" aggregator="sum" datatype="Integer" formatString="#,###"/>
< Measure name="Store Sales" column="store_sales" aggregator="sum" datatype="Numeric" formatString="#,###.00"/>
Each measure (see <Measure> ) has a name, a column in the fact table, and an
aggregator . The aggregator is usually "sum", but "count", "min", "max", "avg", and
"distinct-count" are also allowed; "distinct-count" has some limitations if your cube contains a
parent-child hierarchy.
The optional datatype attribute specifies how cell values are represented in Mondrian's
cache, and how they are returned via XML for Analysis. The datatype attribute can have
values "String ", "Integer ", and "Numeric ".
The default is "Numeric ", except for "count " and
"distinct-count " measures, which are "Integer ".
An optional formatString attribute specifies how the value is to be printed.
Here, we have chosen to output unit sales with no decimal places (since it is an integer), and store sales
with two decimal places (since it is a currency value). The ',' and '.' symbols are locale-sensitive, so if
you were running in Italian, store sales might appear as "48.123,45". You can achieve even more wild effects
using advanced format strings.
A measure can have a caption attribute to be returned by the
Member.getCaption()
method instead of the name. Defining a specific caption does make sense if special letters (e.g. Σ or Π)
are to be displayed:
< Measure name="Sum X" column="sum_x" aggregator="sum" caption="Σ X"/>
Rather than coming from a column, a measure can use a cell reader,
or a measure can use a SQL expression to calculate its value. The measure "Promotion Sales" is
an example of this.
< Measure name="Promotion Sales" aggregator="sum" formatString="#,###.00">
(case when sales_fact_1997.promotion_id =
0 then 0 else sales_fact_1997.store_sales end)
In this case, sales are only included in the summation if they correspond to a promotion sale.
Arbitrary SQL expressions can be used, including subqueries. However, the underlying database
must be able to support that SQL expression in the context of an aggregate. Variations in syntax
between different databases is handled by specifying the dialect in the SQL tag.
In order to provide a specific formatting of the cell values, a measure can use a
cell formatter.
3.3 Dimensions, Hierarchies, Levels
Some more definitions:
- A member is a point within a dimension determined by a particular set of attribute
values. The gender hierarchy has the two members 'M' and 'F'. 'San Francisco', 'California' and
'USA' are all members of the store hierarchy.
- A hierarchy is a set of members organized into a structure for convenient analysis.
For example, the store hierarchy consists of the store name, city, state, and nation. The
hierarchy allows you to form intermediate sub-totals: the sub-total for a state is the sum of the
sub-totals of all of the cities in that state, each of which is the sum of the sub-totals of
the stores in that city.
- A level is a collection of members which have the
same distance from the root of the hierarchy.
- A dimension is a collection of hierarchies which discriminate on the same fact table attribute (say, the day that a sale occurred).
For reasons of uniformity, measures are treated as members of a special dimension, called 'Measures'.
An example
Let's look at a simple dimension.
< Dimension name="Gender" foreignKey="customer_id">
< Hierarchy hasAll="true" primaryKey="customer_id">
< Level name="Gender" column="gender" uniqueMembers="true"/>
This dimension consists of a single hierarchy, which consists of a single level called Gender .
(As we shall see later, there is also a special level called
[(All)] containing a grand total.)
The values for the dimension come from the gender column in the customer
table. The "gender" column contains two values, 'F' and 'M', so the Gender dimension contains the members
[Gender].[F] and [Gender].[M] .
For any given sale, the gender dimension is the gender of the customer who made that purchase. This is expressed by
joining from the fact table "sales_fact_1997.customer_id" to the dimension table "customer.customer_id".
3.3.1 Mapping dimensions and hierarchies onto tables
A dimension is joined to a cube by means of a pair of columns, one in the fact table, the other in the dimension table.
The <Dimension> element has a foreignKey attribute,
which is the name of a column in the fact table; the <Hierarchy> element has
a primaryKey attribute.
If the hierarchy has more than one table, you can disambiguate using the primaryKeyTable
attribute.
The column attribute defines the key of the level. It must be
the name of a column in the level's table. If the key is an expression, you can
instead use the <KeyExpression> element inside the Level. The following is
equivalent to the above example:
< Dimension name="Gender" foreignKey="customer_id">
< Hierarchy hasAll="true" primaryKey="customer_id">
< Level name="Gender" column="gender" uniqueMembers="true">
< SQL dialect="generic">customer.gender</ SQL>
Other attributes of <Level> , <Measure>
and <Property> have corresponding nested elements:
Parent element |
Attribute |
Equivalent nested element |
Description |
<Level> |
column |
<KeyExpression> |
Key of level. |
<Level> |
nameColumn |
<NameExpression> |
Expression which defines the name of members of this level. If
not specified, the level key is used. |
<Level> |
ordinalColumn |
<OrdinalExpression> |
Expression which defines the order of members. If not specified,
the level key is used. |
<Level> |
captionColumn |
<CaptionExpression> |
Expression which forms the caption of members. If not specified,
the level name is used. |
<Level> |
parentColumn |
<ParentExpression> |
Expression by which child members reference their parent member
in a parent-child hierarchy. Not specified in a regular hierarchy. |
<Measure> |
column |
<MeasureExpression> |
SQL expression to calculate the value of the measure (the
argument to the SQL aggregate function). |
<Property> |
column |
<PropertyExpression> |
SQL expression to calculate the value of the property. |
The uniqueMembers attribute is used to optimize SQL generation. If you know that the
values of a given level column in the dimension table are unique across all the other values in that column across
the parent levels, then set uniqueMembers="true" , otherwise, set to
"false" . For example, a time dimension like [Year].[Month]
will have uniqueMembers="false" at the Month level, as the same month appears in different
years. On the other hand, if you had a [Product Class].[Product Name] hierarchy, and you
were sure that [Product Name] was unique, then you can set
uniqueMembers="true" . If you are not sure, then always set
uniqueMembers="false" . At the top level, this will always be
uniqueMembers="true" , as there is no parent level.
The highCardinality attribute is used to notify Mondrian
there are undefined and very high number of elements for this dimension.
Acceptable values are true or false (last one
is default value). Actions performed over the whole set of dimension elements
cannot be performed when using highCardinality="true" .
3.3.2 The 'all' member
By default, every hierarchy contains a top level called '(All) ', which contains a single
member called '(All {hierarchyName}) '. This member is parent of all other members
of the hierarchy, and thus represents a grand total. It is also the default member of the hierarchy; that is, the member
which is used for calculating cell values when the hierarchy is not included on an axis or in the slicer. The
allMemberName and allLevelName attributes override the default
names of the all level and all member.
If the <Hierarchy> element has hasAll="false" , the 'all'
level is suppressed. The default member of that dimension will now be the first member of the first level; for example,
in a Time hierarchy, it will be the first year in the hierarchy. Changing the default member can be confusing, so you
should generally use hasAll="true" .
The <Hierarchy> element also has a defaultMember
attribute, to override the default member of the hierarchy:
< Dimension name="Time" type="TimeDimension" foreignKey="time_id">
< Hierarchy hasAll="false" primaryKey="time_id" defaultMember="[Time].[1997].[Q1].[1]"/>
...
3.3.3 Time dimensions
Time dimensions based on year/month/week/day are coded differently in the Mondrian schema
due to the MDX time related functions such as:
ParallelPeriod([level[, index[, member]]])
PeriodsToDate([level[, member]])
WTD([member])
MTD([member])
QTD([member])
YTD([member])
LastPeriod(index[, member])
Time dimensions have type="TimeDimension" . The role of a level in a time dimension is
indicated by the level's levelType attribute, whose allowable values are as follows:
levelType value |
Meaning |
TimeYears |
Level is a year |
TimeQuarters |
Level is a quarter |
TimeMonths |
Level is a month |
TimeWeeks |
Level is a week |
TimeDays |
Level represents days |
Here is an example of a time dimension:
< Hierarchy hasAll="true" allMemberName="All Periods" primaryKey="dateid">
< Table name="datehierarchy"/>
< Level name="Year" column="year" uniqueMembers="true" levelType="TimeYears" type="Numeric"/>
< Level name="Quarter" column="quarter" uniqueMembers="false" levelType="TimeQuarters"/>
< Level name="Month" column="month" uniqueMembers="false" ordinalColumn="month" nameColumn="month_name" levelType="TimeMonths" type="Numeric"/>
< Level name="Week" column="week_in_month" uniqueMembers="false" levelType="TimeWeeks"/>
< Level name="Day" column="day_in_month" uniqueMembers="false" ordinalColumn="day_in_month" nameColumn="day_name" levelType="TimeDays" type="Numeric"/>
3.3.4 Order and display of levels
Notice in the time hierarchy example above the ordinalColumn and
nameColumn attributes on the <Level> element. These
affect how levels are displayed in a result. The ordinalColumn attribute specifies a
column in the Hierarchy table that provides the order of the members in a given level, while the
nameColumn specifies a column that will be displayed.
For example, in the Month level above, the datehierarchy table has month (1 .. 12)
and month_name (January, February, ...) columns. The column value that will be used internally within MDX is the
month column, so valid member specifications will be of the form:
[Time].[2005].[Q1].[1] . Members of the [Month]
level will be displayed in the order January, February, etc.
In a parent-child hierarchy, members are always sorted in hierarchical
order. The ordinalColumn attribute controls the order that
siblings appear within their parent.
Ordinal columns may be of any datatype which can legally be used in
an ORDER BY clause. Scope of ordering is per-parent, so in the
example above, the day_in_month column should cycle for each month.
Values returned by the JDBC driver should be non-null instances of
java.lang.Comparable
which yield the desired ordering when their
Comparable.compareTo method is called.
Levels contain a type attribute, which can have values "String ", "Integer ", "Numeric ", "Boolean ",
"Date ", "Time ", and "Timestamp ".
The default value is "Numeric" because key columns generally have a numeric type. If it is a
different type, Mondrian needs to know this so it can generate SQL statements
correctly; for example, string values will be generated enclosed in single
quotes:
WHERE productSku = '123-455-AA'
3.3.5 Multiple hierarchies
A dimension can contain more than one hierarchy:
< Hierarchy hasAll="false" primaryKey="time_id">
< Table name="time_by_day"/>
< Level name="Year" column="the_year" type="Numeric" uniqueMembers="true"/>
< Level name="Quarter" column="quarter" uniqueMembers="false"/>
< Level name="Month" column="month_of_year" type="Numeric" uniqueMembers="false"/>
< Hierarchy name="Time Weekly" hasAll="false" primaryKey="time_id">
< Table name="time_by_week"/>
< Level name="Year" column="the_year" type="Numeric" uniqueMembers="true"/>
< Level name="Week" column="week" uniqueMembers="false"/>
< Level name="Day" column="day_of_week" type="String" uniqueMembers="false"/>
Notice that the first hierarchy doesn't have a name. By default, a hierarchy has the
same name as its dimension, so the first hierarchy is called "Time".
These hierarchies don't have much in common -- they don't even have the same table! -- except
that they are joined from the same column in the fact table, "time_id" .
The main reason to put two hierarchies in the same dimension is because it makes more sense to
the end-user: end-users know that it makes no sense to have the "Time" hierarchy on one axis
and the "Time Weekly" hierarchy on another axis. If two hierarchies are the same dimension, the
MDX language enforces common sense, and does not allow you to use them both in the same query.
3.3.6 Degenerate dimensions
A degenerate dimension is a dimension which is so simple that it isn't worth
creating its own dimension table. For example, consider following the fact table:
product_id |
time_id |
payment_method |
customer_id |
store_id |
item_count |
dollars |
55 |
20040106 |
Credit |
123 |
22 |
3 |
$3.54 |
78 |
20040106 |
Cash |
89 |
22 |
1 |
$20.00 |
199 |
20040107 |
ATM |
3 |
22 |
2 |
$2.99 |
55 |
20040106 |
Cash |
122 |
22 |
1 |
$1.18 |
and suppose we created a dimension table for the values in the payment_method column:
payment_method |
Credit |
Cash |
ATM |
This dimension table is fairly pointless. It only has 3 values, adds no additional information,
and incurs the cost of an extra join.
Instead, you can create a degenerate dimension. To do this, declare a dimension without a
table, and Mondrian will assume that the columns come from the fact table.
<!-- The fact table is always necessary. -->
<!-- No table element here. Fact table is assumed. -->
< Level name="Payment method" column="payment_method" uniqueMembers="true"/>
<!-- other dimensions and measures -->
Note that because there is no join, the foreignKey attribute of
Dimension is not necessary, and the Hierarchy
element has no <Table> child element or
primaryKey attribute.
3.3.7 Inline tables
The <InlineTable> construct allows
you to define a dataset in the schema file. You must declare the names of the columns, the column types
("String" or "Numeric"), and a set of rows. As for
<Table> and
<View> , you must provide a unique alias with which
to refer to the dataset.
Here is an example:
< Hierarchy hasAll="true" primaryKey="severity_id">
< Level name="Severity" column="id" nameColumn="desc" uniqueMembers="true"/>
This has the same effect as if you had a table called 'severity' in your database:
id |
desc |
1 |
High |
2 |
Medium |
3 |
Low |
and the declaration
< Hierarchy hasAll="true" primaryKey="severity_id">
< Level name="Severity" column="id" nameColumn="desc" uniqueMembers="true"/>
To specify a NULL value for a column, omit the <Value>
for that column, and the column's value will default to NULL.
3.3.8 Member properties and formatters
As we shall see later, a level definition can also define member properties
and a member formatter.
3.3.9 Approximate level cardinality
The <Level> element allows specifying the optional attribute "approxRowCount". Specifying
approxRowCount can improve performance by reducing the need to determine level, hierarchy, and dimension cardinality.
This can have a significant impact when connecting to Mondrian via XMLA.
3.3.10 Default Measure Attribute
The <Cube> and <VirtualCube>
elements allow specifying the optional attribute "defaultMeasure".
Specifying defaultMeasure in <Cube> element allows users
to explicitly specify any base measure as a default Measure.
Specifying defaultMeasure in <VirtualCube>
element allows users to explicitly specify any VirtualCube Measure as a default Measure.
Note that if a default measure is not specified it takes the first measure defined in the cube as the default measure. In the case of virtual cube,
it would pick up the first base measure of the first cube defined within it as the default.
Specifying the defaultMeasure explicitly would be useful in cases where you would want a calculated member to be picked up as the default measure.
To facilitate this, the calculated member could be defined in one of the base cubes and specified as the defaultMeasure in the virtual cube.
< Cube name="Sales" defaultMeasure="Unit Sales">
...
...
< VirtualCube name="Warehouse and Sales" defaultMeasure="Profit">
...
3.3.11 Functional Dependency Optimizations
In some circumstances, it may be possible to optimize performance by taking advantage
of known functional dependencies in the data being processed. Such dependencies
are typically the result of business rules associated with the systems producing
the data, and often cannot be inferred just by looking at the data itself.
Functional dependencies are declared to Mondrian using the
dependsOnLevelValue attribute of the
<Property> element and the
uniqueKeyLevelName attribute of the
<Hierarchy> element.
The dependsOnLevelValue attribute of a
member property is used to indicate
that the value of the member property is functionally dependent on the value
of the <Level> in which the member property is
defined. In other words, for a given value of the level, the value of the
property is invariant.
The uniqueKeyLevelName attribute of a
<Hierarchy> is used to indicate that the given
level (if any) taken together with all higher levels in the hierarchy acts as a
unique alternate key, ensuring that for any unique combination of those level values,
there is exactly one combination of values for all levels below it.
To illustrate, consider a hierarchy modeling cars built and licensed in the
United States:
< Dimension name="Automotive" foreignKey="auto_dim_id">
< Hierarchy hasAll="true" primaryKey="auto_dim_id" uniqueKeyLevelName="Vehicle Identification Number">
< Table name="automotive_dim"/>
< Level name="Make" column="make_id" type="Numeric"/>
< Level name="Model" column="model_id" type="Numeric"/>
< Level name="ManufacturingPlant" column="plant_id" type="Numeric"/>
< Property name="State" column="plant_state_id" type="Numeric" dependsOnLevelValue="true"/>
< Property name="City" column="plant_city_id" type="Numeric" dependsOnLevelValue="true"/>
< Level name="Vehicle Identification Number" column="vehicle_id" type="Numeric"/>
< Property name="Color" column="color_id" type="Numeric" dependsOnLevelValue="true"/>
< Property name="Trim" column="trim_id" type="Numeric" dependsOnLevelValue="true"/>
< Level name="LicensePlateNum" column="license_id" type="String"/>
< Property name="State" column="license_state_id" type="Numeric" dependsOnLevelValue="true"/>
In the above example, we know that a given manufacturing plant exists only in
a single city and state, that a given car only has one color scheme and
one trim level, and that the license number is associated with a single state.
Therefore, we can state that all of these member properties are functionally
dependent on the associated level values.
Additionally, we know that the Vehicle Identification Number uniquely identifies
each car, and that each car only has one license. Thus, we know that the
combination of Make, Model, Manufacturing Plant, and Vehicle Identification
Number uniquely identifies each vehicle; the license number is redundant.
These attributes enable optimization of the GROUP BY clause in the SQL statements Mondrian generates. Absent any
functional dependency information, a typical query on the Automotive dimension
would look something like:
SELECT
`automotive_dim`.`make_id` AS c0,
`automotive_dim`.`model_id` AS c1,
`automotive_dim`.`plant_id` AS c2,
`automotive_dim`.`plant_state_id` AS c3,
`automotive_dim`.`plant_city_id` AS c4,
`automotive_dim`.`vehicle_id` AS c5,
`automotive_dim`.`color_id` AS c6,
`automotive_dim`.`trim_id` AS c7,
`automotive_dim`.`license_id` AS c8,
`automotive_dim`.`license_state_id` AS c9
FROM
`automotive_dim` AS `automotive_dim`,
GROUP BY
`automotive_dim`.`make_id`,
`automotive_dim`.`model_id`,
`automotive_dim`.`plant_id`,
`automotive_dim`.`plant_state_id`,
`automotive_dim`.`plant_city_id`,
`automotive_dim`.`vehicle_id`,
`automotive_dim`.`color_id`,
`automotive_dim`.`trim_id`,
`automotive_dim`.`license_id`,
`automotive_dim`.`license_state_id`
ORDER BY
`...
Given the functional dependence attributes in the schema example above, however,
we know that the query is selecting at a depth that includes the "unique key" level,
and that all properties in the query are also functionally dependent on their levels.
In such cases the GROUP BY clause is redundant and may be eliminated completely,
increasing SQL query performance significantly on some databases:
SELECT
`automotive_dim`.`make_id` AS c0,
`automotive_dim`.`model_id` AS c1,
`automotive_dim`.`plant_id` AS c2,
`automotive_dim`.`plant_state_id` AS c3,
`automotive_dim`.`plant_city_id` AS c4,
`automotive_dim`.`vehicle_id` AS c5,
`automotive_dim`.`color_id` AS c6,
`automotive_dim`.`trim_id` AS c7,
`automotive_dim`.`license_id` AS c8,
`automotive_dim`.`license_state_id` AS c9
FROM
`automotive_dim` AS `automotive_dim`,
ORDER BY
`...
Had the query not been deep enough to include the "unique key" level, or
had any of the member properties not been functionally dependent on their level,
this optimization would not be possible.
In some cases, a different optimization can be made where there is no "unique key"
level, but some or all of the member properties are functionally dependent on
their level. Some databases (notably MySQL) permit columns to be listed in the
SELECT clause that do not also appear in the GROUP BY clause. On such databases,
Mondrian can simply leave the functionally dependent member properties out of
the GROUP BY, which may reduce SQL query processing time substantially:
SELECT
`automotive_dim`.`make_id` AS c0,
`automotive_dim`.`model_id` AS c1,
`automotive_dim`.`plant_id` AS c2,
`automotive_dim`.`plant_state_id` AS c3,
`automotive_dim`.`plant_city_id` AS c4,
`automotive_dim`.`vehicle_id` AS c5,
`automotive_dim`.`color_id` AS c6,
`automotive_dim`.`trim_id` AS c7,
`automotive_dim`.`license_id` AS c8,
`automotive_dim`.`license_state_id` AS c9
FROM
`automotive_dim` AS `automotive_dim`,
GROUP BY
`automotive_dim`.`make_id`,
`automotive_dim`.`model_id`,
`automotive_dim`.`plant_id`,
`automotive_dim`.`vehicle_id`,
`automotive_dim`.`license_id`,
ORDER BY
`...
Please note that significant changes are anticipated to the schema syntax
in Mondrian 4.0, including a new approach to declaring functional dependencies.
While the expectation is that the 4.0 schema processor will maintain backward
compatibility with schemas developed for Mondrian 3.1, these are transitional
attributes introduced to allow support in the interim, and 4.0 will not be
backward compatible with them. Therefore, any schema using these attributes will
need to be migrated to the new syntax as part of upgrading to Mondrian 4.0.
3.3.12 Table Hints
Mondrian supports a limited set of database-specific hints for the
<Table> element, which will then be
passed on to SQL queries involving the table. These hints are as
follows:
Database |
Hint Type |
Permitted Values |
Description |
MySQL |
force_index |
The name of an index on this table |
Forces the named index to be used when selecting level values
from this table. |
For example:
< Table name="automotive_dim">
< Hint type="force_index">my_index</ Hint>
As with the functional dependency optimizations, support for table hints
is in a transitional stage, and are likely to change in Mondrian 4.0. Any
schema using them may need to be migrated to the new schema syntax as part of
upgrading to Mondrian 4.0.
4. Star and snowflake schemas
We saw earlier how to build a cube based upon a fact table, and dimensions in the fact
table ("Payment method") and in a table joined to the fact table ("Gender"). This is the
most common kind of mapping, and is known as a star schema.
But a dimension can be based upon more than one table, provided that there is a well-defined
path to join these tables to the fact table. This kind of dimension is known as a snowflake,
and is defined using the <Join> operator. For example:
...
< Dimension name="Product" foreignKey="product_id">
< Hierarchy hasAll="true" primaryKey="product_id" primaryKeyTable="product">
< Join leftKey="product_class_key" rightAlias="product_class" rightKey="product_class_id">
< Join leftKey="product_type_id" rightKey="product_type_id">
< Table name="product_class"/>
< Table name="product_type"/>
<!-- Level declarations ... -->
This defines a "Product" dimension consisting of three tables. The
fact table joins to "product" (via the foreign key
"product_id" ), which joins to "product_class" (via the foreign
key "product_class_id" ), which joins to "
product_type" (via the foreign key "product_type_id" ). We require
a <Join> element nested within a <Join>
element because <Join> takes two operands; the operands
can be tables, joins, or even queries.
The arrangement of the tables seems complex; the simple rule of thumb is to order the tables
by the number of rows they contain. The "product" table has the most
rows, so it joins to the fact table and appears first; "product_class"
has fewer rows, and "product_type" , at the tip of the snowflake, has
least of all.
Note that the outer <Join> element has a
rightAlias attribute. This is necessary because the right component of the join (the inner
<Join> element) consists of more than one table. No
leftAlias attribute is necessary in this case, because the leftKey
column unambiguously comes from the "product" table.
4.1 Shared dimensions
When generating the SQL for a join, Mondrian needs to know which column to join to. If you are
joining to a join, then you need to tell it which of the tables in the join that column belongs
to (usually it will be the first table in the join).
Because shared dimensions don't belong to a cube, you have to give them an explicit table
(or other data source). When you use them in a particular cube, you specify the foreign key. This
example shows the Store Type dimension being joined to the
Sales cube using the sales_fact_1997.store_id
foreign key, and to the Warehouse cube using the
warehouse.warehouse_store_id foreign key:
< Hierarchy hasAll="true" primaryKey="store_id">
< Level name="Store Type" column="store_type" uniqueMembers="true"/>
< Table name="sales_fact_1997"/>
...
< DimensionUsage name="Store Type" source="Store Type" foreignKey="store_id"/>
< Table name="warehouse"/>
...
< DimensionUsage name="Store Type" source="Store Type" foreignKey="warehouse_store_id"/>
4.2 Join optimization
The table mapping in the schema tells Mondrian how to get the data, but Mondrian is smart
enough not to read the schema literally. It applies a number of optimizations when generating
queries:
- TODO: describe large dimension support
- If a dimension (or, more precisely, the level of the dimension being accessed) is in the fact table, Mondrian does
not perform a join.
- If two dimensions access the same table via the same join path, Mondrian only joins them once. For example,
[Gender] and [Age] might both be columns in the
customers table, joined via sales_1997.cust_id = customers.cust_id .
5. Advanced logical constructs
5.1 Virtual cubes
A virtual cube combines two or more regular cubes. It is defined by the <VirtualCube>
element:
< CubeUsage cubeName="Sales" ignoreUnrelatedDimensions="true"/>
The <CubeUsages>
element is optional. It specifies the cubes that are imported into the virtual cube.
Holds CubeUsage elements.
The <CubeUsage>
element is optional. It specifies the base cube that is imported into the
virtual cube. Currently it is possible to define a VirtualCubeMeasure and
similar imports from a base cube without defining CubeUsage for the cube.
The cubeName attribute specifies the base cube being imported.
The ignoreUnrelatedDimensions attribute specifies that the measures
from this base cube will have non joining dimension members pushed to the
top level member. This behaviour is currently supported for aggregation.
This attribute is by default false.
ignoreUnrelatedDimensions is an experimental feature similar to
the similarly named feature in SSAS 2005.
MSDN documentation
mentions "When IgnoreUnrelatedDimensions is true, unrelated dimensions are forced
to their top level; when the value is false, dimensions are not forced to their
top level. This property is similar to the Multidimensional Expressions
(MDX) ValidMeasure function". Current Mondrian implementation of
ignoreUnrelatedDimensions depends on use of ValidMeasure. E.g. If we
want to apply this behaviour to "Unit Sales" measure in the "Warehouse and Sales"
virtual cube then we need to define a CubeUsage entry for "Sales" cube as shown
in the example above and also wrap this measure with ValidMeasure.
The <VirtualCubeDimension>
element imports a dimension from one of the constituent cubes. If you do not
specify the cubeName attribute, this means you
are importing a shared dimension. (If a shared dimension is used more than once
in a cube, there is no way, at present, to disambiguate which usage of the
shared dimension you intend to import.)
The <VirtualCubeMeasure>
element imports a measure from one of the constituent cubes. It is imported with
the same name. If you want to create a formula, or just to rename a measure as
you import it, use the <CalculatedMember>
element.
Virtual cubes occur surprisingly frequently in real-world applications. They
occur when you have fact tables of different granularities (say one measured at
the day level, another at the month level), or fact tables of different
dimensionalities (say one on Product, Time and Customer, another on Product,
Time and Warehouse), and want to present the results to an end-user who doesn't
know or care how the data is structured.
Any common dimensions -- shared dimensions which are used by both constituent
cubes -- are automatically synchronized. In this example, [Time]
and [Product] are common dimensions. So if the context is ([Time].[1997].[Q2] ,
[Product].[Beer].[Miller Lite] ), measures from either cube will
relate to this context.
Dimensions which only belong to one cube are called non-conforming
dimensions. The [Gender] dimension is an example of this: it exists
in the Sales cube but not Warehouse . If the context is
([Gender].[F] , [Time].[1997].[Q1] ), it makes sense to
ask the value of the [Unit Sales] measure (which comes from the
[Sales] cube) but not the [Units Ordered] measure (from
[Warehouse] ). In the context of [Gender].[F] , [Units
Ordered] has value NULL.
5.2 Parent-child hierarchies
A conventional hierarchy has a rigid set of levels, and members which adhere to those
levels. For example, in the Product hierarchy, any member of the Product Name
level has a parent in the Brand Name level, which has a parent in the
Product Subcategory level, and so forth. This structure is sometimes too rigid
to model real-world data.
A parent-child hierarchy has only one level (not counting the special 'all' level),
but any member can have parents in the same level. A classic example is the reporting structure
in the Employees hierarchy:
< Dimension name="Employees" foreignKey="employee_id">
< Hierarchy hasAll="true" allMemberName="All Employees" primaryKey="employee_id">
< Level name="Employee Id" uniqueMembers="true" type="Numeric" column="employee_id" nameColumn="full_name" parentColumn="supervisor_id" nullParentValue="0">
< Property name="Marital Status" column="marital_status"/>
< Property name="Position Title" column="position_title"/>
< Property name="Gender" column="gender"/>
< Property name="Salary" column="salary"/>
< Property name="Education Level" column="education_level"/>
< Property name="Management Role" column="management_role"/>
The important attributes here are parentColumn and nullParentValue :
- The
parentColumn attribute is the name of the
column which links a member to its parent member; in this case, it is
the foreign key column which points to an employee's supervisor. The <ParentExpression>
child element of <Level> is equivalent to the parentColumn
attribute, but allows you to define an arbitrary SQL expression, just
like the <Expression> element. The parentColumn
attribute (or <ParentExpression> element) is the
only indication to Mondrian that a hierarchy has a parent-child
structure.
- The
nullParentValue attribute is the value which
indicates that a member has no parent. The default is nullParentValue="null" ,
but since many database don't index null values, schema designers
sometimes use values as the empty string, 0, and -1 instead.
5.2.1 Tuning parent-child hierarchies
There's one serious problem with the parent-child hierarchy defined above, and that is the
amount of work Mondrian has to do in order to compute cell-totals. Let's suppose that the
employee table contains the following data:
employee |
supervisor_id |
employee_id |
full_name |
null |
1 |
Frank |
1 |
2 |
Bill |
2 |
3 |
Eric |
1 |
4 |
Jane |
3 |
5 |
Mark |
2 |
6 |
Carla |
If we want to compute the total salary budget for Bill, we need to add in the salaries of Eric
and Carla (who report to Bill) and Mark (who reports to Eric). Usually Mondrian generates a
SQL GROUP BY statement to compute these totals, but there is no
(generally available) SQL construct which can traverse hierarchies. So by default,
Mondrian generates one SQL statement per supervisor, to retrieve and total all of that
supervisor's direct reports.
This approach has a couple of drawbacks. First, the performance is not very good if a
hierarchy contains more than a hundred members. Second, because Mondrian implements the
distinct-count aggregator by generating SQL, you cannot define a distinct-count
measure in
any cube which contains a parent-child hierarchy.
How can we solve these problems? The answer is to enhance the data so that Mondrian is
able to retrieve the information it needs using standard SQL. Mondrian supports a mechanism
called a closure table for this purpose.
5.2.2 Closure tables
A closure table is a SQL table which contains a record for every employee/supervisor
relationship, regardless of depth. (In mathematical terms, this is called the 'reflexive
transitive closure' of the employee/supervisor relationship. The distance
column is not strictly required, but it makes it easier to populate the table.)
employee_closure |
supervisor_id |
employee_id |
distance |
1 |
1 |
0 |
1 |
2 |
1 |
1 |
3 |
2 |
1 |
4 |
1 |
1 |
5 |
3 |
1 |
6 |
2 |
2 |
2 |
0 |
2 |
3 |
1 |
2 |
5 |
2 |
2 |
6 |
1 |
3 |
3 |
0 |
3 |
5 |
1 |
4 |
4 |
0 |
5 |
5 |
0 |
6 |
6 |
0 |
In the catalog XML, the <Closure> element maps
the level onto a <Table> :
< Dimension name="Employees" foreignKey="employee_id">
< Hierarchy hasAll="true" allMemberName="All Employees" primaryKey="employee_id">
< Level name="Employee Id" uniqueMembers="true" type="Numeric" column="employee_id" nameColumn="full_name" parentColumn="supervisor_id" nullParentValue="0">
< Closure parentColumn="supervisor_id" childColumn="employee_id">
< Table name="employee_closure"/>
< Property name="Marital Status" column="marital_status"/>
< Property name="Position Title" column="position_title"/>
< Property name="Gender" column="gender"/>
< Property name="Salary" column="salary"/>
< Property name="Education Level" column="education_level"/>
< Property name="Management Role" column="management_role"/>
This table allows totals to be evaluated in pure SQL. Even though this introduces an extra
table into the query, database optimizers are very good at handling joins. I recommend that
you declare both supervisor_id and employee_id NOT NULL, and index
them as follows:
CREATE UNIQUE INDEX employee_closure_pk ON employee_closure (
supervisor_id,
employee_id);
CREATE INDEX employee_closure_emp ON employee_closure (
employee_id);
5.2.3 Populating closure tables
The table needs to be re-populated whenever the hierarchy changes, and it is
the application's responsibility to do so — Mondrian does not do this!
If you are using Pentaho Data Integration (Kettle), there is a special step
to populate closure tables as part of the ETL process. Further details in the
Pentaho Data
Integration wiki.
Closure Generator step in Pentaho Data Integration
|
If you are not using Pentaho Data Integration, you can populate the table
yourself using SQL. Here is an example of a MySQL stored procedure that populates a closure table.
DELIMITER //
CREATE PROCEDURE populate_employee_closure()
BEGIN
DECLARE distance int;
TRUNCATE TABLE employee_closure;
SET distance = 0;
-- seed closure with self-pairs (distance 0)
INSERT INTO employee_closure (supervisor_id, employee_id, distance)
SELECT employee_id, employee_id, distance
FROM employee;
-- for each pair (root, leaf) in the closure,
-- add (root, leaf->child) from the base table
REPEAT
SET distance = distance + 1;
INSERT INTO employee_closure (supervisor_id, employee_id, distance)
SELECT employee_closure.supervisor_id, employee.employee_id, distance
FROM employee_closure, employee
WHERE employee_closure.employee_id = employee.supervisor_id
AND employee_closure.distance = distance - 1;
UNTIL (ROW_COUNT() == 0))
END REPEAT;
END //
DELIMITER ;
5.3 Member properties
Member properties are defined by the <Property>
element within a <Level> , like this:
< Level name="MyLevel" column="LevelColumn" uniqueMembers="true">
< Property name="MyProp" column="PropColumn" formatter="com.example.MyPropertyFormatter"/>
The formatter attribute defines a property formatter
, which is explained later.
Once properties have been defined in the schema, you can use them in MDX statements via the
member.Properties("propertyName") function, for example:
SELECT {[Store Sales]} ON COLUMNS,
TopCount(Filter([Store].[Store Name].Members,
[Store].CurrentMember.Properties("Store Type") = "Supermarket"),
10,
[Store
Sales]) ON ROWS
FROM [Sales]
Mondrian deduces the type of the property expression, if it can. If the property name is a
constant string, the type is based upon the type attribute ("String", "Numeric" or "Boolean")
of the property definition. If the property name is an expression (for example
CurrentMember.Properties("Store " + "Type") ), Mondrian will return an untyped
value.
5.4 Calculated members
Suppose you want to create a measure whose value comes not from a column of the fact table,
but from an MDX formula. One way to do this is to use a WITH MEMBER clause, like
this:
WITH MEMBER [Measures].[Profit] AS '[Measures].[Store
Sales]-[Measures].[Store Cost]',
FORMAT_STRING = '$#,###'
SELECT {[Measures].[Store Sales], [Measures].[Profit]} ON COLUMNS,
{[Product].Children} ON ROWS
FROM [Sales]
WHERE [Time].[1997]
But rather than including this clause in every MDX query of your application, you can define
the member in your schema, as part of your cube definition:
You can also declare the formula as an XML attribute, if you prefer. The effect is just the same.
< CalculatedMember name="Profit" dimension="Measures" formula="[Measures].[Store Sales]-[Measures].[Store Cost]">
Note that the <CalculatedMemberProperty>
(not <Property> ) element corresponds
to the FORMAT_STRING = '$#,###' fragment of the MDX statement. You can define
other properties here too, but FORMAT_STRING is by far the most useful in practice.
The FORMAT_STRING property value can also be evaluated using an expression.
When formatting a particular cell, first the expression is evaluated to yield a format string,
then the format string is applied to the cell value. Here is the same property with a conditional
format string:
< CalculatedMemberProperty name="FORMAT_STRING" expression="Iif(Value < 0, '|($#,##0.00)|style=red', '|$#,##0.00|style=green')"/>
For more details about format strings, see the
MDX specification.
One additional calculated member property that is worth mentioning is DATATYPE.
As with measures,
setting datatype specifies how the calculated member is returned via XML for Analysis.
The DATATYPE property of a calculated member can have values "String ", "Integer ", or
"Numeric ":
You can specify SOLVE_ORDER for the calculated member property. Solve order
determines the priority of calculation in the event of competing expressions
You can make a calculated member or a measure invisible. If you specify visible="false"
(the default is "true") in the <Measure> or
<CalculatedMember> element, user-interfaces such as
JPivot will notice this property and hide the member. This is useful if you want to perform
calculations in a number of steps, and hide intermediate steps from end-users. For example,
here only "Margin per Sqft" is visible, and its factors "Store Cost", "Margin" and "Store Sqft"
are hidden:
< Measure name="Store Cost" column="store_cost" aggregator="sum" formatString="#,###.00" visible="false"/>
< Formula>([Measures].[Store Sales] - [Measures].[Store Cost]) / [Measures].[Store Cost]</ Formula>
< Formula>IIF([Store].CurrentMember.Level.Name = "Store Name", [Store].Properties("Store Sqft"), NULL)</ Formula>
5.5 Named sets
The WITH SET clause of an MDX statement allows you to declare a set expression
which can be used throughout that query. For example,
WITH SET [Top Sellers] AS
'TopCount([Warehouse].[Warehouse Name].MEMBERS, 5,
[Measures].[Warehouse Sales])'
SELECT
{[Measures].[Warehouse Sales]} ON COLUMNS,
{[Top Sellers]} ON ROWS
FROM [Warehouse]
WHERE [Time].[Year].[1997]
The WITH SET clause is very similar to the WITH MEMBER clause,
and as you might expect, it has a construct in schema analogous to
<CalculatedMember> . The
<NamedSet> element allows you to define a
named set in your schema as part of a cube definition. It is implicitly available for
any query against that cube:
...
< Formula>TopCount([Warehouse].[Warehouse Name].MEMBERS, 5, [Measures].[Warehouse Sales])</ Formula>
SELECT
{[Measures].[Warehouse Sales]} ON COLUMNS,
{[Top Sellers]} ON ROWS
FROM [Warehouse]
WHERE [Time].[Year].[1997]
Warehouse
|
Warehouse Sales |
Treehouse Distribution |
31,116.37 |
Jorge Garcia, Inc. |
30,743.77 |
Artesia Warehousing, Inc. |
29,207.96 |
Jorgensen Service Storage |
22,869.79 |
Destination, Inc. |
22,187.42 |
A named set defined against a cube is not inherited by a virtual cube defined against
that cube. (But you can define a named set against a virtual cube.)
You can also define a named set as global to a schema:
< Cube name="Sales" ... />
< Cube name="Warehouse" ... />
< NamedSet name="CA Cities" formula="{[Store].[USA].[CA].Children}"/>
A named set defined against a schema is available in all cubes and virtual cubes in that
schema. However, it is only valid if the cube contains dimensions with the names required to
make the formula valid. For example, it would be valid to use [CA Cities] in
queries against the [Sales] and [Warehouse and Sales] cubes, but
if you used it in a query against the [Warehouse] cube you would get an error,
because [Warehouse] does not have a [Store] dimension.
6. Plug-ins
Sometimes Mondrian's schema language isn't flexible enough, or the MDX language isn't
powerful enough, to solve the problem at hand. What you want to do is add a little of your
own Java code into the Mondrian application, and a plug-in is a way to do this.
Each of Mondrian's extensions is technically a Service Provider Interface (SPI); in short,
a Java interface which you write code to implement, and which Mondrian will call at runtime.
You also need to register an extension (usually somewhere in your schema.xml file) and to
ensure that it appears on the classpath.
Plug-ins include
user-defined functions;
cell, member and
property formatters;
dynamic schema processors and
data source change listeners.
There is incomplete support for member
readers and cell readers,
and in future we may support pluggable
SQL dialects.
Some plug-ins (user-defined function, member formatter, property
formatter, cell formatter) can be implemented in a scripting language
such as JavaScript. In this case, you do not need to write a Java
class; you just enclose the script code within
a Script element in the mondrian schema
file. Extensions implemented in scripting languages do not in general
perform as well as extensions implemented in Java, but they are much
more convenient because you do not need to compile any code. Just
modify the script code in the mondrian schema file and re-load the
schema. The shorter code-debug-fix cycle allows you to develop your
application much faster. Once you have implemented the plug-in in
script, if performance is still a concern, you can translate your
plug-in into Java.
Other extensions
include Dynamic datasource
xmla servlet.
6.1 User-defined function
A user-defined function must have a public constructor and implement the
mondrian.spi.UserDefinedFunction interface. For example,
package com.example;
import mondrian.olap.*;
import mondrian.olap.type.*;
import mondrian.spi.UserDefinedFunction;
/**
* A simple user-defined function which adds one to its argument.
*/
public class PlusOneUdf implements UserDefinedFunction {
// public constructor
public PlusOneUdf() {
}
public String getName() {
return "PlusOne";
}
public String getDescription() {
return "Returns its argument
plus one";
}
public Syntax getSyntax() {
return Syntax.Function;
}
public Type getReturnType(Type[] parameterTypes) {
return new NumericType();
}
public Type[] getParameterTypes() {
return new Type[] {new
NumericType()};
}
public Object execute(Evaluator evaluator, Exp[]
arguments) {
final Object argValue =
arguments[0].evaluateScalar(evaluator);
if (argValue instanceof
Number) {
return new Double(((Number) argValue).doubleValue() + 1);
} else {
//
Argument might be a RuntimeException indicating that
//
the cache does not yet have the required cell value. The
//
function will be called again when the cache is loaded.
return null;
}
}
public String[] getReservedWords() {
return null;
}
}
Declare it in your schema:
...
And use it in any MDX statement:
WITH MEMBER [Measures].[Unit Sales Plus One]
AS 'PlusOne([Measures].[Unit
Sales])'
SELECT
{[Measures].[Unit Sales]} ON COLUMNS,
{[Gender].MEMBERS} ON ROWS
FROM [Sales]
If a user-defined function has a public constructor with one string argument, Mondrian
will pass in the function's name. Why? This allows you to define two or more user-defined
functions using the same class:
package com.example;
import mondrian.olap.*;
import mondrian.olap.type.*;
import mondrian.spi.UserDefinedFunction;
/**
* A user-defined function which either adds one to or
* subtracts one from its argument.
*/
public class PlusOrMinusOneUdf implements UserDefinedFunction {
private final name;
private final isPlus;
// public constructor with one argument
public PlusOneUdf(String
name) {
this.name = name;
if (name.equals("PlusOne")) {
isPlus = true;
} else if
(name.equals("MinusOne")) {
isPlus = false;
} else {
throw new IllegalArgumentException("Unexpected name " + name);
}
}
public String getName() {
return name;
}
public String getDescription() {
return "Returns its argument
plus or minus one";
}
public Syntax getSyntax() {
return Syntax.Function;
}
public Type getReturnType(Type[] parameterTypes) {
return new NumericType();
}
public Type[] getParameterTypes() {
return new Type[] {new
NumericType()};
}
public Object execute(Evaluator evaluator, Exp[]
arguments) {
final Object argValue =
arguments[0].evaluateScalar(evaluator);
if (argValue instanceof
Number) {
if (isPlus) {
return new Double(((Number) argValue).doubleValue() + 1);
}
else {
return new Double(((Number) argValue).doubleValue() - 1);
}
} else {
//
Argument might be a RuntimeException indicating that
//
the cache does not yet have the required cell value. The
//
function will be called again when the cache is loaded.
return null;
}
}
public String[] getReservedWords() {
return null;
}
}
and register two the functions in your schema:
...
If you're tired of writing duplicate user-defined function
declarations in schema files, you can pack your user-defined function
implemention classes into a jar file with a embedded
META-INF/services/mondrian.spi.UserDefinedFunction
resource file. This resource file contains class names of
implementations of interface mondrian.spi.UserDefinedFunction, one
name per line. For more details, you may look into
src/main/META-INF/services/mondrian.spi.UserDefinedFunction
in the source distribution and the
Service Provider section of the specification of JAR files.
User-defined functions declared by this means are available to all
mondrian schemas in the JVM.
Caution: you can't define more than one user-defined function
implementations in one class when you declare user-defined functions
in this way. One function will be loaded for each class, and given the
name that the getName() method returns.
User-defined functions can also be implemented in a scripting
language, such as JavaScript. These functions may not perform quite as
well as Java UDFs or built-in functions, but they are a lot more
convenient to implement.
To define a UDF in script, use the Script element and include within it the followimg functions:
getName()
getDescription()
getSyntax()
getParameterTypes()
getReturnType(parameterTypes)
execute(evaluator, arguments)
The getName(), getDescription(), getReservedWords() and getSyntax()
methods are optional; getName() defaults to the name attribute in the
UserDefinedFunction element, getDescription() defaults to the name,
getReservedWords() returns the empty list, and getSyntax() defaults
to mondrian.olap.Syntax.Function. The
other methods have similar meanings to the ones in the
UserDefinedFunction SPI.
Here is an example of the factorial function as a JavaScript UDF:
< Script language="JavaScript">
function getParameterTypes() {
return new Array(new mondrian.olap.type.NumericType());
}
function getReturnType(parameterTypes) {
return new mondrian.olap.type.NumericType();
}
function execute(evaluator, arguments) {
var n = arguments[0].evaluateScalar(evaluator);
return factorial(n);
}
function factorial(n) {
return n <= 1 ? 1 : n * factorial(n - 1);
}
6.2 Member reader
A member reader is a means of accessing members. Hierarchies are usually based
upon a dimension table (an 'arm' of a star schema), and are therefore populated using SQL.
But even if your data doesn't reside in an RDBMS, you can make it appear as a hierarchy
by writing a Java class called a custom member reader.
Here are a couple of examples:
DateSource (to be written) generates a time
hierarchy. Conventionally, data warehouse implementors generate a table
containing a row for every date their system is ever likely to deal
with. But the problem is that this table needs to be loaded, and as
time goes by, they will have to remember to add more rows. DateSource
generates date members in memory, and on demand.
FileSystemSource (to be written) presents the file
system as a hierarchy of directories and files. Since a directory can
have a parent which is itself a directory, it is a parent-child
hierarchy. Like the time hierarchy created by DateSource, this is a
virtual hierarchy: the member for a particular file is only created
when, and if, that file's parent directory is expanded.
ExpressionMemberReader (to be written) creates a
hierarchy based upon an expression.
A custom member reader must implement the
mondrian.rolap.MemberSource interface. If you need to implement a larger set of member
operations for fine-grained control, implement the derived
mondrian.rolap.MemberReader interface; otherwise, Mondrian wrap your reader in a
mondrian.rolap.CacheMemberReader object. Your member reader must have a public constructor
which takes (
RolapHierarchy,
Properties) parameters, and throws no checked exceptions.
Member readers are declared using the <Hierarchy>
element's memberReaderClass attribute; any
<Parameter> child elements are passed via the properties
constructor parameter. Here is an example:
< Hierarchy hasAll="true" memberReaderClass="mondrian.rolap.HasBoughtDairySource">
< Level name="Has bought dairy" uniqueMembers="true"/>
< Parameter name="expression" value="not used"/>
6.3 Cell reader
Not implemented yet. Syntax would be something like
< Measure name="name" cellReaderClass="com.example.MyCellReader"/>
and the class "com.example.MyCellReader" would have to implement the
mondrian.olap.CellReader interface.
6.4 Cell formatter
A cell formatter modifies the behavior of
Cell.getFormattedValue() . The class must implement the
mondrian.spi.CellFormatter interface, and is specified like this:
(The previous syntax, using the 'formatter' attribute of the Measure
element, is deprecated, and will be removed in mondrian-4.0.)
You can specify a formatter in a scripting language such as
JavaScript, using the Script element:
< Script language="JavaScript">
The script has available a value variable,
corresponding to the parameter of
the
mondrian.spi.CellFormatter.formatCell(Object value)
method. The code fragment can have multiple statements, but must end
in a return statement.
For a calculated member that belongs to a cube or virtual cube, you
can also use the CellFormatter
element:
[Measures].[Unit Sales] * 2
< Script language="JavaScript">
var s = value.toString();
while (s.length() < 20) {
s = "0" + s;
}
return s;
You can also define a formatter by setting the CELL_FORMATTER property of the member to the name of the formatter class.
For a calculated measure defined in the WITH MEMBER clause of an
MDX query, you can set the same property in the MDX to achieve the same effect:
WITH MEMBER [Measures].[Foo]
AS '[Measures].[Unit Sales] * 2',
CELL_FORMATTER='com.example.MyCellFormatter'
SELECT {[Measures].[Unit Sales], [Measures].[Foo]} ON COLUMNS,
{[Store].Children} ON ROWS
FROM [Sales]
To define a scripted formatter, use
the CELL_FORMATTER_SCRIPT
and CELL_FORMATTER_SCRIPT_LANGUAGE properties:
WITH MEMBER [Measures].[Foo]
AS '[Measures].[Unit Sales] * 2',
CELL_FORMATTER_SCRIPT_LANGUAGE='JavaScript',
CELL_FORMATTER_SCRIPT='var s = value.toString(); while (s.length() < 20) s = "0" + s; return s;'
SELECT {[Measures].[Unit Sales], [Measures].[Foo]} ON COLUMNS,
{[Store].Children} ON ROWS
FROM [Sales]
The cell formatter property is ignored if a member does not belong to the
[Measures] dimension.
6.5 Member formatter
A member formatter modifies the behavior of
Member.getCaption() . The class must implement
the
mondrian.spi.MemberFormatter interface, and is specified
like this:
< Level name="name" column="column">
(The previous syntax, using the 'formatter' attribute of the Level
element, is deprecated, and will be removed in mondrian-4.0.)
You can specify a formatter in a scripting language such as
JavaScript, using the Script element:
< Level name="name" column="column">
< Script language="JavaScript">
return member.getName().toUpperCase();
The script has available a member variable,
corresponding to the parameter of
the
mondrian.spi.MemberFormatter.formatMember(Member member)
method. The code fragment can have multiple statements, but must end
in a return statement.
6.6 Property formatter
A property formatter modifies the behavior of
Property.getPropertyFormattedValue() . The class must implement the
mondrian.spi.PropertyFormatter interface, and is specified like this:
< Level name="MyLevel" column="LevelColumn" uniqueMembers="true">
< Property name="MyProp" column="PropColumn">
(The previous syntax, using the 'formatter' attribute of the Property
element, is deprecated, and will be removed in mondrian-4.0.)
You can specify a formatter in a scripting language such as
JavaScript, using the Script element:
< Level name="name" column="column">
< Property name="MyProp" column="PropColumn">
< Script language="JavaScript">
return member.getName().toUpperCase();
The script has
available member , propertyName
and propertyValue variables, corresponding to the
parameters of
the mondrian.spi.PropertyFormatter.formatProperty(Member
member, String propertyName, Object propertyValue) method.
The code fragment can have multiple statements, but must end in
a return statement.
6.7 Schema processor
A schema processor implements the
mondrian.spi.DynamicSchemaProcessor interface. It is specified as part of
the connection string, like this:
Jdbc=jdbc:odbc:MondrianFoodMart; JdbcUser=ziggy;
JdbcPassword=stardust; DynamicSchemaProcessor=com.example.MySchemaProcessor
The effect is that when reading the contents of the schema from a URL, Mondrian turns to
the schema processor rather than Java's default URL handler. This gives the schema reader
the opportunity to run a schema through a filter, or even generate an entire schema on the fly.
When DynamicSchemaProcessor is specified, schema would
be processed and reloaded on every ROLAP connection request. Property
UseContentChecksum should be used along with a schema processor
to enable caching of the schema:
DataSource=java:/jdbc/MyWarehouse;
DynamicSchemaProcessor=com.example.MySchemaProcessor;
UseContentChecksum=true
In this case once loaded schema would be cached until it's change. If
schema content has changed, it would be reloaded (and processed).
Dynamic schemas are a very powerful construct. As we shall see, an important application
for them is internationalization.
6.8 Data source change listener
A data source change listener implements the
mondrian.spi.DataSourceChangeListener interface. It is specified as part of
the connection string, like this:
Jdbc=jdbc:odbc:MondrianFoodMart; JdbcUser=ziggy;
JdbcPassword=stardust; DataSourceChangeListener=com.example.MyChangeListener;
Everytime mondrian has to decide whether it will use data from cache, it
will call the change listener. When the change listener tells mondrian
the datasource has changed for a dimension, cube, ... then mondrian will
flush the cache and read from database again.
This class should be called in mondrian before any data is read, so
even before cache is build. This way, the plugin is able to register
the first timestamp mondrian tries to read the datasource.
Each time a query is started, aggregate cache is checked to see if
it has changed. If so, cache will be flushed and aggregates will be
reloaded from the data source.
Here is an example of a data source change listener plugin class:
package com.example;
//...
import javax.sql.DataSource;
import java.sql.Connection;
import java.sql.ResultSet;
import java.sql.SQLException;
import java.sql.Timestamp;
//...
import mondrian.olap.MondrianDef;
import mondrian.rolap.RolapHierarchy;
import mondrian.rolap.RolapUtil;
import mondrian.rolap.agg.Aggregation;
import mondrian.rolap.RolapStar;
import mondrian.spi.impl.DataSourceChangeListenerImpl;
//...
public class MyChangeListener extends DataSourceChangeListenerImpl {
public MyChangeListener() {
}
public synchronized boolean isHierarchyChanged(RolapHierarchy hierarchy) {
// Since this function is called many times, it is a good idea to not check the database every time
// And use some sort of time interval...
// Get name of the table (does not work if based on view)
String tableName = getTableName(hierarchy);
Connection jdbcConnection = null;
DataSource dataSource =
hierarchy.getRolapSchema().getInternalConnection().getDataSource();
try {
jdbcConnection = dataSource.getConnection();
if (jdbcConnection != null) {
// Check database whether hierarchy data source has changed
// ...
}
}
}
public synchronized boolean isAggregationChanged(Aggregation aggregation) {
// The first time, register star and bitKey and remember first time of access...
RolapStar star = aggregation.getStar();
BitKey bitKey = aggregation.getConstrainedColumnsBitKey();
// The first time this function is called, only the bitKey is set,
// the columns are not filled up yet.
RolapStar.Column[] columns = aggregation.getColumns();
if (columns != null) {
// Check database...
}
}
}
6.9 Dynamic datasource xmla servlet
The DynamicDatasourceXmlaServlet extends DefaultXmlaServlet , adding the ability to dynamically load
the datasources.xml file. For every client request that it receives, it checks
for updates to the content of datasources.xml . It selectively
clears cache for catalogs that have changed or no longer exist in datasources.xml .
The servlet considers a
catalog as changed when either of its properties (DataSourceInfo, definition
properties on
DataSourcesConfig.Catalog) are different. It identifies catalog by name.
This servlet complements the dynamic catalog loading capability based on
UseContentChecksum.
It does not check the catalog content for updates. There is no overlap in the
functionality. Both together will give full dynamic datasource and
catalog configuration capability.
To use DynamicDatasourceXmlaServlet, change definition of the MondrianXmlaServlet
servlet in web.xml :
<servlet>
<servlet-name>MondrianXmlaServlet</servlet-name>
<servlet-class>mondrian.xmla.impl.DynamicDatasourceXmlaServlet</servlet-class>
...
</servlet>
This implementation has a limitation. It requires catalog name to be unique
across all datasources and may not work correctly otherwise.
7. Internationalization
An internationalized Mondrian application would have a schema for each language, where the
caption of each object appears in the local language. For example, the [Product]
dimension would have the caption "Product" in English and "Produit" in French.
It is unwise to translate the actual names of the schema objects, because then the MDX
statements would need to be changed also. All that you need to change is the caption. Every
schema object (schema, cube, virtual cube, dimension, hierarchy, level, measure,
named set) has a caption attribute, and user
interfaces such as JPivot and Pentaho Analyzer display the caption rather than the real name. Additionally:
- Every schema object has a description attribute.
- A hierarchy can have an
allMemberCaption attribute
as display value of the "All" member.
- For the schema we can set a display value of the "measures"
dimension by the
measuresCaption attribute.
- A calculated member has properties CAPTION and DESCRIPTION that appear
as caption and description if the member is a measure (that is, a member of
the
Measures dimension).
One way to create an internationalized application is to create a
copy of the schema file for each language, but these are difficult to
maintain. A better way is to use the
LocalizingDynamicSchemaProcessor class to perform dynamic
substitution on a single schema file.
7.1 Localizing schema processor
First, write your schema using variables as values for caption ,
description ,
allMemberCaption and measuresCaption attributes as follows:
< Schema measuresCaption="%{foodmart.measures.caption}">
< Dimension name="Store" caption="%{foodmart.dimension.store.caption}"
description="%{foodmart.dimension.store.description}">
< Hierarchy hasAll="true" allMemberName="All Stores" allMemberCaption="%{foodmart.dimension.store.allmember.caption =All Stores}" primaryKey="store_id"
caption="%{foodmart.hierarchy.store.country.caption}"
description="%{foodmart.hierararchy.store.country.description}>
< Level name="Store Country" column="store_country" uniqueMembers="true" caption="%{foodmart.dimension.store.country.caption}"
description="%{foodmart.dimension.store.country.description}"/>
< Level name="Store State" column="store_state" uniqueMembers="true" caption="%{foodmart.dimension.store.state.caption}"
description="%{foodmart.dimension.store.state.description}"/>
< Level name="Store City" column="store_city" uniqueMembers="false" caption="%{foodmart.dimension.store.city.caption}"
description="%{foodmart.dimension.store.city.description}"/>
< Level name="Store Name" column="store_name" uniqueMembers="true" caption="%{foodmart.dimension.store.name.caption}"
description="%{foodmart.dimension.store.name.description}">
< Property name="Store Type" column="store_type" caption="%{foodmart.dimension.store. name.property_type.caption}"
description="%{foodmart.dimension.store. name.property_type.description}"/>
< Property name="Store Manager" column="store_manager" caption="%{foodmart.dimension.store. name.property_manager.caption}"
description="%{foodmart.dimension.store. name.property_manager.description}"/>
< Property name="Store Sqft" column="store_sqft" type="Numeric" caption="%{foodmart.dimension.store. name.property_storesqft.caption}"
description="%{foodmart.dimension.store.
name.property_storesqft.description}"/>
< Property name="Grocery Sqft" column="grocery_sqft" type="Numeric"/>
< Property name="Frozen Sqft" column="frozen_sqft" type="Numeric"/>
< Property name="Meat Sqft" column="meat_sqft" type="Numeric"/>
< Property name="Has coffee bar" column="coffee_bar" type="Boolean"/>
< Property name="Street address" column="store_street_address" type="String"/>
< Cube name="Sales" caption="%{foodmart.cube.sales.caption}"
description="%{foodmart.cube.sales.description}">
...
< DimensionUsage name="Store" source="Store" foreignKey="store_id"
caption="%{foodmart.cube.sales.name.caption}"
description="%{foodmart.cube.sales.name.description}"/>
...
< Measure name="Unit Sales" column="unit_sales" caption="%{foodmart.cube.sales.measure.unitsales.caption}"
description="%{foodmart.cube.sales.measure.unitsales.description}"/>
As usual, the default caption for any cube, measure, dimension or
level without a caption attribute is the name of the
element. A hierarchy's default caption is the caption of its dimension;
for example, the [Store] hierarchy has no caption
defined, so it inherits the caption attribute from its
parent, the
[Store] dimension.
Next, add the dynamic schema processor and locale to your connect
string. For example,
Provider=mondrian; Locale=en_US;
DynamicSchemaProcessor=mondrian.i18n.LocalizingDynamicSchemaProcessor;
Jdbc=jdbc:odbc:MondrianFoodMart; Catalog=/WEB-INF/FoodMart.xml
Now, for each locale you wish to support, provide a resource file
named
locale_{locale}.properties . For example,
# locale.properties: Default resources
foodmart.measures.caption=Measures
foodmart.dimension.store.country.caption=Store Country
foodmart.dimension.store.name.property_type.column= store_type
foodmart.dimension.store.country.member.caption= store_country
foodmart.dimension.store.name.property_type.caption =Store Type
foodmart.dimension.store.name.caption =Store Name
foodmart.dimension.store.state.caption =Store State
foodmart.dimension.store.name.property_manager.caption =Store Manager
foodmart.dimension.store.name.property_storesqft.caption =Store Sq. Ft.
foodmart.dimension.store.allmember.caption =All Stores
foodmart.dimension.store.caption =Store
foodmart.cube.sales.caption =Sales
foodmart.dimension.store.city.caption =Store City
foodmart.cube.sales.measure.unitsales =Unit Sales
and
# locale_hu.properties: Resources for the 'hu' locale.
foodmart.measures.caption=Hungarian Measures
foodmart.dimension.store.country.caption=Orsz\u00E1g
foodmart.dimension.store.name.property_manager.caption =\u00C1ruh\u00E1z
vezet\u0151
foodmart.dimension.store.country.member.caption =store_country_caption_hu
foodmart.dimension.store.name.property_type.caption =Tipusa
foodmart.dimension.store.name.caption =Megnevez\u00E9s
foodmart.dimension.store.state.caption =\u00C1llam/Megye
foodmart.dimension.store.name.property_type.column =store_type_caption_hu
foodmart.dimension.store.name.property_storesqft.caption =M\u00E9ret
n.l\u00E1b
foodmart.dimension.store.allmember.caption =Minden \u00C1ruh\u00E1z
foodmart.dimension.store.caption =\u00C1ruh\u00E1z
foodmart.cube.sales.caption =Forgalom
foodmart.dimension.store.city.caption =V\u00E1ros
foodmart.cube.sales.measure.unitsales =Eladott db
8. Aggregate tables
Aggregate tables are a way to improve Mondrian's performance when
the fact table contains a huge number of rows: a million or more. An
aggregate table is essentially a pre-computed summary of the data in
the fact table.
Let's look at a simple aggregate table.
< Table name="sales_fact_1997">
< AggName name="agg_c_special_sales_fact_1997">
< AggMeasure name="[Measures].[Store Cost]" column="STORE_COST_SUM"/>
< AggMeasure name="[Measures].[Store Sales]" column="STORE_SALES_SUM"/>
< AggLevel name="[Product].[Product Family]" column="PRODUCT_FAMILY"/>
< AggLevel name="[Time].[Quarter]" column="TIME_QUARTER"/>
< AggLevel name="[Time].[Year]" column="TIME_YEAR"/>
< AggLevel name="[Time].[Quarter]" column="TIME_QUARTER"/>
< AggLevel name="[Time].[Month]" column="TIME_MONTH"/>
<!-- Rest of the cube definition -->
The <AggForeignKey>
element, not shown here, allows you to reference a dimension table
directly, without including its columns in the aggregate table. It is
described in the
aggregate
tables guide.
In practice, a cube which is based upon a very large fact table may
have several aggregate tables. It is inconvenient to declare each
aggregate table explicitly in the schema XML file, and luckily there is
a better way. In the following example, Mondrian locates aggregate
tables by pattern-matching.
< Table name="sales_fact_1997">
< AggMeasure name="[Measures].[Store Cost]" column="STORE_COST_SUM"/>
< AggMeasure name="[Measures].[Store Sales]" column="STORE_SALES_SUM"/>
< AggLevel name="[Product].[Product Family]" column="PRODUCT_FAMILY"/>
< AggLevel name="[Time].[Quarter]" column="TIME_QUARTER"/>
< AggLevel name="[Time].[Year]" column="TIME_YEAR"/>
< AggLevel name="[Time].[Quarter]" column="TIME_QUARTER"/>
< AggLevel name="[Time].[Month]" column="TIME_MONTH"/>
It tells Mondrian to treat all tables which match the pattern "agg_.*_sales_fact_1997"
as aggregate tables, except "agg_c_14_sales_fact_1997"
and
"agg_lc_100_sales_fact_1997" . Mondrian uses rules to deduce the
roles of the columns in those tables, so it's important to adhere to
strict naming conventions. The naming conventions are described in the
aggregate tables
guide.
The performance guide has advice on choosing
aggregate tables.
9. Access-control
OK, so now you've got all this great data, but you don't everyone to
be able to read all
of it. To solve this, you can define an access-control profile, called
a Role, as part of the
schema, and set this role when establishing a connection.
9.1 Defining a role
Roles are defined by <Role>
elements,
which occur as direct children of the <Schema>
element, after the last <Cube> .
Here is an example of a role:
< Role name="California manager">
< HierarchyGrant hierarchy="[Store]" access="custom" topLevel="[Store].[Store Country]">
< MemberGrant member="[Store].[USA].[CA]" access="all"/>
< MemberGrant member="[Store].[USA].[CA].[Los Angeles]" access="none"/>
< HierarchyGrant hierarchy="[Customers]" access="custom" topLevel="[Customers].[State Province]" bottomLevel="[Customers].[City]">
< MemberGrant member="[Customers].[USA].[CA]" access="all"/>
< MemberGrant member="[Customers].[USA].[CA].[Los Angeles]" access="none"/>
A <SchemaGrant>
defines the default access for
objects in a schema. The access attribute can be "all" or
"none";
this access can be overridden for specific objects. In this case,
because
access="none" , a user would only be able to browse the
"Sales" cube,
because it is explicitly granted.
A <CubeGrant>
defines the access to
a particular cube. As for <SchemaGrant> ,
the access attribute can be "all", "custom" or "none", and can
be overridden for specific sub-objects in the cube.
A <DimensionGrant>
defines access to a dimension. The access attribute can be "all", "custrom"
or "none". An access level of "all" means that all the child hierarchies
of the dimension will get inherited access. An access level of "custom" means
that the role does not get an inherent access to the child hierarchies,
unless the role is explicitely granted using a
<HierarchyGrant> element.
A <HierarchyGrant>
defines
access to a hierarchy. The access attribute can be "all", meaning all
members
are visible; "none", meaning the hierarchy's very existence is hidden
from the
user; and "custom". With custom access, you can use the topLevel
attribute to define the top level which is visible (preventing users
from seeing
too much of the 'big picture', such as viewing revenues rolled up to
the
Store Country level); or use the bottomLevel
attribute to
define the bottom level which is visible (here, preventing users from
invading
looking at individual customers' details); or control which sets of
members the
user can see, by defining nested <MemberGrant>
elements.
You can only define a <MemberGrant>
element if its enclosing <HierarchyGrant>
has access="custom" . Member grants give (or remove)
access to a
given member, and all of its children. Here are the rules:
- Members inherit access from their parents. If you deny
access to California, you won't be able to see San Francisco.
- Grants are order-dependent. If you grant access to USA,
then deny access to Oregon, then you won't be able to see Oregon, or
Portland. But if you were to deny access to Oregon, then grant access
to USA, you can effectively see everything.
- A member is visible if any of its children are visible.
Suppose you deny access to USA, then grant access to California. You
will be able to see USA, and California, but none of the other states.
The totals against USA will still reflect all states, however.
If the parent HierarchyGrant specifies a top level, only the parents
equal or below this level will be visible. Similarly, if a bottom level
is specified, only the children above or equal to the level are visible.
- Member grants don't override the hierarchy grant's top- and
bottom-levels. If you set
topLevel="[Store].[Store State]" ,
and grant access to California, you won't be able to see USA.
Member grants do not override the topLevel and bottomLevel attributes.
You can grant or deny access to a member of any level, but the top
and bottom constraints have precedence on the explicit member grants.
In the example, the user will have access to California, and all of
the
cities in California except Los Angeles. They will be able to see USA
(because
its child, California, is visible), but no other nations, and not All
Stores
(because it is above the top level, Store Country ).
9.2 Rollup policy
A rollup policy determines how mondrian computes a member's total
if the current role cannot see all of that member's children. Under the default
rollup policy, called 'full', the total for that member includes contributions
from the children that are not visible. For example, suppose that Fred belongs to a role
that can see [USA].[CA] and [USA].[OR] but not [USA].[WA] .
If Fred runs the query
SELECT {[Measures].[Unit Sales]} ON COLUMNS,
{[[Store].[USA], Store].[USA].Children} ON ROWS
FROM [Sales]
the query returns
[Customer] |
[Measures].[Unit Sales] |
[USA] |
266,773 |
[USA].[CA] |
74,748 |
[USA].[OR] |
67,659 |
Note that [USA].[WA] is not returned, per the access-control
policy, but the total includes the total from Washington (124,366) that
Fred cannot see. For some applications, this is not appropriate.
In particular, if the dimension has a small number of members,
the end-user may be able to deduce the values of the members
which they do not have access to.
To remedy this, a role can apply a different rollup policy to a hierarchy.
The policy describes how a total is calculated for a particular member if
the current role can only see some of that
member's children:
- Full. The total for that member includes all children. This is the
default policy if you don't specify the
rollupPolicy attribute.
- Partial. The total for that member includes only accessible children.
- Hidden. If any of the children are inaccessible, the total is hidden.
Under the 'partial' policy, the [USA] total is the sum of the
accessible children [CA]
and [OR] :
[Customer] |
[Measures].[Unit Sales] |
[USA] |
142,407 |
[USA].[CA] |
74,748 |
[USA].[OR] |
67,659 |
Under 'hidden' policy, the [USA] total is hidden because one of its children
is not accessible:
[Customer] |
[Measures].[Unit Sales] |
[USA] |
- |
[USA].[CA] |
74,748 |
[USA].[OR] |
67,659 |
The policy is specified per role and hierarchy. In the following example, the role sees partial totals for the
[Store] hierarchy but full totals for [Product] .
< Role name="South Pacific manager">
< HierarchyGrant hierarchy="[Store]" access="custom" rollupPolicy="partial" topLevel="[Store].[Store Country]">
< MemberGrant member="[Store].[USA].[CA]" access="all"/>
< MemberGrant member="[Store].[USA].[CA].[Los Angeles]" access="none"/>
< HierarchyGrant hierarchy="[Customers]" access="custom" rollupPolicy="full" topLevel="[Customers].[State Province]" bottomLevel="[Customers].[City]">
<MemberGrant member="[Customers].[USA].[CA]" access="all"/>
<MemberGrant member="[Customers].[USA].[CA].[Los Angeles]" access="none"/>
This example also shows existing features, such as how hierarchy grants can be restricted using
topLevel and/or bottomLevel attributes, and how a role
can be prevented from seeing a hierarchy using access="none".
9.3 Union roles
A union role combines several roles, and has the sum of their privileges.
A union role can see a particular schema object if one or more of its
constituent roles can see it. Similarly, the rollup policy of a union role with
respect to a particular hierarchy is the least restrictive of all of the roles'
rollup policies.
Here is an example showing the syntax of a union role.
< Role name="Coastal manager">
< RoleUsage roleName="Eastern sales manager"/>
The constituent roles "California manager" and "Eastern sales manager" may be
regular roles, user-defined roles or union roles, but they must be declared
earlier in the schema file. The "Coastal manager" role will be able to see any
member that or a "California manager" and "Eastern sales manager". It will be
able to see all the cells at the intersection of these members, plus it will be
able to see cells that neither role can see: for example, if only "California
manager" can see [USA].[CA].[Fresno] , and only "Eastern sales
manager" see the [Sales Target] measure, then "Coastal manager"
will be able to see the sales target for Fresno, which neither of the
constituent roles have access to.
9.4 Setting a connection's role
A role only has effect when it is associated with a connection. By
default,
connections have a role which gives them access to every
cube in that connection's schema.
Most databases associate roles (or 'groups') with users, and
automatically
assign them when users log in. However, Mondrian doesn't have the
notion of
users, so you have to establish the role in a different way. There are
two ways
of doing this:
- In the connect string. If you specify the
Role
keyword in the connect string, the connection will adopt that role. You can
specify multiple role names separated by commas, and a union role will be
created; if a role name contains a comma, escape it with an extra comma. See
class DriverManager
for examples of connect string syntax.
- Programmatically. Once your application has established a
connection, call the method
Connection.setRole(Role). You can create a Role programmatically
(see interface Role and
the developer's note link for more
details), or look one up using the method
Schema.lookupRole(String).
10. Appendix A: XML elements
Element |
Description |
|
|
<Schema> |
Collection of Cubes, Virtual cubes, Shared dimensions, and Roles. |
Logical elements |
<Cube> |
A collection of dimensions and measures, all centered on a fact table. |
<VirtualCube> |
A cube defined by combining the dimensions and measures of one or more cubes.
A measure originating from another cube can be a <CalculatedMember> . |
<CubeUsages> |
Base cubes that are imported into a virtual cube |
<CubeUsage> |
Usage of a base cube by a virtual cube. |
<VirtualCubeDimension> |
Usage of a dimension by a virtual cube. |
<VirtualCubeMeasure> |
Usage of a measure by a virtual cube. |
<Dimension> |
Dimension. |
<DimensionUsage> |
Usage of a shared dimension by a cube. |
<Hierarchy> |
Hierarchy. |
<Level> |
Level of a hierarchy. |
<KeyExpression> |
SQL expression used as key of the level, in lieu of a column. |
<NameExpression> |
SQL expression used to compute the name of a member, in lieu of Level.nameColumn . |
<CaptionExpression> |
SQL expression used to compute the caption of a member, in lieu ofLevel.captionColumn . |
<OrdinalExpression> |
SQL expression used to sort members of a level, in lieu of Level.ordinalColumn . |
<ParentExpression> |
SQL expression used to compute a measure, in lieu of Level.parentColumn . |
<Property> |
Member property. The definition is against a hierarchy or level, but the property will be available to all members. |
<PropertyExpression> |
SQL expression used to compute the value of a property, in lieu of Property.column . |
<Measure> |
Measure. |
<CalculatedMember> |
A member whose value is derived using a formula, defined as part of a cube. |
<NamedSet> |
A set whose value is derived using a formula, defined as part of a cube. |
Physical elements |
<Table> |
Fact or dimension table. |
<View> |
Defines a 'table' using a SQL query, which can have different variants for different underlying databases. |
<Join> |
Defines a 'table' by joining a set of queries. |
<InlineTable> |
Defines a table using an inline dataset. |
<Closure> |
Maps a parent-child hierarchy onto a closure table. |
Aggregate Tables |
<AggExclude> |
Exclude a candidate aggregate table by name or pattern matching. |
<AggName> |
Declares an aggregate table to be matched by name. |
<AggPattern> |
Declares a set of aggregate tables by regular expression pattern. |
<AggFactCount> |
Specifies name of the column in the candidate aggregate table which contains the number of fact table rows. |
<AggIgnoreColumn> |
Tells Mondrian to ignore a column in an aggregate table. |
<AggForeignKey> |
Maps foreign key in the fact table to a foreign key column in the candidate aggregate table. |
<AggMeasure> |
Maps a measure to a column in the candidate aggregate table. |
<AggLevel> |
Maps a level to a column in the candidate aggregate table. |
Access control |
<Role> |
An access-control profile. |
<SchemaGrant> |
A set of rights to a schema. |
<CubeGrant> |
A set of rights to a cube. |
<HierarchyGrant> |
A set of rights to a hierarchy and levels within that hierarchy. |
<MemberGrant> |
A set of rights to a member and its children. |
<Union> |
Definition of a set of rights as the union of a set of roles. |
<RoleUsage> |
A reference to a Role. |
Extensions |
<UserDefinedFunction> |
Declares a user-defined function. |
<CellFormatter> |
Cell formatter. |
<MemberFormatter> |
Member formatter. |
<PropertyFormatter> |
Property formatter. |
<Script> |
Script fragment to implement an SPI such as user-defined
function, member formatter, or cell formatter. |
Miscellaneous |
<Annotations> |
Holder for annotations. |
<Annotation> |
User-defined property attached to a metadata element. |
<Parameter> |
Part of the definition of a Hierarchy; passed to a MemberReader, if present. |
<CalculatedMemberProperty> |
Property of a calculated member. |
<Formula> |
Holds the formula text within a <NamedSet> or <CalculatedMember> . |
<ColumnDefs> |
Holder for <ColumnDef> elements. |
<ColumnDef> |
Definition of a column in an <InlineTable> dataset. |
<Rows> |
Holder for <Row> elements. |
<Row> |
Row in an <InlineTable> dataset. |
<Value> |
Value of a column in an <InlineTable> dataset. |
<MeasureExpression> |
SQL expression used to compute a measure, in lieu of Measure.column . |
<SQL> |
The SQL expression for a particular database dialect. |
Author: Julian Hyde; last modified May 2011.
Version: $Id$
(log)
Copyright (C) 2001-2005 Julian Hyde.
Copyright (C) 2005-2011 Pentaho and others
|