|
Mondrian Roadmap
Contents
- Introduction
- Purpose of this document
- Mondrian's goals
- Scope
- Sponsored
development and co-development
- Upcoming releases
- olap4j release 1.0
- Mondrian release 3.1
- Aggregation designer release x.x
- Schema workbench release x.x
- Feature list
- Partitioned cubes
- Cold start
- Rollup in cache
- Compound slicer
- Schema and query validation
- Name-resolution
- Standard functions
- Bridge to CWM
- User-defined aggregate functions
- Further work on aggregate tables
- Release history
- Release 3.0
- Release 2.4
- Release 2.3
- Release 2.2
- Release 2.1
- Release 2.0
- Release 1.1
- Release 1.0
- Release 0.6
- Release 0.5
- Release 0.4
- Release 0.3
1. Introduction
This is a list of features we propose to deliver in future releases of
Mondrian. Each feature is linked to a high-level description. Complex features
will have more detailed specifications in a separate document.
1.1 Purpose of this document
This document has several goals. First, it lets the Mondrian community know what
features we are thinking about implementing. There may be better ways of
delivering the same functionality that we haven't thought of.
Second, since there is always more work than time, it allows us to
prioritize. If we hear that a particular feature is important to a lot of
people, we will try to get to it sooner.
Third, it allows us to attract resources. If there are features in this
roadmap which are important to your organization, consider sponsoring Mondrian's
development.
1.2 Mondrian's goal
Mondrian's goal is to bring multidimensional analysis to the masses.
To do this it needs to be:
- free
- portable
- easy to install
- easy to integrate, and above all
- easy to understand
As an open-source olap server written in pure Java, we feel that it meets
these goals. We can't anticipate all of our customers' requirements, but
open-source combined with Java keeps Mondrian flexible. It's easy to add
functionality or to integrate third-party tools, and Mondrian be integrated into
a variety of environments.
Mondrian is part of the Pentaho Open Source BI Suite. Pentaho aims to deliver
the best possible user experience by integrating Mondrian with other open-source
components such as Kettle, Pentaho Reporting, and Weka. While building this
integration, Pentaho is committed to keeping Mondrian independent from other
components, and available under a commercial-friendly open-source license.
1.3 Scope
Mondrian can't do everything. If it did everything, it would be a huge
download, difficult to install, and even more difficult to integrate with other
software; and we'd never finish writing it. But the good news is, this is open
source. If a feature is missing, it's often easy to add the feature to Mondrian
or to integrate with another open-source product that provides the feature.
JPivot is Mondrian's sister project.
It provides an excellent user-interface, and shows off what Mondrian
can do. But we have been careful to keep the two projects separate. (You can use
another user-interface to Mondrian, and you can also use JPivot with other
data-sources.) If you've run Mondrian's demo and you have suggestions on how to
improve the web interface, please
make your
suggestion to the JPivot project directly.
1.4 Sponsored development and co-development
Pentaho encourages companies to sponsor development of features which are
important to them. Sponsorship allows Mondrian developers to spend more time to
spend more time adding features to Mondrian, rather than having to find other
ways to pay the rent. The results are always contributed back to the project as
open-source.
Another way companies can help Mondrian is to assign employees to co-develop
features. We can help specify and design these features, provided that the
resulting code is contributed to the project.
If your organization would like to sponsor development of features, please
contact Julian Hyde.
2. Upcoming releases
2.1 olap4j release 1.0
Targeted release timeframe: Q3 2008.
olap4j is a proposed standard API for access to any OLAP data source from
Java. See www.olap4j.org.
As of mondrian-3.0 olap4j is the primary API to mondrian; mondrian's driver
is based on olap4j-0.9.4 (beta). olap4j release 1.0 will be the first production
release of the olap4j specification. It will include a full Test Compatibility
Kit (TCK) and incorporate bug fixes & feedback from the drivers and applications
built using olap4j beta.
2.2 Mondrian Release 3.1
Targeted release timeframe: Q3 2008
Feature |
Effort |
Importance |
Remove support for old API |
low |
medium |
3.12 Bridge to CWM. Integration with Pentaho Metadata. Could be incubator project.
Note that someone has already implemented a bridge in one way. |
high |
high |
3.10 Further
work on Aggregate Tables. To support the aggregation designer,
mondrian release 3.1 will probably include utilities (2) DDL
generation and (3) Utility (maybe graphical, maybe text-based) to
recommend a set of aggregate tables. |
high |
high |
TBD |
|
|
2.3 Aggregation Designer Release x.x
Targeted release timeframe: Q2 2008
Effort: high, Importance: high, Priority: high
Release Highlights:
- Repeatable, reliable, semi-automated methodology for improving
ROLAP/HOLAP performance
- APIs and user interfaces that are suitable for use by developers,
consultants and Pentaho customers
2.4 Schema Workbench Release x.x (cube designer)
Targeted release timeframe – not specified
Effort: high, Importance: high, Priority: high
Release Highlights:
- User Interface suitable for consultants, developers and customers to
design and maintain Mondrian Schemas
- User Interface to support all Mondrian Schema tags and be maintained in
lock-step with Mondrian server going forward
3. Feature list
3.1 Partitioned cubes
Effort: medium; importance: medium; priority: medium.
Whereas a regular cube has a single fact table, a partitioned cube has
several fact tables, which are unioned together. The fact tables must have the
same column names.
Each fact table can have a range (similar to 'cache ranges',
above) which describes what data
ranges are found in each. When looking for a particular cell, Mondrian scans the
tables' criteria to determine which table to look in. For example, T1 holds data
for Texas, 2005 onwards; T2 holds data for 2004 onwards; T3 holds all other
data. The cell (Oklahoma, January 2005) would be found in T2.
Partitioned tables are useful for real-time analysis. For example, one
partition might contain today's data, while another might hold historical data.
The 'hot' partition with today's data would typically have fewer or no
aggregation tables and have caching disabled; its fact table might have
different physical options in the RDBMS, say fewer indexes to maximize insert
performance.
Example schema:
<Cube name="Sales">
<Partitions>
<Partition name="partition1"
cache="false">
<Table name="sales_fact_this_month"/>
<Ranges>
<Range dimension="[Time]">
<RangeMember bound="lower" member="[Time].[2005].[9]"/>
</Range>
<Range dimension="[Store]">
<RangeMember member="[Store].[USA].[CA]"/>
<RangeMember member="[Store].[USA].[WA].[Seattle]"/>
</Range>
</Ranges>
</Partition>
<Partition name="partition2"
cache="true">
<Table name="sales_fact"/>
<Ranges/>
</Partition>
</Partitions>
</Cube>
3.2 Cold start
Effort: medium; importance: medium; priority: low.
When Mondrian initializes and starts to process the first queries, it makes
sql calls to get member lists and determine cardinality, and then to load
segments into the cache. When Mondrian is closed and restarted, it has to do
that work again. This can be a significant chunk of time depending on the cube
size. For example in one test an 8GB cube (55M row fact table) took 15 minutes
(mostly doing a group by) before it returned results from its first query, and
absent any caching on the database server would take another 15 minutes if you
closed it and reopened the application. Now, this cube was just one month of
data; imagine the time if there was 5 years worth.
What ideas and designs can you come up with to speed that up, in other words to
do anything time consuming only once and reuse it between instances?
Gang Chen: If it's possible, can we calculate the real levels of a
parent-child hierarchy? This'll let Mondrian's metadata close to MS
AS's.
Julian Hyde: Can you give me more details on how that would work?
Start a discussion forum or feature request on SourceForge.
Other options for cold start:
- Command for mondrian to serialize cache state (definitions and data) to
disk. When mondrian starts, read the cache state from disk.
- Command for mondrian to serialize cache definitions to disk. When
mondrian starts, reads cache definitions from disk, and cache contents from
DBMS.
- User writes a script of MDX commands to prime the cache. On startup,
mondrian executes this script in a background thread.
3.3 Rollup in cache
Effort: medium; importance: medium; priority: low.
If the cache contains aggregates for all children of a member, then Mondrian
would be able to compute the aggregate for the parent member by rolling up.
See the
email thread "grouper in Mondrian".
3.4 Compound slicer
Effort: medium; importance: low; priority: low.
3.5 Schema and query validation
Process to validate a schema.
Process to validate a set of queries. Maybe an option to ignore errors due to
specific members not existing because the data hasn't been loaded yet.
Expose validation via Eclipse plugin.
3.6 Name-resolution
Mondrian's name resolution is not always compatible with other MDX
implementations such as MSAS and SAS.
- Support abbreviated member names. For example, e.g.
[Products].[Boston Lager] seems to be valid in MSAS if product names are unique, whereas Mondrian
currently requires [Products].[Beverages].[Beer].[Samual Adams].[Boston Lager] .
- Change scheme for generating unique names, omitting the 'all' member
name; current
[Customers].[(All customers)].[USA] would become
[Customers].[USA] . Mondrian would still understand names of the
previous form.
3.7 Standard functions
Implement standard MDX functions:
- DrilldownMemberBottom(<Set1>, <Set2>, <Count>[, [<Numeric Expression>][,
RECURSIVE]])
- DrilldownMemberTop(<Set1>, <Set2>, <Count>[, [<Numeric Expression>][,
RECURSIVE]])
- DrillupLevel(<Set>[, <Level>])
- DrillupMember(<Set1>, <Set2>)
- Except(<Set1>, <Set2>[, ALL]). (
Except is implemented in Mondrian 1.2 except the ALL
keyword.)
- SetToArray(<Set>[, <Set>]...[, <Numeric Expression>])
3.8 Bridge to CWM
CWM (Common Warehouse Model) is a standard model for defining data warehouse
and multidimensional schemas. It allows interoperability with tools such as UML
diagrams, relational report design tools, and ETL tools.
This feature will add:
- A gateway to present a Mondrian schema via the CWM API.
- A bridge to read a CWM schema and create a Mondrian schema from it.
3.9 User-defined aggregate functions
The standard aggregate functions are sum, count, distinct-count, min, max and
avg. This feature will provide an SPI by which application developers can write
their own aggregate functions.
The SPI will include:
- the name of the aggregate function;
- parameter types;
- return types;
- a means to generate SQL expression to compute the aggregate from
unaggregated fact table data. (For the "count" function applied to the "unit_sales"
column, this would generate "count(unit_sales)".)
- a means to generate SQL expression to compute the aggregate by rolling
up partially aggregated data. (For the "count" function applied to the "unit_sales"
column, this would generate "sum(unit_sales)". Some aggregates, such as
"distinct-count", do not support rollup.)
- a means to roll up values in memory. Some aggregates, such as
"distinct-count", do not support this.
The SPI will support functions which map to a SQL expression rather than a
SQL aggregate function. The "avg" function is an example of this: it works by
expanding itself to sum / count.
The SPI will support functions which can be computed from unaggregate fact
table data, but cannot be rolled up. The "distinct-count" function is an example
of this.
You will be able to include user-defined aggregate functions in aggregate
tables.
3.10 Further work on aggregate tables
1. Data population
Utility to populate (or generate INSERT statements to populate) the agg
tables. (For extra credit: populate the tables in topological order, so that
higher level aggregations can be built from lower level aggregations.)
2. DDL generation
Utility to generate a script containing CREATE TABLE and CREATE INDEX
statements all possible aggregate tables (including indexes), XML for these
tables, and comments indicating the estimated number of rows in these
tables. Clearly this will be a huge script, and it would be ridiculous to
create all of these tables. The person designing the schema could copy/paste
from this file to create their own schema.
3. Utility (maybe graphical, maybe text-based) to recommend a set of
aggregate tables
This is essentially an optimization algorithm, and it is
described in the academic literature. Constraints on the optimization
process are the amount of storage required, the estimated time to populate
the agg tables. The algorithm could also take into account usage
information.
4. Allow aggregate tables to be taken offline/online while Mondrian is still
running
I'm thinking of these being utilities, not part of the core runtime engine.
There's plenty of room to wrap these utilities in nice graphical interfaces,
make them smarter.
4. Release history
4.1 Release 3.0 (2008/3/22)
- olap4j API. olap4j (http://www.olap4j.org)
is the Open Java API for OLAP. From mondrian-3.0 onwards, olap4j is the main
API for connecting to mondrian, browsing metadata and executing queries.
Mondrian's previous API (classes in the mondrian.olap package)
still exists but is deprecated; from mondrian-3.1 onwards, classes and
methods in this API may not exist, may not work, or may change.
- Rollup policy controls how a cell's value is calculated if some
of its children are hidden by access-control. Before mondrian-3.0 the
only policy was 'full': if access to a
hierarchy was restricted, the value of a member would be equal to the sum of its
children; from mondrian-3.0, we also allow 'partial' (the sum is the sum of
the visible children) or 'hidden' (the cell's value is unknown if any of the
children are hidden). The policy is expressed by the
rollupPolicy attribute of the <HierarchyGrant>
element.
- Aggregate roles. You can now define a role in the schema that has
the sum of the privileges of two or more roles; and you can connect to
mondrian with one or more roles. This facility enables closer integration
with Pentaho
access-control, where a user can already exist in multiple roles.
- Allow distinct-count measures to be aggregated. For example,
mondrian can now compute the number of distinct customers who bought beer or
diapers in Q2 or Q3. For efficiency, cell values are loaded in batches and a
special cache allows aggregate cell values to be reused between queries.
- Improved dimension sharing. Allow a shared dimension to be used
more than once within the same cube.
- Virtual cube enhancements. When a cube that uses the same
dimension twice is involved in a virtual cube, disambiguate which usage of
the dimension is involved. Allow the virtual cube to use the same cube more
than once.
- Scalar functions. Many scalar functions have been added in
mondrian-3.0, to the the specification of the Visual Basic for Applications
(VBA) and Excel libraries that are available by default in Microsoft SQL
Server Analysis
Services (SSAS) and that many MDX users assume are part of the core MDX
language.
New functions: Abs, Acosh, Asc, AscB, AscW, Asin, Asinh, Atan2, Atanh, Atn,
Cache, CBool, CByte, CDate, CDbl, Chr, ChrB, ChrW, CInt, Cos, Cosh, Date,
DateAdd, DateDiff, DatePart, DateSerial, DateValue, Day, DDB, Degrees,
DrilldownLevel, DrilldownLevelBottom, DrilldownLevelTop, Exp, Fix,
FormatCurrency, FormatDateTime, FormatNumber, FormatPercent, FV, Hex, Hour,
InStrRev, Int, IPmt, IRR, IsDate, LCase, Log, Log10, LTrim, Minute, MIRR,
Month, MonthName, Now, NPer, NPV, Oct, Percentile, Pi, Pmt, Power, PPmt, PV,
Radians, Rate, Replace, Right, Round, RTrim, Second, Sgn, Sin, Sinh, SLN,
Space, Sqr, SqrtPi, Str, StrComp, String, StrReverse, SYD, Tan, Tanh, Time,
Timer, TimeSerial, TimeValue, Trim, TypeName, Val, Weekday, WeekdayName,
Year.
We have added additional forms to existing functions: Descendants(<Member>, , LEAVES) ;
Format can now be
applied to DateTime values; Iif can be applied to member,
level, hierarchy, dimension and tuple and set values; Levels
can be applied to a string expression.
- JNDI in connect string. JDBC data sources can be specified by
their JNDI name.
API changes in release 3.0
Removed methods that were deprecated in 2.4,
plus:
- MondrianServer.flushSchemaCache()
- MondrianServer.flushDataCache()
- DriverManager.getConnection(String, CatalogLocator, boolean)
- DriverManager.getConnection(Util.PropertyList, boolean)
- DriverManager.getConnection(Util.PropertyList, CatalogLocator, boolean)
- DriverManager.getConnection(Util.PropertyList, CatalogLocator,
DataSource, boolean)
- RolapMember.getSqlKey()
- MondrianProperties.CachePoolCostLimit (property "mondrian.rolap.CachePool.costLimit")
- MondrianProperties.FlushAfterQuery (property "mondrian.rolap.RolapResult.flushAfterEachQuery")
4.2 Release 2.4 (2007/08/31)
- Aggregate distinct-count measures. Mondrian now computes distinct-count measures properly over a range of selections (for
example, show me a count of all new Customers from January through July).
- Generate SQL with
GROUPING SETS SQL construct, for
databases which support it. By leveraging Grouping Sets, Mondrian can reduce
the number of SQL queries necessary to fulfill an MDX request, and databases
can often execute the combined queries more efficiently than the individual
queries. Grouping Sets are currently supported in Oracle, DB2, Teradata and
Microsoft SQL Server.
- New MDX functions
Extract(<Set>, <Dimension>[, <Dimension>...]) ,
Generate , Iif(bool, bool, bool) , Len ,
Left , Mid , UCase .
- Support for Apache Commons
Virtual File System (VFS) URLs.
- Support keys in members, e.g.
[Products].&[1234] .
API changes in release 2.4
DynamicSchemaProcessor . Moved the
mondrian.rolap.DynamicSchemaProcessor interface to package
mondrian.spi . The processSchema(URL, PropertyList)
method now has signature processSchema(String, PropertyList) ,
and the URL is intended to be interpreted as an Apache VFS URL. Class
mondrian.spi.impl.FilterDynamicSchemaProcessor is a partial
implementation.
- Various methods which used
String or String[]
to lookup multi-part identifiers such as '[Store].[USA].[CA] '
now take Id.Segment or List<Id.Segment> . The
previous methods are deprecated and will be removed in mondrian-3.0 (see
below).
Deprecated methods to lookup multi-part identifiers which are deprecated
in mondrian-2.4 and will be removed in mondrian-3.0:
Formula.Formula(String[], exp)
Formula.Formula(String[], Exp, MemberProperty[])
QueryPart.addFormula(String[], Exp, MemberProperty[])
SchemaReader.lookupCompound(OlapElement, String[], boolean,
int)
SchemaReader.getMemberByUniqueName(String[], boolean)
SchemaReader.getMemberByUniqueName(String[], boolean,
MatchType)
Util.explode(String)
Util.lookupCompound(SchemaReader, OlapElement, String[],
boolean, int)
Util.lookup(Query, String[])
Other deprecated methods to be removed mondrian-3.0:
Query.getQueryString()
QueryPart.toMdx()
RolapSchema.flushSchema(String, String, String, String)
RolapSchema.flushSchema(String, DataSource)
RolapSchema.clearCache()
RolapSchema.flushRolapStarCaches(boolean)
RolapSchema.flushAllRolapStarCachedAggregations()
CachePool.flush()
4.3 Release 2.3 (2007/03/12)
- Cache control API.
- More efficient evaluation of queries which return large results. To achieve
this, some MDX functions now have multiple implementations, and can return their
results as iterators in addition to the usual list format.
- More control over queries which run for long periods of time, return
large numbers of members or cells, or which use excessive amounts of memory.
Under such conditions, queries throw particular a ResultLimitExceeded
exception.
- JDK 1.5 is now the primary development and delivery platform. You can
continue to run mondrian on JDK 1.4 using the provided
backwards-compatibility JARs mondrian-jdk14.jar and retroweaver-rt-1.2.4.jar
created by retroweaver.
- Added support for Ingres and
LucidDB
- JOLAP (JSR-069) support removed.
API changes which may impact existing applications:
- Rename ResultLimitExceeded to ResultLimitExceededException;
- Remove packages javax.olap, mondrian.jolap, org.omg.java.cwm;
- In mondrian.olap.Axis, change 'Position[] getPositions()' to
'List<Position> getPositions()';
- In mondrian.olap.Position, replace data member 'Member[] members' with
methods 'Member get(int ordinal)' and 'int size()' (both inherited from
List<Member>).
4.4 Release 2.2 (2006/10/??)
- Mondrian-2.2 implements a host of new functions and operators:
In, Matches, Cast, ValidMeasure, CurrentDateMember, CurrentDateString. Also
the NULL literal.
- Parameters. Formerly you could only specify parameters in a
query. Now they can also be specified at system, schema or session level.
Since parameters can be specified using an MDX expression, this is a great
way to define constants and calculations in just one place, and share use
them throughout your application.
- Query timeout and cancel. We have added timeout and a cancel
facility to deal with long-running queries.
- There's now the ability to flush the schema cache. See
mondrian.olap.MondrianServer for more details.
- Internationalization just got a lot easier. Mondrian now supports
a 'Locale' parameter to the connect string. Formatting information comes
from Java rather than from MondrianResource.properties, which means that
Mondrian should work out of the box for any locale Java supports.
- Performance improvements. The Level.approxRowCount schema
attribute saves mondrian the effort of executing queries to count levels
solely for XML/A's purposes. There are also performance improvments in the
LastNonEmpty function, and crossjoin can be evaluated in SQL even for
virtual cubes.
- Lastly, we moved mondrian's website to
https://mondrian.pentaho.com. Same
content as before, but better formatted, and more integrated with the rest
of the Pentaho family of projects.
4.5 Release 2.1 (2006/04/01)
- Finally, a separate distribution
mondrian-*-embedded.zip ,
including an embedded Derby database in the WAR. This can be deployed to
Tomcat on any platform by simply exploding the WAR into TOMCAT/webapps,
allowing folks "kicking the tires" to easily try out Mondrian/JPivot. See
how to deploy and run the embedded web app.
- XML/A bug fixes, functionality and test suite improvements.
- Compilation of MDX expressions. This is an architectural change to allow
Mondrian to analyze queries at the start of execution, and trade off various
techniques such as expression-caching and pushing predicates into the
generated SQL. It involves some API changes (see
below).
- Allow distinct-count measures to be rolled up over attributes which are
functionally dependent on the key of the measure (e.g. "gender" is
functionally dependent on the key "customer_id" of the measure "Customer
Count"). This yields performance improvements when using distinct
count-aggregates.
- Improved integration of User-Defined Functions.
- Implemented
VisualTotals , LastPeriods ,
AddCalculatedMembers , StripCalculatedMembers MDX
functions.
- Support for comments in MDX (
/* ... */ , --
[rest of line], // [rest of line]).
- Includes recent, compatible version of JPivot.
- Interbase 6 support.
- Many bug fixes and extensions to the test suite.
- Documentation improvements.
4.5.1 API changes in release 2.1
- FunCall and UnresolvedFunCall. It used to be possible to create a
FunCall with the name of a function but no function definition.
This complicated the validation process, because we would discover at runtime
that a function call had no definition. Now you should use the new class
UnresolvedFunCall .
- Category methods. Renaming a few of the methods concerning types
and categories.
Exp.getType() used to return int , now returns
Type
- Old usages of
Exp.getType() should use
Exp.getCategory()
int[] FunDef.getParameterTypes() is renamed to int[]
FunCall.getParameterCategories()
int FunCall.getReturnType() is renamed to int
FunCall.getReturnCategory()
- Removed the
Exp.getTypeX() method; old usages of this
method should now use Exp.getType() .
- OLAP element types. OLAP elements
Cube ,
Dimension , Hierarchy , Level and Member
no longer implement the Exp interface. If you want to use these
in expressions, there are wrapper classes: DimensionExpr ,
HierarchyExpr , LevelExpr , MemberExpr . These
are in a new package, mondrian.mdx . Some other parse tree classes
(Query , Literal ) will move to this package at some
time in the future.
4.6 Release 2.0 (2005/12/19)
- Aggregate tables.
- Calculated sets defined in the schema, and
WITH SET syntax to
define sets within an MDX query.
- Cached set expressions. The
WITH SET feature and functions such as
RANK cause the same expression to
be evaluated many times within the course of a single MDX statement. The set-expression cache
improves the performance of such queries.
- User-defined functions.
- Enhanced support for parent-child hierarchies.
- Enhanced XML for Analysis
(XML/A) support.
- Enhanced support for internationalized/localized (I18N/L10N) applications.
- Pushdown SQL. To improve performance, Mondrian automatically translates
filters and aggregations into SQL which can be executed on the underlying
RDBMS.
- Support for the Apache
Derby pure Java embedded RDBMS.
4.7 Release 1.1 (2005/04/06)
- Numerous improvements in functionality, performance, and stability.
4.8 Release 1.0 (2003/08/18)
- First production release.
- Distinct-count aggregations.
- JDBC connection-pooling.
- Support for XML for Analysis (XML/A).
4.9 Release 0.6 (2003/05/24)
- Parent-child hierarchies.
- Partial support for XML for Analysis (XML/A).
4.10 Release 0.5 (2003/02/20)
- Partial support for JOLAP.
- Implement
Hierarchize , ":", Aggregate , and
statistical functions.
4.11 Release 0.4 (2002/11/10)
- Integration with JPivot.
- Improved thread-safety.
4.12 Release 0.3 (2002/08/09)
- First public release.
- JSP page generates a static table from an MDX query.
- Support for several SQL dialects.
Author: Julian Hyde; last modified February 2008.
Version: $Id$
(log)
Copyright (C) 2002-2005 Julian Hyde
Copyright (C) 2005-2009 Pentaho and others
|