Pentaho Home

Mondrian Documentation

Mondrian Components

Introduction

See OLAP and architecture.

Components

to be written...

Caching 

The various subsystems of mondrian have different memory requirements. Some of them require a fixed amount of memory to do their work, whereas others can exploit extra memory to increase their performance. This is an overview of how the various subsystems use memory.

Caching is a scheme whereby a component uses extra memory when it is available in order to boost its performance, and when times are hard, it releases memory with loss of performance but with no loss of correctness. A cache  is the use of extra memory when times are good, use varying amounts of memory.

Garbage collection is carried out by the Java VM to reclaim objects which are unreachable from 'live' objects. A special construct called a soft reference allows objects to be garbage-collected in hard times.

The garbage collector is not very discriminating in what it chooses to throw out, so mondrian has its own caching strategy. There are several caches in the system (described below), but they all of the objects in these caches are registered in the singleton instance of class mondrian.rolap.CachePool (currently there is just a single instance). The cache pool doesn't actually store the objects, but handles all of the events related to their life cycle in a cache. It weighs objects' cost (some function involving their size in bytes and their usefulness, which is based upon how recently they were used) and their benefit (the effort it would take to re-compute them).

The cache pool is not infallible — in particular, it can not adapt to conditions where memory is in short supply — so uses soft references, so that the garbage collector can overrule its wisdom.

Cached objects must obey the following contract:

  1. They must implement interface mondrian.rolap.CachePool.Cacheable, which includes methods to measure objects' cost, benefit, record each time they are used, and tell them to remove themselves from their cache.

  2. They must call CachePool.register(Cacheable) either in their constructor or, in any case, before they are made visible in their cache.
  3. They they must call CachePool.unregister(Cacheable) when they are removed from their cache and in their finalize() method.
  4. They must be despensable: if they disappear, their subsystem will continue to work correctly, albeit slower. A subsystem can declare an object to be temporarily indispensable by calling CachePool.pin(Cacheable, Collection) and then unpin it a short time later.
  5. Their cache must reference them via soft references, so that they are available for garbage collection.

  6. Thread safety. Their cache must be thread-safe.

If a cached object takes a significant time to initialize, it may not be possible to construct it, register it, and initialize it within the same synchronized section without unnacceptably reducing concurrency. If this is the case, you should use phased construction. First construct and register the object, but mark it 'under construction'. Then release the lock on the CachePool and the object's cache, and continue initializing the object. Other threads will be able to see the object, and should be able to wait until the object is constructed. The method Segment.waitUntilLoaded() is an example of this.

The following objects are cached.

1. Segment 

A Segment (class mondrian.rolap.agg.Segment) is a collection of cell values parameterized by a measure, and a set of (column, value) pairs. An example of a segment is

(Unit sales, Gender = 'F', State in {'CA','OR'}, Marital Status = anything)

All segments over the same set of columns belong to an Aggregation, in this case

('Sales' Star, Gender, State, Marital Status)

Note that different measures (in the same Star) occupy the same Aggregation. Aggregations belong to the AggregationManager, a singleton.

Segments are pinned during the evaluation of a single MDX query. The query evaluates the expressions twice. The first pass, it finds which cell values it needs, pins the segments containing the ones which are already present (one pin-count for each cell value used), and builds a cell request (class mondrian.rolap.agg.CellRequest) for those which are not present. It executes the cell request to bring the required cell values into the cache, again, pinned. Then it evalutes the query a second time, knowing that all cell values are available. Finally, it releases the pins.

2. Member set 

A member set (class mondrian.rolap.SmartMemberReader.ChildrenList) is a set of children of a particular member. It belongs to a member reader (class mondrian.rolap.SmartMemberReader).

3. Schema 

Schemas (class mondrian.rolap.RolapSchema) are cached in class mondrian.rolap.RolapSchema.Pool, which is a singleton (todo: use soft references). The cache key is the URL which the schema was loaded from.

4. Star schemas 

Star schemas (class mondrian.rolap.RolapStar) are stored in the static member RolapStar.stars (todo: use soft references), and accessed via RolapStar.getOrCreateStar(RolapSchema, MondrianDef.Relation).




Author: Julian Hyde; last modified August 2006.
Version: $Id$ (log)
Copyright (C) 2002-2005 Julian Hyde
Copyright (C) 2005-2006 Pentaho