Last updated at Wed, 26 Jul 2017 23:16:29 GMT

The Data Warehouse Export recently added support for a Dimensional Model for its export schema. This provides a much more comprehensive, accessible, and scalable model of data than the previous (now referred to as "Legacy") model. The foundation for this dimensional model is the same as the Reporting Data Model, which backs the built-in reporting for SQL Query Export. So what exactly is the difference between the Reporting Data Model and the Dimensional Data Warehouse?

What Are the Differences?

The schema for the Data Warehouse and Reporting Data Model are very similar in their structure. Both use a dimensional model, and numerous facts and dimensions are identical. Many queries written against the Reporting Data Model can natively run against the Data Warehouse with little to no changes. The key differences are highlighted below.

Date-based periodic facts vs Scan-based transactional facts

Perhaps the most important change in the schema is the elimination of scan-based transactional facts. The Reporting Data Model supports transaction-based facts like fact_asset_scan_vulnerability. These allow you to run queries for "scan-scope" reports. However, as the warehouse is designed to export asset data over time, the cumulative state of an asset is exported, not just what was found in one specific scan. Most queries in the Reporting Data Model that use scans are using the scan to determine the date to report on.

Alternatively, the Data Warehouse supports periodic snapshot fact tables like fact_asset_date. So, instead of narrowing down the state of an asset based on a scan for a particular date, the date itself is used immediately to access the state of the asset. For example, if a query is attempting to compare the current state of the asset against three months ago, the fact_asset_date table is used with the desired dates. Read the following section for more details.

To drill down into scan events that have taken place on an asset, the new fact table fact_asset_event is available. This table provides a record of every change to an asset that has taken place, including scans, vulnerability exceptions, etc. The typecolumn has multiple values, one of which can be SCAN. When this is the case, the scan_id column will be populated with a value you can join against the dim_scan dimension, to access the precise scan details.

Discrete Date Exporting

The state of asset data is recorded in the Data Warehouse using an ETL process that runs periodically based on the configured schedule settings. This means that the warehouse does not necessarily have data for assets on each day, particularly if the schedule is weekly or monthly. In those circumstances, if you ask for the data for an asset on a particular date, it may not have data for that date. However, to simplify looking up the last date an ETL job ran prior to or on the date you are looking for, there are two built-in functions that can be used, periodAfter(DATE) and periodBefore(DATE). These functions access the information within the periods table (a junk dimension).

For example, if the ETL is configured to export every 1 week, starting 12/7, then there will be asset data for 12/7, 12/14, 12/21, 12/28 and so on. If a report was to run on 12/15, then there would be no data if using the current date in the query. Instead, you can use periodBefore(CURRENT_DATE), and the date of 12/14 will be resolved instead of 12/15. Future examples of trend-based queries and reports will be described in much more details, so keep an eye on the Reporting Space feed.

One-to-One Flattening

A few tables have been further flattened to avoid joining one-to-one relationships. For example, the dim_asset table has an operating_system_id column in the Reporting Data Model that is used to join the dim_operating_system table. This is a one-to-one relationship, therefore it has been flatted in the Data Warehouse Export to just be fields within the dim_asset table itself (e.g. os_description, etc). This denormalization technique has been applied throughout the model.

Enterprise-level Scale

The Reporting Data Model is a dynamic, runtime schema that applies permissions, role-based access control, and filters from the report configuration specified. This makes the SQL Query Export an incredibly simple to use and flexible reporting option. However, the reports generate on the Security Console simultaneous to the other reports, and running scans.

In contrast, the Data Warehouse exports data into a standalone database instance tuned specifically for read-heavy activity. The dimensional model is fully materialized, optimized, and indexed for fast lookup, aggregation, joins, etc. The ETL job also precomputes several levels of rollup in advance to further optimize queries against high levels of grain (e.g. site, tag, etc).

Which Should I Use? And When?

To help you pick which one is right for you, read Vulnerability Assessment Reports in Nexpose: The Right Tool for the Right Job for more information.

What If I Get Stuck?

Open a discussion or ask the community for help.