Financial institutions need integrated and optimized processes, data, and technology to keep up with the complex regulatory environment and increasingly competitive landscape. A vital component of that ecosystem is an investment data warehouse.
An investment data warehouse (IDW) provides financial institutions with a central repository of integrated, cleansed, and standardized operational and external data for business intelligence, decision support, and legal or regulatory compliance.
This provides easier data access and flexible analytics and drives investment reporting quality, transparency, and adherence to statutory and regulatory requirements.
I pulled together this list of best practices based on my experience working with dozens of clients to enhance their data management and analytics capabilities. This list will help you create an IDW that delivers accurate, consistent, and reliable data and enables better decision-making.
1. An Investment data warehouse should be source system agnostic
Design decisions should be based on data modeling best practices and not influenced by upstream portfolio management, accounting, or other source systems. This includes everything from (1) standard and intuitive naming conventions for database objects to (2) defined primary and foreign keys that preserve relational integrity to (3) avoidance of redundant column names across tables which creates ambiguity and confusion for downstream analysis and consumption. Following these best practices ultimately provides independence from any single vendor and offers flexibility to add or remove data sources as the business or vendor stack evolves. Further, this independent approach allows firms to “play defense” with their vendors by forcing a thoughtful analysis around the expectation of incoming data sets and giving firms a basis to evaluate and identify bad data before it reaches the data warehouse.
2. A data warehouse should store a single version of the truth
This is often referred to as “gold copy” or “gold source” data and is crucial for foundational data domains such as legal entities, issuers, deals, clients, accounts, portfolios, and securities.
With the many software solutions and systems that are required to run an investment operation, organizations end up with multiple sources of the same data elements, which often include duplicated and inconsistent data. This is a common challenge that results in IT and business organizations investing countless hours to consolidate, standardize, and reconcile this data to get a complete, accurate view of these various data domains. When firms lack a single version of enterprise data, downstream consumers lack confidence in data quality and firms risk reporting inconsistent metrics to clients, investors, and auditors which leads to financial, legal, and reputational risk.
Note: If multiple versions of data are required for a specific business case, each version should be properly labeled with sufficient metadata so the source or methodology is evident to end-users. For example, there are multiple methodologies for calculating performance returns, including simple return, time-weighted return, and dollar-weighted return, plus a differentiation between a gross return and a net return for each methodology. There is no “right” answer for which of these methodologies to use because it depends on the context in which it is being reported. As such, all are viable options, but it should be evident to the user which method is used so they can select the appropriate data for their business case.
3. A data warehouse should have separate data ingestion and reporting layers within the overall data architecture
Both layers provide flexibility by creating a level of separation between raw data, warehouse data, and reporting data.
- The ingestion layer provides flexibility to add or remove upstream data sources and serves as the transformation layer to perform cleansing functions that master, standardize, and prepare gold copy data.
- The reporting mart creates a buffer between the IDW and presentation layers, such as Business Intelligence (BI) tools and/or a User Interface (UI). This also provides flexibility to make architectural changes to the core DW without disrupting downstream reports and eliminates the need for report developers to understand the underlying data warehouse table structures.
When firms combine any of the three database layers above, the data architecture becomes rigid and more difficult to adapt because changes to one layer can require changes to all layers due to the embedded dependencies.
- Stage Data: Raw data is stored and prepared for gold copy by applying transformation rules, source hierarchy, and resolution to master identifiers across target IDW tables.
- Data Warehouse: normalized database storing gold copy enterprise data to ensure a single version of the truth and relational integrity across data domains.
- Reporting Data: De-normalized reporting views and calculations with centralized logic that is shared by downstream report developers; ensuring consistent methodologies are used.
4. A cross-functional team should govern the data warehouse
5. Data should be monitored for quality
Based on defined data governance policies, all critical data elements should have assigned data owners or stewards that are responsible for monitoring data to ensure timeliness, quality, and compliance with enterprise policies. Many firms adopt separate Data Quality Management solutions that allow them to configure business rules to detect data issues based on reconciliations to alternate sources, period over period change analysis, outlier checks versus a threshold or benchmark, verification to ensure values match a valid list, or simple missing value validations. All business rules for quality controls and business owners should be documented to ensure accountability and enable data quality metrics that facilitate prioritization of initiatives aimed at data quality improvement.
6. Source system identifiers should not be used as a primary key to join tables in a relational data warehouse
Source system identifiers can and should be stored as referential identifying information, but more importantly, they should be used to cross-reference entities from disparate source systems. This allows for mastering of records across the foundational data domains mentioned previously. Using source system identifiers as a primary key will handcuff you to a vendor. This is a common mistake that firms regret when they need to replace a system, or integrate a new data source because they lack a mechanism to join data sets from disparate sources that lack a common identifier.
7. Data source hierarchies should be created for each data field
Given the growing number of data sources and systems investment managers leverage to manage risk and make investment decisions, there is significant overlap in the available data elements. Given potential conflicts between data providers, it is critical to implement a source hierarchy for each field in the database. This allows firms to dictate which source is preferred and enables alternate sources to be used when the preferred source is unavailable. While it can be daunting to understand which source is the most accurate, a robust source hierarchy enables reconciliation processes to proactively resolve data errors.
8. Business logic should be calculated upstream from the presentation layer
As a rule of thumb, complex calculation logic should not exist in a presentation layer such as a business intelligence (BI) tool or user interface (UI). Instead, calculations should be centralized within the enterprise data warehouse or reporting mart. This will ensure consistency across report developers and data analysts and prevents multiple versions of the truth for calculated fields. This strategy also significantly reduces the time and effort to develop reports because the business logic exists upstream, and the presentation layer slices and dices the same data set(s) in different ways to identify trends, key performance indicators, and ultimately tell a story.
9. Client or business function specific customizations should not exist in an enterprise data warehouse
Many firms struggle to gain consensus on enterprise standards across external clients or internal business functions due to conflicting use cases and priorities. In these situations, a recommended practice is to leverage the enterprise data warehouse for data elements that have a shared enterprise policy and a client-specific data mart for client/BU customizations. This approach allows each client or BU to achieve their unique business goals based on their priorities. Additionally, this approach can facilitate permission of confidential information that is only available on a need-to-know basis.
As an example, many Investment Firms leverage an enterprise data warehouse as the central hub for master data (i.e., legal entities, portfolios, issuers, and securities) with separate data marts for the front office, operations, risk, compliance, legal, and other departments. When following this approach, firms should periodically evaluate overlap or inconsistencies and continually strive to move toward an enterprise policy when possible.
10. An investment data warehouse should support a strategy for archiving, versioning, and storing both time series fact data and static dimension data
Given the number of investment-related data elements that change over time, such as accounting balances, prices, ratings, coupons, financial ratios, risk metrics (and more), it is crucial to store many data elements in separate tables that capture both an “effective date” and an “update date.” This allows for point-in-time reporting, which is commonly a month-end process for investment managers.
Conversely, from an accounting, regulatory, and audit perspective, it is also essential to have an “as was” view that prevents backdated trades or other retroactive activity from modifying data in a closed period. To implement a successful strategy that is transparent to end-users, significant analysis and governance are needed to segment static, slowly changing, and time series attributes.
To Sum it Up
An investment data warehouse allows financial institutions to manage their internal and external data with change management, data quality, stewardship, and governance. However, there are potential stumbling blocks along the way. Use this list to help you avoid some of them. And, if you need assistance, Grandview Analytics can help you modernize your data and technology environment with Rivvit-IDW, its state-of-the-art Investment Data Warehouse built for financial institutions.
CEO and co-founder, Grandview Analytics
ABOUT GRANDVIEW ANALYTICS
Grandview Analytics is a technology consulting and data management software company serving financial institutions. We offer data strategy, technology implementation, systems integration, and analytics consulting services as well as an outsourced data management and reporting service powered by our proprietary, cloud-based platform, Rivvit.
Our services drive improved business processes, integrated technologies, accurate and timely data, and enhanced decision-making capabilities. Our seasoned team of financial industry professionals brings deep business and technical domain expertise across asset classes and trade lifecycle. With hands-on financial industry experience, we execute on complex initiatives that help clients optimize ROI on data and technology investments.