Enabling the “Data-Driven Organization” - Data Architecture, Data Acquisition, and Technical Architecture Overview
May 12, 2017
In my previous blog post, entitled “Enabling the Data-Driven Organization: The Vital Importance of Governance and MDM,” I discussed the organizational structures, business processes, and tools associated with Governance, Data Governance, and Master Data Management (MDM). Executive sponsorship, the collaboration between business, clinical, and IT stakeholders, and a strong foundation for integrating and maintaining enterprise-wide reference data, or Master Data, are vitally important to a successful Enterprise Information Management (EIM) initiative. However, it is also essential to have sound “plumbing,” also known as the EIM Data Architecture, in place to integrate data from disparate source systems, conform it to standards, and make it easily consumable by information stakeholders.
CTG recommends the three-tiered EIM Data Architecture, depicted in Figure 1 below, with each layer designed and modeled to meet specific objectives:
Landing Area: Data are extracted from various source systems and loaded into relational database tables. The data structures of the Landing Area essentially mirror those of the source systems and little, if any, transformations are applied. A best practice is to acquire as much data as possible from the source systems rather than only that which is needed (a “no data left behind” approach as a brilliant data architect associate of mine once said).
Conformance Layer: I believe this layer is without a doubt the cornerstone of the EIM Data Architecture. There has been a great deal of vigorous debate about how to best design an Enterprise Data Warehouse (EDW) for many years and whether it is necessary to even implement one. CTG believes the EDW, or Conformance Layer, is a vital component of the EIM “ecosystem.” We further believe that the best data architecture for this layer utilizes Dan Lindstedt’s Data Vault modeling technique.
A future blog post will focus entirely on Data Vault modeling concepts, but it is important to highlight two of its key benefits here:
Flexibility – New data sources can be added without affecting the existing data model, and the model can more readily adapt to changing business requirements.
Traceability – A complete audit trail back to the source system is available. This is invaluable for compliance reporting and data validation.
Analytic Layer: This is the primary “end-user-facing” layer of the EIM Data Architecture. As the data in the Conformance Layer is essentially still in its “raw” form, data must be significantly transformed to make it easily consumable for use in executive dashboards or drill-down analytics. The best practice data architecture for the Analytic Layer is known as a dimensional data model, or “star schema,” in which Key Performance Indicators (KPIs) and other business measures are pre-computed based on the organization’s approved business rules and stored in what is known as a fact table.
These KPIs and measures can then be easily “sliced and diced” by dimensions such as PCP Provider, Location, or Diagnosis. It is important to note that the Data Governance Process manages the business rules for computing KPIs and measures stored in the Analytic Layer. It is also responsible for ensuring the quality of data in the dimensions.
Data Acquisition refers to the tools and techniques for populating the components of the EIM Data Architecture. This is also often referred to as the Extraction/Transformation/Load (ETL) process, which “pulls” data from source systems into the Landing Area or from the Conformance Layer into the Analytic Layer. Data can also be “pushed” from source systems using XML, HL-7, or X12. However, this blog will only focus on ETL, as it is the most commonly used technique in data warehousing.
Hub-based ETL tools such as Informatica PowerCenter manage the entire process of extracting, transforming, and loading data within a dedicated server environment. CTG believes this is a best practice platform that lowers maintenance costs and fosters the ability to enforce design standards. Tools such as SQL Server Integration Services (SSIS) tend to rely heavily upon stored database procedures. A simple way of putting this is, more code leads to a greater chance of inconsistencies, more errors, and higher remediation costs.
The demand for data visualization, predictive analytics, drill-down and exploratory analytics, and true data mining capabilities is increasing at a breathtaking pace. Access to unstructured data from device monitors, weblogs, and clickstreams is no longer considered a future trend, but rather an immediate priority. Query response times must meet the expectations of information stakeholders, and the data warehouse is now considered a mission-critical application rather than an operational reporting platform.
These emerging needs require an optimal technical infrastructure that provides security, identity management, fault tolerance, business continuity, performance, and scalability. Strong consideration should be given to implementing a data warehouse appliance such as IBM Netezza or Oracle Exadata, which provide the server hardware and database software built specifically to be a data warehouse platform.
In the final post in this four-part blog series, I will discuss Business Intelligence and Metadata Management.
Client Solution Architect
John Walton is a CTG Client Solution Architect and consulting professional with more than 35 years of IT experience spanning multiple disciplines and industries. He has more than 20 years of experience leading data warehousing, business intelligence, and data governance engagements. He has extensive experience working with a broad range of healthcare and life sciences organizations including IDNs, national healthcare payers, regional HMOs, a global pharmaceutical company, academic medical centers, community, and pediatric hospitals.