Future of Data Architecture

Design your data management architecture for the way your data and analytics (D&A) teams organize, share and analyze data.

Download the 2023 D&A Technology Adoption Roadmap

Use this simple visual to benchmark your D&A technology plans against 150 D&A peers.

By clicking the "Continue" button, you are agreeing to the Gartner Terms of Use and Privacy Policy.

Contact Information

All fields are required.

Company/Organization Information

All fields are required.

Optional

Benchmark your D&A technology adoption plans

Tap into the collective wisdom of your D&A peers. Download this infographic and get a quick visual guide on how 150 D&A executives view:

  • 38 key emerging D&A technologies — some strategic; others focused more on core D&A function capabilities
  • The degree of enterprise value and risk related to each technology
  • The extent of deployment — from planning and piloting to adoption

Design your data architecture for modern business needs

Modern organizations need a modular data architecture that supports complex enterprise environments while delivering data access to business users. Here are some key considerations.

Data architecture is evolving to deliver data self-service enabled by metadata

Data analytics architecture best practices have passed through a number of eras over the past decades, as digital transformation initiatives have highlighted the need to modernize the data strategy and leverage opportunities to use data. These eras include:

  • Pre-2000 period — Enterprise Data Warehouse Era: Data architecture centered on the success of the enterprise data warehouse (EDW).

  • 2000-2010 — Post-EDW era: This period is defined by fragmented data analysis, whereby data marts were dependent on the data warehouse. And depending on who you asked, you had a different version of the truth, as each data mart consolidation led to yet another data silo leading to fragmented and inconsistent analytics.

  • 2010-2020 — Logical Data Warehouse (LDW) Era: This period saw more unified analysis of data via a common semantic layer, that enabled access to data warehouses, data marts and data lakes. This is current best practice.

  • 2020-future — Active Metadata Era: The future will see augmented analysis of data using all the relevant data sources, accessed and enabled by advanced analytics, recommendation engines, data and AI orchestration, adaptive practices and metadata analysis. 

Democratizing data access and self-service analytics is motivating the current evolution underway from the LDW Era to Active Metadata Era. Chief data and analytics officers (CDAOs) likewise hope to expand the use cases for data beyond that which LDWs can handle. These include master data management, interenterprise data sharing, B2B data integration, partner data sharing, application data integration and others.

But what is metadata, and what role does it play in this evolution?

Metadata describes different facets of data, such as the data’s context. It is produced as a byproduct of data moving through enterprise systems. There are four types of metadata: Technical, operational, business and social. Each of those types can be either “passive” metadata that organizations collect but do not actively analyze, or “active” metadata that identifies actions across two or more systems utilizing the same data.

Active metadata can enable automation, deliver insights and optimize user engagement, and is a key enabler of self-service analytics. Realizing its potential, however, requires a data architecture that balances requirements for repeatability, reusability, governance, authority, provenance and optimized delivery.

Data analytics leaders see two options for evolving their data architecture from the LDW Era, where most operate today, toward the Active Metadata Era. Those options are data fabric or data mesh. These separate concepts share the goal of providing easier access to data for everyone that uses it — including data scientists, data analysts and data engineers, as well as data consumers. Though many data leaders talk about data fabric and data mesh as competing data architecture approaches, they are more accurately viewed as complementary.

 

Data fabric leverages existing assets from the logical data warehouse era

Data fabric is an emerging data management and data integration design concept (also see “Modernize Data Management to Increase Value and Reduce Costs”). Its goal is to attain flexible, reusable and augmented data integration to support data access across the business. 

Data fabric is a natural evolution for many organizations from their logical data warehouse models because it leverages existing technology and metadata in a modernized data architecture. There is no “rip and replace” with a data fabric design. Instead, it capitalizes on sunk costs while providing prioritization and cost control guidance for new data management spending.

Data fabrics deliver benefits across different perspectives:

  • Business Perspective: Enables less technical business users (including analysts) to quickly find, integrate, analyze and share data

  • Data Management Team Perspective: Productivity advantages from automated data access and integration for data engineers, and increased agility resulting in more closures of data requests per day/week/year

  • Overall Organization Perspective: Faster time to insight from data and analytics investments; improved utilization of organizational data; reduced cost by analyzing the metadata across all participating systems and providing insights on effective data design, delivery and utilization

The two factors that determine whether a data fabric design is right for a given organization are: Metadata completeness and data fabric subject matter expertise in the organization. Specifically, organizations with too little metadata will not see the benefits of data fabric. A lack of metadata also increases dependency on subject matter experts (SMEs) that can assist in discovering, inferring and even authoring metadata — which can negate the relatively low-SME requirements of a data fabric design.

Data mesh, while appealing, requires a disciplined approach

Data mesh is an architectural approach that allows for decentralized data management. Its goal is to support efforts to define, deliver, maintain and govern data products in a way that makes them easy to find and use by data consumers. Data mesh architecture is based on the concept of decentralizing and distributing data responsibility to people who are closest to the data and to share that data as a service.

The most common drivers for data mesh are: More data autonomy for lines of business (LOBs), less dependency on central IT and leveraging decentralization of data to break down silos (though some data centralization within a mesh architecture may be warranted). Despite its obvious appeal, be aware of the following prerequisites and challenges.

Data mesh architecture is not yet an established best practice.

The term is associated with varied approaches that differ by organizational model, management of the data and technology implementation. The organizational drivers also vary. They include removing IT as a bottleneck and rationalizing siloed datasets resulting from LOB-led data pipeline creation, or triggered by a cloud-modernization data-management initiative.

Data analytics leaders should not adopt data mesh architecture as a seemingly easy solution to their data management challenges. Although it formalizes common practices, it abdicates data accountability to LOB experts, which risks proliferating siloed data uses.

Data mesh success depends on the organizational model and data skills in LOBs.

If data literacy, autonomy and data skills vary greatly across departments, and if organizations lack the ability to operationalize data management activities, central IT will need to provide more support — at least at first. LOBs can evolve toward greater autonomy within a data mesh environment by creating new roles, such as data product owners, to manage the definition, creation and governance of data products. Organizations that lack commitment to building distributed data skills, however, should avoid data mesh.

Data mesh architecture, design and technology implementation varies greatly.

Data mesh architecture implementations are generally cloud-based and use shared storage and processing. However, tools used with each LOB for the delivery, maintenance and governance of data will vary greatly based on the use cases and the contract between the producer and the consumer. These contracts define the scope, SLAs and cost of operations for data products, such as availability, compute costs, concurrency of access, governance and quality policies, context and semantics. Organizations that proceed without clear contracts in place often face shareability and reusability constraints — which go against the goals of developing a data mesh architecture.

Organizations need a federated governance model.

Data mesh shifts responsibility for data governance to domain application designers and users. For an LOB to autonomously build and expose data products, it must define local data governance and data management that comply with central guidance from the chief information security officer (CISO) and the chief data officer (CDO) or central governance board. In mature data mesh organizations, the business organization enforces its own governance policies with central IT support, not the other way around.

Data mesh is a viable option for organizations with incomplete metadata. So long as they have data architects with subject matter expertise, they can start with data mesh and build their active metadata stores in parallel.

 

The complexity of modern environments demands a flexible data architecture

Data leaders operating with on-premises, cloud, multicloud, intercloud and hybrid deployments will need to revise their existing data architecture strategy to support their present and future complexity. A carefully planned and robust data architecture ensures that new technologies cohere with existing infrastructure and can support future demands — including integration and interoperability across cloud providers, SaaS solutions and on-premises resource deployments, among others. Focus your planning around the following activities:

  • Devise a strategy that addresses the whole data ecosystem. It's common even for organizations with initial cloud deployments to grow into a hybrid and multicloud environment over time. Establishing an overarching cloud strategy that prioritizes providers can govern additional cloud deployments. This will mitigate the risks that unsanctioned cloud deployments can pose to your data architecture.

  • Align data requirements to use cases. Distributed and complex use cases are now driving newer innovations that deliver business value — in particular, by enabling self-service data access. Success in the cloud will depend on the ability to satisfy business consumer use cases, which are most likely distributed in nature, close to data sources and operating on edge networks and devices.

  • Evaluate integration patterns. Rapid data growth and self-service data access have exacerbated the challenge of moving data across different cloud and on-premises systems with the right bandwidth, latency and throughput. Evaluate your integration patterns to identify a reliable and efficient data architecture that serves evolving business use cases and meets data compliance and sovereignty needs.

  • Embrace open source and open standards to future-proof data investments. Get familiar with open-source pricing models in the cloud, including charges for compute and storage resources. Use standards that are open or provider-neutral, and understand the options for open-source data stores, as well as open-source metadata standards that make metadata shareable across platforms in an enterprise environment. Finally, have a support plan in place to address issues with open-source solutions.

Experience Gartner Conferences

Join your peers for the unveiling of the latest insights at Gartner conferences.

Data architecture FAQs

Data fabric is an emerging data management design for attaining flexible, reusable and augmented data management (i.e., better semantics, integration and organization of data) through metadata.

Data mesh is a data management approach (though not an established best practice) that supports a domain-led practice for defining, delivering, maintaining and governing data products.

Metadata is data that describes various facets of data assets. It provides the data’s context, improving our understanding of that data throughout its life cycle.

Data fabric and data mesh are independently developed concepts, often used interchangeably. While they share the same goal of easy access to data, they are different.

  • Data fabric is a technology pattern driven by metadata to automate data management tasks through unified data access.

  • Data mesh is an architectural approach driven by federating data management responsibilities through distributed governance.

Under the right circumstances, fabric and mesh can be complementary. Mesh is the goal, fabric is the means.

 

Drive stronger performance on your mission-critical priorities.