blog-banner

Metadata management reference architecture: A quick guide with diagrams

Last edited on April 30, 2024

0 minute read

    Metadata management is an important part of any business application, and user metadata is especially critical. Let’s take a quick look at what metadata is, why it’s important, and how you can architect your application to ensure highly available, consistent metadata at scale.

    metadata-reference-architecture-updated-again

    What is metadata?Copy Icon

    Put simply, metadata is data about other data.

    There are many types of metadata, but in the context of enterprise applications, one of the most important is what we call user metadata (sometimes also called user account data). As the name suggests, user metadata is metadata that’s associated with a specific application user. It often includes things like:

    • The user’s access level and permissions

    • The user’s application preferences

    • The user’s contact information

    • The user’s payment methods

    • The user’s application usage history

    These are just examples; precisely what comprises user metadata will vary from company to company. For example, some companies will store IAM data such as usernames and password hashes in their user metadata store, while others store that information in a distinct IAM database.

    Why metadata mattersCopy Icon

    Although metadata storage often isn’t top-of-mind for engineers and architects, no application can function at enterprise scale without it.

    For example, Starburst is an end-to-end SaaS analytics platform serving massive enterprise customers such as Comcast, DoorDash, and Citi. Starburst’s user metadata store includes (among other things) the query history associated with each user, so that users can easily log in and re-run or build on the queries that are critical to their analytics needs. That data needs to remain available and consistent at all times or the user experience would be massively degraded.

    In this era of increasingly personalized application experiences, this is the rule, not the exception: behind virtually every enterprise-scale application is a user metadata store that holds data powering the critical user-specific features of the app.

    Key concerns for user metadata managementCopy Icon

    Given that user metadata is critical to the functioning of almost every application, metadata management should be a key consideration for any company that operates, or aspires to operate, at scale.

    Specifically, important concerns relating to the management of user metadata include, but are not limited to:

    Availability. Needless to say, critical databases cannot go offline without taking most or all of the applications they support offline with them. In 2024, maintaining high availability for your application (and by extension your metadata store) is table stakes, and users are less and less likely to forgive any downtime. (A

    Latency. Just as modern users are unlikely to forgive downtime, they tend to be unenthusiastic – to put it mildly – about degraded performance and lag. Since a single user session often requires querying the user metadata table frequently, even slight lag will be noticeable. In practice, for enterprises this often means that the data must be geographically distributed so that users from the Bay area to Boston to Berlin to Beijing can all enjoy a low-latency experience. It also means choosing a database technology that can remain performant at scale.

    Consistency. User metadata stores must always reflect the most current, correct information. This becomes challenging when combined with the requirement that they offer high availability and low latency, because at enterprise scale achieving high availability always means some kind of distributed database solution, and it often means data must be geographically distributed, too. And again, users tend to be unforgiving of delays – if a user adds an item to their wish list, they want to see it reflected there immediately, not seconds or minutes later.

    Documentation. Because user metadata is critical to the application’s function, and user metadata tables are likely to be queried by many other parts of the application, an important part of metadata management is ensuring that the data itself, and all practices associated with it, are rigorously documented.

    Change data capture. Most applications will also have some need to send at least some user metadata to elements of the stack (such as Kafka, predictive models, etc.) in a real-time streaming format.

    The above are all concerns for virtually any user metadata system, but the specifics of your business and your application will likely introduce additional concerns.

    For example, while we’ve suggested above that localizing data is often important for reducing latency and providing a good user experience, data localization may also be a regulatory requirement in jurisdictions in which you do business. This is particularly likely to be true for enterprises that operate in more “sensitive” industries such as banking, fintech, real-money gaming, etc.

    User metadata reference architecture: an exampleCopy Icon

    While every company is different, we’ve found that for businesses operating at enterprise scale, the above requirements typically lead to the use of a distributed SQL database such as CockroachDB, which can offer the strong consistency and familiar schema of SQL in combination with high availability, easy row-level data geolocation, and a variety of other features such as CDC that support other elements of the application.

    Below, we’ve laid out a simple example of how this can work in the context of a broader application architecture. Note that for visual clarity we’ve significantly simplified things. For example, we’ve depicted only a few application services below, but in reality an enterprise-scale application might have hundreds, or thousands. And while a typical application is served by many databases, we’ve depicted only the metadata store below to make it clearer how a single database can function even across geographical regions.

    metadata-reference-architecture-updated-again

    In the above diagram, requests and data from the front end (which might be a web or mobile application) are sent to a load balancer that distributes them to the appropriate microservice instances. From there, microservices relating to metadata send the relevant data to a CockroachDB cluster (via a load balancer), and then CockroachDB distributes the data automatically across nodes and regions in accordance with however it’s configured. The data from the user metadata store in CockroachDB is then synced to Kafka and a data warehouse via CDC.

    This architecture addresses most of the concerns we laid out earlier for metadata management:

    • It is highly available, since a CockroachDB node or even a whole region can go offline without the database going offline thanks to CockroachDB’s consensus replication.

    • It is low latency since user metadata can be located geographically close to the user it is associated with.

    • It is strongly consistent since CockroachDB uses serializable isolation by default, although a lower level of consistency can be configured if needed. (More on isolation levels in CockroachDB.)

    • It includes built-in change data capture, ensuring that you can keep all your systems in sync with real-time streaming data without having to find or build a separate CDC solution.

    • Admittedly, it won’t write the documentation for you, but the fact that from a developer perspective it functions much like a single-instance Postgres database does make documentation and training easier, since virtually all developers are familiar with PostgreSQL.

    Of course, real-world metadata architectures can get significantly more complex. When designing the architecture for your own application, it may be helpful to look at public examples such as Netflix’s device management architecture, which uses CockroachDB to store metadata related to all of the different hardware devices with which Netflix apps are compatible.

    architecture
    reference architecture
    cdc
    kafka
    metadata