Background

Tips on taxonomies

Overview

Orbital is a data integration & querying platform, built on top of the Taxi schema language.

The idea behind Taxi (and semantic types in general) is to allow producers of data to define both the structural contract (the labels assigned to fields & shapes of objects), and the semantic contract (what each field actually means).

The key to making this work well is in defining a shared taxonomy - a set of common terms with small scope and well-defined meaning. In this guide we’ll take a look tips and pitfalls when crafting a taxonomy.

Structuring your taxonomy

Avoiding common domain models

Trying to enforce standardisation of contracts between multiple systems is really hard, and leads to a lot of time spent forming consensus of design between teams that are working on different goals. This creates complex processes for agreeing to making changes, which makes innovation hard.

Many of the reasons for choosing a shared model across teams (such as lower cost of integration) are solved out-of-the box with Orbital - so you get all the benefit of shared models, without the overhead associated with enterprise domain models.

Types are intended for sharing

Types don’t have structure. In software terms, Types are referred to as Scalar - i.e., they don’t contain any attributes or fields. Things like String or Number are typical examples of scalars.

In Orbital and Taxi, Types take on a semantic meaning, i.e., CustomerFirstName instead of String.

Best practice recommendation

Types shouldn’t have structure or fields, as this makes it harder to share them between systems and teams. Instead, favour lots of small types with well clearly defined meanings.

Getting teams to agree on the meaning of a field is easier than getting teams to agree on how to structure or name it.

Best practice recommendation

Build a set of well defined types that form the basis of your glossary. This set of types is commonly called a Taxonomy.

These types should change infrequently, so spend the time to ensure they’re well documented, with clear definitions of meaning

Models are intended for systems

Models represent a strict contract that a system exposes. As such, the team that designs the system are best placed to design the contract that makes sense for them.

Avoid trying to design models via consensus. As discussed above, getting consensus on shared models is tough, and most of the reasons for adopting shared models are solved using semantic types.

Previous
Introduction to Semantic Integration
Next
Different strategies for publishing schemas