Publishing schemas to Orbital

Overview

Orbital works by understanding where and how data is exposed across your organisation. Orbital is technology-agnostic, and is designed for organisations with a wide variety of technology stacks. As a result, there’s lots of different ways to make schemas available to Orbital, each with it’s own set of pros and cons.

In this guide, we’ll explore the different techniques, so you can choose the right mix that works across your organisation.

We’ll also provide recommendations. These are a mixture of principals that we’ve kept in mind when designing Orbital, and techniques we’ve seen customers implement. These aren’t hard rules - like any advice, it’s up to you to choose what works.

Publishing schemas to Orbital

Each system needs to decide how it’s schema information will be made available to Orbital, and in what form.

In a large organisation, it’s typical to mix-and-match these approaches, to fit in what works for each team. Orbital is designed to support multiple different methods at the same time.

A schema needs to describe a few important elements:

The structural contract of data

What’s the shape of the data that the system is exposing (output) or expecting (input)? This includes field names, nested objects, and expected parameters for operations.

Most schema languages (eg., OpenAPI / Protobuf / SQL) describe this really well.

// An example of a clear structural contract, taken from the protobuf docs
syntax = "proto3";
package tutorial;

message Person {
  string name = 1;
  int32 id = 2;
  string email = 3;
}

The semantic contract of data

What’s the meaning of each field that’s being exposed (output) or received (input)? This is really critical when mapping data between systems, as it’s how we ensure we’re passing the correct information into the correct field.

There’s much less support for semantic contracts in standard schema languages. As a result it’s common for someone to build adhoc maps in Word Documents, Wikis, or Spreadsheets.

This is where languages like Taxi really shine, as they let you bake in the semantic contract as well as the structural contract

// A contract that contains semantic data
model Person {
   name : PersonName inherits String
   id : PersonId inherits Int
   email : EmailAddress inherits String
}

Enriching structural contracts with semantics

While you can use Taxi to define your schemas outright, this isn’t a very common practice, given the prevolence of feature-rich, well supported schema languages. Therefore, a good practice is to combine the two, using taxi extensions inside existing schema languages.

// An example of enriching a structural contract with semantic metadata
syntax = "proto3";
import "taxilang/dataType.proto"; // Import the DataType extension
package tutorial;


message Person {
  string name = 1 [(dataType='PersonName')];
  int32 id = 2  [(dataType='PersonId')];
  string email = 3 [(dataType='PersonEmail')];
}

Systems push schemas to Orbital

The preferred way of exposing schema data to Orbital is to have systems (or CI/CD tooling) publish the schema directly to Orbital.

By making the system (and it’s team) responsible for publishing its own definition, the schema documentation lives as close as possible to the system itself, so has the best chance of being up-to-date. The team that maintains the system can evolve any schema documentation along with the system itself.

Automatically adapting to change

Orbital automates the integration between services, by leveraging the metadata present in the schemas that are published to it.

As schemas change, Orbital automatically adapts its integrations accordingly.

In order to make the most of this capability, it’s ideal to have systems automatically publishing their own schemas. The larger the separation between a change happening in a system, and the team responsible for updating the schema, the greater the chance of schemas being incorrect.

Of course, this is no different from manual integration without Orbital - if documentation isn’t maintained, then integration becomes error-prone.

Generating schema definitions from code

Generating schemas directly from code is a great way of ensuring that schemas evolve with the code, as they’re generated at run-time.

As a schema language, Taxi has great support for generating schemas directly from Kotlin and Java, with other framework support planned.

Using this approach, services generate their own schemas, and publish them directly to Orbital on startup.

Pros / Cons

Strong chance of schema staying up to date
Schema is edited by the same domain experts who build the application
Requires the application to "have knowledge" of Taxi for code generation
Requires the application to "have knowledge" of Orbital for publication

Augmenting existing schemas with semantic metadata

Many applications and systems already publish schemas using a rich schema language, such as Swagger / OpenAPI, Protobuf, JsonSchema, etc.

Generally speaking, these schema languages only describe the structural contract of the data, but not the semantic contract. Therefore, the ideal is to enhance existing schemas with this additional metadata.

The Taxi project has growing support for embedding semantic metadata inside existing schema languages.

Schema Format	Taxi Support
OpenAPI	Supported
Swagger	Supported
Protobuf	Supported
Avro	Supported
JsonSchema	Supported

In these cases, a great solution is to simply enhance the existing schemas with additional metadata.

Pros / Cons

Strong chance of schema staying up to date
Schema is edited by the same domain experts who build the application
No knowledge of Taxi inside code
Schema publication can be performed either at runtime, or in a CI/CD job
Not available for all schema languages

Orbital polls systems for updates

Orbital’s schema server can be configured to poll sources for schemas, using a variety of back-end storages:

File systems
Git Repositories
OpenAPI endpoints
HTTP servers

This is a strong option for scenarios where sytems can’t publish their own schemas (eg., databases), or for data sources that are otherwise structureless (eg., CSV files).

Additionally, using a git-backed repository for a shared glossary / taxonomy is a great way to allow decentralized authorship of the core set of glossary terms.

Storing schemas separately from systems

Sometimes it’s not possible to have systems publish their own code - there’s a variety of reasons for this:

Database schemas - which can’t automatically be pushed
Legacy or external systems, which can’t be modified to publish their own schemas
Schemaless content - such as CSV files

In these cases, it’s possible to store schemas in a git repository, and have Orbital’s schema server manually poll the repository.

The disadvantages here are that it’s easy for the schema definition to drift from the actual schema as the system changes.

Pros / Cons

Good fall-back option when no other options are available
Requires no changes to publishing systems
Requires careful change planning to ensure schemas don't get out of sync with application
Schemas are not necessarily maintained by the same team, which can lead to loss of domain knowledge

​Overview

​Publishing schemas to Orbital

​The structural contract of data

​The semantic contract of data

​Enriching structural contracts with semantics

​Systems push schemas to Orbital

​Automatically adapting to change

​Generating schema definitions from code

Pros / Cons

​Augmenting existing schemas with semantic metadata

Pros / Cons

​Orbital polls systems for updates

​Storing schemas separately from systems

Pros / Cons

Overview

Publishing schemas to Orbital

The structural contract of data

The semantic contract of data

Enriching structural contracts with semantics

Systems push schemas to Orbital

Automatically adapting to change

Generating schema definitions from code

Augmenting existing schemas with semantic metadata

Orbital polls systems for updates

Storing schemas separately from systems