How Orbital Works: Semantic layers and integration

Introduction

In this post, I’ll unpack how Orbital uses a Semantic Layer to automate integration.

If you’ve never heard of Orbital, or semantic integration before - 👋 Welcome! Orbital is a open core data and integration platform on a mission to eliminate integration code.

Our goal is to connect APIs, events and databases on demand, and have those integrations adapt automatically as systems change.

I’ve written in the past about why we created Taxi, and the ideas behind semantic metadata. But, for the impatient - here’s a quick primer:

Modern companies run hundreds of APIs, event streams, databases, and CSVs.
Teams wire these together with point-to-point glue code.
Those connections break every time something changes.
Orbital replaces glue code with semantics: systems describe their data through enriched API specs (OpenAPI, Protobuf, Avro, SOAP, etc.).
Integrations are then expressed semantically — not imperatively.
When systems evolve, Orbital reads the new specs and updates the integrations automatically.

Semantics and Taxi

Orbital is powered by Taxi - a language we created for modelling data semantically.

It lets you add additional metadata to API specs, beyond field names. It let’s use formally define concepts, and then tag those concepts into our API specs.

For example, let’s imagine we’re modelling a simple Film domain. We might define some concepts:

namespace acme.films

type Title inherits String
type FilmId inherits Int
type ReviewScore inherits Int
type DurationInMinutes inherits Int

We can embed these concepts inside an OpenAPI spec (most standard API specs are supported) for a service that returns information about film reviews:

Note

OpenAPI is verbose.

We'll show an example here (scroll down to see the Taxi bits), but for the rest of post, we're going to show examples in Taxi, for brevity. These same approaches work with OpenAPI, Avro, Protobuf, and many others.

We've included the same snippet in pure Taxi, so you can see compare the difference.

# An extract of the ShoppingCartApi OpenAPI spec:
openapi: 3.0.3
info:
  title: Reviews API
  version: 1.0.0
paths:
  /reviews/{filmId}:
    get:
      summary: Get reviews for a film
      operationId: getReviews
      parameters:
        - name: filmId
          in: path
          required: true
          schema:
            $ref: '#/components/schemas/FilmId'
          description: ID of the film
      responses:
        '200':
          description: A list of reviews for the specified film
          content:
            application/json:
              schema:
                type: array
                items:
                  $ref: '#/components/schemas/Review'
components:
  schemas:
    Review:
      type: object
      required:
        - film
        - score
      x-taxi-type:
        name: acme.films.Review ## <--- That's semantic metadata (Taxi)        
      properties:
        film:
          type: integer
          x-taxi-type:
             name: acme.films.FilmId ## <--- That's semantic metadata (Taxi)
        score:
          type: integer
          x-taxi-type:
             name: acme.films.ReviewScore ## <--- That's semantic metadata (Taxi)

// This is the same Reviews API described in Taxi, as a comparison
// We'll be showing examples in Taxi from this point on.
model Review {
  film: FilmId
  score: ReviewScore
}
service ReviewsApi {
   @HttpOperation(method = "GET", url="/reviews/{filmId}")
   operation getReviews(@PathVariable("filmId") filmId: FilmId):Review[]
}

There’s not much to see there - and that’s kinda the point…

Semantic metadata is a thin layer of additional information you add to your existing API specs

TaxiQL - Semantic query language

TaxiQL is the query language of Taxi, and the way that we ask for data from our semantic layer.

It’s strongly typed, declarative, and driven entirely by semantics.

Here’s an example of a Taxi query to fetch data from the Reviews API:

This is interactive

These snippets are interactive - try clicking Run. For each example, you can click to open them in the playground, and see what's going on.

Fetching data from a REST API

Schema (just the highlights)

Play with this snippet by editing it here, or edit it on Taxi Playground

Result

Query failed

Now, this on it’s own isn’t particularly exciting, so lets add some other data sources.

Linking data between sources

Fetching data from a single API is cool ‘n all, but it’s not really the point of a semantic layer.

Things start to get interesting when we add other data sources. Let’s add a database containing film data.

enum FilmRating {
  G,
  PG,
  R18
}
model Film {
  id : FilmId
  title : Title 
  duration : DurationInMinutes
  censorRating: FilmRating
}

service FilmsDb {
  table films : Film[]
}

Now, our semantic layer understands how data links across two sources:

Fork this diagram on playground.taxilang.org

This time, we can run a query asking for data from both sources:

Linking data from a database and an API

Schema (just the highlights)

Play with this snippet by editing it here, or edit it on Taxi Playground

Result

Query failed

What just happened?

This time, when the query ran, the query engine worked out that to return the requested data, it’d need to link data across multiple sources.

Here’s the query plan for that query:

So, we’re fetching data from a database, then enriching it with a series of Rest API calls.

Our query didn’t need to specify how to link that data, the query engine worked it out by using the Semantic layer.

Consumer data contracts

You might’ve noticed in the above example that the consumer defined a Data Contract of the shape of data it wanted to retrieve. Let’s look again:

find { Film[] } as {
     id: FilmId
     name : Title
     reviews: Review[]
}[]

By letting consumers define their own data contract - including field names, response shapes, and composing data sources together, it keeps our consumers and producers entirely decoupled.

Declarative, and adaptive

What’s nice about this approach is that the queries are entirely declarative - we don’t specify which services to call, tables to query, or which fields to map.

That’s particularly powerful, because as things change, consumers remain unaffected.

Our first example was pretty simple - the Id from our database was fed straight to our Reviews API.

However, it’s common in an enterprise for entities to have different Id schemes. Typically, this gets resolved via some kind of lookup API.

So, let’s introduce that complexity:

Fork this diagram on playground.taxilang.org

We’ll update our semantic layer. Typically, this would happen by changing the OpenAPI specs, but we’re going to just show the Taxi, to keep things short’n’sweet.

model FilmIdLookupResponse {
  filmdId: FilmId
  reviewsId: FilmReviewId
}

service FilmIdResolverApi {
  operation lookupIds(FilmId):FilmIdLookupResponse
}

service ReviewsApi {
  // Note the getReviews API no longer accepts a FilmId.
  // It has a different type of input - a FilmReviewId
  @HttpOperation(method = "GET", url = "/reviews/{filmId}")
  operation getReviews(@PathVariable("filmId") filmId:FilmReviewId):Review[](...)
}

Normally, this kind of change would be a breaking change - our consumers would need to call the additional API. Or, if we were using graphQL, we’d need to update our resolvers.

However, because Taxi and TaxiQL are semantic, rather than imperative, for our consumer, nothing changes.

Here’s that same query, again - try clicking run, or open it in the playground to get a better sense:

Database and enriched data via two Lookup APIs

Schema (just the highlights)

Play with this snippet by editing it here, or edit it on Taxi Playground

Result

Query failed

So, even though for the consumer the query hasn’t changed, internally the query plan now looks like this:

Notice we’re adding an additional call to handle resolving the id’s.

How does this work?

Internally, all of the semantics get transformed into a huge graph, that our query engine can traverse at query-time:

The query engine breaks down the TaxiQL query into an AST, and then runs this graph resolution for every field requested by the query - working out how to fetch data by traversing the graph, and calling APIs (or databases, or Kafka topics, or S3 buckets, etc. etc.)

The query engine applies optimisations, like batching requests, and caching responses to ensure that everything stays fast.

This means we can do away with things like Resolvers, or point-to-point integration logic, eliminating the parts of our code that break as teams update and evolve their APIs.

Semantic layers for AI's

As you may have noticed, there’s very little code - or contextual information - involved in all this. This makes it ideally suited for LLM’s and AI copilot / code agents, as the context window is significantly smaller.

The LLM doesn’t need to know about all the different APIs or how to stitch everything together. It simply has to translate user requirements from plain english to a semantic query. And, it turns out that LLM’s are really really good at rephrasing text into a constrained set of terms, using a simple syntax.

eg:

Find me a list of films. Include their title, and a list of reviews

Becomes:

find { Film[] } as {
    id: FilmId
    name : Title
    reviews: Review[]
}[]

If you’re interested in seeing this in action - check out this demo of Luna - our AI Copilot for integration:

How Orbital Works: Semantic layers and integration

​Introduction

​Semantics and Taxi

​TaxiQL - Semantic query language

Fetching data from a REST API

​Linking data between sources

Linking data from a database and an API

​What just happened?

​Consumer data contracts

​Declarative, and adaptive

Database and enriched data via two Lookup APIs

​How does this work?

​Semantic layers for AI's

Introduction

Semantics and Taxi

TaxiQL - Semantic query language

Linking data between sources

What just happened?

Consumer data contracts

Declarative, and adaptive

How does this work?

Semantic layers for AI's