Overall API & Pipeline design
API & Pipeline
The KernelCI API is a server-side service which provides two main features: a database abstraction and a publisher / subscriber interface. Another important concept is the fact that users own the data they send to the API. Let’s have a quick look at how this all fits together.
All the data managed by KernelCI is stored in a MongoDB database using node objects. These can contain data about any part of the testing hierarchy such as a kernel revision, a build, static test results, runtime functional tests, regressions etc. Each node has a parent so they form a simple tree. There is typically one root node for each kernel revision with lots of child nodes containing all the test data that relates to it.
Each node object also has a state which can be used when orchestrating the pipeline. For example, a node will be in the Running state while awaiting some results. There’s also a result value, to tell whether the related pipeline step that produced the node passed or failed. Finally there’s a list of artifacts with URLs to know where to find all the related files (binaries, logs, generated results etc.).
Here’s a publicly available graph showing the number of nodes added to the database every day:
Note: The API doesn’t manage storage, the only requirement is to provide publicly-available HTTP(S) URLs for each artifact.
Publisher / Subscriber Interface
Every time some data changes in the database, basically every time a node has been added or updated, an event is sent on the Publisher / Subscriber interface (Pub/Sub). For example, when a new kernel revision is found and a new node is created for it, an event will be sent to tell subscribers about it with something like “A checkout node has been created for kernel revision v6.2-rc4”. The actual event is a CloudEvents object with some JSON data containing a subset of the corresponding node database entry.
Any client code can subscribe to receive events with an API token and implement features based on how to handle these events. The API generates events automatically whenever nodes are changing but clients may also use the interface to publish their own events and coordinate other parts of the pipeline too.
Interacting with the API requires a token which is associated with a particular user. Whenever a user sends some data such as a new node, it is owned by that user. While all nodes are publicly readable, only the owner of the node can update it. Users can also belong to groups to share data with other users.
While the main KernelCI pipeline will be creating nodes with users from a
kernelci.org group, all the other users can create their own data
which will coexist in the database. Then your own nodes can have parents
created by users. For example, you may submit test results that relate to a
kernel build provided by KernelCI.
The Pipeline is all the client-side services which are running the actual workload (builds, tests etc.). It’s orchestrated based on events from the Pub/Sub interface and all the data is managed via the API. Pipeline services are also responsible for uploading any artifacts to some independent storage services and provide public URLs to access them.
A standard set of services is run directly by KernelCI alongside the API to automate the main part of the pipeline: detecting new kernel revision, scheduling builds and tests, sending email reports, detecting regressions. However, any other service which has an API token is in fact part of the extended pipeline too.
An instance has been set up on
staging.kernelci.org. The Docker logs are
available in real-time via a web
interface for both the API and the
pipeline. It also provides some interactive API
Beta-testing the new API
KernelCI API building blocks
KernelCI Pipeline design details
Setting up a local KernelCI instance