Mender blog

A whirlwind tour of the Mender client architecture using Rust

This blog post will have a closer look at the architecture of the Mender client. I figured that one of the best ways of getting a birds-eye view of the client is to rewrite the core functionality from scratch. And what better language to rewrite existing functionality than Rust. Thus, in the spirit of Rust we are going to rewrite the existing and well tested functionality of the Mender client in the Rust programming language!

In order to get started we simply set up a Rust project as follows:

cargo new --bin

The first part that we are going to re-implement is the state machine. In the original Mender client repository the state machine has a so called stateless architecture, meaning that the states should be mostly separated, and there is no master entity controlling the state machine. This is probably a good solution, yet you feel like doing it differently. In general there are two approaches we can take: Bottom up, or top down. However, since this is a private project, and structure is not your thing today, lets do a little bit of both. Besides, Mealy and Moore state machines are cool, and makes me reminisce about VHDL (<3).

The state-machine that we are going to re-implement looks like this taken from the Mender documentation.

The Mender state machine

On startup the client does a lot of different things, amongst them a few things that we will skip in this implementation (at least for this initial implementation), like reading it's configuration from '/etc/mender/mender.conf', and verifying the boot-flags, to check whether or not it is on a committed partition (in case of a double partition setup, which is the most common, and the safest way to use Mender). However, we will skip these parts for now, and go straight to the state-machine implementation.

Since we decided to go with a Mealy state-machine), which means that a state-transition will depend upon the state it is in, and the present input. It is time to start modeling the machine in code. In general the idea is to make the state-machine event driven. Meaning that events are produced asynchronously, and passed on to the master handler for the machine.

This was a lot of words and little action, so I guess a simple example will do better. The goal for this first part of the is to setup the basic skeleton of the state-machine, and have the client authorize with the Mender server. This means that the client will have to be able to go back and forth between the Idle and Sync states at given intervals of time, and then when in Sync, make an authorization request to the Mender server. Also, in the Idle state, it has to idle for a configurable interval of time

The first part is modeling all the states that the Mender client can be in during operation. These are (as can be seen in the docs#the-nine-states)):

  • Idle: this is a state where no communication with the server is needed nor is there any update in progress
  • Sync: communication with the server is needed (currently while checking if there is an update for the given device and when inventory data is sent to server)
  • Download: there is an update for the given device and a new image is downloaded and written (i.e. streamed) to the inactive rootfs partition
  • ArtifactInstall: swapping of active and inactive partitions after the download and writing is completed
  • ArtifactReboot: after the update is installed we need to reboot the device to apply the new image. The Enter actions run before the reboot; the Leave actions run after.
  • ArtifactCommit: device is up and running after rebooting with the new image, and the commit makes the update persistent
  • ArtifactRollback: if the new update is broken and we need to go back to the previous one
  • ArtifactRollbackReboot: if we need to reboot the device after performing rollback
  • ArtifactFailure: if any of the "Artifact" states are failing, the device enters and executes this state. This state always runs after the ArtifactRollback and ArtifactRollbackReboot states.
  • In Rust this can be modelled as an Enum like this:

    enum ExternalState {
    Init,
    Idle,
    Sync,
    Download,
    ArtifactInstall,
    ArtifactReboot,
    ArtifactCommit,
    ArtifactRollback,
    ArtifactRollbackReboot,
    ArtifactFailure,
    }

    For this first stage, the only thing the client does is idle for a given time (which can be configured in the Mender-client through the mender.conf file), but here, for simplicity will be hard-coded. Thus the initial actions that the state-machine can receive are pretty measly, and look like:

pub enum Event {
    None,
    Uninitialized,
    AuthorizeAttempt,
}

But more will be added later on, as we handle the stages after authorization, but that blog post is for another time. Through utilizing the pattern matching capabilities of Rust, the initial state-machine skeleton looks like:

loop {
    let (state, action) = match (cur_state, cur_action) {
        (ExternalState::Init, Event::Uninitialized) => {
            match InitState::is_committed() {
                true => (ExternalState::Idle, Event::None),
                false => (ExternalState::Idle, Event::None),
            }
        }
        (ExternalState::Idle, _) if !client.is_authorized => {
            debug!("Client is not authorized, waiting for authorization event");
            Idle::wait_for_event(&auth_events)
        }
        (ExternalState::Idle, _) if client.is_authorized => {
            debug!("Client is authorized, waiting for update event");
            Idle::wait_for_event(&update_events)
        }
        (ExternalState::Sync, Event::AuthorizeAttempt) => {
            debug!("Sync: Authorization attempt");
            let (s,a) = Sync::handle(&mut client);
            if client.is_authorized {
                debug!("Sync: client successfully authorized. Starting the update event producer");
                update_events.start();
            }
            (s,a)
        }
        (_, _) => panic!("Unrecognized state transition"),
    };
    debug!("cur_state: {:?}, cur_event: {:?}", state, action);
    cur_state = state;
    cur_action = action;
}

Take note of the 'auth_events' parameter to the 'Idle' state. This is an asynchronous event producer. Essentially, all it does it sleep for the sought interval, and then adds an event to a channel, which the 'Idle' states does a blocking read on. The event producer code looks like this (and can be found in 'src/authevent.rs'):

pub struct AuthorizationEvent {
    interval: time::Duration,
    publisher: mpsc::Sender<Event>,
    events: mpsc::Receiver<Event>,
}

Where 'mpsc' is a multi-producer, single consumer communication primitive in Rust. Which is exactly what we need for the state-machine, as only the master will be consuming, but later we will have multiple producers adding events to the queue, as the client, after it is authorized, will poll for updates, and send inventory. Which are completely different event producers, but are both consumed by the state-machine through using this simple pattern. Neat!

And this is basically all that an event producer has to do:

        thread::spawn(move || {
            tx1.send(Event::AuthorizeAttempt).unwrap(); // Initally send an authorization attempt event
            loop {
                thread::sleep(interval);
                match tx1.send(Event::AuthorizeAttempt) {
                    Ok(_) => println!("Successfully sent Authorization attemp Event"),
                    Err(e) => println!("Failed to send the authorization attempt Event: {}", e),
                }
            }
        });

Sleep for a given interval, and add its event to the queue, once it is time, then sleep some more. And this is logic is now completely separated from the state-machine. Great. Now the client can switch between 'Sync', and 'Idle', whilst only going to the sync state, with an 'authorization' event at the intervals given by the event producer.

Now that the basic skeleton of the state-machine is in place, it is time to try and get the client authorized with the current Mender server. In order to do this, a HTTP library is needed. Rust, like most modern language have some sort of built-in feature for handling 3rd-party libraries (crates in Rust lingo). Thus in order to find what you need you have to have a little look around the interweb for a library befitting your needs for simple HTTP client communication, and after a fair bit of browsing around (downsides of languages like Rust, in opposition to languages like Go, which have this functionality as a part of the standard library, is that there are a lot of options to choose from). The go to index of these crates are 'crates.io'. After a lot of back and forth you settle on the reqwest library, simply because it looks like it has a pretty clean API.

Now, through pattern matching on the state, and the event, in the state-machine, it is time to enable the Sync state to handle an authorization attempt:

        (ExternalState::Sync, Event::AuthorizeAttempt) => {
            debug!("Sync: Authorization attempt");
            let (s,a) = Sync::handle(&mut client);
            if client.is_authorized {
                debug!("Sync: client successfully authorized. Starting the update event producer");
                update_events.start();
            }
            (s,a)

There are a few quirks in authorization though. But this can be solved through a little digging in the Mender device authentication API documentation) for client authorization. This specifies that the authorization request requires three parts:

  • ID-data -- Vendor-specific JSON representation of the device identity data (MACs, serial numbers, etc.)
  • pubkey-- The device's public key, generated by the device or pre-provisioned by the vendor
  • X-MEN-Signature -- Request signature, computed as 'BASE64(SIGN(device_private_key, SHA256(request_body)))'. Verified with the public key presented by the device.

In code this might look similar to: /src/client.rs

    pub fn authorize(&self) -> Result<reqwest::Response, ClientError> {
        debug!("The client is trying to authorize...");
        // Do authorization
        // Authorization API can be found at:
        // https://docs.mender.io/2.0/apis/device-apis/device-authentication
        let protocol = "https://";
        let host = "docker.mender.io";
        let basepath = "/api/devices/v1";
        let request = "/authentication/auth_requests";
        let uri = protocol.to_owned() + host + basepath + request;
        // Create the AuthRequest body
        let pem_pub_key = String::from_utf8(self.private_key.public_key_to_pem().unwrap()).unwrap();
        let id_data = r#"{"MAC": "123"}"#;
        let auth_req = AuthRequestBody {
            id_data: id_data.to_string(),
            pubkey: pem_pub_key,
            tenant_token: None,
        };
        // serialize the request to json
        let auth_req_str = serde_json::to_string(&auth_req)
            .expect("Failed to serialize the authorization request to json");
        debug!("auth_req_data_str: {}", auth_req_str);
        // Sign using PKCS#1
        let sig = self.sign_request(auth_req_str.as_bytes());
        // Base64 encode the signature
        let sig_base64 = base64::encode(&sig[..384]);
        Ok(self.request_client
            .post(&uri)
            .header("Content-Type", "application/json")
            .header("X-MEN-Signature", sig_base64)
            .body(auth_req_str)
            // .body(auth_req_str.as_bytes())
            .send()?)

The easiest way to test this is to fire up the Mender-demo environment, this can be done as easily as downloading the Mender integration repository, and then running ./demo up.

Now you are all ready to authorize the shiny new (super fast, and super secure) Rust client with the Mender server. As in all things for an engineer of your stature (the lower kind), this seems trivial at first. However, you get stuck trying to authorize with the server, and in the spirit of security, the response you are getting is vague and gives very little away, so as not to lead baddies on:

400 Bad Authorization request

It is then time to fire up the server logs, and scroll through to have a look: These can be had from simply running the mender demo script, which can be found in the integration repository. Thus, scrolling through the logs, the first part we are looking for is the Mender-api-gateway, which will be the first point of contact for any external entity trying to communicate with the Mender server. Thus:

mender-api-gateway_1    | 172.20.0.1 - - [25/Aug/2019:18:08:35 +0000] "POST /api/devices/v1/authentication/auth_requests HTTP/1.1" 400 600 "-"
"reqwest/0.9.20" "-" request ID "dc77fd6a-de27-495f-9b34-d1ad6e6e6d07" 0.082 

is exactly what we are looking for, and we can see that it is sent with the reqwest library, that we introduced to the client previously. Next we look for the mender-device-auth service logs, which is the micro-service which handles the authorization of the clients, which are trying to connect with the server.

mender-device-auth_1    | time="2019-08-25T18:08:35Z" level=warning msg="Failed to extract identity from header: malformed authorization data"
file=middleware.go func="identity.(*IdentityMiddleware).MiddlewareFunc.func1" line=47 request_id=dc77fd6a-de27-495f-9b34-d1ad6e6e6d07 

You got the order of sha'ing, byte64 encoding and signing incorrect you doofus! Back to Emacs!

Compiling often is usually wise, in order to pick up on errors, quickly. Especially in Rust, where all the little shiny bits of Rust (borrowing, security, and memory safety) are all handled by the compiler. However, compiling is boring, so you decide to see if you can set some sort of unofficial world record on how many compiler errors you can create before a single pass is done (Current record -- 13). Also, Rust goes great along with your strategy, as it spits out errors, at the same rate a programmer such as yourself spits out security bugs using a language such as C (</3), which is great! After a tiresome (full) evening of squashing compiler bugs, only to have two, or three more appear for each one you squash, it feels like you are fighting the Hydra, and you are starting to feel a little demotivated, until you remember that Vratislav has said that Rust is shit. Which makes you decide to power through! Even though Vratislav has a lot more experience and brain than you do, the only sensible thing to do is ignore his advice and power through.

After a lot of toiling, slowly, the errors are starting to disappear. Once again the birds are singing, and the sun is shining. You are on top of the world, and Vratislav is wrong (like you knew he would be). Rust is still awesome! Yet, you have just spent the last two hours writing zero lines of code, and are therefore feeling pretty great about yourself, so you call the local pizza place, and order a celebratory kebab. Well deserved!

    Finished dev [unoptimized + debuginfo] target(s) in 7.38s
Cargo-Process finished at Tue Aug 27 20:53:28

And... Great success! The client authorized with the server.

INFO  [mender_rust] Failed to authorize the client: Response { url:
"https://localhost/api/devices/v1/authentication/auth_requests", status: 401,
headers: {"server": "openresty/1.13.6.2", "date": "Tue, 27 Aug 2019 19:00:56 GMT", "content-type": "application/json; charset=utf-8", "x-men-requestid":
"b13d4017-10e7-4abc-b591-0331bcf23d63", "connection": "keep-alive",
"access-control-allow-origin": "*", "vary": "Accept-Encoding",
"x-authentication-version": "unknown"} } 

Or.. At least it tried to:

mender-device-auth_1    | time="2019-08-27T19:00:56Z" level=warning msg="dev auth: unauthorized: dev auth: unauthorized" file=response_helpers.go
func=rest_utils.RestErrWithWarningMsg line=55
request_id=b13d4017-10e7-4abc-b591-0331bcf23d63 

Therefore, we navigate to the demo server environment setup at localhost, by inserting https://localhost into our browser bar, and from there accepting the pending client.

And, believe it or not. It works!

If you're curious, please check out the code at my github).

Victory Puppies!!!

Victory puppies

Recent articles

The scope of EU Cyber Resilience Act (CRA) compliance

The scope of EU Cyber Resilience Act (CRA) compliance

Explore the scope of the EU Cyber Resilience Act (CRA). Learn about the CRA's scope, and why secure OTA updates are essential for compliance.
An overview of EU Cyber Resilience Act (CRA) compliance

An overview of EU Cyber Resilience Act (CRA) compliance

Learn how the EU Cyber Resilience Act (CRA) enforces stringent cybersecurity requirements for PDEs. Explore compliance essentials in part 1 of a 4-part series.
Challenges in complying with the EU Cyber Resilience Act (CRA)

Challenges in complying with the EU Cyber Resilience Act (CRA)

Discover how manufacturers can achieve Cyber Resilience Act (CRA) compliance by tackling secure updates, SBOM management, and vulnerability tracking with robust OTA solutions.
View more articles

Learn why leading companies choose Mender

Discover how Mender empowers both you and your customers with secure and reliable over-the-air updates for IoT devices. Focus on your product, and benefit from specialized OTA expertise and best practices.

 
sales-pipeline_295756365