Mender 3.1 release: Monitor using edge processing

We are excited to announce the Mender 3.1 release!

Mender 3.1 introduces the brand new Monitor add-on package. Unlike traditional monitoring tools, Mender’s Monitor add-on is built for connected devices. It uses edge processing, is designed for intermittent connectivity and requires much less bandwidth.

The Monitor add-on package complements the Troubleshoot and Configure add-on packages released earlier and completes the support for supplemental device management capabilities outside OTA software updates. You can easily add addons to your Mender plan or installation.

Mender Monitor

Why invest in monitoring?

Have you encountered a situation similar to the following?

IoT-no-monitoring-customer-interaction-illustration

We can create solid designs and test products thoroughly before shipping them to customers. Unfortunately, reality bites and with complex software products, new issues are certain to surface once devices start running in a wide variety of environments in the field. Therefore there is a need to detect and respond to these issues as quickly as possible -- ideally before the customer notices them and any harm occurs.

If you have no visibility into the performance of your product in the field, you are bound to spend a lot of time and money handling support requests, get unhappy customers and over time a damaged brand.

Problems with existing monitoring solutions

Monitoring itself is not a new use case, in fact, products addressing it have existed for 20 years. So, why not just use one of these products for connected devices as well?

The problem is that they have been designed for a different environment: cloud and server infrastructure. The typical architecture is very heavy on server-processing with the client continuously sending metrics and data to the server.

Monitor%20server%20release%203.1

The obvious problem with this type of architecture for connected field deployed devices is the bandwidth usage. For connected devices, bandwidth is limited and often expensive, especially if traffic occurs over cellular or satellite networks. Further, it is often not possible to connect to devices from the server as they move between networks, or are only intermittently online. Always-on and available clients can be assumed in typical IT infrastructure monitoring solutions, but this is not the reality in the world of connected devices.

A server-heavy architecture means that responding to issues have longer cycle times. What should take milliseconds to fix may require many round trips to different cloud services, and even humans being involved, before the issue is resolved.

Focus your time on differentiating your product features

Before Mender existed to solve the over-the-air software update problem, the vast majority of connected devices depended on a homegrown tool to do over-the-air software updates. Today, Mender users can instead spend their time on developing new business logic and product features that make their product more competitive.

Today, the situation is similar for monitoring. Out of users interviewed and surveyed, 74% use a homegrown monitoring tool.

chart-which-solution-for-monitoring

This begs the question: is there something wrong with the current off-the-shelf monitoring solutions when used in connected device environments?

We would like you to spend your time and energy creating a great product for your customers rather than implementing yet-another homegrown monitoring tool. We created the Monitor add-on package to address the unique environment of connected devices and help ensure re-use rather than constantly re-inventing (and maintaining) the wheel.

What makes Mender Monitor different?

The purpose of the new Mender Monitor add-on package is to detect and analyze health issues of devices, services and applications. Unlike traditional monitoring tools, Mender’s Monitor add-on is built for connected devices and uses edge processing. It is designed for intermittent connectivity and uses close to zero bandwidth.

This is achieved by moving the Alert logic and configuration from the server to the edge device.

Monitor%20edge%20release%203.1

Instead of sending all the metrics from all the devices all the time, the devices will only send Alerts that have triggered. This vastly reduces bandwidth use and enables quick responses to environmental changes locally on the device. For example, if the device is about to overheat, you might want to pause a process, or reboot immediately instead of sending an Alert to a server and hope for some action to come out of it.

Monitor systemd services

Monitoring of applications and services running under systemd is supported out of the box and can be enabled with two simple commands. For example, to monitor the ssh service simply run the following:

mender-monitorctl create service ssh systemd
mender-monitorctl enable service ssh

Using mender-monitorctl is an easy way to configure the monitoring settings on a running device. If you would like to use the same configuration for devices at scale it is available in the directory /etc/mender-monitor/. Please read more about the configuration format in the Monitoring subsystems documentation.

If the ssh service ever stops, there will be an alert in the Mender UI. You can filter devices with monitoring issues as seen below.

devices-filter-issues-monitor

Looking at a specific device, there is a new Monitor section which shows all currently triggered Alerts. Each Alert line also displays when the problem first occurred along with more diagnostics information.

devices-monitor-issue

Monitor logs

Application and system logs are important sources of information about potential device problems and failures.

For example, applications may restart sporadically when they encounter new situations like intermittent connectivity. As they are automatically started again by systemd, docker-compose or other services the root cause of this condition may be difficult to detect, and even harder to diagnose solely based on customer reports. Still, this situation and even what led to it can be fairly easy to discover if you look at the log or output from the application itself.

Similarly, system logs may contain clues that should be investigated. For example, if a line about a disconnected USB device appears in the Linux kernel output, this may indicate that peripherals such as a keypad have lost its connectivity and the product may turn unusable. Maybe the user accidentally pulled a cable partially out and will soon start to have problems with the product. In a connected device environment, it is not practical to collect all the logs from all the devices all the time to find these “needles in the haystack”. With this old-school approach you would likely be wasting 90% or more of bandwidth and face related transfer costs on data of zero value to you.

Mender Monitor allows you to alert on log patterns that appear in output or log files using edge processing, i.e. on the device itself. This means that the only network usage occurs when an anomaly is detected and the related Alert is triggered. In this case, the Alert is sent to the Mender server for further processing.

Log monitoring is equally easy to set up as the service monitoring above. For example, to start monitoring lines indicating a USB device got disconnected, similar to usb 2-1: USB disconnect, device number 50 in the kernel logs, simply run the following command:

mender-monitorctl create log kernel_usb_disconnect '.*usb.*: USB disconnect.*' /var/log/kern.log
mender-monitorctl enable log kernel_usb_disconnect

To help diagnose the problem, you also get the 50 log lines preceding and 50 following the line which caused the Alert to trigger, as you can see below.

monitor-log-lines-collected

The amount of log lines collected before and after the match is also configurable. Pattern matching using regular expression (PCRE) is supported and log rotation is detected and handled.

Alert notifications via email

All triggered Alerts will by default also create an email notification, so you become aware and can handle any detected issues right away. You can see an example email message below.

alert-email-message

You will also receive a similar email notification once the issue is resolved and the Alert transitions into an OK state.

No additional configuration is needed, the emails are sent to all users of your Mender server Organization (aka. Tenant) for which the device belongs to. The device does not need any emailing support because the emails are sent from the Mender server, not the device.

Monitor supports Role Based Access Control, so the UI and email notifications are only available to users who have Read access to the device triggering the Alert.

Try the new features

Here are some pages with more information to get you started with the new features of Mender 3.1:

  • Get started - The best place to do a quick test of the new release from scratch. Sign up for a new Free trial and all features and add-ons are available for 12 months for free.
  • Try the Monitor add-on - Tutorial on getting started with the Monitor add-on.
  • Monitor add-on documentation - Overview of Monitor, including more advanced usage.

Support for your board

If you are getting started with OTA updates, or do not have time to integrate the Mender client with your board for robust A/B system updates, there are several resources available to you!

The Board Integrations category in Mender Hub is a community site to contribute, reuse and maintain Mender board integrations.

We are also happy to help with consulting services to enable verified Mender support for your board!

Share your feedback

We appreciate your general feedback on Mender, be it positive or need for improvement, in the Mender Hub General Discussions forum. Your continued feedback ensures Mender will meet your needs even better in the future!

If you believe you have encountered a bug, please submit your report at the Mender JIRA issue tracker.

We hope you enjoy the new features and are looking forward to hearing from you!