Implementing a consistent and reliable alerting system across a sprawling organization is a significant challenge for just about any engineering team. For example, diverse infrastructures across different teams and numerous team-specific customizations may not translate well when investigating specific incidents. Inconsistent alerting practices can eventually lead to fatigue, leading to triggering of alerts that may not be relevant or actionable. These issues can even render the entire alerting system ineffective.
However, Grafana’s Provisioned Alerting feature can be an effective way to address these issues. This feature allows you to systematically manage your alerting components by providing modularity for each alerting component. It also enables the importing and exporting of custom alert rules, contact points, notification policies, mute timings, and templates across different Grafana instances. This flexibility allows large organizations to quickly set up alerting resources that are common across different teams, thereby saving time and reducing the margin of human error inherently involved in the process.
Here are three ways you can import alerting resources into your Grafana instance:
The following is an example of how you can use the alert provisioning API to create an alert rule:
{ "orgID": 1, "folderUID": "OpsVerse-Alerts", "ruleGroup": "host-alerts", "title": "HostOutOfMemoryAlert", "condition": "C", "data": [ { "refId": "A", "queryType": "", "relativeTimeRange": { "from": 600, "to": 0 }, "datasourceUid": "metrics", "model": { "editorMode": "code", "expr": "node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes * 100", "hide": false, "intervalMs": 1000, "legendFormat": "__auto", "maxDataPoints": 43200, "range": true, "refId": "A" } }, { "refId": "B", "queryType": "", "relativeTimeRange": { "from": 600, "to": 0 }, "datasourceUid": "__expr__", "model": { "conditions": [ { "evaluator": { "params": [], "type": "gt" }, "operator": { "type": "and" }, "query": { "params": [ "B" ] }, "reducer": { "params": [], "type": "last" }, "type": "query" } ], "datasource": { "type": "__expr__", "uid": "__expr__" }, "expression": "A", "hide": false, "intervalMs": 1000, "maxDataPoints": 43200, "reducer": "last", "refId": "B", "settings": { "mode": "" }, "type": "reduce" } }, { "refId": "C", "queryType": "", "relativeTimeRange": { "from": 600, "to": 0 }, "datasourceUid": "__expr__", "model": { "conditions": [ { "evaluator": { "params": [ 5 ], "type": "lt" }, "operator": { "type": "and" }, "query": { "params": [ "C" ] }, "reducer": { "params": [], "type": "last" }, "type": "query" } ], "datasource": { "type": "__expr__", "uid": "__expr__" }, "expression": "B", "hide": false, "intervalMs": 1000, "maxDataPoints": 43200, "refId": "C", "type": "threshold" } } ], "noDataState": "OK", "execErrState": "Error", "for": "2m", "annotations": { "description": "Host out of memory (instance )\nMemory Left : %", "summary": "Node memory is filling up (< 5% left)" }, "labels": { "alerttype": "opsverse", "severity": "warning" }, "isPaused": false, "notification_settings": null }
Alerting resources can be exported via UI as well as API. The UI exports alert resources in Terraform format. To export an alert in a provisioning file format, the Alerting HTTP API endpoints can be used.
To learn more about how to streamline your alerts and save valuable time, check out the links and detailed documentation we posted in this blog. Feel free to also contact our experts who can help take your alerting systems to the next level.