grafana 初体验

前言

grafana是一个监控平台,可以监控各种东西,试着玩了一下,挺不错的,记录一下

准备

还是用docker来安装,比较简单,所以先装好docker还有docker-compose

docker-compose.yml

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
version: '2'

networks:
monitor:
driver: bridge

services:
prometheus:
image: prom/prometheus
container_name: prometheus
hostname: prometheus
restart: always
volumes:
- /root/prometheus/prometheus.yml:/etc/prometheus/prometheus.yml
- /root/alertmanages/node_down.yml:/etc/prometheus/node_down.yml
ports:
- "9090:9090"
networks:
- monitor

alertmanager:
image: prom/alertmanager
container_name: alertmanager
hostname: alertmanager
restart: always
volumes:
- /root/alertmanages/alertmanager.yml:/etc/alertmanager/alertmanager.yml
ports:
- "9093:9093"
networks:
- monitor

grafana:
image: grafana/grafana
container_name: grafana
hostname: grafana
restart: always
ports:
- "3000:3000"
networks:
- monitor

node-exporter:
image: quay.io/prometheus/node-exporter
container_name: node-exporter
hostname: node-exporter
restart: always
ports:
- "9100:9100"
networks:
- monitor

cadvisor:
image: google/cadvisor:latest
container_name: cadvisor
hostname: cadvisor
restart: always
volumes:
- /:/rootfs:ro
- /var/run:/var/run:rw
- /sys:/sys:ro
- /var/lib/docker/:/var/lib/docker:ro
ports:
- "8080:8080"
networks:
- monitor

直接贴文件简单粗暴!下面介绍下各个容器的作用

名称 作用
prometheus 获取、存储监控数据,供第三方查询
alertmanager 定义告警规则,发送告警
grafana 展示监控数据
node-exporter 监控节点(每个微服务都能作为节点,机器也行)
cadvisor 监控docker

准备一些配置文件

promethus.yml(promethus的配置文件)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
# my global config
global:
scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
# scrape_timeout is set to the global default (10s).

# Alertmanager configuration
alerting:
alertmanagers:
- static_configs:
- targets: ['alertmanagers:9093']
# - alertmanager:9093

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
- "node_down.yml"
# - "first_rules.yml"
# - "second_rules.yml"

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
- job_name: 'prometheus'
static_configs:
- targets: ['prometheus:9090']

- job_name: 'cadvisor'
static_configs:
- targets: ['cadvisor:8080']

- job_name: 'node'
scrape_interval: 8s
static_configs:
- targets: ['node-exporter:9100']

因为我们配置了网络,所以可以用容器名来连接,不然就用ip

alertmanager.yml(告警发送邮件的配置)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
global:
smtp_smarthost: 'smtp.126.com:465'  #126服务器
smtp_from: '134348@126.com'        #发邮件的邮箱
smtp_auth_username: '23142134@126.com'  #发邮件的邮箱用户名,也就是你的邮箱
smtp_auth_password: '23424'        #发邮件的邮箱密码
smtp_require_tls: false        #不进行tls验证

route:
group_by: ['alertname']
group_wait: 10s
group_interval: 10s
repeat_interval: 10m
receiver: live-monitoring

receivers:
- name: 'live-monitoring'
email_configs:
- to: '806393858@qq.com'

node-down.yml(告警规则)

1
2
3
4
5
6
7
8
9
10
11
groups:
- name: node_down
rules:
- alert: InstanceDown
expr: up == 0
for: 1m
labels:
user: test
annotations:
summary: "Instance {{ $labels.instance }} down"
description: "{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 1 minutes."

启动

启动后访问prometheus的控制台,看status->target,不出意外的话能看到
123

查看rule也能看到配置的规则

配置仪表盘

访问grafana,第一次登陆用admin/admin登陆然后修改下密码,然后选择数据源那里选择prometheus
321
填上prometheus的地址,如果你像我一样指定了docker网络,那么用容器名别用ip,连接上后就可以配置仪表盘了。

点击这个import
21
然后把id粘贴上去,在https://grafana.com/grafana/dashboards有各种各样的仪表盘供你选择

然后就没了~