关于prometheus的介绍网上有很多详细的资料和整套的书籍,因为接触的晚,偶有用到,所以整理一下最基础的部署和使用,以便归档
一、安装prometheus
1、根据平台类型直接下载二进制包
2、解压后将包中的所有文件都移动到/usr/local/prometheus便于统一管理
3、启动应用
可以直接手动拉起,或者配置systemctl服务来监管,根据使用环境选择
(1)通过systemctl启动:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
vi /etc/systemd/system/prometheus.service [Unit] Description=Prometheus Server After=network.target [Service] WorkingDirectory=/usr/local/prometheus Restart=on-failure ExecStart=/usr/local/prometheus/prometheus --config.file=/usr/local/prometheus/prometheus.yml --web.listen-address=:8080 --web.enable-admin-api --storage.tsdb.path=/data/prometheus/data [Install] WantedBy=multi-user.target systemctl daemon-reload systemctl enable prometheus systemctl start prometheus |
(2)手工启动
1 2 3 4 |
# 使用非默认的9090端口,开启web admin /usr/local/prometheus/prometheus --config.file=/usr/local/prometheus/prometheus.yml --web.listen-address=:8080 --web.enable-admin-api & |
下面几个模块的启动方式类似,后面就不再单独说吗,只提供一个配置文件便于copy
4、prometheus的其他功能
# 检查配置是否正确
./promtool check config /usr/local/prometheus/prometheus.yml
# 加载修改后的配置
(1)热加载
curl -X POST http://localhost:8080/-/reload
(2)重启
systemctl restart prometheus
5、配置文件模版
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 |
# my global config global: scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute. evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute. # scrape_timeout is set to the global default (10s). # Alertmanager configuration alerting: alertmanagers: - static_configs: - targets: ['9.30.2.8:9093'] # - alertmanager:9093 # Load rules once and periodically evaluate them according to the global 'evaluation_interval'. rule_files: - /usr/local/prometheus/alterRules/*.rules # - "second_rules.yml" # A scrape configuration containing exactly one endpoint to scrape: # Here it's Prometheus itself. scrape_configs: # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config. - job_name: "prometheus" # metrics_path defaults to '/metrics' # scheme defaults to 'http'. static_configs: - targets: ["9.30.2.8:9090"] - job_name: "linux" static_configs: - targets: ["9.30.2.8:9100","9.30.2.9:9100"] - job_name: "process" static_configs: - targets: ["9.30.2.8:9256","9.30.2.9:9256"] |
这里都是使用的静态配置,因为机器个数比较少,临时用一下。
告警规则配置模版
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
groups: - name: mysql rules: - alert: MemoryIncrease expr: delta(namedprocess_namegroup_memory_bytes{groupname="map[:mysql]",memtype="resident"}[3h]) > 1024*1024*1024 for: 1s labels: time: 3h mem_size: 1GB annotations: summary: mysql memory abnormal description: the total resident memory release more than 1G for all mysql in all db machines |
这里的规则文件名称可以自由定义,文件路径需要根据prometheus配置文件中的rule路径放置,例如:/usr/local/prometheus/alterRules/*.rules。规则的写法需要了解PromSQL,这又是一名学问,此处涉入不深,这里示例的规则触发条件是:监控过去3个小时内存变化幅度超过1G
二、安装node_exporter
最常用的监控整个系统状态的exporter
1、搜索并下载node_exporter,直接将解压包中的所有文件都移动到/user/local/node_exporter
2、配置systemctl启动
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
vi /etc/systemd/system/node-exporter.service [Unit] Description=Prometheus Node Exporter After=network.target [Service] WorkingDirectory=/usr/local/node_exporter Restart=on-failure ExecStart=/usr/local/node_exporter/node_exporter [Install] WantedBy=multi-user.target systemctl daemon-reload systemctl enable node-exporter systemctl start node-exporter systemctl status node-exporter |
三、安装process-exporter
可以用来监控进程状态的一个exporter
1、安装方式同上面node-exporter
2、配置systemctl启动
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
vi /etc/systemd/system/process-exporter.service [Unit] Description=Prometheus Process Exporter After=network.target [Service] WorkingDirectory=/usr/local/process_exporter Restart=on-failure ExecStart=/usr/local/process_exporter/process-exporter -config.path=/usr/local/process_exporter/process-exporter.yaml [Install] WantedBy=multi-user.target systemctl daemon-reload systemctl enable process-exporter systemctl start process-exporter systemctl status process-exporter |
配置文件模版
1 2 3 4 5 6 7 8 9 |
process_names: - name: "{{.Matches}}" cmdline: - 'mysqld' - name: "{{.Matches}}" cmdline: - 'nginx' |
配置配置匹配的模式有多种,这里只使用了{{.Matches}}
四、altermanager
altermanager是一个可选项,默认prometheus的web端也是可以看到告警信息的,临时使用可以不用安装
1、安装方式同上
2、配置systemctl启动
1 2 3 4 5 6 7 8 9 10 11 12 |
vim /etc/systemd/system/alertmanager.service [Unit] Description=alertmanager After=network.target [Service] WorkingDirectory=/usr/local/alertmanager Restart=on-failure ExecStart=/usr/local/alertmanager/alertmanager --config.file=/usr/local/alertmanager/alertmanager.yml --log.level=debug --log.format=json [Install] WantedBy=multi-user.target |
配置文件模版:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
route: group_by: ['alertname'] group_wait: 30s group_interval: 5m repeat_interval: 1h receiver: 'web.hook' #receiver: 'wechat' receivers: - name: 'web.hook' webhook_configs: - url: 'http://127.0.0.1:5001/' inhibit_rules: - source_match: severity: 'critical' target_match: severity: 'warning' equal: ['alertname', 'dev', 'instance'] |
五、Grafana
这个比较重要,可视化数据监控主要还是看grafana
1、安装
有源的可以直接安装
sudo yum install grafana-7.1.5-1.x86_64.rpm
没有源的下载rpm包进行安装
2、修改端口
某些情况下遇到默认的3000端口未开放,此时需要修改默认端口,方法如下
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
setcap 'cap_net_bind_service=+ep' /usr/sbin/grafana-server vim /etc/grafana/grafana.ini http_port = 80 systemctl edit grafana-server.service [Service] # Give the CAP_NET_BIND_SERVICE capability CapabilityBoundingSet=CAP_NET_BIND_SERVICE AmbientCapabilities=CAP_NET_BIND_SERVICE # A private user cannot have process capabilities on the host's user # namespace and thus CAP_NET_BIND_SERVICE has no effect. PrivateUsers=false |
3、启动
systemctl start grafana-server
4、接入数据源
打开grafana的web端,默认用户密码是admin/admin,进入之后在configuration->Data Sources中添加prometheus的源即可
5、下载安装各种DashBoard
grafana官网有很多人贡献dashboard,可以根据exporter的类型搜索自己钟意的模版,下载后在DashBoard->Import中导入json文件,然后就可以查看了