pushGateWay 遇到的一些问题
前言
生产遇到的两个问题
-
发现通过pgw推送的metric在不继续推送监控数据时,prometheus仍然在更新数据。
-
通过pgw推送metric都被覆盖了。
1、问题一的解决方式
1.1、查看文档
- When monitoring multiple instances through a single Pushgateway, the Pushgateway becomes both a single point of failure and a potential bottleneck.
- You lose Prometheus's automatic instance health monitoring via the up metric (generated on every scrape).
- The Pushgateway never forgets series pushed to it and will expose them to Prometheus forever unless those series are manually deleted via the Pushgateway's API.
1.2、解决方式
只能通过pushgateway的api来删除metric,通过prometheus的api删不掉 全删和局部删,路径规则参考
curl -X PUT http://127.0.0.1:9099/api/v1/admin/wipe
curl -X DELETE http://127.0.0.1:9099/metrics/job/auto_wx_friend_from_pgw/process_name/5ENDU19620000906/grouping_src_instance/192.168.61.153
2、问题二的解决方式
2.1、推送代码
from prometheus_client import CollectorRegistry, Gauge, push_to_gateway
import socket
class PushGateWayPrometheus:
"""
pushgateway
"""
def __init__(self):
self.registry = CollectorRegistry()
self.gateway = '192.168.60.203:9099'
# label 和 value 对应
self.label_name = ['src_instance', 'process_name']
self.src_ip_label_value = socket.gethostbyname(socket.gethostname())
# 无需修改
self.job = 'auto_wx_friend_from_pgw'
self.request_timeout = 5
def gauge_process_alive(self, metric_name: str, describe: str, process_name: str) -> None:
"""
如果对应值设置为1,则表示应用仍然存活
:param metric_name:
:param describe:
:return:
"""
g = Gauge(metric_name, describe, registry=self.registry,
labelnames=self.label_name)
g.labels(self.src_ip_label_value, process_name).set(1)
def push(self, metric_name: str, describe: str, process_name: str) -> None:
"""
推送对应的指标,如果有新的只需新增
:param metric_name:
:param describe:
:return:
"""
self.gauge_process_alive(metric_name, describe, process_name)
push_to_gateway(self.gateway, job=self.job, registry=self.registry, timeout=self.request_timeout,
grouping_key={"process_name": process_name, "grouping_src_instance": self.src_ip_label_value})
# 不用动
PushGateWayPrometheus().push('job_last_success_unixtime', 'Last time a batch job successfully finished',
'ce0717179055de32027e')
PushGateWayPrometheus().push('job_last_success_unixtime', 'Last time a batch job successfully finished',
'5ENDU19620000906')
PushGateWayPrometheus().push('job_last_success_unixtime', 'Last time a batch job successfully finished',
'ce071717fdf178a20c7e')
2.2、指定grouping_key,根据grouping_key中的值进行分组,默认就是根据job进行分组
2.3、此时可以看看上传监控项后产生了多少数据
[root@www pushgateway-1.5.1.linux-amd64]# curl -s "http://192.168.60.203:9099/metrics"|grep "auto_wx_friend_from_pgw"
job_last_success_unixtime{grouping_src_instance="192.168.61.153",instance="",job="auto_wx_friend_from_pgw",process_name="5ENDU19620000906",src_instance="192.168.61.153"} 1
job_last_success_unixtime{grouping_src_instance="192.168.61.153",instance="",job="auto_wx_friend_from_pgw",process_name="ce0717179055de32027e",src_instance="192.168.61.153"} 1
job_last_success_unixtime{grouping_src_instance="192.168.61.153",instance="",job="auto_wx_friend_from_pgw",process_name="ce071717fdf178a20c7e",src_instance="192.168.61.153"} 1
push_failure_time_seconds{grouping_src_instance="192.168.61.153",instance="",job="auto_wx_friend_from_pgw",process_name="5ENDU19620000906"} 0
push_failure_time_seconds{grouping_src_instance="192.168.61.153",instance="",job="auto_wx_friend_from_pgw",process_name="ce0717179055de32027e"} 0
push_failure_time_seconds{grouping_src_instance="192.168.61.153",instance="",job="auto_wx_friend_from_pgw",process_name="ce071717fdf178a20c7e"} 0
push_time_seconds{grouping_src_instance="192.168.61.153",instance="",job="auto_wx_friend_from_pgw",process_name="5ENDU19620000906"} 1.6793950591862314e+09
push_time_seconds{grouping_src_instance="192.168.61.153",instance="",job="auto_wx_friend_from_pgw",process_name="ce0717179055de32027e"} 1.6793950501798096e+09
push_time_seconds{grouping_src_instance="192.168.61.153",instance="",job="auto_wx_friend_from_pgw",process_name="ce071717fdf178a20c7e"} 1.6793950681916375e+09
本文阅读量 次