Prometheus Operator网络探测:Blackbox Exporter

  • A+
所属分类:Kubernetes 容器技术

白盒监控

我们监控主机的资源用量、容器的运行状态、数据库中间件的运行数据。 这些都是支持业务和服务的基础设施,通过白盒能够了解其内部的实际运行状态,通过对监控指标的观察能够预判可能出现的问题,从而对潜在的不确定因素进行优化。而从完整的监控逻辑的角度,除了大量的应用白盒监控以外,还应该添加适当的黑盒监控。黑盒监控即以用户的身份测试服务的外部可见性,常见的黑盒监控包括HTTP探针、TCP探针等用于检测站点或者服务的可访问性,以及访问效率等。

黑盒监控

相较于白盒监控最大的不同在于黑盒监控是以故障为导向当故障发生时,黑盒监控能快速发现故障,而白盒监控则侧重于主动发现或者预测潜在的问题。一个完善的监控目标是要能够从白盒的角度发现潜在问题,能够在黑盒的角度快速发现已经发生的问题。

Blackbox Exporter是Prometheus社区提供的官方黑盒监控解决方案,可以提供 http、dns、tcp、icmp 、ssl的方式对网络进行探测。

https://github.com/prometheus/blackbox_exporter

部署blackbox_exporter

具体配置参考:

https://github.com/prometheus/blackbox_exporter/blob/master/example.yml

cat > /root/tools/exporter/blackexporter.yaml <<EOF
apiVersion: v1
data:
  config.yml: |
    modules:
      http_2xx:
        prober: http
        http:
          method: GET
          preferred_ip_protocol: "ip4"
      http_post_2xx:
        prober: http
        http:
          method: POST
          preferred_ip_protocol: "ip4"
      tcp_connect:
        prober: tcp
      icmp:
        prober: icmp
        timeout: 3s
        icmp:
          preferred_ip_protocol: "ip4"
      dns_tcp:
        prober: dns
        timeout: 5s
        dns:
          transport_protocol: "tcp"
          preferred_ip_protocol: "ip4"
          query_name: "kubernetes.default.svc.cluster.local"
          query_type: "A"
kind: ConfigMap
metadata:
  name: blackbox-exporter
  namespace: monitoring
---
apiVersion: apps/v1
kind: Deployment
metadata:
  creationTimestamp: null
  labels:
    name: blackbox-exporter
    cluster: ali-huabei2-dev
  name: blackbox-exporter
  namespace: monitoring
spec:
  replicas: 1
  selector:
    matchLabels:
      name: blackbox-exporter
  strategy: {}
  template:
    metadata:
      creationTimestamp: null
      labels:
        name: blackbox-exporter
        cluster: ali-huabei2-dev
    spec:
      containers:
      - image: prom/blackbox-exporter:v0.16.0
        name: blackbox-exporter
        ports:
        - containerPort: 9115
        volumeMounts:
        - name: config
          mountPath: /etc/blackbox_exporter
        args:
        - --config.file=/etc/blackbox_exporter/config.yml
        - --log.level=info
      volumes:
      - name: config
        configMap:
          name: blackbox-exporter
---
apiVersion: v1
kind: Service
metadata:
  
  
  labels:
    name: blackbox-exporter
    cluster: ali-huabei2-dev
  name: blackbox-exporter
  namespace: monitoring
spec:
  
  selector:
    name: blackbox-exporter
  ports:
  - name: http-metrics
    port: 9115
    targetPort: 9115
  type: LoadBalancer
EOF

应用文件,检查pod和svc,通过浏览器访问,如果修改configmap记得重启pod

kubectl  apply -f blackexporter.yaml
kubectl get svc -n monitoring
kubectl get deploy -n monitoring

自定义配置job

cat > /root/tools/exporter/prometheus-additional.yaml  <<EOF
##检查http网站存活
- job_name: "blackbox-external-website"
  scrape_interval: 30s
  scrape_timeout: 15s
  metrics_path: /probe
  params:
    module: [http_2xx]
  static_configs:
  - targets:
    - https://www.example.com # 要检查的网址
    - https://test.example.com
    - https://www.baidu.com
    - http://www.sina.com.cn
    - http://www.liuyalei.top
  relabel_configs:
  - source_labels: [__address__]
    target_label: __param_target
  - source_labels: [__param_target]
    target_label: instance
  - target_label: __address__
    replacement: blackbox-exporter:9115

##检查主机存活
- job_name: 'blackbox-node-status'
  metrics_path: /probe
  params:
    module: [icmp]
  static_configs:
    - targets: ['127.0.0.1','192.168.8.101','192.168.0.0']
      labels:
        instance: 'node_status'
        group: 'node'
  relabel_configs:
  - source_labels: [__address__]
    target_label: __param_target
  - source_labels: [__param_target]
    target_label: instance
  - target_label: __address__
    replacement: blackbox-exporter:9115

##检查主机端口存活
- job_name: 'balckbox-port-status'
  metrics_path: /probe
  params:
    module: [tcp_connect]
  static_configs:
    - targets: ['127.0.0.1:9100','127.0.0.1:9090','192.168.8.102:22']
      labels:
        instance: 'port_status'
        group: 'tcp'
  relabel_configs:
  - source_labels: [__address__]
    target_label: __param_target
  - source_labels: [__param_target]
    target_label: instance
  - target_label: __address__
    replacement: blackbox-exporter:91155
EOF

    replacement: blackbox-exporter:9115 这里配置svc blackbox-exporter地址和端口!!!

创建jod的secret对象

kubectl create secret generic additional-configs --from-file=prometheus-additional.yaml -n monitoring

创建完成后,会将上面配置信息进行 base64 编码后作为 prometheus-additional.yaml 这个 key 对应的值存在:

[root@master01 exporter]# kubectl  -n monitoring get secrets additional-configs  -o yaml
apiVersion: v1
data:
  prometheus-additional.yaml: LSBqb2JfbmFtZTogJ2t1YmVybmV0ZXMtc2VydmljZS1lbmRwb2ludHMnCiAga3ViZXJuZXRlc19zZF9jb25maWdzOgogIC0gcm9sZTogZW5kcG9pbnRzCiAgcmVsYWJlbF9jb25maWdzOgogIC0gc291cmNlX2xhYmVsczogW19fbWV0YV9rdWJlcm5ldGVzX3NlcnZpY2VfYW5ub3RhdGlvbl9wcm9tZXRoZXVzX2lvX3NjcmFwZV0KICAgIGFjdGlvbjoga2VlcAogICAgcmVnZXg6IHRydWUKICAtIHNvdXJjZV9sYWJlbHM6IFtfX21ldGFfa3ViZXJuZXRlc19zZXJ2aWNlX2Fubm90YXRpb25fcHJvbWV0aGV1c19pb19zY2hlbWVdCiAgICBhY3Rpb246IHJlcGxhY2UKICAgIHRhcmdldF9sYWJlbDogX19zY2hlbWVfXwogICAgcmVnZXg6IChodHRwcz8pCiAgLSBzb3VyY2VfbGFiZWxzOiBbX19tZXRhX2t1YmVybmV0ZXNfc2VydmljZV9hbm5vdGF0aW9uX3Byb21ldGhldXNfaW9fcGF0aF0KICAgIGFjdGlvbjogcmVwbGFjZQogICAgdGFyZ2V0X2xhYmVsOiBfX21ldHJpY3NfcGF0aF9fCiAgICByZWdleDogKC4rKQogIC0gc291cmNlX2xhYmVsczogW19fYWRkcmVzc19fLCBfX21ldGFfa3ViZXJuZXRlc19zZXJ2aWNlX2Fubm90YXRpb25fcHJvbWV0aGV1c19pb19wb3J0XQogICAgYWN0aW9uOiByZXBsYWNlCiAgICB0YXJnZXRfbGFiZWw6IF9fYWRkcmVzc19fCiAgICByZWdleDogKFteOl0rKSg/OjpcZCspPzsoXGQrKQogICAgcmVwbGFjZW1lbnQ6ICQxOiQyCiAgLSBhY3Rpb246IGxhYmVsbWFwCiAgICByZWdleDogX19tZXRhX2t1YmVybmV0ZXNfc2VydmljZV9sYWJlbF8oLispCiAgLSBzb3VyY2VfbGFiZWxzOiBbX19tZXRhX2t1YmVybmV0ZXNfbmFtZXNwYWNlXQogICAgYWN0aW9uOiByZXBsYWNlCiAgICB0YXJnZXRfbGFiZWw6IGt1YmVybmV0ZXNfbmFtZXNwYWNlCiAgLSBzb3VyY2VfbGFiZWxzOiBbX19tZXRhX2t1YmVybmV0ZXNfc2VydmljZV9uYW1lXQogICAgYWN0aW9uOiByZXBsYWNlCiAgICB0YXJnZXRfbGFiZWw6IGt1YmVybmV0ZXNfbmFtZQotIGpvYl9uYW1lOiAiYmxhY2tib3gtZXh0ZXJuYWwtd2Vic2l0ZSIKICBzY3JhcGVfaW50ZXJ2YWw6IDMwcwogIHNjcmFwZV90aW1lb3V0OiAxNXMKICBtZXRyaWNzX3BhdGg6IC9wcm9iZQogIHBhcmFtczoKICAgIG1vZHVsZTogW2h0dHBfMnh4XQogIHN0YXRpY19jb25maWdzOgogIC0gdGFyZ2V0czoKICAgIC0gaHR0cHM6Ly93d3cuZXhhbXBsZS5jb20gIyDopoHmo4Dmn6XnmoTnvZHlnYAKICAgIC0gaHR0cHM6Ly90ZXN0LmV4YW1wbGUuY29tCiAgcmVsYWJlbF9jb25maWdzOgogIC0gc291cmNlX2xhYmVsczogW19fYWRkcmVzc19fXQogICAgdGFyZ2V0X2xhYmVsOiBfX3BhcmFtX3RhcmdldAogIC0gc291cmNlX2xhYmVsczogW19fcGFyYW1fdGFyZ2V0XQogICAgdGFyZ2V0X2xhYmVsOiBpbnN0YW5jZQogIC0gdGFyZ2V0X2xhYmVsOiBfX2FkZHJlc3NfXwogICAgcmVwbGFjZW1lbnQ6IGJsYWNrYm94LWV4cG9ydGVyOjkxMTUK
kind: Secret
metadata:
  creationTimestamp: "2020-09-18T14:02:52Z"
  name: additional-configs
  namespace: monitoring
  resourceVersion: "109161"
  selfLink: /api/v1/namespaces/monitoring/secrets/additional-configs
  uid: cd24759a-bac9-4fbe-b744-9d48728a8e96
type: Opaque

声明 prometheus 的资源对象文件中添加上这个额外的配置:(prometheus-prometheus.yaml)

添加:

additionalScrapeConfigs:

name: additional-configs

key: prometheus-additional.yaml

[root@master01 exporter]# kubectl  -n monitoring get prometheus -o yaml
apiVersion: v1
items:
- apiVersion: monitoring.coreos.com/v1
  kind: Prometheus
  metadata:
    annotations:
      kubectl.kubernetes.io/last-applied-configuration: |
        {"apiVersion":"monitoring.coreos.com/v1","kind":"Prometheus","metadata":{"annotations":{},"labels":{"prometheus":"k8s"},"name":"k8s","namespace":"monitoring"},"spec":{"alerting":{"alertmanagers":[{"name":"alertmanager-main","namespace":"monitoring","port":"web"}]},"baseImage":"quay.io/prometheus/prometheus","nodeSelector":{"kubernetes.io/os":"linux"},"podMonitorSelector":{},"replicas":2,"resources":{"requests":{"memory":"400Mi"}},"ruleSelector":{"matchLabels":{"prometheus":"k8s","role":"alert-rules"}},"securityContext":{"fsGroup":2000,"runAsNonRoot":true,"runAsUser":1000},"serviceAccountName":"prometheus-k8s","serviceMonitorNamespaceSelector":{},"serviceMonitorSelector":{},"version":"v2.11.0"}}
    creationTimestamp: "2020-08-29T20:00:47Z"
    generation: 4
    labels:
      prometheus: k8s
    name: k8s
    namespace: monitoring
    resourceVersion: "108147"
    selfLink: /apis/monitoring.coreos.com/v1/namespaces/monitoring/prometheuses/k8s
    uid: b71abf71-be3c-457e-8cc2-244d0b38612a
  spec:
    additionalScrapeConfigs:
      key: prometheus-additional.yaml
      name: additional-configs
    alerting:
      alertmanagers:
      - name: alertmanager-main
        namespace: monitoring
        port: web
    baseImage: quay.io/prometheus/prometheus
    nodeSelector:
      kubernetes.io/os: linux
    podMonitorSelector: {}
    replicas: 2
    resources:
      requests:
        memory: 400Mi
    ruleSelector:
      matchLabels:
        prometheus: k8s
        role: alert-rules
    securityContext:
      fsGroup: 2000
      runAsNonRoot: true
      runAsUser: 1000
    serviceAccountName: prometheus-k8s
    serviceMonitorNamespaceSelector: {}
    serviceMonitorSelector: {}
    version: v2.11.0
kind: List
metadata:
  resourceVersion: ""
  selfLink: ""

添加完成后,直接更新 prometheus 这个 CRD 资源对象:

kubectl apply -f prometheus-prometheus.yaml
prometheus.monitoring.coreos.com "k8s" configured

重载Prometheus配置

先删除,在重载 Prometheus,每次添加job都需要重载

打开 Prometheus 的Configuration/Target 页面,就会看到 上面定义的blackbox-external-website任务,有点慢需要等一段时间,probe_success查询条件可以看到状态

kubectl delete secrets -n monitoring additional-configs
kubectl create secret generic additional-configs --from-file=prometheus-additional.yaml -n monitoring

上面我们通过自动prometheus自动发现配置的方式,完成backbox监控,下面我们通过operator完成blackbox和prometheus集成。

ServiceMonitor集成blackbox

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: a6-blackbox-exporter
  namespace: monitoring
spec:
  endpoints:
###检查网站存活curl
  - interval: 30s  #多长时间抓取一次
    params:
      module:
        - http_2xx   #抓取模块
      target:
        - https://blog.csdn.net
    path: "/probe"
    port: http-metrics   #定义endpoint名称
    scheme: http    #抓取的方法
    scrapeTimeout: 30s   #抓取超时时间
    relabelings:
      - sourceLabels:
          - __param_target
        targetLabel: target
      - sourceLabels:
          - __param_target
        targetLabel: instance
###检查主机存活ping
  - interval: 30s
    params:
      module:
        - icmp
      target:
        - 192.168.8.11
    path: "/probe"
    port: http-metrics
    scheme: http
    scrapeTimeout: 30s
    relabelings:
      - sourceLabels:
          - __param_target
        targetLabel: target
      - sourceLabels:
          - __param_target
        targetLabel: instance
###检查端口存活telnet
  - interval: 30s
    params:
      module:
        - tcp_connect
      target:
        - 192.168.8.11:22
    path: "/probe"
    port: http-metrics
    scrapeTimeout: 30s
    relabelings:
      - sourceLabels:
          - __param_target
        targetLabel: target
      - sourceLabels:
          - __param_target
        targetLabel: instance

  namespaceSelector:
    matchNames:
    - monitoring
  selector:
    matchLabels:
      name: blackbox-exporter    #选择blacebox exporter容器service的标签!!!svc记得打标签,否则匹配不到

servicemonitor定义的监控项必须指定namespaceSelector,如果相匹配所有空间改成如下配置

namespaceSelector:
    any: true

prometheus和servicemonitor集成,匹配的是serviceMonitorNamespaceSelector/serviceMonitorSelector,如果prometheus定义了标签,servicemonitor也要修改,否则无法关联。

[root@master01 exporter]# kubectl  -n monitoring get prometheus -o yaml
....
    serviceAccountName: prometheus-k8s
    serviceMonitorNamespaceSelector: {}
    serviceMonitorSelector: {}
....


servicemonitor endpoint详细配置

https://github.com/prometheus-operator/prometheus-operator/blob/master/Documentation/api.md#endpoint

relabelins 作用是重写,支持正则:

https://prometheus.io/docs/prometheus/latest/configuration/configuration/#relabel_config

问题:

servicemonitor不支持探针合并,如果要定义多个url监控,只能逐一添加,后续版本估计会优化(https://github.com/prometheus-operator/prometheus-operator/issues/2821

Grafana配置

默认prometheus配置的grafana没有持久化,修改deployment,持久化到本地

#emptyDir容器销毁数据丢失

      - emptyDir: {}

        name: grafana-storage


hostPath替换:

volumes:
- name: grafana-storage
  hostPath:
    path: /tmp/grafana
    type: DirectoryOrCreate
chmod 777 /tmp/grafana

安装插件

[root@master01 exporter]# kubectl  -n monitoring exec -it grafana-5cd56df4cd-jtbl7  bash
nobody@grafana-5cd56df4cd-jtbl7:/usr/share/grafana$ grafana-cli plugins install grafana-piechart-panel

删除pod,重启grafana,安装dashboard 官网11543

监控域名证书过期

  - alert: "ssl证书过期警告"
    expr: (probe_ssl_earliest_cert_expiry - time())/86400 <10
    for: 1h
    labels:
      severity: warn
    annotations:
      description: '域名{{$labels.instance}}的证书还有{{ printf "%.1f" $value }}天就过期了,请尽快更新证书'
      summary: "ssl证书过期警告"


Prometheus Operator网络探测:Blackbox Exporter

Prometheus Operator网络探测:Blackbox Exporter

参考文档:

https://zhuanlan.zhihu.com/p/103095462

https://www.qikqiak.com/post/prometheus-operator-advance/

https://www.voidking.com/dev-prometheus-operator-blackbox-exporter/

https://yunlzheng.gitbook.io/prometheus-book/part-ii-prometheus-jin-jie/exporter/commonly-eporter-usage/install_blackbox_exporter#yu-prometheus-ji-cheng



YaLei

发表评论

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen: