热门标签 | HotTags
当前位置:  开发笔记 > 编程语言 > 正文

prometheus学习系列十一:Prometheus报警规则配置LinuxPanda

prometheus监控系统的的报警规则是在prometheus这个组件完成配置的。prometheus支持2种类型的规则,记录规则和报警规则,记录规则主要是为了简写报警规则

prometheus学习系列十一: Prometheus 报警规则配置

prometheus监控系统的的报警规则是在prometheus这个组件完成配置的。 prometheus支持2种类型的规则,记录规则和报警规则, 记录规则主要是为了简写报警规则和提高规则复用的, 报警规则才是真正去判定是否需要报警的规则。 报警规则中是可以使用记录规则的。

提供下我整理的node-exporter的记录规则和报警规则。

node-exporter-record-rules.yml

groups:
  - name: node-exporter-record
    rules:
    - expr: up{job=~"node-exporter"}
      record: node_exporter:up 
      labels: 
        desc: "节点是否在线, 在线1,不在线0"
        unit: " "
        job: "node-exporter"
    - expr: time() - node_boot_time_seconds{}
      record: node_exporter:node_uptime
      labels: 
        desc: "节点的运行时间"
        unit: "s"
        job: "node-exporter"
##############################################################################################
#                              cpu                                                           #
    - expr: (1 - avg by (environment,instance) (irate(node_cpu_seconds_total{job="node-exporter",mode="idle"}[5m])))  * 100 
      record: node_exporter:cpu:total:percent
      labels: 
        desc: "节点的cpu总消耗百分比"
        unit: "%"
        job: "node-exporter"

    - expr: (avg by (environment,instance) (irate(node_cpu_seconds_total{job="node-exporter",mode="idle"}[5m])))  * 100 
      record: node_exporter:cpu:idle:percent
      labels: 
        desc: "节点的cpu idle百分比"
        unit: "%"
        job: "node-exporter"

    - expr: (avg by (environment,instance) (irate(node_cpu_seconds_total{job="node-exporter",mode="iowait"}[5m])))  * 100 
      record: node_exporter:cpu:iowait:percent
      labels: 
        desc: "节点的cpu iowait百分比"
        unit: "%"
        job: "node-exporter"


    - expr: (avg by (environment,instance) (irate(node_cpu_seconds_total{job="node-exporter",mode="system"}[5m])))  * 100 
      record: node_exporter:cpu:system:percent
      labels: 
        desc: "节点的cpu system百分比"
        unit: "%"
        job: "node-exporter"

    - expr: (avg by (environment,instance) (irate(node_cpu_seconds_total{job="node-exporter",mode="user"}[5m])))  * 100 
      record: node_exporter:cpu:user:percent
      labels: 
        desc: "节点的cpu user百分比"
        unit: "%"
        job: "node-exporter"

    - expr: (avg by (environment,instance) (irate(node_cpu_seconds_total{job="node-exporter",mode=~"softirq|nice|irq|steal"}[5m])))  * 100 
      record: node_exporter:cpu:other:percent
      labels: 
        desc: "节点的cpu 其他的百分比"
        unit: "%"
        job: "node-exporter"
##############################################################################################


##############################################################################################
#                                    memory                                                  #
    - expr: node_memory_MemTotal_bytes{job="node-exporter"}
      record: node_exporter:memory:total
      labels: 
        desc: "节点的内存总量"
        unit: byte
        job: "node-exporter"

    - expr: node_memory_MemFree_bytes{job="node-exporter"}
      record: node_exporter:memory:free
      labels: 
        desc: "节点的剩余内存量"
        unit: byte
        job: "node-exporter"

    - expr: node_memory_MemTotal_bytes{job="node-exporter"} - node_memory_MemFree_bytes{job="node-exporter"}
      record: node_exporter:memory:used
      labels: 
        desc: "节点的已使用内存量"
        unit: byte
        job: "node-exporter"

    - expr: node_memory_MemTotal_bytes{job="node-exporter"} - node_memory_MemAvailable_bytes{job="node-exporter"}
      record: node_exporter:memory:actualused
      labels: 
        desc: "节点用户实际使用的内存量"
        unit: byte
        job: "node-exporter"

    - expr: (1-(node_memory_MemAvailable_bytes{job="node-exporter"} / (node_memory_MemTotal_bytes{job="node-exporter"})))* 100
      record: node_exporter:memory:used:percent
      labels: 
        desc: "节点的内存使用百分比"
        unit: "%"
        job: "node-exporter"

    - expr: ((node_memory_MemAvailable_bytes{job="node-exporter"} / (node_memory_MemTotal_bytes{job="node-exporter"})))* 100
      record: node_exporter:memory:free:percent
      labels: 
        desc: "节点的内存剩余百分比"
        unit: "%"
        job: "node-exporter"
##############################################################################################
#                                   load                                                     #
    - expr: sum by (instance) (node_load1{job="node-exporter"})
      record: node_exporter:load:load1
      labels: 
        desc: "系统1分钟负载"
        unit: " "
        job: "node-exporter"

    - expr: sum by (instance) (node_load5{job="node-exporter"})
      record: node_exporter:load:load5
      labels: 
        desc: "系统5分钟负载"
        unit: " "
        job: "node-exporter"

    - expr: sum by (instance) (node_load15{job="node-exporter"})
      record: node_exporter:load:load15
      labels: 
        desc: "系统15分钟负载"
        unit: " "
        job: "node-exporter"
   
##############################################################################################
#                                 disk                                                       #
    - expr: node_filesystem_size_bytes{job="node-exporter" ,fstype=~"ext4|xfs"}
      record: node_exporter:disk:usage:total
      labels: 
        desc: "节点的磁盘总量"
        unit: byte
        job: "node-exporter"

    - expr: node_filesystem_avail_bytes{job="node-exporter",fstype=~"ext4|xfs"}
      record: node_exporter:disk:usage:free
      labels: 
        desc: "节点的磁盘剩余空间"
        unit: byte
        job: "node-exporter"

    - expr: node_filesystem_size_bytes{job="node-exporter",fstype=~"ext4|xfs"} - node_filesystem_avail_bytes{job="node-exporter",fstype=~"ext4|xfs"}
      record: node_exporter:disk:usage:used
      labels: 
        desc: "节点的磁盘使用的空间"
        unit: byte
        job: "node-exporter"

    - expr:  (1 - node_filesystem_avail_bytes{job="node-exporter",fstype=~"ext4|xfs"} / node_filesystem_size_bytes{job="node-exporter",fstype=~"ext4|xfs"}) * 100 
      record: node_exporter:disk:used:percent    
      labels: 
        desc: "节点的磁盘的使用百分比"
        unit: "%"
        job: "node-exporter"

    - expr: irate(node_disk_reads_completed_total{job="node-exporter"}[1m])
      record: node_exporter:disk:read:count:rate
      labels: 
        desc: "节点的磁盘读取速率"
        unit: "次/秒"
        job: "node-exporter"

    - expr: irate(node_disk_writes_completed_total{job="node-exporter"}[1m])
      record: node_exporter:disk:write:count:rate
      labels: 
        desc: "节点的磁盘写入速率"
        unit: "次/秒"
        job: "node-exporter"

    - expr: (irate(node_disk_written_bytes_total{job="node-exporter"}[1m]))/1024/1024
      record: node_exporter:disk:read:mb:rate
      labels: 
        desc: "节点的设备读取MB速率"
        unit: "MB/s"
        job: "node-exporter"

    - expr: (irate(node_disk_read_bytes_total{job="node-exporter"}[1m]))/1024/1024
      record: node_exporter:disk:write:mb:rate
      labels: 
        desc: "节点的设备写入MB速率"
        unit: "MB/s"
        job: "node-exporter"

##############################################################################################
#                                filesystem                                                  #
    - expr:   (1 -node_filesystem_files_free{job="node-exporter",fstype=~"ext4|xfs"} / node_filesystem_files{job="node-exporter",fstype=~"ext4|xfs"}) * 100 
      record: node_exporter:filesystem:used:percent    
      labels: 
        desc: "节点的inode的剩余可用的百分比"
        unit: "%"
        job: "node-exporter"
#############################################################################################
#                                filefd                                                     #
    - expr: node_filefd_allocated{job="node-exporter"}
      record: node_exporter:filefd_allocated:count
      labels: 
        desc: "节点的文件描述符打开个数"
        unit: "%"
        job: "node-exporter"
 
    - expr: node_filefd_allocated{job="node-exporter"}/node_filefd_maximum{job="node-exporter"} * 100 
      record: node_exporter:filefd_allocated:percent
      labels: 
        desc: "节点的文件描述符打开百分比"
        unit: "%"
        job: "node-exporter"

#############################################################################################
#                                network                                                    #
    - expr: avg by (environment,instance,device) (irate(node_network_receive_bytes_total{device=~"eth0|eth1|ens33|ens37"}[1m]))
      record: node_exporter:network:netin:bit:rate
      labels: 
        desc: "节点网卡eth0每秒接收的比特数"
        unit: "bit/s"
        job: "node-exporter"

    - expr: avg by (environment,instance,device) (irate(node_network_transmit_bytes_total{device=~"eth0|eth1|ens33|ens37"}[1m]))
      record: node_exporter:network:netout:bit:rate
      labels: 
        desc: "节点网卡eth0每秒发送的比特数"
        unit: "bit/s"
        job: "node-exporter"

    - expr: avg by (environment,instance,device) (irate(node_network_receive_packets_total{device=~"eth0|eth1|ens33|ens37"}[1m]))
      record: node_exporter:network:netin:packet:rate
      labels: 
        desc: "节点网卡每秒接收的数据包个数"
        unit: "个/秒"
        job: "node-exporter"

    - expr: avg by (environment,instance,device) (irate(node_network_transmit_packets_total{device=~"eth0|eth1|ens33|ens37"}[1m]))
      record: node_exporter:network:netout:packet:rate
      labels: 
        desc: "节点网卡发送的数据包个数"
        unit: "个/秒"
        job: "node-exporter"

    - expr: avg by (environment,instance,device) (irate(node_network_receive_errs_total{device=~"eth0|eth1|ens33|ens37"}[1m]))
      record: node_exporter:network:netin:error:rate
      labels: 
        desc: "节点设备驱动器检测到的接收错误包的数量"
        unit: "个/秒"
        job: "node-exporter"

    - expr: avg by (environment,instance,device) (irate(node_network_transmit_errs_total{device=~"eth0|eth1|ens33|ens37"}[1m]))
      record: node_exporter:network:netout:error:rate
      labels: 
        desc: "节点设备驱动器检测到的发送错误包的数量"
        unit: "个/秒"
        job: "node-exporter"
      
    - expr: node_tcp_connection_states{job="node-exporter", state="established"}
      record: node_exporter:network:tcp:established:count
      labels: 
        desc: "节点当前established的个数"
        unit: ""
        job: "node-exporter"

    - expr: node_tcp_connection_states{job="node-exporter", state="time_wait"}
      record: node_exporter:network:tcp:timewait:count
      labels: 
        desc: "节点timewait的连接数"
        unit: ""
        job: "node-exporter"

    - expr: sum by (environment,instance) (node_tcp_connection_states{job="node-exporter"})
      record: node_exporter:network:tcp:total:count
      labels: 
        desc: "节点tcp连接总数"
        unit: ""
        job: "node-exporter"
   
#############################################################################################
#                                process                                                    #
    - expr: node_processes_state{state="Z"}
      record: node_exporter:process:zoom:total:count
      labels: 
        desc: "节点当前状态为zoom的个数"
        unit: ""
        job: "node-exporter"
#############################################################################################
#                                other                                                    #
    - expr: abs(node_timex_offset_seconds{job="node-exporter"})
      record: node_exporter:time:offset
      labels: 
        desc: "节点的时间偏差"
        unit: "s"
        job: "node-exporter"

#############################################################################################
   
    - expr: count by (instance) ( count by (instance,cpu) (node_cpu_seconds_total{ mode=\'system\'}) ) 
      record: node_exporter:cpu:count
#

node-exporter-alert-rules.yml

groups:
  - name: node-exporter-alert
    rules:
    - alert: node-exporter-down
      expr: node_exporter:up == 0 
      for: 1m
      labels: 
        severity: info
      annotations: 
        summary: "instance: {{ $labels.instance }} 宕机了"  
        description: "instance: {{ $labels.instance }} \n- job: {{ $labels.job }} 关机了, 时间已经1分钟了。" 
        value: "{{ $value }}"
        instance: "{{ $labels.instance }}"
        grafana: "http://xxxxxxxx.com/d/node-exporter/node-exporter?orgId=1&var-instance={{ $labels.instance }} "
        console: "https://ecs.console.aliyun.com/#/server/{{ $labels.instanceid }}/detail?regiOnId=cn-beijing"
        cloudmonitor: "https://cloudmonitor.console.aliyun.com/#/hostDetail/chart/instanceId={{ $labels.instanceid }}&system=®ion=cn-beijing&aliyunhost=true"
        id: "{{ $labels.instanceid }}"
        type: "aliyun_meta_ecs_info"



    - alert: node-exporter-cpu-high 
      expr:  node_exporter:cpu:total:percent > 80
      for: 3m
      labels: 
        severity: info
      annotations: 
        summary: "instance: {{ $labels.instance }} cpu 使用率高于 {{ $value }}"  
        description: ""    
        value: "{{ $value }}"
        instance: "{{ $labels.instance }}"
        grafana: "http://xxxxxxxx.com/d/node-exporter/node-exporter?orgId=1&var-instance={{ $labels.instance }} "
        console: "https://ecs.console.aliyun.com/#/server/{{ $labels.instanceid }}/detail?regiOnId=cn-beijing"
        cloudmonitor: "https://cloudmonitor.console.aliyun.com/#/hostDetail/chart/instanceId={{ $labels.instanceid }}&system=®ion=cn-beijing&aliyunhost=true"
        id: "{{ $labels.instanceid }}"
        type: "aliyun_meta_ecs_info"

    - alert: node-exporter-cpu-iowait-high 
      expr:  node_exporter:cpu:iowait:percent >= 12
      for: 3m
      labels: 
        severity: info
      annotations: 
        summary: "instance: {{ $labels.instance }} cpu iowait 使用率高于 {{ $value }}"  
        description: ""    
        value: "{{ $value }}"
        instance: "{{ $labels.instance }}"
        grafana: "http://xxxxxxxx.com/d/node-exporter/node-exporter?orgId=1&var-instance={{ $labels.instance }} "
        console: "https://ecs.console.aliyun.com/#/server/{{ $labels.instanceid }}/detail?regiOnId=cn-beijing"
        cloudmonitor: "https://cloudmonitor.console.aliyun.com/#/hostDetail/chart/instanceId={{ $labels.instanceid }}&system=®ion=cn-beijing&aliyunhost=true"
        id: "{{ $labels.instanceid }}"
        type: "aliyun_meta_ecs_info"

    - alert: node-exporter-load-load1-high 
      expr:  (node_exporter:load:load1) > (node_exporter:cpu:count) * 1.2
      for: 3m
      labels: 
        severity: info
      annotations: 
        summary: "instance: {{ $labels.instance }} load1 使用率高于 {{ $value }}"  
        description: ""    
        value: "{{ $value }}"
        instance: "{{ $labels.instance }}"
        grafana: "http://xxxxxxxx.com/d/node-exporter/node-exporter?orgId=1&var-instance={{ $labels.instance }} "
        console: "https://ecs.console.aliyun.com/#/server/{{ $labels.instanceid }}/detail?regiOnId=cn-beijing"
        cloudmonitor: "https://cloudmonitor.console.aliyun.com/#/hostDetail/chart/instanceId={{ $labels.instanceid }}&system=®ion=cn-beijing&aliyunhost=true"
        id: "{{ $labels.instanceid }}"
        type: "aliyun_meta_ecs_info"

    - alert: node-exporter-memory-high
      expr:  node_exporter:memory:used:percent > 85
      for: 3m
      labels: 
        severity: info
      annotations: 
        summary: "instance: {{ $labels.instance }} memory 使用率高于 {{ $value }}"  
        description: ""    
        value: "{{ $value }}"
        instance: "{{ $labels.instance }}"
        grafana: "http://xxxxxxxx.com/d/node-exporter/node-exporter?orgId=1&var-instance={{ $labels.instance }} "
        console: "https://ecs.console.aliyun.com/#/server/{{ $labels.instanceid }}/detail?regiOnId=cn-beijing"
        cloudmonitor: "https://cloudmonitor.console.aliyun.com/#/hostDetail/chart/instanceId={{ $labels.instanceid }}&system=®ion=cn-beijing&aliyunhost=true"
        id: "{{ $labels.instanceid }}"
        type: "aliyun_meta_ecs_info"

    - alert: node-exporter-disk-high
      expr:  node_exporter:disk:used:percent > 88
      for: 10m
      labels: 
        severity: info
      annotations: 
        summary: "instance: {{ $labels.instance }} disk 使用率高于 {{ $value }}"  
        description: ""    
        value: "{{ $value }}"
        instance: "{{ $labels.instance }}"
        grafana: "http://xxxxxxxx.com/d/node-exporter/node-exporter?orgId=1&var-instance={{ $labels.instance }} "
        console: "https://ecs.console.aliyun.com/#/server/{{ $labels.instanceid }}/detail?regiOnId=cn-beijing"
        cloudmonitor: "https://cloudmonitor.console.aliyun.com/#/hostDetail/chart/instanceId={{ $labels.instanceid }}&system=®ion=cn-beijing&aliyunhost=true"
        id: "{{ $labels.instanceid }}"
        type: "aliyun_meta_ecs_info"

    - alert: node-exporter-disk-read:count-high
      expr:  node_exporter:disk:read:count:rate > 3000
      for: 2m
      labels: 
        severity: info
      annotations: 
        summary: "instance: {{ $labels.instance }} iops read 使用率高于 {{ $value }}"  
        description: ""    
        value: "{{ $value }}"
        instance: "{{ $labels.instance }}"
        grafana: "http://xxxxxxxx.com/d/node-exporter/node-exporter?orgId=1&var-instance={{ $labels.instance }} "
        console: "https://ecs.console.aliyun.com/#/server/{{ $labels.instanceid }}/detail?regiOnId=cn-beijing"
        cloudmonitor: "https://cloudmonitor.console.aliyun.com/#/hostDetail/chart/instanceId={{ $labels.instanceid }}&system=®ion=cn-beijing&aliyunhost=true"
        id: "{{ $labels.instanceid }}"
        type: "aliyun_meta_ecs_info"

    - alert: node-exporter-disk-write-count-high
      expr:  node_exporter:disk:write:count:rate > 3000
      for: 2m
      labels: 
        severity: info
      annotations: 
        summary: "instance: {{ $labels.instance }} iops write 使用率高于 {{ $value }}"  
        description: ""    
        value: "{{ $value }}"
        instance: "{{ $labels.instance }}"
        grafana: "http://xxxxxxxx.com/d/node-exporter/node-exporter?orgId=1&var-instance={{ $labels.instance }} "
        console: "https://ecs.console.aliyun.com/#/server/{{ $labels.instanceid }}/detail?regiOnId=cn-beijing"
        cloudmonitor: "https://cloudmonitor.console.aliyun.com/#/hostDetail/chart/instanceId={{ $labels.instanceid }}&system=®ion=cn-beijing&aliyunhost=true"
        id: "{{ $labels.instanceid }}"
        type: "aliyun_meta_ecs_info"




    - alert: node-exporter-disk-read-mb-high
      expr:  node_exporter:disk:read:mb:rate > 60 
      for: 2m
      labels: 
        severity: info
      annotations: 
        summary: "instance: {{ $labels.instance }} 读取字节数 高于 {{ $value }}"  
        description: ""    
        instance: "{{ $labels.instance }}"
        value: "{{ $value }}"
        grafana: "http://xxxxxxxx.com/d/node-exporter/node-exporter?orgId=1&var-instance={{ $labels.instance }} "
        console: "https://ecs.console.aliyun.com/#/server/{{ $labels.instanceid }}/detail?regiOnId=cn-beijing"
        cloudmonitor: "https://cloudmonitor.console.aliyun.com/#/hostDetail/chart/instanceId={{ $labels.instanceid }}&system=®ion=cn-beijing&aliyunhost=true"
        id: "{{ $labels.instanceid }}"
        type: "aliyun_meta_ecs_info"

    - alert: node-exporter-disk-write-mb-high
      expr:  node_exporter:disk:write:mb:rate > 60
      for: 2m
      labels: 
        severity: info
      annotations: 
        summary: "instance: {{ $labels.instance }} 写入字节数 高于 {{ $value }}"  
        description: ""    
        value: "{{ $value }}"
        instance: "{{ $labels.instance }}"
        grafana: "http://xxxxxxxx.com/d/node-exporter/node-exporter?orgId=1&var-instance={{ $labels.instance }} "
        console: "https://ecs.console.aliyun.com/#/server/{{ $labels.instanceid }}/detail?regiOnId=cn-beijing"
        cloudmonitor: "https://cloudmonitor.console.aliyun.com/#/hostDetail/chart/instanceId={{ $labels.instanceid }}&system=®ion=cn-beijing&aliyunhost=true"
        id: "{{ $labels.instanceid }}"
        type: "aliyun_meta_ecs_info"

    - alert: node-exporter-filefd-allocated-percent-high 
      expr:  node_exporter:filefd_allocated:percent > 80
      for: 10m
      labels: 
        severity: info
      annotations: 
        summary: "instance: {{ $labels.instance }} 打开文件描述符 高于 {{ $value }}"  
        description: ""    
        value: "{{ $value }}"
        instance: "{{ $labels.instance }}"
        grafana: "http://xxxxxxxx.com/d/node-exporter/node-exporter?orgId=1&var-instance={{ $labels.instance }} "
        console: "https://ecs.console.aliyun.com/#/server/{{ $labels.instanceid }}/detail?regiOnId=cn-beijing"
        cloudmonitor: "https://cloudmonitor.console.aliyun.com/#/hostDetail/chart/instanceId={{ $labels.instanceid }}&system=®ion=cn-beijing&aliyunhost=true"
        id: "{{ $labels.instanceid }}"
        type: "aliyun_meta_ecs_info"

    - alert: node-exporter-network-netin-error-rate-high
      expr:  node_exporter:network:netin:error:rate > 4
      for: 1m
      labels: 
        severity: info
      annotations: 
        summary: "instance: {{ $labels.instance }} 包进入的错误速率 高于 {{ $value }}"  
        description: ""    
        value: "{{ $value }}"
        instance: "{{ $labels.instance }}"
        grafana: "http://xxxxxxxx.com/d/node-exporter/node-exporter?orgId=1&var-instance={{ $labels.instance }} "
        console: "https://ecs.console.aliyun.com/#/server/{{ $labels.instanceid }}/detail?regiOnId=cn-beijing"
        cloudmonitor: "https://cloudmonitor.console.aliyun.com/#/hostDetail/chart/instanceId={{ $labels.instanceid }}&system=®ion=cn-beijing&aliyunhost=true"
        id: "{{ $labels.instanceid }}"
        type: "aliyun_meta_ecs_info"
    - alert: node-exporter-network-netin-packet-rate-high
      expr:  node_exporter:network:netin:packet:rate > 35000
      for: 1m
      labels: 
        severity: info
      annotations: 
        summary: "instance: {{ $labels.instance }} 包进入速率 高于 {{ $value }}"  
        description: ""    
        value: "{{ $value }}"
        instance: "{{ $labels.instance }}"
        grafana: "http://xxxxxxxx.com/d/node-exporter/node-exporter?orgId=1&var-instance={{ $labels.instance }} "
        console: "https://ecs.console.aliyun.com/#/server/{{ $labels.instanceid }}/detail?regiOnId=cn-beijing"
        cloudmonitor: "https://cloudmonitor.console.aliyun.com/#/hostDetail/chart/instanceId={{ $labels.instanceid }}&system=®ion=cn-beijing&aliyunhost=true"
        id: "{{ $labels.instanceid }}"
        type: "aliyun_meta_ecs_info"

    - alert: node-exporter-network-netout-packet-rate-high
      expr:  node_exporter:network:netout:packet:rate > 35000
      for: 1m
      labels: 
        severity: info
      annotations: 
        summary: "instance: {{ $labels.instance }} 包流出速率 高于 {{ $value }}"  
        description: ""    
        value: "{{ $value }}"
        instance: "{{ $labels.instance }}"
        grafana: "http://xxxxxxxx.com/d/node-exporter/node-exporter?orgId=1&var-instance={{ $labels.instance }} "
        console: "https://ecs.console.aliyun.com/#/server/{{ $labels.instanceid }}/detail?regiOnId=cn-beijing"
        cloudmonitor: "https://cloudmonitor.console.aliyun.com/#/hostDetail/chart/instanceId={{ $labels.instanceid }}&system=®ion=cn-beijing&aliyunhost=true"
        id: "{{ $labels.instanceid }}"
        type: "aliyun_meta_ecs_info"

    - alert: node-exporter-network-tcp-total-count-high
      expr:  node_exporter:network:tcp:total:count > 40000
      for: 1m
      labels: 
        severity: info
      annotations: 
        summary: "instance: {{ $labels.instance }} tcp连接数量 高于 {{ $value }}"  
        description: ""    
        value: "{{ $value }}"
        instance: "{{ $labels.instance }}"
        grafana: "http://xxxxxxxx.com/d/node-exporter/node-exporter?orgId=1&var-instance={{ $labels.instance }} "
        console: "https://ecs.console.aliyun.com/#/server/{{ $labels.instanceid }}/detail?regiOnId=cn-beijing"
        cloudmonitor: "https://cloudmonitor.console.aliyun.com/#/hostDetail/chart/instanceId={{ $labels.instanceid }}&system=®ion=cn-beijing&aliyunhost=true"
        id: "{{ $labels.instanceid }}"
        type: "aliyun_meta_ecs_info"

    - alert: node-exporter-process-zoom-total-count-high 
      expr:  node_exporter:process:zoom:total:count > 10
      for: 10m
      labels: 
        severity: info
      annotations: 
        summary: "instance: {{ $labels.instance }} 僵死进程数量 高于 {{ $value }}"  
        description: ""    
        value: "{{ $value }}"
        instance: "{{ $labels.instance }}"
        grafana: "http://xxxxxxxx.com/d/node-exporter/node-exporter?orgId=1&var-instance={{ $labels.instance }} "
        console: "https://ecs.console.aliyun.com/#/server/{{ $labels.instanceid }}/detail?regiOnId=cn-beijing"
        cloudmonitor: "https://cloudmonitor.console.aliyun.com/#/hostDetail/chart/instanceId={{ $labels.instanceid }}&system=®ion=cn-beijing&aliyunhost=true"
        id: "{{ $labels.instanceid }}"
        type: "aliyun_meta_ecs_info"

    - alert: node-exporter-time-offset-high
      expr:  node_exporter:time:offset > 0.03
      for: 2m
      labels: 
        severity: info
      annotations:
        summary: "instance: {{ $labels.instance }} {{ $labels.desc }}  {{ $value }} {{ $labels.unit }}"  
        description: ""    
        value: "{{ $value }}"
        instance: "{{ $labels.instance }}"
        grafana: "http://xxxxxxxx.com/d/node-exporter/node-exporter?orgId=1&var-instance={{ $labels.instance }} "
        console: "https://ecs.console.aliyun.com/#/server/{{ $labels.instanceid }}/detail?regiOnId=cn-beijing"
        cloudmonitor: "https://cloudmonitor.console.aliyun.com/#/hostDetail/chart/instanceId={{ $labels.instanceid }}&system=®ion=cn-beijing&aliyunhost=true"
        id: "{{ $labels.instanceid }}"
        type: "aliyun_meta_ecs_info"

准备这2个文件放置到/usr/local/prometheus/prometheus/rules文件夹里面,确保prometheus的主配置文件有如下部分: 

rule_files:
  - "rules/*rules.yml"
  # - "second_rules.yml"

重启prometheus服务, 可以在web界面看到对应的规则。

可以直接在表达式浏览器中输入我们定义好的记录规则表达式了,如下。 

其他

网上对prometheus的规则相对较少, 这里提供一个地址,可以参考参考: https://awesome-prometheus-alerts.grep.to/rules


推荐阅读
  • Metasploit攻击渗透实践
    本文介绍了Metasploit攻击渗透实践的内容和要求,包括主动攻击、针对浏览器和客户端的攻击,以及成功应用辅助模块的实践过程。其中涉及使用Hydra在不知道密码的情况下攻击metsploit2靶机获取密码,以及攻击浏览器中的tomcat服务的具体步骤。同时还讲解了爆破密码的方法和设置攻击目标主机的相关参数。 ... [详细]
  • 本文介绍了数据库的存储结构及其重要性,强调了关系数据库范例中将逻辑存储与物理存储分开的必要性。通过逻辑结构和物理结构的分离,可以实现对物理存储的重新组织和数据库的迁移,而应用程序不会察觉到任何更改。文章还展示了Oracle数据库的逻辑结构和物理结构,并介绍了表空间的概念和作用。 ... [详细]
  • 也就是|小窗_卷积的特征提取与参数计算
    篇首语:本文由编程笔记#小编为大家整理,主要介绍了卷积的特征提取与参数计算相关的知识,希望对你有一定的参考价值。Dense和Conv2D根本区别在于,Den ... [详细]
  • 本文由编程笔记#小编为大家整理,主要介绍了logistic回归(线性和非线性)相关的知识,包括线性logistic回归的代码和数据集的分布情况。希望对你有一定的参考价值。 ... [详细]
  • 本文比较了eBPF和WebAssembly作为云原生VM的特点和应用领域。eBPF作为运行在Linux内核中的轻量级代码执行沙箱,适用于网络或安全相关的任务;而WebAssembly作为图灵完备的语言,在商业应用中具有优势。同时,介绍了WebAssembly在Linux内核中运行的尝试以及基于LLVM的云原生WebAssembly编译器WasmEdge Runtime的案例,展示了WebAssembly作为原生应用程序的潜力。 ... [详细]
  • 本文为Codeforces 1294A题目的解析,主要讨论了Collecting Coins整除+不整除问题。文章详细介绍了题目的背景和要求,并给出了解题思路和代码实现。同时提供了在线测评地址和相关参考链接。 ... [详细]
  • 2018年人工智能大数据的爆发,学Java还是Python?
    本文介绍了2018年人工智能大数据的爆发以及学习Java和Python的相关知识。在人工智能和大数据时代,Java和Python这两门编程语言都很优秀且火爆。选择学习哪门语言要根据个人兴趣爱好来决定。Python是一门拥有简洁语法的高级编程语言,容易上手。其特色之一是强制使用空白符作为语句缩进,使得新手可以快速上手。目前,Python在人工智能领域有着广泛的应用。如果对Java、Python或大数据感兴趣,欢迎加入qq群458345782。 ... [详细]
  • 如何实现织梦DedeCms全站伪静态
    本文介绍了如何通过修改织梦DedeCms源代码来实现全站伪静态,以提高管理和SEO效果。全站伪静态可以避免重复URL的问题,同时通过使用mod_rewrite伪静态模块和.htaccess正则表达式,可以更好地适应搜索引擎的需求。文章还提到了一些相关的技术和工具,如Ubuntu、qt编程、tomcat端口、爬虫、php request根目录等。 ... [详细]
  • 本文介绍了在Python3中如何使用选择文件对话框的格式打开和保存图片的方法。通过使用tkinter库中的filedialog模块的asksaveasfilename和askopenfilename函数,可以方便地选择要打开或保存的图片文件,并进行相关操作。具体的代码示例和操作步骤也被提供。 ... [详细]
  • EPICS Archiver Appliance存储waveform记录的尝试及资源需求分析
    本文介绍了EPICS Archiver Appliance存储waveform记录的尝试过程,并分析了其所需的资源容量。通过解决错误提示和调整内存大小,成功存储了波形数据。然后,讨论了储存环逐束团信号的意义,以及通过记录多圈的束团信号进行参数分析的可能性。波形数据的存储需求巨大,每天需要近250G,一年需要90T。然而,储存环逐束团信号具有重要意义,可以揭示出每个束团的纵向振荡频率和模式。 ... [详细]
  • 本文分享了一个关于在C#中使用异步代码的问题,作者在控制台中运行时代码正常工作,但在Windows窗体中却无法正常工作。作者尝试搜索局域网上的主机,但在窗体中计数器没有减少。文章提供了相关的代码和解决思路。 ... [详细]
  • Windows下配置PHP5.6的方法及注意事项
    本文介绍了在Windows系统下配置PHP5.6的步骤及注意事项,包括下载PHP5.6、解压并配置IIS、添加模块映射、测试等。同时提供了一些常见问题的解决方法,如下载缺失的msvcr110.dll文件等。通过本文的指导,读者可以轻松地在Windows系统下配置PHP5.6,并解决一些常见的配置问题。 ... [详细]
  • 本文主要解析了Open judge C16H问题中涉及到的Magical Balls的快速幂和逆元算法,并给出了问题的解析和解决方法。详细介绍了问题的背景和规则,并给出了相应的算法解析和实现步骤。通过本文的解析,读者可以更好地理解和解决Open judge C16H问题中的Magical Balls部分。 ... [详细]
  • 本文介绍了P1651题目的描述和要求,以及计算能搭建的塔的最大高度的方法。通过动态规划和状压技术,将问题转化为求解差值的问题,并定义了相应的状态。最终得出了计算最大高度的解法。 ... [详细]
  • Webmin远程命令执行漏洞复现及防护方法
    本文介绍了Webmin远程命令执行漏洞CVE-2019-15107的漏洞详情和复现方法,同时提供了防护方法。漏洞存在于Webmin的找回密码页面中,攻击者无需权限即可注入命令并执行任意系统命令。文章还提供了相关参考链接和搭建靶场的步骤。此外,还指出了参考链接中的数据包不准确的问题,并解释了漏洞触发的条件。最后,给出了防护方法以避免受到该漏洞的攻击。 ... [详细]
author-avatar
PHP1.CN | 中国最专业的PHP中文社区 | DevBox开发工具箱 | json解析格式化 |PHP资讯 | PHP教程 | 数据库技术 | 服务器技术 | 前端开发技术 | PHP框架 | 开发工具 | 在线工具
Copyright © 1998 - 2020 PHP1.CN. All Rights Reserved | 京公网安备 11010802041100号 | 京ICP备19059560号-4 | PHP1.CN 第一PHP社区 版权所有