1. 29 Oct, 2018 1 commit
    • Ben Kochie's avatar
      Improve node CPU recording rules. · 723aa629
      Ben Kochie authored
      * Drop obsoletely `node_cpu` metric recordings.
      * Drop `CPUOutlierDetectionOnPrd` that doesn't work due to missing
      recordings.
      * Add new 1m rate recordings, with a 1m interval.
      * Move CPU alerts to new metrics.
      * Drop environment filter from CPU alerts.
      * Drop 80% CPU threshold for "High CPU" to avoid alert noise.
      * Move old 5m alerting to separate rule group.
      723aa629
  2. 25 Oct, 2018 3 commits
  3. 23 Oct, 2018 3 commits
  4. 22 Oct, 2018 3 commits
  5. 19 Oct, 2018 1 commit
  6. 18 Oct, 2018 2 commits
    • Ben Kochie's avatar
      Update Prometheus metamon · 5627559d
      Ben Kochie authored
      * Remove obsolete Prometheus 1.x local storage alerts.
      * Simplify queries.
      * Add an alert for rule group evaluation taking longer than 70% of the
      interval.
      * Update the slow rule documentation.
      5627559d
    • John T Skarbek's avatar
      Update the alerts for postgresql-01 · 12f771ff
      John T Skarbek authored
      * This node is no longer serving as an archive replica
      * Now he's participating as a node in the cluster
      * Removes an alert that was crafted for him as an archive replica
      * Removes him from the exception list of the rest of our alerts
      12f771ff
  7. 17 Oct, 2018 6 commits
  8. 15 Oct, 2018 2 commits
  9. 12 Oct, 2018 4 commits
  10. 11 Oct, 2018 1 commit
  11. 10 Oct, 2018 2 commits
  12. 09 Oct, 2018 3 commits
  13. 08 Oct, 2018 6 commits
  14. 03 Oct, 2018 1 commit
  15. 02 Oct, 2018 2 commits