Resolution duration can impact the alert creation and time the alert is raised
Two main attention points when configuring the alert resolution:
- HPA is waiting for the end of each observation period to evaluate the metric(s) value with the threshold(s) and create an alert or not
- During the period, if HPA is sometimes not collecting metric value, those data points will be considered as empty, and excluded from the metric computation of the period used to be compared with the threshold(s)
Having a resolution period not fully adapted can issue false positive with potentially wrong alert creation date.
Illustration:
- Requirement is to raise an alert about the Heap memory usage of an ST if higher than 95% during 12 minutes
- If we define a resolution of 12 minutes and we define Periods Over Threshold of 1 and Observed Periods of 1
- If the heap usage is reaching 95% at the end of the ST run, meaning only the last data point is filled before the ST ends
- Then HPA will detect the thresholds has been exceeded once, and will wait for the 12 minutes period to be over
- At the end of the period, HPA will average the heap %age metric values (collected every 15 seconds)
- HPA will find only one data point, make the average onto, then determine a 95% average metric value for the period, and create an alert
- A wrong alert will be then created, and lately created compare to the time the ST exceeded the threshold
- In a such case it would be better to configure the alert like
- Resolution 1 minute
- Periods Over Threshold 12
- Observed Periods 12
- In this way, reaching the end of the ST like previously described won’t create any alert
- In case the situation of the ST heap %age is effectively higher than 95% during 12 minutes, the alert will be created after 12 minutes
- It is also possible to consider an Observed Periods of 13 or 14 if we want to detect a situation where the heap %age is very often higher than the threshold but not always only