When using the agent down functionality in AM 9.1 if an agent is down during the healthcheck you only get 1 alert for it being down. Because the StatusChangeTime is never updated from when the agent went down.
I use the Alain KSDepot, ServerDown script to ping servers, and after they are down, and if they are still down they will continue to generate events. Pingmachine same thing, if it fails to ping a server or servers, those servers will still trigger down events. The AMHealth_agentdown only creates 1 event and until you bring that agent back up, it'll never alert again that the agent is still down. How am I going to know which agents are still down?
Maybe a Management group, like maintenance mode, or should I look at Greyed out servers. I need a daily report of agents that are still down whenever agent down alerts.
What would be reasonable is that you have a SQL query to the amheartbeat table that resets the statuschangeTime to the more recent time if the server is not up and running still. E.g Repage out.
Very similar to event collapsing.
Otherwise unless you have some report mechanism that can be generated and works for thousands of agents, there's no way to do this?
by: Phil S. | over a year ago | AppManager Core
Comments
Thanks Phil for sharing the feedback! Let me check the feasibility with the team and i will return on this to you. Once again thanks for the feedback!