" First failure: what should occur the first time the service fails. Valid options are "Take No Action", "Restart the Service", "Run a Program", and "Restart the Computer".
Second failure: same options the second time a service fails
Subsequent failures: same options for any subsequent failure
Reset fail count after: the number of days the service must be running before the failure count is reset
Restart service after: the amount of time in minutes to wait to restart the service
This is very nice, but it is very easy to misunderstand what these values actually do. I have seen a number of services (and I tried this myself) set these values to 0 days and 0 minutes. The problem is your service will continually restart if you set the failure count to reset after 0 days, if the service at least started correctly. The result is only the first option ("first failure") will ever be run.
To fix this, set the failure count to reset after one day. The drawback to this approach is your service may stay stopped after failing several times but this likely means something is toast anyways.
One thing also to take into account is not all services will work with the reset logic - or in other words just setting the recovery options on any service does not guarantee that it will restart. In order for the service to restart, it must exit abnormally. This generally means the service must exist with a non-zero exit code and the service status must not be stopped (note: this has changed for Vista - it is possible to set the service status to stopped and provide an exit code to trigger the restart logic).
I have a Windows service that exits unexpectedly every few days. Is there a simple way to monitor it to make sure it gets restarted quickly if it crashes?
Under the Services application, select the properties of the service in question.
View the recovery tab - there are all sorts of options - I\'d set First & Second Failure to Restart the Service, Third to run a batch program that BLAT\'s out an email with the third failure notification.
You should also set the Reset Fail Count to 1 to reset the fail count daily.
EDIT:
Looks like you can do this via a command line:
SC failure w3svc reset= 432000 actions= restart/30000/restart/60000/run/60000
SC failure w3svc command= "MyBatchFile.cmd"
Your MyBatchFile.CMD file can look like this:
blat - -body "Service W3svc Failed" -subject "SERVICE ERROR" -to Notify@Example.com -server SMTP.Example.com -f Administrator@Example.com
The reset failure count is the "trigger" for the second recovery action. IF its is set to 0 it will never trigger the second condiction.
Setting "Reset fail count after" to 0 means "reset the fail count to 0 after each failure" until a reboot occurs.
The 0 effectively disables both, the "second failure" and "subsequent failure" actions and you will always get the "first failure" action, until you reboot the machine.