Using Alarms
Alarms are used to indicate when some condition has arisen that warrants operator intervention. In order to ensure the integrity of alarms, components are required to broadcast the state of each alarm periodically. When situations occur in which a component is unable to determine and/or broadcast the alarm state, the alarm severity is automatically set to Disconnected
after some period of time. This is brought to the operator’s attention and the operator can then take appropriate action. This means even when situations are normal, the component must continue to publish each alarm with an Okay
severity; otherwise, the alarm severity is automatically marked as Disconnected
, prompting the operator to investigate.
The CSW Alarm Service provides two APIs: a “Client API” used by a component, and an “Admin API” used to manage the Alarm Store and for operator use (typically via HCMS user interfaces). The Client API consists of a single method call, setSeverity
. The Admin API includes methods to set up the Alarm Store, get alarm severities and health of components or subsystems, acknowledge alarms, reset latched alarms, and other operator tasks.
The Admin API can be exercised using the command line tool provided with CSW: csw-alarm-cli
. See the reference manual for more info.
More details about the Alarm Service are found in the Alarm Service manual.
Tutorial: Using alarms in a component
We will use our sample Assembly to monitor the counter event it is subscribed to, published by our sample HCD. Our alarm is based on the value of the counter, with the normal (Okay
) range of 0 to 10, Warning
range of 11 to 15, Major
range of 16 to 20, and any other value generating a Critical
alarm severity.
First, the Alarm Store must be initialized with our alarm using the CLI tool csw-alarm-cli
. A configuration file must be written that describes every alarm that will be used in the system. For TMT operations, this configuration will be generated from the ICD-DB models. For our tutorial, we will use a configuration with only the alarm we will be using.
- alarms.conf
-
source
alarms: [ { prefix = csw.sample name = counterTooHighAlarm description = "Warns when counter value is too high" location = "enclosure" alarmType = Absolute supportedSeverities = [Warning, Major, Critical] probableCause = "Sample HCD has run for too long" operatorResponse = "Restart HCD" isAutoAcknowledgeable = false isLatchable = false activationStatus = Active } ]
For our tutorial, let’s save this file to disk in our resources folder in the sample-deploy
module (sample-deploy/src/main/resources/alarms.conf
).
Now, we will use the CLI tool. Find it in the bin
directory of the CSW application package available with the release as csw-alarm-cli
.
Use the init
command to initialize the Alarm Store (this assumes csw-services is running, which sets up the Redis store for alarms).
csw-alarm-cli init $PROJECTDIR/sample-deploy/src/main/resources/alarms.conf --local
where $PROJECTDIR
is the root directory of your sample project. The --local
flag indicates the configuration file is obtains from disk; omitting it would attempt to find the file in the Configuration Service, as would be done during operations.
Now we will add code to our assembly to publish an alarm severity on every counter event. Let’s create some logic to take the counter as an argument and generate an alarm:
- Scala
-
source
private val safeRange = 0 to 10 private val warnRange = 11 to 15 private val majorRange = 16 to 20 private def getCounterSeverity(counter: Int) = counter match { case x if safeRange contains x => AlarmSeverity.Okay case x if warnRange contains x => AlarmSeverity.Warning case x if majorRange contains x => AlarmSeverity.Major case _ => AlarmSeverity.Critical } private val counterAlarmKey = AlarmKey(componentInfo.prefix, "CounterTooHighAlarm") private def setCounterAlarm(counter: Int): Unit = { // fire alarm according to counter value val severity = getCounterSeverity(counter) alarmService.setSeverity(counterAlarmKey, severity).onComplete { case Success(_) => log.info(s"Severity for alarm ${counterAlarmKey.name} set to " + severity.toString) case Failure(ex) => log.error(s"Error setting severity for alarm ${counterAlarmKey.name}: ${ex.getMessage}") } }
- Java
-
source
private AlarmSeverity getCounterSeverity(int counter) { if (counter >= 0 && counter <= 10) { return JAlarmSeverity.Okay; } else if (counter >= 11 && counter <= 15) { return JAlarmSeverity.Warning; } else if (counter >= 16 && counter <= 20) { return JAlarmSeverity.Major; } return JAlarmSeverity.Critical; } private void setCounterAlarm(int counter) { AlarmKey counterAlarmKey = new AlarmKey(cswCtx.componentInfo().prefix(), "CounterTooHighAlarm"); AlarmSeverity severity = getCounterSeverity(counter); cswCtx.alarmService().setSeverity(counterAlarmKey, severity) .whenComplete((d, ex) -> { if (ex != null) { log.error("Error setting severity for alarm " + counterAlarmKey.name() + ": " + ex.getMessage()); } else { log.info("Severity for alarm " + counterAlarmKey.name() + " set to " + severity.toString()); } }); }
This code determines the severity of the alarm based on the rules we established above:
Okay
: 0-10Warning
: 11-15Major
: 16-20Critical
: any other value
Now, all we have to do is call this whenever we receive a counter event. We add a call to the setCounterAlarm
method in the processEvent
method:
- Scala
-
source
private def processEvent(event: Event): Unit = { log.info(s"Event received: ${event.eventKey}") event match { case e: SystemEvent => e.eventKey match { case `counterEventKey` => val counter = e(hcdCounterKey).head log.info(s"Counter = $counter") setCounterAlarm(counter) case _ => log.warn("Unexpected event received.") } case _: ObserveEvent => log.warn("Unexpected ObserveEvent received.") // not expected } }
- Java
-
source
private void processEvent(Event event) { log.info("Event received: " + event.eventKey()); if (event instanceof SystemEvent) { SystemEvent sysEvent = (SystemEvent) event; if (event.eventKey().equals(counterEventKey)) { int counter = sysEvent.parameter(hcdCounterKey).head(); log.info("Counter = " + counter); setCounterAlarm(counter); } else { log.warn("Unexpected event received."); } } else { // ObserveEvent, not expected log.warn("Unexpected ObserveEvent received."); } }
To see the effect, let’s use the CLI to set up a subscription to the alarm. Note the alarm key is the component’s prefix (csw.sample
), and the alarm name (counterTooHighAlarm
).
- Scala
-
csw-alarm-cli severity subscribe --subsystem csw --component sample --name counterTooHighAlarm
- Java
-
csw-alarm-cli severity subscribe --subsystem csw --component sample --name counterTooHighAlarm
Note that the alarm severity is currently Disconnected
. This is the appropriate state, since we are not running the components. Now, run the Assembly and HCD, and you will see the severity of our alarm updated in the CLI as the severity changes.