Using Alarms

Alarms are used to indicate when some condition has arisen that warrants operator intervention. In order to ensure the integrity of alarms, components are required to broadcast the state of each alarm periodically. When situations occur in which a component is unable to determine and/or broadcast the alarm state, the alarm severity is automatically set to Disconnected after some period of time. This is brought to the operator’s attention and the operator can then take appropriate action. This means even when situations are normal, the component must continue to publish each alarm with an Okay severity; otherwise, the alarm severity is automatically marked as Disconnected, prompting the operator to investigate.

The CSW Alarm Service provides two APIs: a “Client API” used by a component, and an “Admin API” used to manage the Alarm Store and for operator use (typically via HCMS user interfaces). The Client API consists of a single method call, setSeverity. The Admin API includes methods to set up the Alarm Store, get alarm severities and health of components or subsystems, acknowledge alarms, reset latched alarms, and other operator tasks.

The Admin API can be exercised using the command line tool provided with CSW: csw-alarm-cli. See the reference manual for more info.

More details about the Alarm Service are found in the Alarm Service manual.

Tutorial: Using alarms in a component

We will use our sample Assembly to monitor the counter event it is subscribed to, published by our sample HCD. Our alarm is based on the value of the counter, with the normal (Okay) range of 0 to 10, Warning range of 11 to 15, Major range of 16 to 20, and any other value generating a Critical alarm severity.

First, the Alarm Store must be initialized with our alarm using the CLI tool csw-alarm-cli. A configuration file must be written that describes every alarm that will be used in the system. For TMT operations, this configuration will be generated from the ICD-DB models. For our tutorial, we will use a configuration with only the alarm we will be using.

alarms.conf
sourcealarms: [
  {
    prefix = csw.sample
    name = counterTooHighAlarm
    description = "Warns when counter value is too high"
    location = "enclosure"
    alarmType = Absolute
    supportedSeverities = [Warning, Major, Critical]
    probableCause = "Sample HCD has run for too long"
    operatorResponse = "Restart HCD"
    isAutoAcknowledgeable = false
    isLatchable = false
    activationStatus = Active
  }
]

For our tutorial, let’s save this file to disk in our resources folder in the sample-deploy module (sample-deploy/src/main/resources/alarms.conf).

Now, we will use the CLI tool. Find it in the bin directory of the CSW application package available with the release as csw-alarm-cli.

Use the init command to initialize the Alarm Store (this assumes csw-services is running, which sets up the Redis store for alarms).

csw-alarm-cli init $PROJECTDIR/sample-deploy/src/main/resources/alarms.conf --local

where $PROJECTDIR is the root directory of your sample project. The --local flag indicates the configuration file is obtains from disk; omitting it would attempt to find the file in the Configuration Service, as would be done during operations.

Now we will add code to our assembly to publish an alarm severity on every counter event. Let’s create some logic to take the counter as an argument and generate an alarm:

Scala
sourceprivate val safeRange  = 0 to 10
private val warnRange  = 11 to 15
private val majorRange = 16 to 20
private def getCounterSeverity(counter: Int) =
  counter match {
    case x if safeRange contains x  => AlarmSeverity.Okay
    case x if warnRange contains x  => AlarmSeverity.Warning
    case x if majorRange contains x => AlarmSeverity.Major
    case _                          => AlarmSeverity.Critical
  }

private val counterAlarmKey = AlarmKey(componentInfo.prefix, "CounterTooHighAlarm")
private def setCounterAlarm(counter: Int): Unit = {
  // fire alarm according to counter value
  val severity = getCounterSeverity(counter)
  alarmService.setSeverity(counterAlarmKey, severity).onComplete {
    case Success(_)  => log.info(s"Severity for alarm ${counterAlarmKey.name} set to " + severity.toString)
    case Failure(ex) => log.error(s"Error setting severity for alarm ${counterAlarmKey.name}: ${ex.getMessage}")
  }
}
Java
sourceprivate AlarmSeverity getCounterSeverity(int counter) {
    if (counter >= 0 && counter <= 10) {
        return JAlarmSeverity.Okay;
    } else if (counter >= 11 && counter <= 15) {
        return JAlarmSeverity.Warning;
    } else if (counter >= 16 && counter <= 20) {
        return JAlarmSeverity.Major;
    }
    return JAlarmSeverity.Critical;
}

private void setCounterAlarm(int counter) {
    AlarmKey counterAlarmKey = new AlarmKey(cswCtx.componentInfo().prefix(), "CounterTooHighAlarm");
    AlarmSeverity severity = getCounterSeverity(counter);
    cswCtx.alarmService().setSeverity(counterAlarmKey, severity)
            .whenComplete((d, ex) -> {
                if (ex != null) {
                    log.error("Error setting severity for alarm " + counterAlarmKey.name() + ": " + ex.getMessage());
                } else {
                    log.info("Severity for alarm " + counterAlarmKey.name() + " set to " + severity.toString());
                }
            });
}

This code determines the severity of the alarm based on the rules we established above:

  • Okay: 0-10
  • Warning: 11-15
  • Major: 16-20
  • Critical: any other value

Now, all we have to do is call this whenever we receive a counter event. We add a call to the setCounterAlarm method in the processEvent method:

Scala
sourceprivate def processEvent(event: Event): Unit = {
  log.info(s"Event received: ${event.eventKey}")
  event match {
    case e: SystemEvent =>
      e.eventKey match {
        case `counterEventKey` =>
          val counter = e(hcdCounterKey).head
          log.info(s"Counter = $counter")
          setCounterAlarm(counter)

        case _ => log.warn("Unexpected event received.")
      }
    case _: ObserveEvent => log.warn("Unexpected ObserveEvent received.") // not expected
  }
}
Java
sourceprivate void processEvent(Event event) {
    log.info("Event received: " + event.eventKey());
    if (event instanceof SystemEvent) {
        SystemEvent sysEvent = (SystemEvent) event;
        if (event.eventKey().equals(counterEventKey)) {
            int counter = sysEvent.parameter(hcdCounterKey).head();
            log.info("Counter = " + counter);
            setCounterAlarm(counter);
        } else {
            log.warn("Unexpected event received.");
        }
    } else {
        // ObserveEvent, not expected
        log.warn("Unexpected ObserveEvent received.");
    }
}

To see the effect, let’s use the CLI to set up a subscription to the alarm. Note the alarm key is the component’s prefix (csw.sample), and the alarm name (counterTooHighAlarm).

Scala
csw-alarm-cli severity subscribe --subsystem csw --component sample --name counterTooHighAlarm
Java
csw-alarm-cli severity subscribe --subsystem csw --component sample --name counterTooHighAlarm

Note that the alarm severity is currently Disconnected. This is the appropriate state, since we are not running the components. Now, run the Assembly and HCD, and you will see the severity of our alarm updated in the CLI as the severity changes.