Alarm Service

The Alarm Service provides API to manage alarms in the TMT software system. The service uses Redis to store Alarm data, including the alarm status and associated metadata. Alarm “keys” are used to access information about an alarm.

Dependencies

The Alarm Service comes bundled with the Framework, no additional dependency needs to be added to your build.sbt file if using it. To use the Alarm service without using the framework, add this to your build.sbt file:

sbt
libraryDependencies += "com.github.tmtsoftware.csw" %% "csw-alarm-client" % "0.6.0-RC1"

API Flavours

There are two APIs provided in the Alarm Service: a client API, and an administrative (admin) API. The client API is the API used by component developers to set the severity of an alarm. This is the only functionality needed by component developers. As per TMT policy, the severity of an alarm must be set periodically (within some time limit) in order to maintain the integrity of the alarm status. If an alarm severity is not refreshed within the time limit, currently set at TBD seconds, the severity is set to Disconnected by the Alarm Service, which indicates to the operator that there is some problem with the component’s ability to evaluate the alarm status.

The admin API provides all of the functions needed manage the alarm store, as well as providing access to monitor alarms for use by an operator or instrument specialist. The admin API provides the ability to load alarm data into alarm store, set severity of an alarm, acknowledge alarms, shelve or unshelve alarms, reset a latched alarm, get the metadata/status/severity of an alarm, and get or subscribe to aggregations of severity and health of the alarm, a component’s alarms, a subsystem’s alarms, or the alarms of the whole TMT System.

A command line tool is provided as part of the Alarm Service that implements this API can provides low level control over the Alarm Service. More details about alarm CLI can be found here: CSW Alarm Client CLI application

Eventually, operators will use Graphical User Interfaces that access the admin API through a UI gateway. This will be delivered as part of the ESW HCMS package.

Note

Since the admin API will primarily be used with the CLI and HCMS applications, it is only supported in Scala, and not Java.

To summarize, the APIs are as follows: * client API (AlarmService) : Must be used by component. Available method is : {setSeverity} * admin API (AlarmAdminService) : Expected to be used by administrator. Available methods are: {initAlarm | setSeverity | acknowledge | shelve | unshelve | reset | getMetaData | getStatus | getCurrentSeverity | getAggregatedSeverity | getAggregatedHealth | subscribeAggregatedSeverityCallback | subscribeAggregatedSeverityActorRef | subscribeAggregatedHealthCallback | subscribeAggregatedHealthActorRef }

Creating clientAPI and adminAPI

For component developers, the client API is provided as an AlarmService object in the CswContext object injected into the ComponentHandlers class provided by the framework.

If you are not using csw-framework, you can create AlarmService using AlarmServiceFactory.

Scala
// create alarm client using host and port of alarm server
private val clientAPI1 = new AlarmServiceFactory().makeClientApi("localhost", 5225)

// create alarm client using location service
private val clientAPI2 = new AlarmServiceFactory().makeClientApi(locationService)

// create alarm admin using host and port of alarm server
private val adminAPI1 = new AlarmServiceFactory().makeAdminApi("localhost", 5226)

// create alarm admin using location service
private val adminAPI2 = new AlarmServiceFactory().makeAdminApi(locationService)
Java
// create alarm client using host and port of alarm server
 IAlarmService jclientAPI1 = new AlarmServiceFactory().jMakeClientApi("localhost", 5227, actorSystem);

// create alarm client using location service
 IAlarmService jclientAPI2 = new AlarmServiceFactory().jMakeClientApi(jLocationService, actorSystem);

Rules and checkes

  • When representing a unique alarm, the alarm name or component name must not have * [ ] ^ - or any whitespace characters

Model Classes

  • AlarmKey : Represents the unique alarm in the TMT system. It is composed of subsystem, component and alarm name.
  • ComponentKey : Represents all alarms of a component
  • SubsystemKey : Represents all alarms of a subsystem
  • GlobalKey : Represents all alarms present in the TMT system
  • AlarmMetadata : Represents static metadata of an alarm, which will not change in its entire lifespan.
  • AlarmStatus : Represents dynamically changing data of the an alarm, which will be changing depending on the severity change or manually changed by an operator
  • AlarmSeverity : Represents severity levels that can be set by the component developer e.g. Okay, Indeterminate, Warning, Major and Critical
  • FullAlarmSeverity : Represents all possible severity levels of the alarm i.e. Disconnected (cannot be set by the developer) plus other severity levels that can be set by the developer
  • AlarmHealth : Represents possible health of an alarm or component or subsystem or whole TMT system

setSeverity

Sets the severity of the given alarm. The severity must be refreshed by setting it at a regular interval or it will automatically be changed to Disconnected after a specific time.

Scala
val alarmKey = AlarmKey(NFIRAOS, "trombone", "tromboneAxisLowLimitAlarm")

async {
  await(clientAPI.setSeverity(alarmKey, Okay))
}
Java
AlarmKey alarmKey = new AlarmKey(NFIRAOS, "trombone", "tromboneAxisLowLimitAlarm");

Done done = jclientAPI1.setSeverity(alarmKey, Okay).get();
Note
  • If the alarm is not refreshed within 9 seconds, it will be inferred as Disconnected
  • If the alarm is auto-acknowledgable and the severity is set to Okay then, the alarm will be auto-acknowledged and will not require any explicit admin action in terms of acknowledging

initAlarms

Loads the given alarm data in alarm store

Scala
async {
  val resource             = "test-alarms/valid-alarms.conf"
  val alarmsConfig: Config = ConfigFactory.parseResources(resource)
  await(adminAPI.initAlarms(alarmsConfig))
}

acknowledge

Acknowledges the given alarm which is raised to a higher severity

Scala
async {
  await(adminAPI.acknowledge(alarmKey))
}

shelve

Shelves the given alarm. Alarms will be un-shelved automatically at a specific time(i.e. 8 AM local time by default) if it is not un-shelved manually before that. The time to automatically un-shelve can be configured in application.conf for e.g csw-alarm.shelve-timeout = h:m:s a .

Scala
async {
  await(adminAPI.shelve(alarmKey))
}
Note

Shelved alarms are also considered in aggregation severity or health calculation of alarms.

unshelve

Unshelves the given alarm

Scala
async {
  await(adminAPI.unshelve(alarmKey))
}

reset

Resets the status of the given latched alarm by updating the latched severity same as current severity and acknowledgement status to acknowledged without changing any other properties of the alarm.

Scala
async {
  await(adminAPI.reset(alarmKey))
}

getMetadata

Gets the metadata of an alarm, component, subsystem, or whole TMT system. The following information is returned for each alarm:

  • subsystem
  • component
  • name
  • description
  • location
  • alarmType
  • supported severities
  • probable cause
  • operator response
  • is autoAcknowledgeable
  • is latchable
  • activation status
Scala
async {
  val metadata: AlarmMetadata = await(adminAPI.getMetadata(alarmKey))
}
Note

Inactive alarms will not be taking part in aggregation of severity or health. Alarms are set active or inactive in the alarm configuration file, and not through either API.

getStatus

Gets the status of the alarm which contains fields like:

  • latched severity
  • acknowledgement status
  • shelve status
  • alarm time
Scala
async {
  val status: AlarmStatus = await(adminAPI.getStatus(alarmKey))
}

getCurrentSeverity

Gets the severity of the alarm.

Scala
async {
  val severity: FullAlarmSeverity = await(adminAPI.getCurrentSeverity(alarmKey))
}

getAggregatedSeverity

Gets the aggregated severity for the given alarm/component/subsystem/whole TMT system. Aggregation of the severity represents the most severe alarm amongst multiple alarms.

Scala
async {
  val componentKey                          = ComponentKey(NFIRAOS, "tromboneAssembly")
  val aggregatedSeverity: FullAlarmSeverity = await(adminAPI.getAggregatedSeverity(componentKey))
}

getAggregatedHealth

Gets the aggregated health for the given alarm/component/subsystem/whole TMT system. Aggregation of health is either Good, ill or Bad based on the most severe alarm amongst multiple alarms.

Scala
async {
  val subsystemKey        = SubsystemKey(IRIS)
  val health: AlarmHealth = await(adminAPI.getAggregatedHealth(subsystemKey))
}

subscribeAggregatedSeverityCallback

Subscribes to the changes of aggregated severity for given alarm/component/subsystem/whole TMT system by providing a callback which gets executed for every change.

Scala
adminAPI.subscribeAggregatedSeverityCallback(
  ComponentKey(NFIRAOS, "tromboneAssembly"),
  aggregatedSeverity ⇒ { /* do something*/ }
)

subscribeAggregatedSeverityActorRef

Subscribes to the changes of aggregated severity for given alarm/component/subsystem/whole TMT system by providing an actor which will receive a message of aggregated severity on every change.

Scala
val severityActorRef = typed.ActorSystem(behaviour[FullAlarmSeverity], "fullSeverityActor")
adminAPI.subscribeAggregatedSeverityActorRef(SubsystemKey(NFIRAOS), severityActorRef)

subscribeAggregatedHealthCallback

Subscribe to the changes of aggregated health for given alarm/component/subsystem/whole TMT system by providing a callback which gets executed for every change.

Scala
adminAPI.subscribeAggregatedHealthCallback(
  ComponentKey(IRIS, "ImagerDetectorAssembly"),
  aggregatedHealth ⇒ { /* do something*/ }
)

subscribeAggregatedHealthActorRef

Subscribes to the changes of aggregated health for given alarm/component/subsystem/whole TMT system by providing an actor which will receive a message of aggregated severity on every change.

Scala
val healthActorRef = typed.ActorSystem(behaviour[AlarmHealth], "healthActor")
adminAPI.subscribeAggregatedHealthActorRef(SubsystemKey(IRIS), healthActorRef)