Alarm Service
The Alarm Service provides API to manage alarms in the TMT software system. The service uses Redis to store Alarm data, including the alarm status and associated metadata. Alarm “keys” are used to access information about an alarm.
Dependencies
The Alarm Service comes bundled with the Framework, no additional dependency needs to be added to your build.sbt
file if using it. To use the Alarm service without using the framework, add this to your build.sbt
file:
- sbt
-
libraryDependencies += "com.github.tmtsoftware.csw" %% "csw-alarm-client" % "0.6.0-RC1"
API Flavours
There are two APIs provided in the Alarm Service: a client API, and an administrative (admin) API. The client API is the API used by component developers to set the severity of an alarm. This is the only functionality needed by component developers. As per TMT policy, the severity of an alarm must be set periodically (within some time limit) in order to maintain the integrity of the alarm status. If an alarm severity is not refreshed within the time limit, currently set at TBD seconds, the severity is set to Disconnected
by the Alarm Service, which indicates to the operator that there is some problem with the component’s ability to evaluate the alarm status.
The admin API provides all of the functions needed manage the alarm store, as well as providing access to monitor alarms for use by an operator or instrument specialist. The admin API provides the ability to load alarm data into alarm store, set severity of an alarm, acknowledge alarms, shelve or unshelve alarms, reset a latched alarm, get the metadata/status/severity of an alarm, and get or subscribe to aggregations of severity and health of the alarm, a component’s alarms, a subsystem’s alarms, or the alarms of the whole TMT System.
A command line tool is provided as part of the Alarm Service that implements this API can provides low level control over the Alarm Service. More details about alarm CLI can be found here: CSW Alarm Client CLI application
Eventually, operators will use Graphical User Interfaces that access the admin API through a UI gateway. This will be delivered as part of the ESW HCMS package.
Since the admin API will primarily be used with the CLI and HCMS applications, it is only supported in Scala, and not Java.
To summarize, the APIs are as follows: * client API (AlarmService) : Must be used by component. Available method is : {setSeverity}
* admin API (AlarmAdminService) : Expected to be used by administrator. Available methods are: {initAlarm | setSeverity | acknowledge | shelve | unshelve | reset | getMetaData
| getStatus | getCurrentSeverity | getAggregatedSeverity | getAggregatedHealth | subscribeAggregatedSeverityCallback
| subscribeAggregatedSeverityActorRef | subscribeAggregatedHealthCallback | subscribeAggregatedHealthActorRef }
Creating clientAPI and adminAPI
For component developers, the client API is provided as an AlarmService object in the CswContext
object injected into the ComponentHandlers class provided by the framework.
If you are not using csw-framework, you can create AlarmService using AlarmServiceFactory.
- Scala
-
// create alarm client using host and port of alarm server private val clientAPI1 = new AlarmServiceFactory().makeClientApi("localhost", 5225) // create alarm client using location service private val clientAPI2 = new AlarmServiceFactory().makeClientApi(locationService) // create alarm admin using host and port of alarm server private val adminAPI1 = new AlarmServiceFactory().makeAdminApi("localhost", 5226) // create alarm admin using location service private val adminAPI2 = new AlarmServiceFactory().makeAdminApi(locationService)
- Java
-
// create alarm client using host and port of alarm server IAlarmService jclientAPI1 = new AlarmServiceFactory().jMakeClientApi("localhost", 5227, actorSystem); // create alarm client using location service IAlarmService jclientAPI2 = new AlarmServiceFactory().jMakeClientApi(jLocationService, actorSystem);
Rules and checkes
- When representing a unique alarm, the alarm name or component name must not have
* [ ] ^ -
orany whitespace characters
Model Classes
- AlarmKey : Represents the unique alarm in the TMT system. It is composed of subsystem, component and alarm name.
- ComponentKey : Represents all alarms of a component
- SubsystemKey : Represents all alarms of a subsystem
- GlobalKey : Represents all alarms present in the TMT system
- AlarmMetadata : Represents static metadata of an alarm, which will not change in its entire lifespan.
- AlarmStatus : Represents dynamically changing data of the an alarm, which will be changing depending on the severity change or manually changed by an operator
- AlarmSeverity : Represents severity levels that can be set by the component developer e.g. Okay, Indeterminate, Warning, Major and Critical
- FullAlarmSeverity : Represents all possible severity levels of the alarm i.e. Disconnected (cannot be set by the developer) plus other severity levels that can be set by the developer
- AlarmHealth : Represents possible health of an alarm or component or subsystem or whole TMT system
setSeverity
Sets the severity of the given alarm. The severity must be refreshed by setting it at a regular interval or it will automatically be changed to Disconnected
after a specific time.
- Scala
-
val alarmKey = AlarmKey(NFIRAOS, "trombone", "tromboneAxisLowLimitAlarm") async { await(clientAPI.setSeverity(alarmKey, Okay)) }
- Java
-
AlarmKey alarmKey = new AlarmKey(NFIRAOS, "trombone", "tromboneAxisLowLimitAlarm"); Done done = jclientAPI1.setSeverity(alarmKey, Okay).get();
- If the alarm is not refreshed within 9 seconds, it will be inferred as
Disconnected
- If the alarm is auto-acknowledgable and the severity is set to
Okay
then, the alarm will be auto-acknowledged and will not require any explicit admin action in terms of acknowledging
initAlarms
Loads the given alarm data in alarm store
- Scala
-
async { val resource = "test-alarms/valid-alarms.conf" val alarmsConfig: Config = ConfigFactory.parseResources(resource) await(adminAPI.initAlarms(alarmsConfig)) }
acknowledge
Acknowledges the given alarm which is raised to a higher severity
- Scala
-
async { await(adminAPI.acknowledge(alarmKey)) }
shelve
Shelves the given alarm. Alarms will be un-shelved automatically at a specific time(i.e. 8 AM local time by default) if it is not un-shelved manually before that. The time to automatically un-shelve can be configured in application.conf for e.g csw-alarm.shelve-timeout = h:m:s a .
- Scala
-
async { await(adminAPI.shelve(alarmKey)) }
Shelved alarms are also considered in aggregation severity or health calculation of alarms.
unshelve
Unshelves the given alarm
- Scala
-
async { await(adminAPI.unshelve(alarmKey)) }
reset
Resets the status of the given latched alarm by updating the latched severity same as current severity and acknowledgement status to acknowledged without changing any other properties of the alarm.
- Scala
-
async { await(adminAPI.reset(alarmKey)) }
getMetadata
Gets the metadata of an alarm, component, subsystem, or whole TMT system. The following information is returned for each alarm:
- subsystem
- component
- name
- description
- location
- alarmType
- supported severities
- probable cause
- operator response
- is autoAcknowledgeable
- is latchable
- activation status
- Scala
-
async { val metadata: AlarmMetadata = await(adminAPI.getMetadata(alarmKey)) }
Inactive alarms will not be taking part in aggregation of severity or health. Alarms are set active or inactive in the alarm configuration file, and not through either API.
getStatus
Gets the status of the alarm which contains fields like:
- latched severity
- acknowledgement status
- shelve status
- alarm time
- Scala
-
async { val status: AlarmStatus = await(adminAPI.getStatus(alarmKey)) }
getCurrentSeverity
Gets the severity of the alarm.
- Scala
-
async { val severity: FullAlarmSeverity = await(adminAPI.getCurrentSeverity(alarmKey)) }
getAggregatedSeverity
Gets the aggregated severity for the given alarm/component/subsystem/whole TMT system. Aggregation of the severity represents the most severe alarm amongst multiple alarms.
- Scala
-
async { val componentKey = ComponentKey(NFIRAOS, "tromboneAssembly") val aggregatedSeverity: FullAlarmSeverity = await(adminAPI.getAggregatedSeverity(componentKey)) }
getAggregatedHealth
Gets the aggregated health for the given alarm/component/subsystem/whole TMT system. Aggregation of health is either Good
, ill
or Bad
based on the most severe alarm amongst multiple alarms.
- Scala
-
async { val subsystemKey = SubsystemKey(IRIS) val health: AlarmHealth = await(adminAPI.getAggregatedHealth(subsystemKey)) }
subscribeAggregatedSeverityCallback
Subscribes to the changes of aggregated severity for given alarm/component/subsystem/whole TMT system by providing a callback which gets executed for every change.
- Scala
-
adminAPI.subscribeAggregatedSeverityCallback( ComponentKey(NFIRAOS, "tromboneAssembly"), aggregatedSeverity ⇒ { /* do something*/ } )
subscribeAggregatedSeverityActorRef
Subscribes to the changes of aggregated severity for given alarm/component/subsystem/whole TMT system by providing an actor which will receive a message of aggregated severity on every change.
- Scala
-
val severityActorRef = typed.ActorSystem(behaviour[FullAlarmSeverity], "fullSeverityActor") adminAPI.subscribeAggregatedSeverityActorRef(SubsystemKey(NFIRAOS), severityActorRef)
subscribeAggregatedHealthCallback
Subscribe to the changes of aggregated health for given alarm/component/subsystem/whole TMT system by providing a callback which gets executed for every change.
- Scala
-
adminAPI.subscribeAggregatedHealthCallback( ComponentKey(IRIS, "ImagerDetectorAssembly"), aggregatedHealth ⇒ { /* do something*/ } )
subscribeAggregatedHealthActorRef
Subscribes to the changes of aggregated health for given alarm/component/subsystem/whole TMT system by providing an actor which will receive a message of aggregated severity on every change.
- Scala
-
val healthActorRef = typed.ActorSystem(behaviour[AlarmHealth], "healthActor") adminAPI.subscribeAggregatedHealthActorRef(SubsystemKey(IRIS), healthActorRef)