NETWORK DEVICE AUTO-REMEDIATION
Situation: A Network Interface Has Failed
- A network interface has gone down triggering actions to to discover why the interface failed.
- Was this a manual or intentional shutdown or was it due to some kind of testing underway or due to a critical failure of a network device?
- When the root cause of the network outage is unknown, IT staff will prioritize the problem as urgent and take actions accordingly.
- IT operations personnel are broadly activated to help remediate the situation even if the problem ultimately proves to be outside of their areas of responsibility.
The Conventional Workflow Approach
A user recognizes that a service has gone down and creates a ServiceNow ticket. Service Ops assigns the ticket to NetOps with a Priority 1.
NetOps team members now manually start diagnostics, discovering that an interface is down. Remediation action is performed to bring up the interface.
NetOps team now either updates the ticket if successful or continues running diagnostics to discover why the interface went down, staying as a priority level 1 task. The ServiceNow record is then updated and closed.
Orchestral.ai's Composer Solution
Composer Event-Driven Auto-Remediation
- Composer informs the operations teams about the the interface outage through chatops or similar alerting/communications tool and indicates the start of an auto-remediation workflow.
- Composer collects state information on the router before and after the auto-remediation action.
- Composer zips the two "pre" and "post" state files as artifacts of the incident.
- If the interface has been successfully restored: Composer then opens a service ticket with priority "Low" on the ITSM ticketing system and attaches the troubleshooting artifacts for further analysis.
- If the interface has not been successfully restored: Composer opens a service ticket with priority "High" on the ITSM ticketing system and attaches the troubleshooting artifacts for further analysis.
- Lastly, Composer informs the operations team through chatops of the new incident created and the ticket number for follow up action.