Skip to main content

@webframp/aws-ops

v2026.04.22.1

AWS Operations Toolkit - Unified incident investigation and operational visibility.

This extension provides a complete workflow for investigating AWS outages by gathering data from CloudWatch Logs, Metrics, Alarms, X-Ray Traces, resource inventory (EC2, Lambda), and networking (load balancers, NAT gateways), plus an incident report that summarizes all findings.

Quick Start

# Install the extension (auto-resolves dependencies)
swamp extension pull @webframp/aws-ops

# Create model instances for your region
swamp model create @webframp/aws/logs aws-logs --global-arg region=us-east-1
swamp model create @webframp/aws/metrics aws-metrics --global-arg region=us-east-1
swamp model create @webframp/aws/alarms aws-alarms --global-arg region=us-east-1
swamp model create @webframp/aws/traces aws-traces --global-arg region=us-east-1
swamp model create @webframp/aws/inventory aws-inventory --global-arg region=us-east-1
swamp model create @webframp/aws/networking aws-networking --global-arg region=us-east-1

# Run the investigate-outage workflow
swamp workflow run @webframp/investigate-outage

Required IAM Permissions

  • logs:DescribeLogGroups
  • logs:StartQuery
  • logs:GetQueryResults
  • logs:FilterLogEvents
  • cloudwatch:ListMetrics
  • cloudwatch:GetMetricStatistics
  • cloudwatch:GetMetricData
  • cloudwatch:DescribeAlarms
  • cloudwatch:DescribeAlarmHistory
  • xray:GetServiceGraph
  • xray:GetTraceSummaries

Included Components

Workflows

  • @webframp/investigate-outage - Unified incident investigation workflow that:
    • Gathers alarm summary and active alarms
    • Analyzes Lambda Duration/Errors and ELB 5XX/latency metrics for anomalies
    • Gets X-Ray service dependency graph
    • Finds error traces and analyzes error patterns
    • Lists CloudWatch log groups and searches for error patterns
    • Inventories EC2 instances and Lambda functions
    • Lists load balancers and NAT gateways with health status
    • Gets alarm state change history
    • Generates an incident report summarizing all findings

Reports

  • @webframp/incident-report - Workflow-scope report that aggregates findings into:
    • Alarm status and recent state changes
    • Metric anomaly highlights (Lambda + ELB)
    • Trace error analysis with top faulty services
    • Infrastructure inventory (EC2, Lambda)
    • Networking status (load balancers, NAT gateways)
    • Actionable recommendations

Model Dependencies

The workflow expects these model instances (create them before running):

  • aws-logs - @webframp/aws/logs
  • aws-metrics - @webframp/aws/metrics
  • aws-alarms - @webframp/aws/alarms
  • aws-traces - @webframp/aws/traces
  • aws-inventory - @webframp/aws/inventory
  • aws-networking - @webframp/aws/networking

Repository

https://github.com/webframp/swamp-extensions

Labels

awscloudwatchxrayobservabilityopsincident-responseworkflow

Quality score

How well-documented and verifiable this extension is.

100%

Grade A

  • Has README or module doc2/2earned
  • README has a code example1/1earned
  • README is substantive1/1earned
  • Most symbols documented1/1earned
  • No slow types1/1earned
  • Has description1/1earned
  • At least one platform tag (or universal)1/1earned
  • Two or more platform tags (or universal)1/1earned
  • License declared1/1earned
  • Verified public repository2/2earned

Install

$ swamp extension pull @webframp/aws-ops

@webframp/investigate-outagec3866eb0-6190-4154-b8e1-304624aba93e

Unified AWS outage investigation workflow. Gathers data from CloudWatch Logs, Metrics, Alarms, X-Ray Traces, resource inventory, and networking to provide a comprehensive view of system health during an incident.

gather-observability-dataCollect data from all observability sources in parallel
1.check-alarmsaws-alarms.get_summary— Get alarm summary and active alarms
2.get-active-alarmsaws-alarms.get_active— Get all currently active alarms
3.analyze-metricsaws-metrics.analyze— Analyze Lambda Duration metrics for anomalies
4.analyze-errors-metricaws-metrics.analyze— Analyze Lambda Errors metrics
5.analyze-elb-5xxaws-metrics.analyze— Analyze ALB 5XX error count
6.analyze-elb-latencyaws-metrics.analyze— Analyze ALB target response time
7.get-service-graphaws-traces.get_service_graph— Get X-Ray service dependency graph
8.get-error-tracesaws-traces.get_errors— Get traces with errors or faults
9.analyze-trace-errorsaws-traces.analyze_errors— Analyze error patterns in traces
gather-logsSearch logs for errors (runs in parallel with observability)
1.list-log-groupsaws-logs.list_log_groups— Discover log groups
2.find-lambda-errorsaws-logs.find_errors— Search Lambda log groups for error patterns
gather-infrastructureCollect resource inventory and networking state (runs in parallel)
1.list-ec2-instancesaws-inventory.list_ec2— List EC2 instances across all states
2.list-lambda-functionsaws-inventory.list_lambda— List Lambda functions
3.list-load-balancersaws-networking.list_load_balancers— List ALBs and NLBs with target health
4.list-nat-gatewaysaws-networking.list_nat_gateways— List NAT gateway status
deep-divePerform deeper analysis based on initial findings
1.get-alarm-historyaws-alarms.get_history— Get alarm state change history

@webframp/incident-reportworkflow
incident_report.ts

Summarizes findings from the investigate-outage workflow into an actionable incident report

awsincident-responseopsobservability