AWS Step Functions with TypeScript: Orchestrating Serverless Workflows
Building robust serverless applications often requires orchestrating multiple Lambda functions into complex workflows. AWS Step Functions provide a visual workflow service that coordinates distributed components, manages state transitions, and handles error recovery—all while maintaining the reliability and scalability that modern applications demand.
Why Step Functions with TypeScript?
TypeScript brings compelling advantages to Step Functions development beyond basic type safety. Workflow clarity emerges from strongly-typed state definitions that make complex logic easier to understand and maintain. Error prevention occurs at compile time through type checking of state inputs and outputs. Developer experience improves dramatically with IntelliSense support for AWS SDK calls and state machine definitions.
Most importantly, TypeScript enables contract-driven development where interfaces define the expected data flow between states, ensuring consistency across your entire workflow.
Step Functions Architecture Patterns
Understanding the fundamental patterns in Step Functions helps you design effective serverless workflows:
These patterns form the building blocks for complex business workflows, each serving specific orchestration needs in distributed systems.
Prerequisites
Before building Step Functions workflows, ensure your development environment includes:
- AWS SAM CLI for local development and testing
- TypeScript development environment with AWS SDK v3
- AWS CLI configured with appropriate permissions
- Familiarity with Lambda fundamentals from our previous post in this series
Order Processing Workflow Example
Let’s build a realistic order processing system that demonstrates key Step Functions patterns. This workflow handles the complete order lifecycle from validation through fulfillment.
This workflow demonstrates several important patterns: sequential processing for validation and payment, parallel execution for inventory and notifications, and comprehensive error handling at each step.
TypeScript-First Implementation
Type Definitions and Interfaces
Start by defining strong contracts for your workflow data:
// src/types/workflow.ts
export interface OrderWorkflowInput {
orderId: string;
customerId: string;
items: OrderItem[];
shippingAddress: Address;
paymentMethod: PaymentMethod;
}
export interface OrderItem {
productId: string;
quantity: number;
unitPrice: number;
}
export interface OrderWorkflowState extends OrderWorkflowInput {
status: OrderStatus;
totalAmount: number;
paymentId?: string;
shipmentId?: string;
inventoryReservationId?: string;
errors?: WorkflowError[];
}
export enum OrderStatus {
PENDING = 'PENDING',
VALIDATED = 'VALIDATED',
PAYMENT_PROCESSED = 'PAYMENT_PROCESSED',
INVENTORY_RESERVED = 'INVENTORY_RESERVED',
FULFILLED = 'FULFILLED',
FAILED = 'FAILED'
}
export interface WorkflowError {
step: string;
message: string;
timestamp: string;
}
These interfaces provide compile-time safety, clear documentation of data flow, and consistency across all workflow functions.
State Machine Definition
Create a maintainable state machine using SAM templates:
# template.yaml
Resources:
OrderProcessingStateMachine:
Type: AWS::Serverless::StateMachine
Properties:
DefinitionUri: statemachine/order-processing.asl.json
Type: STANDARD
Policies:
- LambdaInvokePolicy:
FunctionName: !Ref ValidateOrderFunction
- LambdaInvokePolicy:
FunctionName: !Ref ProcessPaymentFunction
- LambdaInvokePolicy:
FunctionName: !Ref ReserveInventoryFunction
- LambdaInvokePolicy:
FunctionName: !Ref CreateShipmentFunction
Events:
OrderCreatedEvent:
Type: EventBridgeRule
Properties:
Pattern:
source: ['order-service']
detail-type: ['Order Created']
Optimized Lambda Functions
Implement focused, single-responsibility functions:
// src/functions/validate-order.ts
import { OrderWorkflowState, OrderStatus, WorkflowError } from '../types/workflow';
export const handler = async (input: OrderWorkflowState): Promise<OrderWorkflowState> => {
try {
// Calculate total amount
const totalAmount = input.items.reduce((sum, item) =>
sum + (item.quantity * item.unitPrice), 0);
// Validate business rules
if (totalAmount <= 0) {
throw new Error('Order total must be greater than zero');
}
if (input.items.length === 0) {
throw new Error('Order must contain at least one item');
}
// Validate inventory availability (simplified)
const inventoryValid = await checkInventoryAvailability(input.items);
if (!inventoryValid) {
throw new Error('Insufficient inventory for one or more items');
}
return {
...input,
status: OrderStatus.VALIDATED,
totalAmount
};
} catch (error) {
return {
...input,
status: OrderStatus.FAILED,
errors: [{
step: 'validate-order',
message: error.message,
timestamp: new Date().toISOString()
}]
};
}
};
async function checkInventoryAvailability(items: OrderItem[]): Promise<boolean> {
// Implementation would check actual inventory system
return true;
}
// src/functions/process-payment.ts
import { OrderWorkflowState, OrderStatus } from '../types/workflow';
import { PaymentService } from '../services/payment-service';
export const handler = async (input: OrderWorkflowState): Promise<OrderWorkflowState> => {
const paymentService = new PaymentService();
try {
const paymentResult = await paymentService.processPayment({
amount: input.totalAmount,
customerId: input.customerId,
paymentMethod: input.paymentMethod,
orderId: input.orderId
});
return {
...input,
status: OrderStatus.PAYMENT_PROCESSED,
paymentId: paymentResult.paymentId
};
} catch (error) {
// Let Step Functions handle retry logic for transient errors
if (isTransientError(error)) {
throw error; // Step Functions will retry
}
// Permanent failure
return {
...input,
status: OrderStatus.FAILED,
errors: [
...(input.errors || []),
{
step: 'process-payment',
message: error.message,
timestamp: new Date().toISOString()
}
]
};
}
};
function isTransientError(error: any): boolean {
return error.code === 'NETWORK_ERROR' ||
error.code === 'TIMEOUT' ||
error.statusCode >= 500;
}
This implementation approach emphasizes error isolation, retry-friendly design, and comprehensive state management.
Advanced State Machine Definition
Here’s the complete Amazon States Language (ASL) definition that brings our workflow to life:
{
"Comment": "Order Processing Workflow with Error Handling",
"StartAt": "ValidateOrder",
"States": {
"ValidateOrder": {
"Type": "Task",
"Resource": "${ValidateOrderFunctionArn}",
"Next": "CheckValidationResult",
"Catch": [{
"ErrorEquals": ["States.ALL"],
"Next": "HandleValidationFailure",
"ResultPath": "$.error"
}]
},
"CheckValidationResult": {
"Type": "Choice",
"Choices": [{
"Variable": "$.status",
"StringEquals": "VALIDATED",
"Next": "ProcessPayment"
}],
"Default": "HandleValidationFailure"
},
"ProcessPayment": {
"Type": "Task",
"Resource": "${ProcessPaymentFunctionArn}",
"Next": "ParallelProcessing",
"Retry": [{
"ErrorEquals": ["TransientError"],
"IntervalSeconds": 2,
"MaxAttempts": 3,
"BackoffRate": 2.0
}],
"Catch": [{
"ErrorEquals": ["States.ALL"],
"Next": "HandlePaymentFailure",
"ResultPath": "$.error"
}]
},
"ParallelProcessing": {
"Type": "Parallel",
"Branches": [
{
"StartAt": "ReserveInventory",
"States": {
"ReserveInventory": {
"Type": "Task",
"Resource": "${ReserveInventoryFunctionArn}",
"End": true
}
}
},
{
"StartAt": "SendConfirmationEmail",
"States": {
"SendConfirmationEmail": {
"Type": "Task",
"Resource": "${SendEmailFunctionArn}",
"End": true
}
}
}
],
"Next": "CreateShipment",
"Catch": [{
"ErrorEquals": ["States.ALL"],
"Next": "HandleFulfillmentFailure"
}]
},
"CreateShipment": {
"Type": "Task",
"Resource": "${CreateShipmentFunctionArn}",
"End": true
},
"HandleValidationFailure": {
"Type": "Pass",
"Result": {"status": "FAILED", "reason": "Validation failed"},
"End": true
},
"HandlePaymentFailure": {
"Type": "Pass",
"Result": {"status": "FAILED", "reason": "Payment failed"},
"End": true
},
"HandleFulfillmentFailure": {
"Type": "Task",
"Resource": "${CompensateOrderFunctionArn}",
"End": true
}
}
}
This state machine demonstrates several sophisticated patterns:
- Choice states for conditional logic based on function results
- Parallel execution for independent operations
- Comprehensive error handling with specific recovery paths
- Retry policies for transient failures
- Compensation logic for rolling back partially completed workflows
Error Handling and Resilience Patterns
Step Functions excel at handling distributed system failures. Here’s how to implement robust error handling in TypeScript:
// src/types/errors.ts
export class WorkflowError extends Error {
constructor(
message: string,
public readonly code: string,
public readonly isRetryable: boolean = false
) {
super(message);
this.name = 'WorkflowError';
}
}
export class TransientError extends WorkflowError {
constructor(message: string, code: string = 'TRANSIENT_ERROR') {
super(message, code, true);
this.name = 'TransientError';
}
}
export class PermanentError extends WorkflowError {
constructor(message: string, code: string = 'PERMANENT_ERROR') {
super(message, code, false);
this.name = 'PermanentError';
}
}
// src/functions/compensate-order.ts
export const handler = async (input: OrderWorkflowState): Promise<OrderWorkflowState> => {
const compensationActions = [];
try {
// Reverse payment if it was processed
if (input.paymentId) {
await refundPayment(input.paymentId);
compensationActions.push('payment-refunded');
}
// Release inventory if it was reserved
if (input.inventoryReservationId) {
await releaseInventory(input.inventoryReservationId);
compensationActions.push('inventory-released');
}
// Send failure notification
await sendFailureNotification(input.customerId, input.orderId);
compensationActions.push('notification-sent');
return {
...input,
status: OrderStatus.FAILED,
compensationActions
};
} catch (error) {
// Log compensation failure but don't throw - we want to end the workflow
console.error('Compensation failed:', error);
return {
...input,
status: OrderStatus.FAILED,
compensationActions,
compensationError: error.message
};
}
};
This error handling approach provides graceful degradation, automatic compensation, and comprehensive audit trails for debugging and compliance.
Testing and Local Development
Step Functions integration testing requires a different approach than unit testing individual Lambda functions:
// tests/integration/workflow.test.ts
import { SFNClient, StartExecutionCommand, DescribeExecutionCommand } from '@aws-sdk/client-sfn';
import { OrderWorkflowInput, OrderStatus } from '../../src/types/workflow';
describe('Order Processing Workflow', () => {
const sfnClient = new SFNClient({ region: 'us-east-1' });
const stateMachineArn = process.env.STATE_MACHINE_ARN!;
test('processes valid order successfully', async () => {
const orderInput: OrderWorkflowInput = {
orderId: 'test-order-001',
customerId: 'customer-123',
items: [
{ productId: 'product-001', quantity: 2, unitPrice: 29.99 }
],
shippingAddress: {
street: '123 Test St',
city: 'Test City',
state: 'TS',
zipCode: '12345'
},
paymentMethod: {
type: 'credit_card',
token: 'test-token'
}
};
const startCommand = new StartExecutionCommand({
stateMachineArn,
input: JSON.stringify(orderInput),
name: `test-execution-${Date.now()}`
});
const { executionArn } = await sfnClient.send(startCommand);
// Poll for completion
let execution;
do {
await new Promise(resolve => setTimeout(resolve, 1000));
execution = await sfnClient.send(new DescribeExecutionCommand({ executionArn }));
} while (execution.status === 'RUNNING');
expect(execution.status).toBe('SUCCEEDED');
const output = JSON.parse(execution.output!);
expect(output.status).toBe(OrderStatus.FULFILLED);
expect(output.paymentId).toBeDefined();
expect(output.shipmentId).toBeDefined();
});
});
Monitoring and Observability
Effective monitoring is crucial for production Step Functions workflows:
CloudWatch Integration
// src/utils/metrics.ts
import { CloudWatchClient, PutMetricDataCommand } from '@aws-sdk/client-cloudwatch';
export class WorkflowMetrics {
private cloudwatch = new CloudWatchClient({ region: process.env.AWS_REGION });
async recordStepDuration(stepName: string, duration: number): Promise<void> {
await this.cloudwatch.send(new PutMetricDataCommand({
Namespace: 'OrderProcessing/Workflow',
MetricData: [{
MetricName: 'StepDuration',
Value: duration,
Unit: 'Milliseconds',
Dimensions: [
{ Name: 'StepName', Value: stepName }
]
}]
}));
}
async recordOrderValue(amount: number): Promise<void> {
await this.cloudwatch.send(new PutMetricDataCommand({
Namespace: 'OrderProcessing/Business',
MetricData: [{
MetricName: 'OrderValue',
Value: amount,
Unit: 'None'
}]
}));
}
}
X-Ray Distributed Tracing
// src/utils/tracing.ts
import * as AWSXRay from 'aws-xray-sdk-core';
export function traceWorkflowStep<T>(
stepName: string,
operation: () => Promise<T>
): Promise<T> {
return AWSXRay.captureAsyncFunc(stepName, async (subsegment) => {
try {
const result = await operation();
subsegment?.addAnnotation('success', true);
return result;
} catch (error) {
subsegment?.addAnnotation('success', false);
subsegment?.addAnnotation('error', error.message);
throw error;
}
});
}
// Usage in Lambda functions
export const handler = async (input: OrderWorkflowState): Promise<OrderWorkflowState> => {
return traceWorkflowStep('validate-order', async () => {
// Your validation logic here
return processValidation(input);
});
};
Best Practices for Production
When deploying Step Functions workflows to production, follow these essential practices:
Design for Idempotency: Ensure your Lambda functions can be safely retried without side effects. Use unique identifiers for external operations and implement checks to prevent duplicate processing.
Implement Circuit Breakers: For external service calls, implement circuit breaker patterns to prevent cascading failures and provide graceful degradation.
Optimize for Cost: Use Express workflows for high-volume, short-duration processes. Standard workflows are better for long-running processes requiring audit trails.
Monitor Workflow Health: Set up CloudWatch alarms for failed executions, long-running workflows, and error rates. Create dashboards to visualize workflow performance and business metrics.
Version Control State Machines: Treat your state machine definitions as code, storing them in version control and using CI/CD pipelines for deployment.
Conclusion
AWS Step Functions with TypeScript provide a powerful combination for building resilient, maintainable serverless workflows. The type safety of TypeScript combined with the visual orchestration capabilities of Step Functions creates systems that are both robust and easy to understand.
Key advantages of this approach include visual workflow representation that makes complex business logic clear to both technical and non-technical stakeholders, built-in error handling and retry logic that reduces the amount of boilerplate code you need to write, strong typing that catches integration issues at compile time, and comprehensive monitoring capabilities that provide insight into both technical and business metrics.
As you continue developing with Step Functions, consider implementing distributed saga patterns for managing transactions across multiple services, workflow versioning strategies for evolving business processes, advanced monitoring dashboards for operational visibility, and automated testing pipelines that validate both individual functions and complete workflows.
The patterns and practices demonstrated here scale from simple workflows to complex enterprise-grade orchestration systems, providing a solid foundation for your serverless architecture journey.
In our next post, we’ll explore AWS SNS and SQS with TypeScript, learning how to build event-driven architectures that complement our Step Functions workflows.
Comments