T E C H N I C A L B L O G

FreeRTOS Gateway: Zero CAN Bus

Interruptions from LTE Modem Faults

via Task-Level Fault Isolation

How a priority-based RTOS redesign eliminated cascading failures in an industrial IoT gateway and retained IEC 61508 SIL 1 compliance

Industry: Industrial Automation · IIoT

Stack: FreeRTOS · STM32 · CAN Bus · LTE Modem · JTAG

Field Trial: 4 months · Zero CAN interruptions

Embedded Systems Case Study June 2025

Table of Contents

Note: This Table of Contents is generated via field codes. To ensure page number accuracy after editing, please right-click the TOC and select "Update Field."

The 47-Second Silence That Cost a Plant

At 03:42 on a Tuesday morning in February, the LTE modem embedded in a chemical processing plant's CAN bus gateway stopped responding to AT commands. In the original super-loop firmware, this was not merely a connectivity inconvenience — it was a systemic catastrophe. The main loop, blocked on a socket read waiting for the modem's response, could not service the CAN bus receive interrupt handler's message queue. For 47 seconds, every CAN frame from 23 field devices — flow controllers, pressure transmitters, valve actuators, and temperature sensors — went unacknowledged. The field devices, designed with their own internal watchdog timers, interpreted the silence as a bus-off condition and began triggering local fault states. Three valve actuators moved to their safe positions, shutting down a reactor feed line. The plant's safety system logged an unplanned shutdown event, triggering a mandatory incident review and a production loss estimated at €34,000.

This was not an isolated occurrence. Field logs revealed that the modem hung roughly twice per week, causing CAN bus interruptions lasting between 15 and 60 seconds each time. The super-loop architecture — a single infinite loop sequentially servicing all peripheral tasks — meant that any blocking operation in one subsystem (the modem) would starve all other subsystems (the CAN bus, the Ethernet interface, the local HMI polling). The firmware team had implemented timeouts on modem AT commands, but the timeouts were set conservatively at 30 seconds to avoid false-triggering modem resets on slow cellular networks. By the time the timeout expired, the damage to CAN bus timing was already done.

This article details how the engineering team eliminated every CAN bus interruption caused by modem faults by migrating from a bare-metal super-loop to a FreeRTOS-based multitasking architecture on an STM32 microcontroller. The redesign introduced priority-based preemptive scheduling, message-queue-based inter-task communication, and a supervised watchdog architecture that enabled per-subsystem fault recovery. Over a four-month field trial, the gateway achieved zero CAN bus interruptions from modem faults, a 94% modem self-recovery rate, and retained full IEC 61508 SIL 1 compliance.

Priority	Stack (words)	Responsibility	Watchdog Check-in Period
vCANBusTask	5 (Highest)	512	CAN frame rx/tx, message filtering, timestamping
vEthernetTask	4	768	TCP/IP stack polling, MQTT publish, OPC UA server
vWatchdogSupervisor	3	256	Monitor all task check-ins, trigger per-task restarts
vModemTask	2	1024	AT command state machine, LTE connection management
vHMITask	1 (Lowest)	384	Local display update, LED status indicators

Super-Loop (Before)	FreeRTOS (After)	Improvement
CAN bus interruptions from modem faults	~8 per month (15–60 s each)	0 in 4 months
Modem self-recovery (no gateway restart)	0% (full reset required)	94% (63 of 67 events)
Stack overflows	Not detectable	0 in 4 months
CAN frame processing worst-case latency	Variable (up to 60 s)	< 2 ms measured
IEC 61508 SIL 1 compliance	Marginal (single-thread)	Strong (separated tasks)
Code size (Flash)	42 KB	58 KB
RAM usage	14 KB	26 KB

Your privacy, your call

FreeRTOS CAN Bus Industrial Gateway

The 47-Second Silence That Cost a Plant

Related posts

Need help shipping this?

Industrial Automation and the CAN Bus

The Role of CAN Bus in IIoT Gateways

The Super-Loop Paradigm and Its Limitations

FreeRTOS Redesign: Architecture and Implementation

Task Decomposition and Priority Assignment

Priority-Based Preemptive Scheduling for Real-Time Guarantees

Message Queue Patterns for Fault-Tolerant Communication

Watchdog Supervision Architecture

Hardware Watchdog (IWDG)

Software Watchdog Supervisor Task

Fault Isolation and Recovery Mechanisms

IEC 61508 SIL 1 Compliance

CAN Bus Protocol Implementation Details

LTE Modem Reliability Challenges

Limitations and Considerations

Outcomes Summary

Conclusion and Future Implications

References