Embedded Firmware Best Practices

Essential guidelines for developing robust and reliable embedded firmware. Each topic is grouped by category, tagged with priority, and expandable for detailed explanations, code examples, and field-tested tips.

Code Quality

Foundational habits that keep embedded codebases readable, reviewable, and safe to change.

Use Version Control

Always use Git or similar version control systems to track changes and collaborate effectively.

Critical

Why it matters

Version control is essential for embedded development. It allows you to track every change, revert to working versions when bugs are introduced, and collaborate with team members. Git branches enable parallel development of features without affecting the main codebase.

Tips

  • Create meaningful commit messages that describe the 'why' not just the 'what'
  • Use feature branches for new development
  • Tag releases for easy reference to production firmware versions
  • Include hardware revision information in your commit history

Follow Coding Standards

Adopt industry standards like MISRA C for embedded systems to ensure code safety and reliability.

Critical

Why it matters

MISRA C is a set of software development guidelines designed to promote safety, security, and reliability in embedded systems. Following these guidelines helps prevent common programming errors, makes code more maintainable, and is often required for safety-critical applications.

Code example

/* MISRA C compliant example */
static uint32_t calculate_checksum(const uint8_t *data, size_t len) {
    uint32_t sum = 0U;
    
    if (data != NULL) {
        for (size_t i = 0U; i < len; i++) {
            sum += (uint32_t)data[i];
        }
    }
    
    return sum;
}

Tips

  • Use static analysis tools like PC-lint or Polyspace
  • Enable compiler warnings and treat them as errors
  • Document any intentional deviations from standards

Write Modular Code

Break down functionality into reusable modules with clear interfaces and single responsibilities.

High Priority

Why it matters

Modular code separates concerns into distinct units, each handling a specific functionality. This approach improves testability, allows code reuse across projects, and makes maintenance easier. In embedded systems, modules often correspond to hardware peripherals or application features.

Code example

/* Module header: sensor_interface.h */
typedef struct {
    int32_t temperature;
    uint32_t humidity;
    uint32_t timestamp;
} sensor_data_t;

int sensor_init(void);
int sensor_read(sensor_data_t *data);
void sensor_deinit(void);

Tips

  • One module = one responsibility
  • Define clear public APIs in header files
  • Hide implementation details as static functions
  • Use opaque pointers for complex data structures

Document Your Code

Use clear comments and documentation to explain complex logic, hardware interactions, and API usage.

High Priority

Why it matters

Good documentation is crucial for embedded systems where code often interacts with hardware in non-obvious ways. Comments should explain the 'why' behind decisions, document hardware quirks, and describe timing requirements. Use Doxygen-style comments for API documentation.

Code example

/**
 * @brief Initialize the ADC peripheral for temperature sensing
 * 
 * Configures ADC channel 3 for single-ended input with 12-bit resolution.
 * Must be called before any sensor_read() operations.
 * 
 * @note Requires VREF to be stable before calling
 * @return 0 on success, negative error code on failure
 */
int sensor_init(void);

Tips

  • Document hardware dependencies and timing requirements
  • Explain magic numbers with named constants or comments
  • Keep comments up-to-date when code changes
  • Use README files for module-level documentation

Hardware Interaction

Patterns for talking to peripherals, handling interrupts, and keeping hardware code portable.

Use Hardware Abstraction Layers

Create HALs to separate hardware-specific code from application logic for better portability.

Critical

Why it matters

A Hardware Abstraction Layer (HAL) provides a consistent interface to hardware peripherals, hiding the low-level register manipulations. This allows application code to be ported between different MCUs with minimal changes and enables testing on host systems using mock implementations.

Code example

/* HAL interface */
typedef struct {
    int (*init)(const gpio_config_t *config);
    int (*write)(uint32_t pin, bool state);
    bool (*read)(uint32_t pin);
} gpio_driver_t;

/* Platform-specific implementation */
static const gpio_driver_t nrf_gpio_driver = {
    .init = nrf_gpio_init,
    .write = nrf_gpio_write,
    .read = nrf_gpio_read,
};

Tips

  • Define abstract interfaces before implementing platform-specific code
  • Use function pointers or weak symbols for swappable implementations
  • Create mock HAL implementations for unit testing
  • Document hardware assumptions in the HAL interface

Implement Proper Initialization

Always initialize peripherals and variables before use to avoid undefined behavior.

Critical

Why it matters

Embedded systems often have complex initialization sequences that must follow specific orders. Peripherals may depend on clocks, power domains, or other peripherals being initialized first. Document and enforce these dependencies to prevent hard-to-debug issues.

Code example

int system_init(void) {
    int ret;
    
    /* Clock must be initialized first */
    ret = clock_init();
    if (ret != 0) {
        return ret;
    }
    
    /* Power domain depends on clock */
    ret = power_init();
    if (ret != 0) {
        return ret;
    }
    
    /* Peripherals depend on power */
    ret = gpio_init();
    if (ret != 0) {
        return ret;
    }
    
    return 0;
}

Tips

  • Initialize all variables at declaration
  • Document peripheral initialization order requirements
  • Use initialization flags to prevent double-init issues
  • Check return values from all initialization functions

Handle Interrupts Carefully

Keep ISRs short and fast. Use flags to defer processing to main loop when possible.

High Priority

Why it matters

Interrupt Service Routines (ISRs) should execute as quickly as possible to minimize latency for other interrupts. Use ISRs only to capture time-critical data and set flags, then perform complex processing in the main loop or a task. Be aware of shared data issues between ISRs and main code.

Code example

volatile bool data_ready = false;
volatile uint16_t adc_value;

void ADC_IRQHandler(void) {
    /* Clear interrupt flag first */
    ADC->ISR = ADC_ISR_EOC;
    
    /* Quick data capture */
    adc_value = ADC->DR;
    
    /* Signal main loop */
    data_ready = true;
}

/* Main loop processing */
void main_loop(void) {
    if (data_ready) {
        data_ready = false;
        process_adc_data(adc_value);
    }
}

Tips

  • Use volatile for variables shared with ISRs
  • Disable interrupts when accessing shared multi-byte data
  • Avoid floating-point math in ISRs
  • Use RTOS semaphores or message queues for complex ISR-to-task communication

Manage Power Consumption

Implement sleep modes and optimize peripheral usage to extend battery life in portable devices.

Medium Priority

Why it matters

Power management is critical for battery-powered devices. Modern MCUs offer multiple sleep modes with different power consumption and wake-up latencies. Disable unused peripherals, use event-driven design, and measure actual power consumption during development.

Tips

  • Profile power consumption early in development
  • Use the deepest sleep mode possible for your latency requirements
  • Disable unused peripheral clocks
  • Consider using DMA to allow CPU sleep during data transfers

Nordic & Zephyr RTOS

Idiomatic Zephyr / nRF Connect SDK patterns — Device Tree, Kconfig, BLE, and work queues.

Use Zephyr Device Tree

Leverage Device Tree overlays to configure hardware without modifying source code.

Critical

Why it matters

Zephyr's Device Tree system provides a hardware-agnostic way to describe your board's configuration. Device Tree overlays allow you to customize pin assignments, peripheral settings, and sensor configurations without changing C code, making your firmware more portable across Nordic development kits.

Code example

/* nrf52840dk.overlay */
&i2c0 {
    status = "okay";
    compatible = "nordic,nrf-twim";
    
    bme280@76 {
        compatible = "bosch,bme280";
        reg = <0x76>;
        label = "BME280";
    };
};

&spi1 {
    status = "okay";
    cs-gpios = <&gpio0 17 GPIO_ACTIVE_LOW>;
};

Tips

  • Create board-specific overlays for custom hardware
  • Use Device Tree bindings documentation as reference
  • Test overlays with 'west build --pristine' for clean builds
  • Document overlay changes in your project README

Configure prj.conf Properly

Use Kconfig options in prj.conf to enable only required features and optimize resource usage.

Critical

Why it matters

The prj.conf file controls which Zephyr subsystems and drivers are included in your build. Enabling only what you need reduces flash and RAM usage. Understanding Kconfig dependencies helps avoid mysterious build errors.

Code example

# Core configuration
CONFIG_GPIO=y
CONFIG_I2C=y
CONFIG_SPI=y

# BLE configuration for nRF52
CONFIG_BT=y
CONFIG_BT_PERIPHERAL=y
CONFIG_BT_DEVICE_NAME="MyDevice"

# Power management
CONFIG_PM=y
CONFIG_PM_DEVICE=y

# Logging (disable in production)
CONFIG_LOG=y
CONFIG_LOG_DEFAULT_LEVEL=3

Tips

  • Use 'west build -t menuconfig' to explore Kconfig options
  • Create separate prj.conf files for debug and release builds
  • Document why each config option is enabled
  • Check CONFIG dependencies with 'west build -t guiconfig'

Leverage Nordic SDK Libraries

Use nRF Connect SDK libraries for BLE, Thread, Matter, and other protocol stacks.

High Priority

Why it matters

The nRF Connect SDK provides production-ready implementations of BLE services, mesh networking, and IoT protocols. Using these libraries saves development time and ensures compliance with protocol specifications. They're tested and optimized for Nordic hardware.

Code example

/* Using Nordic BLE libraries */
#include <bluetooth/bluetooth.h>
#include <bluetooth/conn.h>
#include <bluetooth/gatt.h>

static struct bt_conn_cb conn_callbacks = {
    .connected = on_connected,
    .disconnected = on_disconnected,
};

int bluetooth_init(void) {
    int err = bt_enable(NULL);
    if (err) {
        return err;
    }
    
    bt_conn_cb_register(&conn_callbacks);
    return 0;
}

Tips

  • Check nRF Connect SDK samples for implementation patterns
  • Use Nordic DevZone forums for troubleshooting
  • Keep SDK version consistent across your team
  • Test with Nordic Power Profiler for power optimization

Use Zephyr Workqueues

Offload non-critical work from ISRs and high-priority threads using work queues.

High Priority

Why it matters

Zephyr work queues provide a mechanism to defer work to a lower-priority context. This keeps ISRs fast and prevents priority inversion. The system work queue is suitable for most cases, but create dedicated work queues for time-sensitive or blocking operations.

Code example

static struct k_work sensor_work;

static void sensor_work_handler(struct k_work *work) {
    /* Heavy processing here */
    struct sensor_data data;
    sensor_read(&data);
    process_and_transmit(&data);
}

void sensor_irq_handler(void) {
    /* Just submit work, don't process here */
    k_work_submit(&sensor_work);
}

int main(void) {
    k_work_init(&sensor_work, sensor_work_handler);
    /* ... */
}

Tips

  • Use k_work_delayable for periodic or debounced operations
  • Create dedicated work queues for blocking I/O
  • Monitor work queue depth to detect overload
  • Use work queue pools for parallel processing

Power Management

Stretch battery life with disciplined sleep modes, radio scheduling, and per-peripheral power control.

Profile Power Early

Measure actual power consumption during development, not just at the end.

Critical

Why it matters

Power consumption issues are much harder to fix late in development. Use tools like Nordic Power Profiler Kit early and often to understand your device's power profile. Correlate power spikes with code execution to identify optimization opportunities.

Tips

  • Establish a power budget before starting development
  • Measure each peripheral's contribution to total power
  • Test power in all operating modes (active, idle, sleep)
  • Document power measurements for each firmware version

Implement Sleep Modes

Use the deepest sleep mode compatible with your wake-up latency requirements.

Critical

Why it matters

Modern MCUs offer multiple sleep modes trading off power savings against wake-up time. System ON sleep on nRF52 uses ~1.5µA while System OFF uses ~0.3µA but requires full reboot. Choose based on your application's responsiveness requirements.

Code example

/* Zephyr power management */
#include <pm/pm.h>
#include <pm/device.h>

void enter_low_power(void) {
    /* Disable unused peripherals */
    pm_device_action_run(uart_dev, PM_DEVICE_ACTION_SUSPEND);
    
    /* Enter low power mode - Zephyr handles this automatically
       when idle if CONFIG_PM=y */
}

/* Wake sources: GPIO, timer, or BLE events */

Tips

  • Configure proper wake sources before entering deep sleep
  • Retain RAM contents if faster wake-up is needed
  • Use RTC for periodic wake-ups instead of busy-waiting
  • Test wake-up latency meets your requirements

Optimize Radio Usage

Minimize radio-on time for BLE, WiFi, and LTE to dramatically reduce power consumption.

High Priority

Why it matters

Radio transmission is typically the highest power consumer in wireless devices. Optimize by reducing advertising intervals, using connection parameter updates, batching data transmissions, and leveraging low-power modes like BLE's sniff subrating.

Code example

/* Optimized BLE connection parameters */
static struct bt_le_conn_param conn_params = {
    .interval_min = 80,   /* 100ms - balance latency vs power */
    .interval_max = 160,  /* 200ms */
    .latency = 4,         /* Skip up to 4 intervals */
    .timeout = 400,       /* 4 seconds supervision timeout */
};

/* Request parameter update after connection */
bt_conn_le_param_update(conn, &conn_params);

Tips

  • Increase advertising interval when not actively seeking connections
  • Use longer connection intervals for low-bandwidth applications
  • Batch sensor data and transmit in bursts
  • Consider using BLE coded PHY for longer range at lower power

Manage Peripheral Power

Disable unused peripherals and use low-power alternatives when possible.

High Priority

Why it matters

Even idle peripherals consume power. Disable peripheral clocks when not in use, use GPIO interrupts instead of polling, and choose low-power peripheral modes. On Nordic devices, use the PPI system to connect peripherals without CPU intervention.

Tips

  • Use Zephyr's PM_DEVICE API to suspend/resume peripherals
  • Configure unused pins as disconnected inputs
  • Use timer callbacks instead of busy-wait delays
  • Leverage hardware PWM instead of software bit-banging

OTA Updates

Ship updates safely — dual-bank A/B partitioning, signed images, resumable downloads, and small deltas.

Implement Dual-Bank Updates

Use A/B partitioning to ensure safe firmware updates with automatic rollback capability.

Critical

Why it matters

Dual-bank (A/B) updates write new firmware to an inactive partition while the device runs from the active one. After verification, the bootloader switches to the new image. If the update fails or the new firmware is faulty, automatic rollback to the previous version ensures the device remains operational.

Code example

/* MCUboot partition layout in DTS */
/ {
    chosen {
        zephyr,code-partition = &slot0_partition;
    };
};

&flash0 {
    partitions {
        boot_partition: partition@0 { /* MCUboot */ };
        slot0_partition: partition@10000 { /* Active */ };
        slot1_partition: partition@80000 { /* Staging */ };
        scratch_partition: partition@f0000 { /* Swap */ };
    };
};

Tips

  • Use MCUboot for production-ready secure boot and updates
  • Test rollback scenarios thoroughly
  • Include self-test code that confirms boot within timeout
  • Plan flash layout early - changing partitions later is difficult

Sign and Verify Images

Cryptographically sign firmware images to prevent unauthorized code execution.

Critical

Why it matters

Firmware signing ensures only authorized code runs on your device. MCUboot supports RSA, ECDSA, and ED25519 signatures. The bootloader verifies the signature before accepting an update, protecting against both malicious and corrupted firmware.

Code example

# Signing with MCUboot's imgtool
west sign -t imgtool -- \
    --key root-ec-p256.pem \
    --version 1.2.0 \
    --header-size 0x200 \
    --slot-size 0x70000

# Verification happens automatically at boot
# MCUboot checks signature before jumping to app

Tips

  • Store signing keys securely - never commit to source control
  • Use hardware security modules (HSM) for production signing
  • Implement key revocation strategy for compromised keys
  • Version your firmware and track which devices have which version

Handle Update Failures

Implement robust error handling for network failures, power loss, and corrupted downloads.

High Priority

Why it matters

OTA updates can fail at any point due to network issues, power loss, or flash errors. Implement resumable downloads, verify image integrity before applying, and ensure the bootloader can always recover to a known-good state.

Code example

int ota_download_image(const char *url) {
    size_t offset = ota_get_download_progress();
    
    while (offset < image_size) {
        int ret = http_download_chunk(url, offset, chunk_buf);
        if (ret < 0) {
            /* Save progress and retry later */
            ota_save_progress(offset);
            return ret;
        }
        
        ret = flash_write(slot1_addr + offset, chunk_buf, ret);
        if (ret < 0) {
            return ret;
        }
        
        offset += ret;
        ota_save_progress(offset);
    }
    
    return ota_verify_image();
}

Tips

  • Implement chunk-based downloads with progress persistence
  • Verify complete image hash before confirming update
  • Use watchdog to detect stuck boot loops
  • Test update process with simulated failures

Minimize Update Size

Use delta updates or compression to reduce bandwidth and update time.

Medium Priority

Why it matters

Full image updates can be hundreds of kilobytes. Delta updates transmit only changed bytes, reducing download time and cellular/power costs. Zephyr and MCUboot support LZMA compression, and tools like Memfault provide delta update infrastructure.

Tips

  • Enable image compression in MCUboot configuration
  • Consider delta update solutions for large codebases
  • Track binary size changes in your CI pipeline
  • Optimize code and remove debug symbols for production

Memory Optimization

Stay inside flash and RAM budgets with stack analysis, memory pools, packed structs, and link-time optimization.

Monitor Stack Usage

Track thread stack usage to prevent overflows and optimize memory allocation.

Critical

Why it matters

Stack overflows are a common cause of embedded system crashes and can be difficult to debug. Zephyr provides stack usage analysis tools. Size stacks appropriately - too small causes crashes, too large wastes precious RAM.

Code example

/* Enable stack analysis in prj.conf */
CONFIG_THREAD_ANALYZER=y
CONFIG_THREAD_ANALYZER_USE_PRINTK=y
CONFIG_THREAD_ANALYZER_AUTO=y
CONFIG_THREAD_ANALYZER_AUTO_INTERVAL=5

/* Zephyr will print stack usage:
 * Thread: main
 * Stack size: 2048
 * Stack used: 1456
 * Stack unused: 592
 */

Tips

  • Add safety margin (20-30%) to measured stack usage
  • Enable stack canaries during development
  • Avoid large local arrays - use static or heap allocation
  • Profile stack usage under worst-case conditions

Use Memory Pools

Prefer fixed-size memory pools over dynamic allocation for predictable behavior.

Critical

Why it matters

Dynamic memory allocation (malloc/free) can lead to fragmentation and non-deterministic timing. Memory pools allocate fixed-size blocks, eliminating fragmentation and providing O(1) allocation time. Zephyr provides k_mem_slab and k_mem_pool for this purpose.

Code example

/* Define a memory slab for sensor readings */
K_MEM_SLAB_DEFINE(sensor_slab, 
    sizeof(struct sensor_reading), 
    16,  /* 16 blocks */
    4);  /* 4-byte alignment */

void *alloc_reading(void) {
    void *ptr;
    if (k_mem_slab_alloc(&sensor_slab, &ptr, K_NO_WAIT) == 0) {
        return ptr;
    }
    return NULL;
}

void free_reading(void *ptr) {
    k_mem_slab_free(&sensor_slab, ptr);
}

Tips

  • Size pools based on maximum concurrent allocations
  • Use different pools for different object types
  • Monitor pool utilization in debug builds
  • Consider static allocation if pool size is always known

Optimize Data Structures

Pack structures, use appropriate types, and minimize memory fragmentation.

High Priority

Why it matters

Compiler padding can waste significant memory in structures. Use __attribute__((packed)) carefully (it may impact performance), order struct members by size, and choose the smallest integer type that fits your data range.

Code example

/* Unoptimized: 12 bytes due to padding */
struct sensor_bad {
    uint8_t  type;      /* 1 byte + 3 padding */
    uint32_t value;     /* 4 bytes */
    uint8_t  status;    /* 1 byte + 3 padding */
};

/* Optimized: 8 bytes, no padding */
struct sensor_good {
    uint32_t value;     /* 4 bytes */
    uint8_t  type;      /* 1 byte */
    uint8_t  status;    /* 1 byte */
    uint8_t  reserved[2]; /* Explicit padding */
};

Tips

  • Order struct members from largest to smallest
  • Use sizeof() to verify expected structure sizes
  • Consider bit-fields for boolean flags
  • Use enums with explicit uint8_t backing type

Reduce Flash Usage

Minimize code size through compiler optimization, dead code elimination, and link-time optimization.

High Priority

Why it matters

Flash memory is limited on MCUs. Use compiler flags for size optimization, remove unused code and libraries, and consider link-time optimization (LTO). Zephyr's Kconfig system helps by only including enabled features.

Code example

# CMakeLists.txt optimizations
target_compile_options(app PRIVATE
    -Os           # Optimize for size
    -ffunction-sections
    -fdata-sections
)

target_link_options(app PRIVATE
    -Wl,--gc-sections  # Remove unused sections
)

# prj.conf - disable unused features
CONFIG_PRINTK=n           # If not needed
CONFIG_LOG=n              # For production
CONFIG_ASSERT=n           # For production

Tips

  • Use 'west build -t rom_report' to analyze flash usage
  • Remove debug features in production builds
  • Consider storing large const data in external flash
  • Use LTO for additional code size reduction

Safety & Security

Watchdogs, input validation, secure communication, and graceful error handling for production-grade firmware.

Implement Watchdog Timers

Use watchdog timers to recover from system hangs and ensure continuous operation.

Critical

Why it matters

Watchdog timers reset the system if software fails to 'kick' them periodically. This provides recovery from infinite loops, deadlocks, and other fault conditions. Configure the timeout based on your longest expected operation, with margin for variability.

Code example

#include <zephyr/drivers/watchdog.h>

static const struct device *wdt = DEVICE_DT_GET(DT_ALIAS(watchdog0));
static int wdt_channel_id;

int watchdog_init(void) {
    struct wdt_timeout_cfg cfg = {
        .window.min = 0,
        .window.max = 5000,  /* 5 second timeout */
        .callback = NULL,    /* Reset on timeout */
    };
    
    wdt_channel_id = wdt_install_timeout(wdt, &cfg);
    return wdt_setup(wdt, WDT_OPT_PAUSE_HALTED_BY_DBG);
}

void main_loop(void) {
    while (1) {
        do_work();
        wdt_feed(wdt, wdt_channel_id);  /* Kick the dog */
    }
}

Tips

  • Feed watchdog in main loop, not ISRs
  • Set timeout longer than worst-case processing time
  • Use multiple watchdog channels for monitoring different tasks
  • Pause watchdog during debugging to prevent reset cycles

Validate Input Data

Always validate data from external sources (sensors, communication interfaces) before processing.

Critical

Why it matters

External data can be corrupted, out of range, or maliciously crafted. Always validate before use. Check bounds, verify checksums, and sanitize inputs to prevent buffer overflows, crashes, and security vulnerabilities.

Code example

typedef struct {
    uint8_t cmd;
    uint16_t length;
    uint8_t data[MAX_PAYLOAD];
    uint16_t crc;
} packet_t;

int process_packet(const uint8_t *buf, size_t len) {
    /* Validate minimum size */
    if (len < sizeof(packet_t) - MAX_PAYLOAD) {
        return -EINVAL;
    }
    
    packet_t *pkt = (packet_t *)buf;
    
    /* Validate length field */
    if (pkt->length > MAX_PAYLOAD) {
        return -EINVAL;
    }
    
    /* Validate CRC */
    if (calculate_crc(buf, len - 2) != pkt->crc) {
        return -EBADMSG;
    }
    
    return handle_command(pkt);
}

Tips

  • Check all array indices before access
  • Validate numeric ranges for physical quantities
  • Use checksums or CRCs for transmitted data
  • Implement rate limiting for external inputs

Secure Communication

Use encryption and authentication for wireless communication and firmware updates.

High Priority

Why it matters

Wireless communication can be intercepted or spoofed. Use TLS for IP-based protocols, enable BLE encryption and bonding, and implement message authentication. Nordic devices support hardware crypto acceleration for efficient security.

Code example

/* BLE security configuration */
static struct bt_conn_auth_cb auth_callbacks = {
    .passkey_display = on_passkey_display,
    .cancel = on_auth_cancel,
    .pairing_complete = on_pairing_complete,
};

int security_init(void) {
    bt_conn_auth_cb_register(&auth_callbacks);
    
    /* Require encryption and authentication */
    return bt_conn_set_security(conn, BT_SECURITY_L4);
}

Tips

  • Use hardware crypto when available for better performance
  • Store encryption keys in secure storage, not flash
  • Implement key rotation for long-lived devices
  • Enable BLE secure connections (LESC) for stronger pairing

Implement Error Handling

Add comprehensive error checking and recovery mechanisms for robust operation.

High Priority

Why it matters

Robust embedded systems gracefully handle errors rather than crashing. Check return values, implement retry logic for transient failures, and define clear error recovery procedures. Log errors for later diagnosis.

Code example

int sensor_read_with_retry(sensor_data_t *data) {
    int ret;
    int retries = 3;
    
    while (retries-- > 0) {
        ret = sensor_read(data);
        
        if (ret == 0) {
            return 0;  /* Success */
        }
        
        if (ret == -ENODEV) {
            /* Sensor disconnected - no point retrying */
            LOG_ERR("Sensor not found");
            return ret;
        }
        
        LOG_WRN("Sensor read failed, retrying...");
        k_msleep(100);
    }
    
    LOG_ERR("Sensor read failed after retries");
    return ret;
}

Tips

  • Distinguish between recoverable and fatal errors
  • Use exponential backoff for retry delays
  • Log enough context to diagnose issues remotely
  • Consider safe fallback modes for critical systems

Testing & Debugging

Catch bugs early with unit tests, structured logging, edge-case coverage, and end-to-end integration testing.

Unit Test Your Code

Write unit tests for critical functions to catch bugs early in development.

High Priority

Why it matters

Unit tests verify individual functions work correctly in isolation. For embedded systems, use frameworks like Ztest (Zephyr's native framework) or Unity. Mock hardware dependencies to run tests on your development machine.

Code example

/* Zephyr Ztest example */
#include <ztest.h>
#include "checksum.h"

static void test_checksum_empty(void) {
    uint8_t data[] = {};
    uint16_t result = calculate_checksum(data, 0);
    zassert_equal(result, 0, "Empty data should return 0");
}

static void test_checksum_known_value(void) {
    uint8_t data[] = {0x01, 0x02, 0x03, 0x04};
    uint16_t result = calculate_checksum(data, sizeof(data));
    zassert_equal(result, 0x0A0A, "Checksum mismatch");
}

ZTEST_SUITE(checksum_tests, NULL, NULL, NULL, NULL, NULL);
ZTEST(checksum_tests, test_checksum_empty);
ZTEST(checksum_tests, test_checksum_known_value);

Tips

  • Test edge cases: empty input, maximum values, null pointers
  • Run tests on both host and target hardware
  • Integrate tests into your CI/CD pipeline
  • Aim for high coverage of critical code paths

Use Debug Interfaces

Leverage JTAG/SWD debugging tools and logging to troubleshoot issues efficiently.

High Priority

Why it matters

Hardware debuggers (J-Link, ST-Link) provide breakpoints, memory inspection, and peripheral register views. Combined with RTT logging, you can debug without affecting timing-sensitive code. Use Zephyr's logging subsystem for structured output.

Code example

#include <zephyr/logging/log.h>
LOG_MODULE_REGISTER(sensor, LOG_LEVEL_DBG);

int sensor_read(sensor_data_t *data) {
    LOG_DBG("Starting sensor read");
    
    int ret = i2c_read(i2c_dev, buf, len, addr);
    if (ret < 0) {
        LOG_ERR("I2C read failed: %d", ret);
        return ret;
    }
    
    LOG_INF("Sensor value: %d", data->value);
    LOG_HEXDUMP_DBG(buf, len, "Raw data:");
    
    return 0;
}

Tips

  • Use Segger RTT for low-impact logging
  • Configure log levels per module for focused debugging
  • Use hardware breakpoints to catch memory corruption
  • Profile code with cycle-accurate timing via ITM/ETM

Test Edge Cases

Test boundary conditions, error scenarios, and resource limitations thoroughly.

Medium Priority

Why it matters

Bugs often hide at boundaries - buffer limits, integer overflow points, and timing edges. Test what happens when resources are exhausted, when operations are interrupted, and when inputs are at their limits.

Tips

  • Test with minimum and maximum input values
  • Simulate memory exhaustion and resource starvation
  • Test with unreliable communication (dropped packets, timeouts)
  • Verify behavior at temperature and voltage extremes

Perform Integration Testing

Test complete system integration including hardware, firmware, and external interfaces.

High Priority

Why it matters

Integration tests verify that components work together correctly. This includes testing communication protocols, sensor fusion algorithms, and end-to-end functionality. Use real hardware and simulate external systems when needed.

Tips

  • Create automated test fixtures with real hardware
  • Test communication with actual mobile apps and gateways
  • Verify long-running stability (hours or days)
  • Test firmware update process end-to-end

Remember

These best practices are guidelines based on industry experience. Always adapt them to your specific project requirements, hardware constraints, and regulatory standards. Consistency and documentation are key to maintainable embedded systems.

AI-Powered

Apply these practices automatically with FirmwareMaestro

Generate Zephyr / nRF Connect SDK projects that ship with MISRA-aware code, sensible Kconfig, MCUboot wiring, and watchdog scaffolding from day one.