ENG | Zephyr RTOS: Because Arduino was too easy.
Reflection on Zephyr RTOS. Idea of device tree and hardware abstraction sounds great, but reality is cryptic compiler error messages inside macro expansion, many edge cases where abstraction fails, and breaking changes.
I was playing with Zephyr a few weeks in June and later October/November. Here are some of my reflections.
I’m not sure if it is good idea to use it. Yes, Arduino is stupid simple - take existing library, don’t ask questions, use it if it works.
I may question using library for something as simple as RTC, where integration could be roughly 20 lines of code on top of I2C layer, but many libraries for Arduino are so simple, they are barely justified. Pin layouts are not separated from code, but there might be config.h that allows changing a few things such as wifi password, pins and so on.
Zephyr tries to separate hardware from code by introducing device tree. Changing temperature sensor or RTC does not need touching the source code at all. Instead if needs enabling driver in prj.conf file, to specify sensor under i2c or spi bus with defined pins and bus speed, sensor may have defined mode of operation and binding to source is done via aliases or chosen section. You may swap microcontroller, write different device tree overlay with different pins or board definition file and source code should work. So far, so good.
But it comes at cost. Source code is quite straightforward and easy to debug. Device tree is not. Not adding required property to node, typos in device tree or C binding code (macros) results in pages of error messages which do not make any sense, other than there is something wrong with node __device_dts_ord_42. Or macro DT_... failed to compile and error is repeated literally 50 times because of recursion.
Well. Let’s call this skill issue. I’m self-learning Zephyr and Google and LLMs are the only help.
Best part of it is, that abstraction does not always work in real life, because devices have their own specifics, limitations, or quirks.
For example displays:
- PCD8544 (alias Nokia 5110 display, 84x48 pixels)
- no driver exists, technically it’s easy to write
- ST7567 (monochrome 128x64 display)
- does not support rotation, but likely can work upside down with
column_offset=<4>; segment-inv-dir; - updates must be aligned to lines - y coordinate divisible by 8 which breaks LVGL partial updates
- does not support rotation, but likely can work upside down with
- ILI9341 (TFT TN 240x320, Elecrow 2.8”)
- no issues
- ST7789V (TFT IPS 240x320, Pimoroni 2.8” Display Pack)
- no rotation supported
- almost no default initialization values so they need to be defined in device tree
- some required values for initialization are not even used in drivers for Pico SDK
- ST1680
- eInk driver, no experience yet
- likely does not support 4 color grayscale
All SPI displays actually suffer from one breaking change - they now belong under mipi_dpi node which defines dc-gpios, reset-gpios or similar. So no code examples found on github older than one year work today.
There are many cases, where Zephyr and Device tree create overhead without too much benefits. Initialization of ST7567 display is quite easy (as shown earlier in MicroPython code without driver). Realizing that LVGL may call partial updates with unasserted coordinates took some time - it was clear when LLM proposed doing exactly that in callback and it needed to read driver source code. So is there any benefit in using driver?
I have the very same question with ST7789 display, initialization is not so easy, but here it’s needed to find and write most of the initialization constants to device tree anyways.
At this point, I’m not sure what to do with eInk displays. They have their own specifics, such as look up tables for full, partial updates, grayscale, update window selection.
On the other hand, eInk vendor specific drivers (for PicoSDK, Arduino, …) are awful pieces of code using macros for SPI data transfer (sometimes using bit-banging of data and clock pins with suspicious timing), hardcoded values for display resolution or frame buffer size and they are basically impossible to test or maintain. So each driver is basically 500 to 800 lines of code, no code is shared with similar displays, vendor sells like 4 types of display (black&white, three color, four shades of gray, with touch) in like six sizes, now drivers exist for Arduino, ESP32, Raspberry Pico SDK and MicroPython. Good luck maintaining and testing it, because it means combinations of controllers, display and resolution. So no abstraction at all is not a good aproach either.
By the way, some questions about driver are valid even for simple devices - is it worth using RTC driver instead of 15 lines of code on top of I2C? Especially when RTC driver returns mystic error code when time is not set? Isn’t it easier to find validity bit in clock documentation rather then looking to driver source code to realize why it fails? Or simply drop dates before year 2025 as invalid?
Some sensors have features which are not exposed via API.
Where I had somewhat good experience with Zephyr was unfinished/not battle tested data logger, where it was very handy to have serial shell for testing, downloading MIME/base64 encoded data and using LFS filesystem to store each run in a separate file.
But most of the time, Zephyr RTOS feels like an overkill for small projects, such as weather station - here timing is not critical and program somewhat runs in a main loop, not using different threads.
Hopefully, for more complex systems with some data consumers and producers, it pays off. But I would still question stability of the API.
And I will likely continue trying it.
