ENG | Zephyr RTOS: Because Arduino was too easy.
Reflection on Zephyr RTOS. Idea of device tree and hardware abstraction sounds great, but reality are cryptic compiler error messages inside macro expansions, many edge cases where abstraction fails, and breaking changes.
I was playing with Zephyr a few weeks in June and later October/November - mostly just with I2C, SPI sensors, displays and I have not yet explored wifi, HTTP(S), JSON, multithreading etc.
Here are some of my reflections.
I’m not sure if it is good idea to use it. Yes, Arduino is stupid simple - take existing library, don’t ask questions, use it if it works.
I may question using library for something as simple as RTC, where integration could be roughly 20 lines of code on top of I2C layer, but many libraries for Arduino are so simple, they are barely justified. Pin layouts are not separated from code, but there might be config.h that allows changing a few things such as wifi password, pins and so on.
Zephyr tries to abstract hardware from code by introducing device tree. Changing temperature sensor or RTC does not need touching the source code at all. Instead if needs enabling driver in prj.conf file, to specify sensor under i2c or spi bus with defined pins and bus speed, sensor may have defined mode of operation and binding to source is done via aliases or chosen section. You may swap microcontroller, write different device tree overlay with different pins or board definition file and source code should work. So far, so good.
But it comes at cost. Source code is quite straightforward and easy to debug. Device tree is not. Not adding required property to node, typos in device tree or C binding code (macros) results in pages of error messages which do not make any sense, other than there is something wrong with node __device_dts_ord_42. Or macro DT_... failed to compile and error is repeated literally 50 times because of recursion. The same applies to something ommited in prj.conf file.
Well. Let’s call this skill issue. I’m self-learning Zephyr and Google, LLMs, and documentation are the only help.
Best part of it is, that abstraction does not always work in real life, because devices have their own specifics, limitations, or quirks.
For example displays:
- PCD8544 (alias Nokia 5110 display, 84x48 pixels)
- no driver exists, technically it’s easy to write
- ST7567 (monochrome 128x64 display)
- does not support rotation, but likely can work upside down with
column_offset=<4>; segment-inv-dir; - updates must be aligned to lines - y coordinate divisible by 8 which breaks LVGL partial updates
- does not support rotation, but likely can work upside down with
- ILI9341 (TFT TN 240x320, Elecrow 2.8”)
- no issues
- ST7789V (TFT IPS 240x320, Pimoroni 2.8” Display Pack)
- no rotation supported
- almost no default initialization values so they need to be defined in device tree
- some required values for initialization are not even used in drivers for Pico SDK
All SPI displays actually suffer from one breaking change - they now belong under mipi_dpi node which defines dc-gpios, reset-gpios or similar. So no code examples found on github older than one year work today.
There are many cases, where Zephyr and Device tree create overhead without too much benefits. Initialization of ST7567 display is quite easy (as shown earlier in MicroPython code without driver). Realizing that LVGL may call partial updates with unasserted coordinates took some time - it was clear when LLM proposed doing exactly that in callback and it was needed to read driver source code to find this limitation. So is there any benefit in using driver if I need to study source code to find workaround?
I have the very same question with ST7789 display, initialization is not so easy, and here it’s needed to find and write most of the initialization constants to device tree anyways. And likely more than is needed.
By the way, some questions about driver are valid even for simple devices - is it worth using RTC driver instead of 15 lines of code on top of I2C? Especially when RTC driver returns mystic error code when time is not set? Isn’t it easier to find validity bit in clock documentation rather then looking to driver source code to realize why it fails? Or simply drop dates before year 2025 as invalid? For a hobby project it’s unlikely to change sensor. For large project, it can be still viable to do it in source code and not in the device tree.
Then, some sensors have features which are not exposed via API.
Where I had somewhat good experience with Zephyr was unfinished/not battle tested data logger, where it was very handy to have serial shell for testing, downloading MIME/base64 encoded data and using LFS filesystem to store each run in a separate file. But I found shell extremely useful.
But most of the time, Zephyr RTOS feels like an overkill for small projects, such as weather station - here timing is not critical and program somewhat runs in a main loop, not using different threads. Maybe it starts to pay off when networking is added. I never used PicoSDK so I don’t know it’s capabilities and limitations.
Hopefully, for more complex systems with some data consumers and producers, it pays off. But I would still question stability of the API.
Conclusion
I’ll keep experimenting with Zephyr — not because it’s better than MicroPython for most of my projects, but because it’s a challenge and it may eventually pay off for more complex stuff.
MicroPython remains my go-to: rapid protyping (even tests using REPL), readable, often good enough.
Where MicroPython failed is nRF52840 Bluetooth LE support, which is major selling point of this microcontroller, on the other hand, Zephyr (and Nordic Semiconductor SDK) is it’s native platform and there are plenty of examples for BLE applications.
And it’s not that bad once you gain some expertise by making and solving beginner’s mistakes :-)
Evil note: and it is surprisingly good for generating traffic to this blog, so it brings some satisfaction.
