๐ง Debugging Mari's Performance - From 50% to 98% Packet Delivery
Mari performance optimization results
You can only optimize what you can measure. And measurement is crucial when dealing with scalable systems (100+ real-time nodes).
For a few weeks I was puzzled by the unstable success ratio of Mari (our new link layer for micro-robots), especially the performance degradation during over-the-air updates of large robot fleets.
This week, I finally took the time to dig deep. That meant:
- Instrumenting both the link layer and the host library to collect and show relevant metrics in real time, including packet delivery ratio (PDR) for both the radio and serial links.
- Observing execution patterns using a logic analyzer, toggling pins in carefully chosen code paths.
This led to a precise diagnosis: the radio was doing fine, but the serial link was suffering from concurrency issues and lost interrupts.
The Solution
After a set of adjustments:
- Re-prioritizing interrupts
- Adding a queue
- Making a task non-blocking
The improvement was drastic. Average PDR jumped from ~50% with 10 nodes, to 98% on the radio link and 92% on the serial connection with 100 devices.
There’s still room for refinement, but the current release already provides serious value for users.
Learn More
โก๏ธ We are building open-source tools for robotic swarms:
- Check out the project page at https://openswarm.eu/
- Our code at https://github.com/dotbots
Before and after
PDR before optimization: ~50% success rate with only 10 nodes
PDR after optimization: 98% radio link and 92% serial connection with 100 devices