Post by papa on Feb 28, 2016 16:10:21 GMT
Responding to another thread, this thread topic came to mind as a poster reported persistent communication glitches in his system & computourist responded as follows:
Hi sam4205, You seem to have some trouble when 'stress-testing' the gateway & endnode. Your findings are consistent with the way things are designed/set up.
Main requirement when starting this project was simplicity and duplex communication. It was designed as a data gathering / control network with limited real-time performance. When measuring temperature one or 2 seconds don't count...
The design is based on:
- Software (both gateway and end node) is designed around a simple loop.
- Interrupts are blocked in the gateway in order to share the SPI bus between 2 users: RFM69 and ethernet.
- Interrupts are not used to flag external events; the normal program flow is not interrupted.
This means that this design is prone to 'collisions' and will only function if:
- Events do not occur at the same time or in very short timing intervals.
- The amount of end nodes and messages that share a radio link (network ID) is limited
- Data transfer allows for delays, so retransmission is a viable option to increase reliability.
The 'loop' design is a simple setup and easy to understand/adapt; it means all processes take place one after the other.
This also means that when the gateway is talking to Mosquitto, it cannot receive or send data over the radio link and vice versa.
It also means that if an end-node is debouncing a switch or writing to an LCD it cannot send/receive data.
Not using interrupts in such a setup means you are bound to loose some data on the radio link. A retransmission scheme is used to deal with this. Since the gateway-radio can be offline for quite some time (when handling/decoding MQTT messages FE) it seemed wise to choose a rather long retransmission interval (.5 Seconds) that would lead to rather long response times in case of data collision. But I prefer a long delay over loosing data ;-)
In the end it all comes down to a tradeoff between:
- Simplicity of program setup
- Number of nodes/messages that need to be served
- Response time required.
Of course things could be improved a lot by choosing an interrupt driven , store/forward, message stack based design. You state you are a software person, so you might be able to improve ...
Given the current design you could try the following:
- Decrease the transmission retry interval. It is set at an arbitrary value of 0.5 seconds, and I have not experimented with that.
- Block radio retransmission by setting the number of reTx to 1. Error handling would need to be done in Openhab, but your response times will definitely improve.
- Examine the load on your MQTT broker. If MQTT response time is bad, your radio response times will suffer.
- Examine the software on your node. Keep the event-handling part as short as possible. If needed, offload event handling to external hardware or an extra Arduino.
Some remarks on your [sam4205] post are inline below:
yesterday at 3:21am sam4205 said: Thanks Computourist, appreciate your inputs.
I agree with you about the expectations from this simple design and other ideas to build out a complex design. I am a software person so don't have lot of hardware knowledge
I really like the idea to let the node do most of the stuff and push the info instead of pull by gateway, as suggested by papa, greg and yourself. I have tried those tweaks and while troubleshooting, I found some basic issue in my setup related to long time needed for communications.
One of the post, I stumbled across, was on the LowPowerLabs forum, where Felix mentioned that typically messages should take less than 100 ms to communicate with reasonable strength (~-70db or more).
--> I agree, but the response time in this system is also dependent on performance of MQTT/Openhab: how soon is the gateway available again after handling MQTT messages. Many forum members run Mosquitto & Openhab on a Raspberri Pi with limited performance. Handling MQTT by the Pi is a factor to consider.
In my setup, I have push switches on the node that could be toggled either by push or via openhab. Leaving two toggles out of picture, and while trying just one switch at a time, I see issue in my design. Ack times is very variable and has wide ranges.
--> as explained above. Also: debouncing pushbuttons in software is NOT good for your performance...
I have attached the screen shot with MQTT log and message time from Serial.
- After wakeup, the nodes sends 0, 2, 17,18,19,20 and 48 status to gateway
- First line in the serial shows Dev19 took only 137 ms (highlighted in red)
- Dev20 took 941 ms
- Dev48 took 288 ms
- After few seconds, I toggled the Dev17 on openhab and it was immediately received by the node(my led on node changes immediately) and then
it took 5636 ms to send the ACK back to the gateway, with 4 retries. Between that 5636ms, the gateway reported connection lost to node.
- After few seconds, I toggled Dev18 on opehab and it was also immediately received by the node and my led toggled immediately. But ACK back to the gateway took 328 ms with 1 retry.
- The RF strength of -46db seems to be a strong strength, but still its taking longer time to send back the ACK.
--> As explained above: your signal strength is NOT the limiting factor, handling concurrent events is. The "connection lost" report is an immediate result of the gateway being busy with other things than radio reception.
I don't know if the issue is at the Node or at the Gateway. I have seen messages being sent in less than 100 ms, but only sometimes. Node and Gateway are just 20 ft away.
--> Again: signal strength is not the issue...
Main requirement when starting this project was simplicity and duplex communication. It was designed as a data gathering / control network with limited real-time performance. When measuring temperature one or 2 seconds don't count...
The design is based on:
- Software (both gateway and end node) is designed around a simple loop.
- Interrupts are blocked in the gateway in order to share the SPI bus between 2 users: RFM69 and ethernet.
- Interrupts are not used to flag external events; the normal program flow is not interrupted.
This means that this design is prone to 'collisions' and will only function if:
- Events do not occur at the same time or in very short timing intervals.
- The amount of end nodes and messages that share a radio link (network ID) is limited
- Data transfer allows for delays, so retransmission is a viable option to increase reliability.
The 'loop' design is a simple setup and easy to understand/adapt; it means all processes take place one after the other.
This also means that when the gateway is talking to Mosquitto, it cannot receive or send data over the radio link and vice versa.
It also means that if an end-node is debouncing a switch or writing to an LCD it cannot send/receive data.
Not using interrupts in such a setup means you are bound to loose some data on the radio link. A retransmission scheme is used to deal with this. Since the gateway-radio can be offline for quite some time (when handling/decoding MQTT messages FE) it seemed wise to choose a rather long retransmission interval (.5 Seconds) that would lead to rather long response times in case of data collision. But I prefer a long delay over loosing data ;-)
In the end it all comes down to a tradeoff between:
- Simplicity of program setup
- Number of nodes/messages that need to be served
- Response time required.
Of course things could be improved a lot by choosing an interrupt driven , store/forward, message stack based design. You state you are a software person, so you might be able to improve ...
Given the current design you could try the following:
- Decrease the transmission retry interval. It is set at an arbitrary value of 0.5 seconds, and I have not experimented with that.
- Block radio retransmission by setting the number of reTx to 1. Error handling would need to be done in Openhab, but your response times will definitely improve.
- Examine the load on your MQTT broker. If MQTT response time is bad, your radio response times will suffer.
- Examine the software on your node. Keep the event-handling part as short as possible. If needed, offload event handling to external hardware or an extra Arduino.
Some remarks on your [sam4205] post are inline below:
yesterday at 3:21am sam4205 said: Thanks Computourist, appreciate your inputs.
I agree with you about the expectations from this simple design and other ideas to build out a complex design. I am a software person so don't have lot of hardware knowledge
I really like the idea to let the node do most of the stuff and push the info instead of pull by gateway, as suggested by papa, greg and yourself. I have tried those tweaks and while troubleshooting, I found some basic issue in my setup related to long time needed for communications.
One of the post, I stumbled across, was on the LowPowerLabs forum, where Felix mentioned that typically messages should take less than 100 ms to communicate with reasonable strength (~-70db or more).
--> I agree, but the response time in this system is also dependent on performance of MQTT/Openhab: how soon is the gateway available again after handling MQTT messages. Many forum members run Mosquitto & Openhab on a Raspberri Pi with limited performance. Handling MQTT by the Pi is a factor to consider.
In my setup, I have push switches on the node that could be toggled either by push or via openhab. Leaving two toggles out of picture, and while trying just one switch at a time, I see issue in my design. Ack times is very variable and has wide ranges.
--> as explained above. Also: debouncing pushbuttons in software is NOT good for your performance...
I have attached the screen shot with MQTT log and message time from Serial.
- After wakeup, the nodes sends 0, 2, 17,18,19,20 and 48 status to gateway
- First line in the serial shows Dev19 took only 137 ms (highlighted in red)
- Dev20 took 941 ms
- Dev48 took 288 ms
- After few seconds, I toggled the Dev17 on openhab and it was immediately received by the node(my led on node changes immediately) and then
it took 5636 ms to send the ACK back to the gateway, with 4 retries. Between that 5636ms, the gateway reported connection lost to node.
- After few seconds, I toggled Dev18 on opehab and it was also immediately received by the node and my led toggled immediately. But ACK back to the gateway took 328 ms with 1 retry.
- The RF strength of -46db seems to be a strong strength, but still its taking longer time to send back the ACK.
--> As explained above: your signal strength is NOT the limiting factor, handling concurrent events is. The "connection lost" report is an immediate result of the gateway being busy with other things than radio reception.
I don't know if the issue is at the Node or at the Gateway. I have seen messages being sent in less than 100 ms, but only sometimes. Node and Gateway are just 20 ft away.
--> Again: signal strength is not the issue...