I am currently running 8 rfm69 temperature sensors and have been doing so for over 8 months. They've been running well though I get a reoccurring issue where a number of my sensor stop reading and I just get a flat line on Grafana. I've managed to fiddle around to get them working again, which involves power cycling the sensors. I had the thought that it was a clashing of sensors, i.e. a number of them were trying to communicate at the same time. So last time this occurred, around 3 weeks ago, I turned off all the sensors and sequentially turned them on to ensure they were evenly spaced across the 5 minute interval in which they send.
This fixed the issue and things worked perfectly (for 3 weeks). One thing I also found while fault finding last time was that although the wake and send time are set the same on all the rfm69 arduino's (5 mins) they are very inaccurate, being anywhere up to 30 seconds out. Meaning over time they will all start clashing again.
And sure enough, 3 weeks later they are clashing again.
Unfortunately when the clashing starts to occur the data no longer gets collected for a very long time (I've never waited long enough to see if it even recovers as it takes too long).
So my question is, is there a way to change the code at either the rfm end or hub end so it deals with clashing in a smarter way? Maybe in the rfm end, add a random time between retrying to connect?
Hmmm, interesting, kisa. Eight seems a small number to clash. I believe I've had up to 11 or 12 with no clashes.
Before we consider a possible fix, please provide more information: Do you have more nodes besides the 8 with temperature sensors? What is the version of the sketch that programmed your RFM69 gateway?
Which version of the choose_nodes sketch programmed the nodes? What sketch customizing did you do? Did you give every node their own unique NODEID? Are the nodes all programmed the same except for NODEID? Which temperature sensor are you using?
kisa: "the wake and send time are set the same on all the rfm69 arduino's (5 mins) they are very inaccurate, being anywhere up to 30 seconds out"
papa: Apparently you are using a sleep method in the sketch & maybe have the nodes on batteries, correct? For sleep method, did you activate #define SLEEPY? You say wake & send after 5 minutes. Does that mean you set the Loops variable to 37 (37 Loops of 8 seconds = 296 seconds, almost five minutes)?
For EACH send occasion, computourist designed the txRadio() function to re-transmit up to 7 time before giving up. Device 9 tracks the count of re-transmissions & the choose_nodes sketch default is to send device 9's results. Try connecting one of the nodes to your computer's USB, watch the Device 9 values on the Arduino IDE serial monitor, & let me know what values you get.
PS Do an internet search for "Arduino accuracy of watch dog timer." Inaccuracy of that timer is known. Perhaps think of it this way: Sleep gives us a way to save power on batteries & the watch dog timer (though not precise) gives us a means to wake the sleeper & get results.
Do you have more nodes besides the 8 with temperature sensors? No, that is all that is running, 7 are identical in design and one also measures the level in the rain water tank as we discussed a while back (it's working well). Which version of the choose_nodes sketch programmed the nodes? Not sure exactly what you're after but, I'm using GW 2.5.1 as the gateway and Computourist_v2. What sketch customizing did you do? Very little except activating the ds18 and ds18b and all standalone sensors and activating what I believe was the sump option and sending raw values from that. Been a while so don't remember exactly. Did you give every node their own unique NODEID? Yes (of course ) Are the nodes all programmed the same except for NODEID? Yes except for the one that measures tank level as well. This will unavoidably cause a different delay in transmitting its values, but that being said they are all slight different which I proved through timing the transmit times, while all the delays were the same. Which temperature sensor are you using? ds18b20 Apparently you are using a sleep method in the sketch & maybe have the nodes on batteries, correct? I am using the sleep mode by activating define SLEEPY, but I am no longer using batteries as they only lasted a month, so all units are powered from mains. I kept using SLEEPY to save on power. For sleep method, did you activate #define SLEEPY? Yes. You say wake & send after 5 minutes. Does that mean you set the Loops variable to 37 (37 Loops of 8 seconds = 296 seconds, almost five minutes)? int Loops = 37;
I left node 55 off for about 15 minutes, then when I plug it back in I get this while monitoring the gateway:-
And this just keeps going, this time with both node 55 and node 56 repeating. A new line every second or so.
As I mentioned due to the imperfection in time delay they will always eventually clash, whether it's weeks or months away. So my thought it to add a random delay time (say 5s - 45s) on each temperature sensor before it tries to transmit again.
Found node 50 gave an error 3 when connecting (no other time), moved it's location slightly and it connected fine, then all the other nodes seem to be behaving correctly, with all the repeating stopped. Seems odd that node 500 would cause issue in communication with other nodes???
Actually, I did see this error a little while ago and didn't bother to chase it down as the temperatures were still reading correctly, so maybe it's a combination of things that's caused the issue?
kisa, thanks for answering my questions & providing the serial monitor (SM) output. That helps fill in the picture.
kisa: "this just keeps going: this time with both node 55 and node 56 repeating. A new line every second or so. ... As I mentioned due to the imperfection in time delay they will always eventually clash, whether it's weeks or months away. So my thought it to add a random delay time (say 5s - 45s) on each temperature sensor before it tries to transmit again."
papa: The SM output plus what I underlined in your quote above really tells a story: The issues look like more than the not precise watch dog timer. Nodes communicating every second to the Gateway could flood the Gateway & cause clashes. Node 55 took several times to wake up. Node 55's device 9 (how many times it retried to send a message) was 6, very high, perhaps because the Gateway is flooded. The nodes should not be reporting so often.
What to try now: I recommend that you reprogram the gateway with the latest sketch AND customizing starting here. Also reprogram the nodes from the latest sketch AND basic customizing from here. Then for each node, before you upload, add the customizing for its NODEID & its functions (perhaps using links from this index). Since you are using mains power instead of batteries, do NOT use SLEEPY mode for now. (BTW, the batteries may have run down quickly from the nodes' frequent reporting. Sending takes the most power.)
Again, try reprogramming the gateway & all the nodes. Then report back here with more gateway serial monitor results. From the latest node sketch, the nodes should report only every 60 seconds.
More questions if the above does not help greatly: Do you have any rules that relate to the nodes that send so often? What is your radio frequency? How long is your antenna? Is the antenna solid or stranded wire? What distance & obstacles are between Gateway & nodes? Can you elevate the gateway in relation to the nodes?
Thanks again. It will take me some time before I can reprogram the devices as I want to go through the code thoroughly to ensure it's all correct and some of the nodes are hard to get access too. I'll let you know once it's done and how it goes. Regards.