|
Post by acekrystal on Sept 13, 2016 10:17:06 GMT
I have piGateway running for several day's now and testing its stability.
For now I have never run it longer then a week due to all kind of updates where doing to our (what we started calling) SensorHub.
This is now the second time in ~2,5 weeks that I see the piGateway crash with the message:
Pinging node 14 - ACK - nothing![14] to [1] Received Node ID = 14 Device ID = 6 Time = 65676711 RSSI = -59 var2 = 21.843674 var3 = 62.580299 [14] to [1] Received Node ID = 14 Device ID = 7 Time = 65677785 RSSI = -65 var2 = 1022.000000 var3 = 0.000000 [13] to [1] Received Node ID = 13 Device ID = 6 Time = 699452682 RSSI = -60 var2 = 21.982529 var3 = 66.277565 *** stack smashing detected ***: ./Gateway terminated Aborted
I'm currently digging into how I can get some more logs that help me find the source of this problem. I'm guessing it has to do something with a value that is getting to big and does not fit anymore in the reserved memory for that value. Though it does not show where this is yet.
I'm digging into this, but might someone have some advice for me, be welcome =^.^= Will post results here of course
|
|
|
Post by acekrystal on Sept 13, 2016 15:35:45 GMT
I'm now planning to try to following to find the origin:
1. Enable core dumps:
$ ulimit -c 464
2. By default -fstack-protector-all is already enabled on all g++ compiling in latest Ubuntu, so we are just going to run the Gateway in debug mode:
gdb ./Gateway
3. Wait for it to crash again, and use "bt" command to (hopefully) read what function is causing this problem:
> bt
4. I'm expecting a result like:
*** stack smashing detected ***: /home/odroid/Treco/gcc/test2 terminated
Program received signal SIGABRT, Aborted. __libc_do_syscall () at ../ports/sysdeps/unix/sysv/linux/arm/libc-do-syscall.S:44 44 ../ports/sysdeps/unix/sysv/linux/arm/libc-do-syscall.S: No such file or directory. (gdb) bt #0 __libc_do_syscall () at ../ports/sysdeps/unix/sysv/linux/arm/libc-do-syscall.S:44 #1 0xb6ee7ebe in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56 #2 0xb6eea716 in __GI_abort () at abort.c:89 #3 0xb6f0e12c in __libc_message (do_abort=do_abort@entry=1, fmt=0xb6f92e74 "*** %s ***: %s terminated\n") at ../sysdeps/posix/libc_fatal.c:175 #4 0xb6f5f852 in __GI___fortify_fail ( msg=0xb6f92e54 "stack smashing detected") at fortify_fail.c:38 #5 0xb6f5f812 in __stack_chk_fail () at stack_chk_fail.c:28 #6 0x0000864c in check_password(char*) () #7 0x32313230 in ?? () Backtrace stopped: previous frame identical to this frame (corrupt stack?) Hope we I will get a clue this way. Although I might already have an idea where the problem is comming from. It might come from the strangely long temperature and humidity value's i'm getting on my Gateway side. I'm expecting float_var2 (temp) en float_var3 (hum) to receive value's in the range of "21,98" and "66,27" instead of what I now have: "21.982529" and "66.277565". I'm unsure if somehow I go over the limit of precision characters. Though the fact that it crashed 2 times no only after multiple day's (~2 - 3 day's) I might also think it could be a time variable. Though I don't see how this could exceed the "unsigned long". Last value was: "699452682"
|
|
|
Post by acekrystal on Sept 14, 2016 13:07:58 GMT
Update:
It crashed already this morning with a stack smashing detected. This is re result from debug:
[13] to [1] Received Node ID = 13 Device ID = 6 Time = 811167971 RSSI = -87 var2 = 21.029221 var3 = 66.659035 [14] to [1] Received Node ID = 14 Device ID = 7 Time = 177269225 RSSI = -52 var2 = 795.000000 var3 = 0.000000 Pinging node 14 - ACK - nothing![14] to [1] Received Node ID = 14 Device ID = 4 Time = 177269344 RSSI = -68 var2 = 1.000000 var3 = 0.000000 [13] to [1] Received Node ID = 13 Device ID = 6 Time = 811169509 RSSI = -53 var2 = 20.973145 var3 = 66.677345 *** stack smashing detected ***: /home/odroid/work/2lib/HomeAutomation/piGateway/Gateway terminated
Program received signal SIGABRT, Aborted. [Switching to Thread 0xb6bd7450 (LWP 2864)] __libc_do_syscall () at ../ports/sysdeps/unix/sysv/linux/arm/libc-do-syscall.S:44 44 ../ports/sysdeps/unix/sysv/linux/arm/libc-do-syscall.S: No such file or directory. (gdb) bt #0 __libc_do_syscall () at ../ports/sysdeps/unix/sysv/linux/arm/libc-do-syscall.S:44 #1 0xb6e0aebe in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56 #2 0xb6e0d716 in __GI_abort () at abort.c:89 #3 0xb6e3112c in __libc_message (do_abort=do_abort@entry=1, fmt=0xb6eb5e74 "*** %s ***: %s terminated\n") at ../sysdeps/posix/libc_fatal.c:175 #4 0xb6e82852 in __GI___fortify_fail ( msg=0xb6eb5e54 "stack smashing detected") at fortify_fail.c:38 #5 0xb6e82812 in __stack_chk_fail () at stack_chk_fail.c:28 #6 0x0000adbc in RFM69::interruptHandler() () #7 0x0c0c0c0c in ?? () Backtrace stopped: previous frame identical to this frame (corrupt stack?) (gdb) So it seems the problem actually lies within rfm69.cpp in the function void RFM69::interruptHandler() @ line 437 (for me):
// internal function - interrupt gets called when a packet is received void RFM69::interruptHandler() {
#if defined(RASPBERRY) || defined(ODROIDC1) unsigned char thedata[67]; char i; for(i = 0; i < 67; i++) thedata[i] = 0; // printf("interruptHandler %d\n", intCount); #endif //pinMode(4, OUTPUT); //digitalWrite(4, 1); if (_mode == RF69_MODE_RX && (readReg(REG_IRQFLAGS2) & RF_IRQFLAGS2_PAYLOADREADY)) { //RSSI = readRSSI(); setMode(RF69_MODE_STANDBY); #if defined(RASPBERRY) || defined(ODROIDC1) thedata[0] = REG_FIFO & 0x7F; thedata[1] = 0; // PAYLOADLEN thedata[2] = 0; // TargetID wiringPiSPIDataRW(SPI_DEVICE, thedata, 3); delayMicroseconds(MICROSLEEP_LENGTH);
PAYLOADLEN = thedata[1]; PAYLOADLEN = PAYLOADLEN > 66 ? 66 : PAYLOADLEN; // precaution TARGETID = thedata[2]; #else select(); SPI.transfer(REG_FIFO & 0x7F); PAYLOADLEN = SPI.transfer(0); PAYLOADLEN = PAYLOADLEN > 66 ? 66 : PAYLOADLEN; // precaution TARGETID = SPI.transfer(0); #endif if(!(_promiscuousMode || TARGETID == _address || TARGETID == RF69_BROADCAST_ADDR) // match this node's address, or broadcast address or anything in promiscuous mode || PAYLOADLEN < 3) // address situation could receive packets that are malformed and don't fit this libraries extra fields { PAYLOADLEN = 0; unselect(); receiveBegin(); //digitalWrite(4, 0); return; } #if defined(RASPBERRY) || defined(ODROIDC1) DATALEN = PAYLOADLEN - 3; thedata[0] = REG_FIFO & 0x77; thedata[1] = 0; //SENDERID thedata[2] = 0; //CTLbyte; for(i = 0; i< DATALEN; i++) { thedata[i+3] = 0; }
wiringPiSPIDataRW(SPI_DEVICE, thedata, DATALEN + 3);
SENDERID = thedata[1]; uint8_t CTLbyte = thedata[2];
ACK_RECEIVED = CTLbyte & 0x80; //extract ACK-requested flag ACK_REQUESTED = CTLbyte & 0x40; //extract ACK-received flag for (i= 0; i < DATALEN; i++) { DATA[i] = thedata[i+3]; } #else DATALEN = PAYLOADLEN - 3; SENDERID = SPI.transfer(0); uint8_t CTLbyte = SPI.transfer(0);
ACK_RECEIVED = CTLbyte & RFM69_CTL_SENDACK; // extract ACK-received flag ACK_REQUESTED = CTLbyte & RFM69_CTL_REQACK; // extract ACK-requested flag interruptHook(CTLbyte); // TWS: hook to derived class interrupt function for (uint8_t i = 0; i < DATALEN; i++) { DATA[i] = SPI.transfer(0); } #endif if (DATALEN < RF69_MAX_DATA_LEN) DATA[DATALEN] = 0; // add null at end of string unselect(); setMode(RF69_MODE_RX); } RSSI = readRSSI(); //digitalWrite(4, 0);
} I'm curious now. Is this a problem purely on RBPi / Odroid? Or is this also happening on the Arduino Gateway. It could be that arduino doesn't care about it because it doesn't use the compiler option "-fstack-protector-all" as Ubuntu is doing by default. I could try and use the compiler option "-fno-stack-protector" to disable this check, but I guess this is an unwanted solution.
My guess is that there needs to be a check somewhere if the data to be written in a variable is the correct size, and report it in a different way, instead of killing the whole process.
I'm going to dig deeper, but if someone has some advice... your welcome!
|
|
|
Post by greginkansas on Sept 14, 2016 23:35:17 GMT
I think most of us use a arduino as the gateway, mines been rock steady.
|
|
|
Post by papa on Sept 15, 2016 15:06:46 GMT
I also use the Arduino plus Ethernet shield as the gateway. It has run reliably for over a year. For demonstration purposes, I have successfully installed & run OpenHAB on a Raspberry PI. However, whenever I consider making the Pi also serve Gateway functions, I see all the problems & frustrations of those who have tried that & I end up sticking with an Arduino-based gateway.
|
|
|
Post by acekrystal on Sept 20, 2016 17:04:33 GMT
papa , HAhahaha, I understand totally! But I need to make between 7 and 28 of these setups this year, so I hope I have debugged enough by then to make a simple guide on how to set it up greginkansas , I know most use arduino configurations, but I find this a totally wrong aproach for many reasons, for example: 1. limited functionality (memory, DeviceID-limits,) 2. Extra hardware, (2x arduino, 1x Ethernet shield, 2x CAT5e cables, 1x Switch, and all of it needs power) I need it to be as little hardware as possible, and as dynamic as possible. At the moment we still use the deviceID structure for the piGateway scripts(1462) and convert this to our own SensorID and ModuleID structure what is stored in a database. But we are planning to update the piGateway script so it can directly work with the Sensor- and ModuleID's out of the database. For now the problem seems to be solved as it runs stable now for 4 day's already. I think arduino is indeed not caring about the stack smashing happening and just ignores it. But the latest Ubuntu compilers do care about stack smashing by default since one of the more recent updates. So that leaves me with some options for now: 1. Just don't care about it and also let ubuntu ignore the stack smashing ("-fstack-protector-all") like arduino does; 2. Start debugging the RFM69.cpp from LowPowerLabs; 3. Add a request on GitHub from LowPowerLabs. For now I will go with option 1 and see how it runs for the comming time, later I might add a request or start debugging it myself, though I'm not really confinced that latter one would be a wise decision for me
|
|
|
Post by papa on Sept 20, 2016 18:56:22 GMT
Acekrystal, truly best wishes on getting the Raspberry Pi to work reasonably well as Gateway & OpenHAB host, especially with the capabilities you describe.
If you're reasonably successful, I & I'm sure others would be glad to see you post here "a simple guide on how to set it up."
|
|