I sure am! :)
If this wasn't fun, I wouldn't be doing it.
This complete rewrite of the Sensor Module has given me the freedom to not only fix bugs, but to do some major enhancements.
Last night I decided to add CRC32 checksums to all data sent back and forth between the Mega and ESP on the serial line. After struggling to get the CRC code to spit out the same value on the ESP and Mega (yeah, it was my fault) I was finally able to get it working, but by then, I'd been at it for almost 24 hours...
I had some sleep today, and got started about 9pm, and it's 11:30 now, and I have one "transaction" complete. It's a simple one, the ESP sends the current time to the Mega. The ESP gets the time from the database with a query as neither the ESP nor the Mega have a real time clock. I originally had one on the Mega, but it was a cheap one that drifted all the time, and I ended up having to fetch the current time from the database to keep updating the clock. Why bother, this works just fine. Remember, the modules only use the timestamps for display on the Serial Monitor. All timestamps on data being inserted or updated in the database are generated by the database, ensuring the entire system is operating on a synchronised time.
The complete transaction includes:
ESP8266 sends DateTime to Mega 2560 along with the CRC32 of everything up to the end of the closing ALLDONE tag.
The Mega 2560 parses the incoming characters building the XML strings, and doing a running CRC32 calculation. When it encounters the closing ALLDONE tag, it also grabs the value it has in its running tally, and saves it, because the next line is the CRC32 value from the ESP8266. The two values are compared, and either an ACK, or a NACK is sent back to the ESP which has been waiting for one or the other, but also will time out if nothing is received.
If a NACK is received from the Mega, the ESP8266 can resend the entire payload.
I envision that the ESP8266 could adjust the speed it is sending the data if it is not being received properly, but first I want to see just how often it fails...
The real purpose of doing the crc32 check is to determine if there are issues with the data fidelity, and if so, how to correct it. The current versions of modules which talk between a Mega and ESP all exhibit "timeouts" when waiting for data, and I'm not sure if the problem is getting data from the database (not likely) or sending it over the serial port (again, should be reliable, the two modules are on the same circuit board, and it uses hardware serial ports on both). I have always suspected the problem is due to one payload being clobbered by the arrival of another one before the current one is processed.
By being able to certify that we have received a complete unaltered payload, we can now acknowledge this back to the ESP with an ACK. If the CRC32 values do not match, we can send a Negative ACKnowledgement (NACK)
So the ESP sends the data, and now it waits for an acknowledgement, preventing it from sending any more data until this payload has been dealt with, so it solves the clobbering issue if that's what was happening.
Now I can try a larger payload, like the WiFi info...
This is awesome. The Sensor Module is booting up very reliably. I can see the initial sending of the date and time is occasionally corrupt, it looks like a buffer problem, however, the Mega detects the corruption now, and sends a NACK (Negative Acknowledgement) so the ESP simply resends it.
I'm coding the send methods so you pass the number of retries. For example, sending the date and time is more important at boot up, when the Mega doesn't know the time, and less important for the updates which are sent every 10 minutes. So at boot up, I pass a 5 for the number of retries and 1 for the updates.
Once I have the code finalized, I'll take all those variable pieces and save them in the database, such as # of retries for each type of payload, and even if it should wait for an acknowledgement, and how long, etc.
So, I'm excited, and still having fun!
When I started out on this rewrite, I acknowledged that the communication between the 2560 and 8266 was probably my weak link, although I didn't really know why that was so. Perhaps I still don't really know what causes it, but I think I've fixed it :)
All payloads will require acknowledgements, except, obviously, acknowledgements themselves. That could create quite a bouncy loop...
There are two acknowledgements, ACK and NACK.
- ACK means the crc32 checksum matched
- NACK means the crc32 checksum did not match
The only other response which the ESP may encounter is no response at all. While waiting for an acknowledgement, there is a timeout value, and if this is reached before receiving a response, it will just treat it as a NACK.
I'm going to take a little break here, regroup and figure out what's next. I think I might work on getting it to read a sensor, but I want to make sure I think it through and write code I can use, not just rush to make something work. I know it will work, so no proof of concept is needed. What I do need is a good design so I'm not wanting to rewrite this again in my lifetime :)