MultiTech Developer Resources » Topic: How to handle Node-red crashes?

How to handle Node-red crashes?

Tagged: AEP, Node-RED

This topic has 6 replies, 4 voices, and was last updated 6 years, 4 months ago by Jeff Hatch.

Viewing 7 posts - 1 through 7 (of 7 total)

Author

Posts
January 31, 2019 at 6:36 am #27143

Bob Doiron
Participant

Hi,
We’re running a few AEP conduits (FW 1.6.2) and one of them occasionally has issues where our node-red app stops sending messages to our servers. It continues to check in to devicehq and we’re able to schedule a reset to get data flowing again. As far as I can tell from the logs, it seems like node-red either crashes or hangs.

I’ve looked at setting up monit to monitor node-red, or some custom script/cron job, but it sounds like any linux mods outside of the config/app would get blown away by an AEP firmware upgrade.

As is, our system will go down for 4 to 8 hours minimum due to the latency of getting a reset request through devicehq.

Any suggestions?

January 31, 2019 at 2:52 pm #27150

Jason Reiss
Keymaster

Creating a Custom Application

You could create a custom app that performs the monitoring. It would be reinstalled after the firmware upgrade.

January 31, 2019 at 3:27 pm #27153

Bob Doiron
Participant

Thanks! That looks promising.

February 4, 2019 at 3:02 am #27158
Lawrence Griffiths
Participant
Bob do your servers send a response code to updates from Node-Red?
If you do then you could try & count the number of missed and use exe node to a Linux reboot. As it might be more of an connection issues.

Also on latest version of AEP the node-red logs are not rotated I would have a look at those.
- This reply was modified 6 years, 5 months ago by Lawrence Griffiths.
February 4, 2019 at 5:33 am #27160

Bob Doiron
Participant

Yes, our servers send a response code. I already have a watchdog of sorts built in to my node-red flow that will reboot the box if the node-red flow is unable to deliver messages for 4 hours. It also delivers lora stats every 5 minutes as a heartbeat.

Unfortunately, on several occasions we’ve found that the conduit continued to update devicehq, but was no longer delivering data to our api and my node-red watchdog didn’t activate. Nothing was written to the node-red logs either, so I concluded that node-red itself either hung up or crashed.

We’re using 1.6.2 which does have log rotation for node-red. I haven’t tried 1.6.4 yet, but I hope they didn’t un-fix the node-red log rotation. We had previous issues with the node-red log getting too big.

February 4, 2019 at 6:10 am #27161

Bob Doiron
Participant

Has anyone seen documentation for this process?

admin@mtcdt:~# ps -eaf |grep watchdog
admin 3437 1 0 Feb01 ? 00:00:10 watchdog –device /dev/watchdog –ppp
admin 18138 13069 0 12:08 pts/0 00:00:00 grep watchdog

admin@mtcdt:~# which watchdog
/sbin/watchdog

admin@mtcdt:~# watchdog –help
Usage: watchdog
–api (a) : watches and restarts api process
–ddns (i) : watches and restarts ddns process
–ppp (p) : watches and restarts ppp process
–lora (l) : watches and restarts lora process
–node-red (n) : watches and restarts node-red process
–device (d) : path to hardware watchdog device
–help (h) : prints this message

February 4, 2019 at 8:30 am #27163

Jeff Hatch
Keymaster

Bob,

The watchdog process is not documented. If you add the -node-red argument I don’t think it will provide what you need. If the node-red process is just hung and hasn’t actually disappeared this watchdog will not restart it.

BTW, there is a simple process called angel (a link to it called node-angel is used for node-red) is used to restart the node-red process when it terminates. As you have noted, I think something else is going on and the node-red process is getting into some kind of “hung” state.

A couple of things to look at when node-red gets in this state:

1) How much memory is it using: “ps auxww | grep node-angel” output should be able to tell you this.
2) Run top and see if it is consuming lots of CPU.

Are you using SSL in node-red, and therefore in node. I have seen node use a lot of memory when doing SSL for some reason, ie. ~150MB

Jeff
Author

Posts