How to handle Node-red crashes?

Home Forums Conduit: AEP Model How to handle Node-red crashes?

Tagged: ,

Viewing 7 posts - 1 through 7 (of 7 total)
  • Author
    Posts
  • #27143
    Bob Doiron
    Participant

    Hi,
    We’re running a few AEP conduits (FW 1.6.2) and one of them occasionally has issues where our node-red app stops sending messages to our servers. It continues to check in to devicehq and we’re able to schedule a reset to get data flowing again. As far as I can tell from the logs, it seems like node-red either crashes or hangs.

    I’ve looked at setting up monit to monitor node-red, or some custom script/cron job, but it sounds like any linux mods outside of the config/app would get blown away by an AEP firmware upgrade.

    As is, our system will go down for 4 to 8 hours minimum due to the latency of getting a reset request through devicehq.

    Any suggestions?

    #27150
    Jason Reiss
    Keymaster

    http://www.multitech.net/developer/software/aep/creating-a-custom-application/
    You could create a custom app that performs the monitoring. It would be reinstalled after the firmware upgrade.

    #27153
    Bob Doiron
    Participant

    Thanks! That looks promising.

    #27158
    Lawrence Griffiths
    Participant

    Bob do your servers send a response code to updates from Node-Red?
    If you do then you could try & count the number of missed and use exe node to a Linux reboot. As it might be more of an connection issues.

    Also on latest version of AEP the node-red logs are not rotated I would have a look at those.

    #27160
    Bob Doiron
    Participant

    Yes, our servers send a response code. I already have a watchdog of sorts built in to my node-red flow that will reboot the box if the node-red flow is unable to deliver messages for 4 hours. It also delivers lora stats every 5 minutes as a heartbeat.

    Unfortunately, on several occasions we’ve found that the conduit continued to update devicehq, but was no longer delivering data to our api and my node-red watchdog didn’t activate. Nothing was written to the node-red logs either, so I concluded that node-red itself either hung up or crashed.

    We’re using 1.6.2 which does have log rotation for node-red. I haven’t tried 1.6.4 yet, but I hope they didn’t un-fix the node-red log rotation. We had previous issues with the node-red log getting too big.

    #27161
    Bob Doiron
    Participant

    Has anyone seen documentation for this process?

    admin@mtcdt:~# ps -eaf |grep watchdog
    admin 3437 1 0 Feb01 ? 00:00:10 watchdog –device /dev/watchdog –ppp
    admin 18138 13069 0 12:08 pts/0 00:00:00 grep watchdog

    admin@mtcdt:~# which watchdog
    /sbin/watchdog

    admin@mtcdt:~# watchdog –help
    Usage: watchdog
    –api (a) : watches and restarts api process
    –ddns (i) : watches and restarts ddns process
    –ppp (p) : watches and restarts ppp process
    –lora (l) : watches and restarts lora process
    –node-red (n) : watches and restarts node-red process
    –device (d) : path to hardware watchdog device
    –help (h) : prints this message

    #27163
    Jeff Hatch
    Keymaster

    Bob,

    The watchdog process is not documented. If you add the -node-red argument I don’t think it will provide what you need. If the node-red process is just hung and hasn’t actually disappeared this watchdog will not restart it.

    BTW, there is a simple process called angel (a link to it called node-angel is used for node-red) is used to restart the node-red process when it terminates. As you have noted, I think something else is going on and the node-red process is getting into some kind of “hung” state.

    A couple of things to look at when node-red gets in this state:

    1) How much memory is it using: “ps auxww | grep node-angel” output should be able to tell you this.
    2) Run top and see if it is consuming lots of CPU.

    Are you using SSL in node-red, and therefore in node. I have seen node use a lot of memory when doing SSL for some reason, ie. ~150MB

    Jeff

Viewing 7 posts - 1 through 7 (of 7 total)
  • You must be logged in to reply to this topic.