Custom App fails to launch if power cycle occurs quickly.

Home Forums Conduit: AEP Model Custom App fails to launch if power cycle occurs quickly.

Viewing 23 posts - 1 through 23 (of 23 total)
  • Author
    Posts
  • #19245
    Ajay K
    Participant

    I have two conduits, one of them is a Verizon specific conduit or rather works with that carrier and the other one seems to work with ATT.

    I have a custom app that has been installed on the local flash of both the conduits and is re-prioritized to run before the lora-wan service. However when we power cycle the conduit, i.e by removing the power cord from the conduit and if I connect it back anything less than 30 seconds, on the Verizon conduit, the custom app fails to launch, even though the boot log clearly indicates that the custom app was started.

    However the same scenario in my attempts to reproduce doesn’t occur on the conduit that supports the ATT carrier.

    I am not sure what to make of it especially since the boot log indicates that that custom application was started successfully.

    How best to troubleshoot the failure of launch of a custom app, since this app is very critical for it to run before node-red is up and running and its also a nodejs based application.

    Thanks,
    Ajay

    #19257
    Jeff Hatch
    Keymaster

    Ajay,

    Does your app require internet access right away when it starts? The reason I ask is that the Verizon radios, for whatever reason, take significantly longer to start up and get a connection. We have had to put in special code on the AEP and some of our other products to work around this for starting ppp or anything else that interoperates with the radio.

    Jeff

    #19265
    Ajay K
    Participant

    Hi Jeff,

    The application doesn’t access the internet. It connects to a local port by creating a socket connection and also listens for uplink packets by subscribing to the uplink message.

    So technically nodejs should handle any failed connections, what I mean by that I have error handling for those scenarios and not sure why when its lesser than 30 seconds that app fails to come up.

    Thanks,
    Ajay

    #19276
    Jeff Hatch
    Keymaster

    Ajay,

    If there is no debug logging that is helpful either on the console or in /var/log/messages and any other logs, I would insert debug logging in strategic places in an effort to determine where the app is failing.

    Jeff

    #19281
    Ajay K
    Participant

    Hi Jeff,

    I use Winston to log to the console and I have redirected the console logs to the /var/log/apps/<app-name>.log in the start up scripts of the custom app. However when the conduit reboots, within a minute or two, I would see the log files, however when the application fails to start, I don’t see any log file even being created and that is one of the reasons I know the app failed to startup when the init.d scripts are executed for custom app. What is weird is boot logs not sure why it assumes that the application was started successfully, when it has not.

    I am not sure why time plays any role here, i.e. if the conduit is power cycled within 30 seconds the app doesn’t startup and why doesn’t it reproduce on the other conduit, except on the Verizon specific conduit and also we can reproduce it consistently.

    Thanks,
    Ajay

    #19289
    Jeff Hatch
    Keymaster

    Ajay,

    One thing you could do is turn up debugging through the Web UI so that processes like app-manager will log more information. The app-manager logging is all identified by the string “APPMANAGER” and should at least tell you if it is getting any errors. The app-manager program is what the custmapp init.d script uses to start the application.

    Jeff

    #19304
    Ajay K
    Participant

    Thanks Jeff will do that and see if I have some better luck wrt to troubleshooting this issue.

    Thanks,
    Ajay

    #19307
    Ajay K
    Participant

    I did increase the logging level to Maximum as per your suggestion and most detail seems to be in the messages log file under \var\log\messages. Looking at the failed and success scenarios in the log file doesn’t elicit any new information as to why the custom app doesn’t come up on a quick power cycle and more over the APP-MANAGER indicates success in both scenarios as per the log file.

    Should I be looking at any other log files for better information?

    Just fyi, I do use the /sbin/angel application to load the nodejs custom app I have built, could this be causing any issues? Although I have renamed it/created a soft link appropriately so that it doesn’t clash with the node-red’s use of the same application.

    Are there are any hardware differences between the ATT model and the Verizon conduit AEP models?

    Thanks,
    Ajay

    #19314
    Jeff Hatch
    Keymaster

    Ajay,

    If your application doesn’t generate any status information as described at http://www.multitech.net/developer/software/aep/creating-a-custom-application/application-status/ app-manager will not know that anything is amiss due to the fact that the angel process is still running. If your app provides a pid and any status information through the status.json file you will receive more information from app-manager.

    Jeff

    #19315
    Ajay K
    Participant

    Hi Jeff,

    I do provide the PID and status information via status.json. But I don’t think the application didn’t even run to begin with. At the least the <app-name>.log file should have been created as the o/p of the console is piped to a log file and moreover I have Winston logging as well and handle the “ExitOnError” event as well. So as far as I can see the application never runs. Mostly I am worried about the difference in behavior of the exact same application working differently on two different conduits.

    Also as an experiment, I removed the /sbin/angel from launching the nodejs app, instead launched it directly using the node application and still fails loading the app, under under 30 seconds power cycle.

    If you would rather prefer I rather open a ticket, let me know I can attach the log files for review and the application tar file so the issue can be reproduced?

    Thanks,
    Ajay

    #19328
    Jeff Hatch
    Keymaster

    Ajay,

    I’m not sure what exactly is going on in this case. Is it possible that a different run level is getting executed when you power back up in less than 30 seconds? That might explain why none of the startup for the application is getting executed.

    Jeff

    #19331
    Ajay K
    Participant

    Thanks Jeff, I never thought of that possibility. Is there a way to figure out what the run level was, when it was booted up?

    Also I have another update, I could reproduce it consistently on the ATT based AEP conduit now. However I removed the /sbin/angel dependency from my startup scripts, I just use the following

    My environment variables in my start scripts.

    
    NAME="TestLoraUplinkPktManager"
    APP_LOGDIR="/var/log/app"
    
    # Use MultiTech Provided $APP_DIR environment variable
    DAEMON="/usr/bin/node"
    DAEMON_ARGS="$APP_DIR/app.js > $APP_LOGDIR/$NAME.log 2>&1"
    
    RUN_DIR=$APP_DIR
    
    START_STOP_DAEMON="/usr/sbin/start-stop-daemon"
    PID_FILE="/var/run/$NAME.pid"
    

    My start script for my app. Let me know if you see any concerns with the start script.

    
    $START_STOP_DAEMON --start --background --pidfile "$PID_FILE" --make-pidfile --chdir "$RUN_DIR" --startas /bin/bash -- -c "exec $DAEMON $DAEMON_ARGS"
    

    Even though the application hasn’t started the app-manager believes the application has started successfully, even though there is no app running with that PID as stored in the pid file.

    Thanks,
    Ajay

    #19350
    Ajay K
    Participant

    Hey Jeff,

    What happens different when an additional –initd param is passed via the app-manager script?

    May 30 17:50:35 mtcdt user.info APPMANAGER: AppCommand::executeStartScript: executing: /var/config/app/TestLoraUplinkPktManager/Start start –initd

    as opposed to the one called from starting the app manager via the url.
    “/api/customApps/LOCAL/start”

    May 30 17:55:33 mtcdt user.info APPMANAGER: AppCommand::executeStartScript: executing: /var/config/app/TestLoraUplinkPktManager/Start start

    Thanks,
    Ajay

    #19358
    Jeff Hatch
    Keymaster

    Ajay,

    Before figuring out the run level stuff, is the pid file of the app still laying around with a pid value in it? If that is the case, maybe start-stop-daemon is checking the pid file and seeing that there is already a process running with that pid and won’t try to start the app. It’s worth checking. If that is the case, I would have to think a little bit to figure out what might be a solution.

    However, I just realized by checking your script that the pid file is in /var/run, so that probably isn’t the problem.

    Jeff

    • This reply was modified 6 years, 10 months ago by Jeff Hatch.
    #19361
    Jeff Hatch
    Keymaster

    Ajay,

    As for the -initd parameter, it is just a way to detect whether the Start script is being run by the init script or not. There are customers that are doing their own startup init scripts and they don’t want our init script to start the app with the Start script, so they modified their Start script so it does nothing if it receives that parameter.

    Jeff

    #19364
    Ajay K
    Participant

    Hi Jeff,

    Where does the start-stop-daemon typically look for the pid file?

    Thanks,
    Ajay.

    #19365
    Jeff Hatch
    Keymaster

    Ajay,

    I’m pretty sure it’s where you have specified with the –pidfile argument. That is why I don’t think that is your problem because your –pidfile points to /var/run which is not persistent through reboot.

    To figure out what run level you are currently running in:

    admin@mtcdt:/var/run/config# runlevel
    N 5

    This example the current run level is 5. For some reason who -aH doesn’t display the run level, and the busybox who doesn’t have the ‘-r’ option.

    Jeff

    #19368
    Ajay K
    Participant

    I actually saw the run level mentioned in the boot logs and it mentioned Runlevel 5, so the application should technically run.

    I also could confirm my startup scripts are being called as I have added log statements using the logger object in my Applications Start Scripts.

    Here is one thing I have noticed, when I have the /sbin/angel loading my app, the angel application loads up fine, however the node application or rather my custom app is not run. Could this be an memory issue?

    In the node-red startup in app.py i see the following lines, is this setting some kind of memory cap on the node-red app?

    
    /usr/bin/node --max-old-space-size=40 /opt/node-red/red.js
    

    if so, should I be limiting the memory usage on my custom app as well and would the same number applied to node-red suffice for my app as well?

    Thanks,
    Ajay

    #19369
    Jeff Hatch
    Keymaster

    Ajay,

    That is definitely a possibility. I didn’t think of that. Added that argument over two years ago. That argument helps make the node garbage collection a lot more aggressive if I remember correctly, and tries to keep the size down. Try it as an option to node when starting your app.

    Jeff

    #19371
    Ajay K
    Participant

    Adding that switch to manage the memory didn’t help either.

    Its almost like whenever it fails, one of the things I have noticed is that the /var/log/app folder takes a little while to be created. I am going to try and see if write my log file to a different location does it help.

    Thanks,
    Ajay

    #19405
    Ajay K
    Participant

    Hi Jeff,

    Just want to give an update, I think we figured out the root cause. One I had to cap the memory of the application as it was prudent to do so and second the main issue, was that the /var/log/app folder is not created, which is where the custom app used to log. Since the custom app’s priority was modified to be run before the lora-wan-server application and the node-red startup scripts used to be the one that creates this folder I guess. I think based on our tests it seems to be fixed and is not reproducible, after I re-routed the custom app’s logs to /var/logs instead of /var/logs/app. Also this anomaly only occurs during when you power cycle in less than 30 seconds.

    Thanks,
    Ajay

    #19406
    Jeff Hatch
    Keymaster

    Ajay,

    Glad to hear you have probably figured it out. It appears the app was not checking for existence of the /var/log/app directory before trying to open the log file. It is puzzling as to why the power cycle affected things in this way. I am still stumped on that one.

    Jeff

    #19408
    Ajay K
    Participant

    Yeah we could check for the existence of the app dir, However I guess i just think we will make do with logging under /var/logs. I am not sure why this occurs, but that was the first things I noticed, the app folder under /var/logs would take a little longer than usual to get created, whenever it failed.

    Thanks,
    Ajay.

Viewing 23 posts - 1 through 23 (of 23 total)
  • You must be logged in to reply to this topic.