Frequent Restarts

Frequent Restarts SearchSearch
Author Message
Chris Henn
New member
Username: Chrishenn

Post Number: 4
Registered: 09-2017
Posted on Tuesday, October 10, 2023 - 08:03 pm:   

Our system is having frequent restarts about every 10 minutes. Below are the logs showing what we see when it restarts.

Error: Shutdown triggered by SIGABRT

SMSDEBUG.BAK error prior to shutdowns: 13:49:13:882370 [7F06283C3700] m=1620,0 TestSocketOK: getsockopt reported error
Bryce Norwood - NowSMS Support
Board Administrator
Username: Bryce

Post Number: 8499
Registered: 10-2002
Posted on Tuesday, October 10, 2023 - 09:14 pm:   

Hi Chris,

Could you please send several of these SMSDEBUG.BAK logs to nowsmstech@nowsms.com so that we can better understand the overall context in which the error is occurring. Socket errors are a frequent occurrence, and not necessarily an "error" as such.

Also, what is the Linux platform being used?

Regards,

Bryce
NowSMS Support
Bryce Norwood - NowSMS Support
Board Administrator
Username: Bryce

Post Number: 8500
Registered: 10-2002
Posted on Wednesday, October 18, 2023 - 04:20 pm:   

Follow-up.

Upon reviewing the customer log files, we observed that the "Shutdown triggered by SIGABRT" restarts seemed to be related to a routing callback where the HTTP keep-alive socket is unexpectedly closed.

We advised them to edit MMSC.INI and under the [MMSC] header, add:

RoutingKeepAlive=No

Applying this setting did resolve the operational problem.

Upon further review, we observed that the problem was more likely to happen when the callbacks are running on a nginx web server, prior to version 1.19.10.

The same problem can occur with accounting callbacks, which can be disabled by the following additional settings:

In the MMSC.INI and under the [MMSC] header, add:

AccountingKeepAlive=No

In the SMSGW.INI and under the [SMSGW] header, add:

AccountingKeepAlive=No

More detail on the problem:

We identified a bug in code that is common to both Linux and Windows versions of NowSMS. When this bug is triggered, under Linux, the OS recognizes that the server is doing something that is invalid and forces the service to restart. Under Windows, the invalid action will most likely not cause a problem, but has the potential to cause the NowSMS server to experience data corruption or become unstable.

NowSMS uses HTTP keep-alive connections when processing routing callbacks to improve performance.

The bug is triggered when an HTTP keep-alive connection is terminated not by an idle timeout, but by a non-idle timeout. Nginx has several settings related to keep-alive connections, one of which is exposing the bug.

Nginx has a default idle timeout of 60 seconds (keepalive_timeout setting). This means that if NowSMS does not send a routing callback for 60 seconds, the next time it needs to make a routing callback, it will need to create a new connection. This is normal and is handled correctly.

Nginx has a default overall limit of 1 hour (keepalive_time setting). This means that even if NowSMS is not idle with sending routing callbacks, the connection will be closed, and NowSMS will need to create a new connection. This appears to be handled correctly.

Versions of nginx prior to version 1.19.10 have an overall limit of 100 requests over a keep-alive connection. In version 1.19.10, this default was increased to 1000 requests and a setting was added to make it configurable (keepalive_requests setting). This action is triggering the bug. NowSMS gets confused when a server supports keep-alive, but later indicates that it does not.

A fix for this bug will be included in all versions of NowSMS 2023.10.10 and later, which are currently being tested.


--
Bryce Norwood
NowSMS Support
Rohan Power
New member
Username: Rohp

Post Number: 5
Registered: 01-2023
Posted on Wednesday, November 29, 2023 - 03:34 am:   

Hi Bryce,

We're starting to see more frequent SIGSEGV restarts starting to occur in our environment again (we're running v2023.11.15).

Note: I should point out that we're now seeing them more frequently because we have our email alerts configured now, whereas we didn't previously.

I've readded the "AccountingKeepAlive=No" entries to the two .ini files and will monitor.

Rohan
Bryce Norwood - NowSMS Support
Board Administrator
Username: Bryce

Post Number: 8506
Registered: 10-2002
Posted on Wednesday, November 29, 2023 - 01:55 pm:   

Hi Rohan,

This sounds like a different issue/problem. The SIGABRT issue discussed here was related to an error handling an unexpected issue processing accounting/routing callbacks. That has been fixed. This is a different error code, so I suspect a different problem.

Could you please collect and send any SMSDEBUG.BAK log files after an unexpected restart to nowsmstech@nowsms.com so that we can better understand the overall context in which the error is occurring.

Thanks,

Bryce Norwood
NowSMS Support

Add Your Message Here, or click here to start a new topic.
Post:
Bold text Italics Underline Create a hyperlink Insert a clipart image
Options: Automatically activate URLs in message
Action: