NowSMS CPU thrashing

NowSMS CPU thrashing SearchSearch
Author Message
Chris
Frequent Contributor
Username: Chrisc

Post Number: 62
Registered: 12-2008
Posted on Wednesday, April 25, 2012 - 10:23 am:   

Hi guys

During the course of last week we noticed that the CPU on one of our SMS servers kept on thrasing due to a relatively low number of messages kept in the queue.

The messages in question were only about 15, but they were premium messages going through a specific SMPP connection which is limited at 5 msgs/sec.

Even though we've imposed the limit ourselves to avoid NowSMS from hitting the queue too often which will end up with us receiving throttling errors, the queue still sometimes overfills and we receive Queue full errors. Our async window is set to 5, which may be the cause of these errors.

This has caused the messages to enter a retry period which we believe could've been the root cause of the CPU thrasing.

Strangely, after the messages ultimately failed or finally got submitted, the CPU carried on thrasing. Only after I stopped and started the NowSMS service did the thrashing stop.

Would you perhaps have any idea what could be causing this? Our version number is 2012.02.16.

Thanks
Chris
Des - NowSMS Support
Board Administrator
Username: Desosms

Post Number: 3930
Registered: 08-2008
Posted on Thursday, April 26, 2012 - 11:21 pm:   

Hi Chris,

Sorry for the delay in response. As I had no ideas what to suggest, I wanted to hold this for discussion in a team meeting earlier today.

It did lead to a good discussion, but I should have asked you more questions first.

When you say thrashing, can you give me a better idea of what you mean? It's hard to imagine 15 messages waiting for a particular route causing much of a problem.

Were the messages explicitly routed to that particular route, or were there routing rules in NowSMS that were pushing them to that route?

(In either case, I can't see this causing a significant amount of CPU usage.)

Does the connection in question have more than one transmitter/transceiver session?

I'm suspecting that whatever was "thrashing" was not related to these messages. But it's difficult to guess what, since a restart stopped the thrashing. Normally a restart would just pick up where it left off.

Do you have SMPP clients connecting? I'm not aware of any problems, I'm just guessing that a restart would force a client reset if there was somehow a problem with a particular client connection.

Unfortunately, all I can suggest is more observation.

--
Des
NowSMS Support
Chris
Frequent Contributor
Username: Chrisc

Post Number: 63
Registered: 12-2008
Posted on Friday, April 27, 2012 - 09:06 am:   

Hi Des,

No problem on the delay.

What we mean in thrashing is that the CPU did not drop below roughly 30% for about 19 hours. When we had a look at the running processes we could see that NowSMS was using most of the processing power.

The only significant thing we could see was the build-up (albeit small) of the messages in the queue. These messages were explicitly routed to only a single connection, with a single transmitter and single receiver.

In the past, when this type of thing happened, we suspected it was a large build up of .ERR or .IN files and the I/O rate was the cause of the thrashing. This time though, there were only about 1,300 files in the SMS-IN folder and the disk wasn't defragmented in any way.

This is what has led us to the queue and the fact that the messages were still queued.

We do have SMPP clients connecting as well, for the moment it's only 2 binds connected to us.

Hope this helps.

Regards
Chris
Chris
Frequent Contributor
Username: Chrisc

Post Number: 64
Registered: 12-2008
Posted on Tuesday, May 29, 2012 - 02:26 pm:   

Hi Des

It's been a while since we last spoke, but has there been update on this issue we've raised?

Hope to hear from you soon.

Regards
Chris
Des - NowSMS Support
Board Administrator
Username: Desosms

Post Number: 3979
Registered: 08-2008
Posted on Wednesday, May 30, 2012 - 05:16 am:   

Hi Chris,

We don't really have enough information to offer any real insight. Are you seeing reoccurrences?

That said, I just re-read your original message post, and I may not have properly understood the scenario before. Can you explain all of the throttling, retry and message speed limiting settings that you are using. (ThrottleForQFull=Yes is one of the settings?)

30% CPU is normally not something that we would see as a cause for concern. It does seem high when there is effectively only one connection transmitting.

Assuming that the problem is reoccurring, if you disable async mode, can you still get close to 5 messages per second with this connection?

We are going to investigate some scenarios with frequent throttling errors to see if we can recreate any issues. Toward that end, I'd like to be sure I understand the configuration settings you are using, as there may be some combination of settings triggering the issue.

--
Des
NowSMS Support