Ack to slow to response

Ack to slow to response SearchSearch
Author Message
Constantinos Psilogenis
New member
Username: Constgp

Post Number: 19
Registered: 03-2010
Posted on Wednesday, September 07, 2016 - 10:58 am:   

Hello,

We are phasing the following problem:

Very often the response to smtp client and http clients is taking too long to answer back. At the time of the delay if we go from a browser is taking also long to login. Also at the time of the delay the nowsms console is not responding.

Any ideas?

Best Regards,
Des - NowSMS Support
Board Administrator
Username: Desosms

Post Number: 5716
Registered: 08-2008
Posted on Wednesday, September 07, 2016 - 11:49 am:   

Hi,

Most likely it is an issue with accounting callbacks taking too long to respond. If you are not using accounting callbacks, ignore this next bit.

Unfortunately, it can be very difficult to see how long they are taking, other than enabling the SMSDEBUG.LOG and manually filtering. Here is an example from another thread::


16:42:17:102 [6] RetrieveURL: GET /..../..../ HTTP/1.1
User-Agent: Now SMS/MMS Gateway v2013.09.26
Accept: */*
Host: 127.0.0.1


16:42:17:556 [6] HttpResponseWait: Ok

To troubleshoot callback response times, look for "RetrieveURL: GET" ... then look for "HttpResponseWait:" preceded by the same number.

In this example, the accounting callback took 454 milliseconds which will limit processing speed, but not would not be as significant to cause the type of problems you are seeing unless you are sending to large distribution lists.

The other possibility is a server overload, where there is more activity than the server can handle...either CPU overload or disk overload. Use Windows task manager to look at overall CPU load.

One issue that we have seen a number of times is active virus scanners not being able to keep up with high activity, as they weren't designed to handle server activity.

--
Des
NowSMS Support
Constantinos Psilogenis
New member
Username: Constgp

Post Number: 20
Registered: 03-2010
Posted on Thursday, September 08, 2016 - 11:15 am:   

Hello,

We don't have accounting callbacks.
Our systems are not overloaded for CPU, memory and disk.
We stop the antivirus.

But we still have strange and unresponsive behaviour from the system.
Please note that we have a high availability configuration with two systems sharing the same configuration.

SMS1\sharedvolume.ini
[SharedVolume]
SharedVolume=\smsvip\smsconfig
MessageIDPrefix=SMS1

SMS2\sharedvolume.ini
[SharedVolume]
SharedVolume=\smsvip\smsconfig
MessageIDPrefix=SMS2

we observe the following:

1. when there is only one active server (SMS2) in the HA environment the system sends 60msg/min as it is our license per server. BUT when receiving huge number of message requests(35000) the ack to the senders is dead slow (45 secs to receive an ack),and the console freezes at the same time.

2. when we start the second server in the HA environment (SMS1 + SMS2) both active. The system is completely unresponsive has the same behavior as described in point 1 as long as there are messages in the queue. Also the performance of both servers is dramaticaly degraded to less than 60msg/min which is not normal since we have a licence of 60msg/min for each server and we expect to send 120msg/min.

3. We note that in the shared folder there is a huge number of small files and directories created and is increasing dramatiacally. Is it normal?

Attached are the SMSout logs showing what we describe above (1,2).

application/x-zip-compressed
HAperformance.zip (16.6 k)
Des - NowSMS Support
Board Administrator
Username: Desosms

Post Number: 5718
Registered: 08-2008
Posted on Thursday, September 08, 2016 - 10:01 pm:   

Hi,

I should have asked what version of NowSMS are you running?

Earlier this year, we did find a HA scenario where on some networks there was a mysterious 2 to 3 second sporadic delay in some responses. The 2016.03.28 version fixed this (http://www.nowsms.com/download-free-trial). It removed an unnecessary network query that seemed to cause problems in some environments.

But you're talking about 30-45 seconds, and unresponsiveness, which sounds like a more fundamental issue. I suppose if you are running a version prior to 2016.03.28, it is still worth seeing iff this update makes a difference.

What is the network storage?

It is normal for many small files/directories to be created and deleted. These are normally very quick operations, but it sounds like they are not in your network environment.

--
Des
NowSMS Support
Constantinos Psilogenis
New member
Username: Constgp

Post Number: 22
Registered: 03-2010
Posted on Friday, September 09, 2016 - 06:00 am:   

Hi,

The version we have installed is 2016.03.28.
The network storage is a windows share directory in one of the two serverrs we have.
Our environment is like this.

We have two servers SMS1, SMS2.
These two servers are hosted in a VMware environment.
A load balancing software(safekit) is running on top of these machines.

Please check this asap is causing us very serious problems in our production environment.

BR

Constantinos
Des - NowSMS Support
Board Administrator
Username: Desosms

Post Number: 5719
Registered: 08-2008
Posted on Friday, September 09, 2016 - 03:09 pm:   

Hi,

I'm not familiar with Safekit, but are you using it as a load balancer only, or is it providing replication/clustering functionality.

What you describe sounds like replication getting overwhelmed and causing delays.

What if you remove Safekit from the equation? Do you still see unresponsiveness when you bring the second server up?

--
Des
NowSMS Support
Constantinos Psilogenis
New member
Username: Constgp

Post Number: 23
Registered: 03-2010
Posted on Friday, September 09, 2016 - 04:57 pm:   

Hi,

Safekit can do both replication and the load balancing. We stopped the replication as a very first step. The findings we described in previous steps is without the replication. So the problem is present.



BR

Constantinos
Des - NowSMS Support
Board Administrator
Username: Desosms

Post Number: 5720
Registered: 08-2008
Posted on Friday, September 09, 2016 - 05:46 pm:   

Hi,

OK, as a troubleshooting step, I still think it is a good idea to remove Safekit from the configuration.

Start with 1 active server.

I think you have already done this, when you said:


quote:

1. when there is only one active server (SMS2) in the HA environment the system sends 60msg/min as it is our license per server. BUT when receiving huge number of message requests(35000) the ack to the senders is dead slow (45 secs to receive an ack),and the console freezes at the same time.




Can you clarify the mechanism (protocol) in use, and how these requests are submitted. SMPP? HTTP? Number of concurrent submissions? If HTTP, how many recipients per submit request?

Also please clarify what you mean by ACK. I'm assuming that you are referring to the HTTP or SMPP response from NowSMS back to the client, but I might not understand correctly.


quote:

The network storage is a windows share directory in one of the two serverrs we have.




What is this server? A Windows server or Samba?

I am concerned that there is some performance or sharing tuning required at the server. 30-45 seconds with only a single server running suggests major performance issues being encountered creating/accessing files in this shared volume.

A colleague says he remembers a customer that was having similar problems with unexplainable delays, using a Samba based share. They resolved the issue without any changes or fixes from us, but did not provide any details. I have minimal experience with Samba, but I see interesting info when I Google "samba small files performance"....oplocks and case sensitivity seem to be settings that have performance implications.

--
Des
NowSMS Support