Smpp gateway stops time to time

Smpp gateway stops time to time SearchSearch
Author Message
Malathy
New member
Username: Malathy

Post Number: 3
Registered: 04-2007
Posted on Tuesday, May 28, 2013 - 08:05 am:   

Hi
We have configured a smpp gateway , about 11 users connect to our server and push messages to our gateway
Version 2006 was very stable
But since we downloaded latest version 2012 from your website and using it for testing , everyday smpp connections drop.
Clients call us saying our connection is cut , when we check nowsms , we see smpp clients connections show count 0 , after restating software again it works and count is 11

Sometimes if we dont touch it , it works after 2-3 hours automaticly

We couldnt find any useful log except the file: except.log
Shows time of connection drop:


Exception occurred in thread #16
Thread Description: InternalProcessSMPPConnection
Sunday, May 26, 2013 @ 4:31 PM
-------
Exception occurred in thread #19
Thread Description: InternalProcessSMPPConnection
Sunday, May 26, 2013 @ 7:00 PM
-------
Exception occurred in thread #19
Thread Description: InternalProcessSMPPConnection
Sunday, May 26, 2013 @ 10:00 PM
-------
Exception occurred in thread #20
Thread Description: InternalProcessSMPPConnection
Monday, May 27, 2013 @ 1:00 AM
-------
Exception occurred in thread #19
Thread Description: InternalProcessSMPPConnection
Monday, May 27, 2013 @ 4:00 AM
-------
Exception occurred in thread #19
Thread Description: InternalProcessSMPPConnection
Monday, May 27, 2013 @ 12:28 PM
-------
Exception occurred in thread #18
Thread Description: InternalProcessSMPPConnection
Monday, May 27, 2013 @ 11:00 PM
-------
Des - NowSMS Support
Board Administrator
Username: Desosms

Post Number: 4498
Registered: 08-2008
Posted on Tuesday, May 28, 2013 - 02:37 pm:   

Hi Malathy,

We are investigating. It is particularly odd that the many of these errors are occurring on the hour.

Our best guess is that a receipt message tracking database has somehow become corrupt. I know that NowSMS 2006 used a different format for these temporary databases (by default kept for 15 days), so perhaps we have broken some logic interfacing with tracking files that were originally generated by the 2006 version.

Please ZIP or RAR all files from the SMPPDATA directory (and subfolders) and e-mail them to nowsms@nowsms.com. Put Attention: Des in the subject line and post a follow-up here to let me know to look for the e-mail if I have not received it automatically.

I would also recommend enabling the SMSDEBUG.LOG (checkbox setting on "Serial #" page). If an error is noticed again, collect the following files: SMSDEBUG.LOG, SMSDEBUG.BAK, SMPPDEBUG.LOG and SMPPDEBUG.BAK. E-mail those files to nowsms@nowsms.com, also making sure to put Attention: Des in the subject line of the e-mail.

It would also help if you could e-mail the SMSGW.INI file so that I can better understand your configuration.

--
Des
NowSMS Support
Malathy
New member
Username: Malathy

Post Number: 4
Registered: 04-2007
Posted on Thursday, May 30, 2013 - 07:19 am:   

Hi
From the date I posted this issue till last night , nowsms was working fine and stable.
Last night exactly at 5/29/2013 10:00:01 PM same issue happened , and it was stopped till 4:21 am when we restarted it manually.
Before restart I have collected log files and sent to your email , please check.
Des - NowSMS Support
Board Administrator
Username: Desosms

Post Number: 4505
Registered: 08-2008
Posted on Thursday, May 30, 2013 - 12:00 pm:   

Hi,

When you restart manually, exactly what action are you taking?

What I see is that at 10:00pm, NowSMS detected a problem and restarted itself. The logs that you sent appear to cover the time period between 10:00pm and 4:21am. During this time, NowSMS is alive, and it is seeing connections from your SMPP clients, but it is not seeing any data from the SMPP clients.

Are you having to reboot/restart the computer? (My assumption is Yes.)

If so, then it would appear that something is going wrong at the operating system level. That is extremely odd ... the only time I can recall seeing anything remotely similar ... it was related to a problem with a virus scanner. Desktop virus scanners are not built to handle server types of activity and may leak resources.

The inability to receive data certainly suggests that something on the computer is leaking resources.

Check task manager for applications using large amounts of memory or a large number of handles. My suspicion is that something else on the computer is using excessive resources.

I will run a thorough check on the files that you sent to look for other potential causes, just in case this analysis is missing something.

--
Des
NowSMS Support
Malathy
New member
Username: Malathy

Post Number: 7
Registered: 04-2007
Posted on Thursday, May 30, 2013 - 12:30 pm:   

Hi
We are running nowsms on windows server 2003 R2
When I logged in to remote desktop , I saw nowsms is alive but smpp connections count is zero
I clicked stop button , it took about 1 minute
Then clicked start , after a while I saw nothing changed and still smpp connection is zero
Then I restarted windows , and when windows came up nowsms also worked fine.

We have only mcafee in server , is that the reason?
Why after 2-3 days this problem happens? nowsms works fine for few days , then suddenly this problem happens.

Firewall can be an issue also?
Des - NowSMS Support
Board Administrator
Username: Desosms

Post Number: 4506
Registered: 08-2008
Posted on Thursday, May 30, 2013 - 01:15 pm:   

Hi Malathy,

Yes, Mcafee is most likely the problem. It is the virus scanner that we have seen problems with before. It simply cannot keep up with the load of a server and all of the TCP connections from the clients. It leaks memory and starts blocking network activity.

Virus scanners like Mcafee were designed for typical desktop workstations, and simply not tested for higher volume server activities.

If you are using Firewall software other than the built-in Windows Firewall, it could also be causing problems.

For non-server Windows versions, Microsoft Security Essentials is a good virus scanner that doesn't get in the way, but it is not officially supported on Windows Server (I have installed and used it on our test servers).

You could go without active virus scanning, but firewall off all but the SMPP ports (and any other you might be using), and avoid doing any web browsing from the server (default in Windows Server is a locked down web browser that cannot access external sites).

--
Des
NowSMS Support
Malathy
New member
Username: Malathy

Post Number: 8
Registered: 04-2007
Posted on Friday, May 31, 2013 - 06:59 am:   

Thanks for reply
I have uninstalled mcafee. How can I make sure same problem is not happening? Can you tell me which log file should be monitored?
Des - NowSMS Support
Board Administrator
Username: Desosms

Post Number: 4508
Registered: 08-2008
Posted on Friday, May 31, 2013 - 01:04 pm:   

Hi Malathy,

It's hard to say what should be monitored.

For the most part, NowSMS thinks that everything is ok. It is just not seeing any inbound data from the clients because McAfee has made the system unstable.

That said, there are exceptions being recorded, so to confirm the problem is gone, I would delete EXCEPT.LOG and monitor to see if it returns.

In your particular scenario, it might also be helpful to write a PHP script to monitor the active SMPP client connection count.

Here's a simple PHP script that reports back the SMPP client connection count:


$xmldata = file_get_contents("http://127.0.0.1:8800/admin/xmlstatus?user=adminuser&password =adminpass");

$xml = simplexml_load_string($xmldata);

echo $xml->SMPPClientList->ActiveConnectionCount;

(adminuser and adminpass are credentials for any SMS user account with admin access enabled.)

With the previous problem, there is a good chance that NowSMS would not see the status query request because McAfee appeared to be blocking all inbound TCP/IP traffic. So you would also want to add error handling where if the HTTP request failed, this would also trigger a monitoring error.

--
Des
NowSMS Support
Malathy
New member
Username: Malathy

Post Number: 9
Registered: 04-2007
Posted on Friday, May 31, 2013 - 02:40 pm:   

Hi Des

Mcafee already uninstalled but again same problem happened after some hours.
I have sent you new files in email please check.
This time i clicked on stop then start , and connections came up without restarting windows.
Des - NowSMS Support
Board Administrator
Username: Desosms

Post Number: 4509
Registered: 08-2008
Posted on Friday, May 31, 2013 - 03:35 pm:   

Hi Malathy,

I noticed something very strange, and it seems to correlate with problems happening at the start of a new hour.

At the top of each hour, NowSMS performs a cleanup of temporary files that are no longer necessary.

On your system, it appears that NowSMS is trying to delete the same files over and over again, at the top of each hour or after a restart.

This suggests to me that there may be hard disk corruption.

Try to delete *.LCK files in NowSMS\Q\###216E6

Are you able to? I am assuming you will not be able to.

If you look in the SMSDEBUG.LOG you will see a repeated pattern of file names prefaced with "DeleteFilesOlderThan". Every hour NowSMS is trying to delete these files, but they are apparently not being deleted.

I suspect that when NowSMS tries to access these problematic files, this is occasionally triggering other unexpected problems in the Windows OS, which is leading to the critical exceptions.

My suggestion is to stop the NowSMS services. Attempt to delete these files.

Assuming they cannot be deleted, open a command prompt window and change your current directory to C:\Program Files\NowSMS (or C:\Program Files (x86)\NowSMS).

Use the following command to move the corrupt directory structure:

move Q \BadQ

This will move the corrupt directory out of the NowSMS directory structure and to the location C:\BadQ

Restart the NowSMS services and NowSMS will create a new Q directory.

Any queued messages that were in the corrupt directory can be copied and re-queued to the new directory with the following command (assuming same current directory as above):

xcopy C:\BadQ\*.req Q /s

When you can schedule downtime, run CHKDSK c: /F to try to repair the disk and see if you can then delete the problem files.

If the problem files can still not be deleted, I'd look at replacing the hard disk or moving to a new server.

If the problem files can be deleted after this, your hard disk might be ok ... the corruption may have been related to a power failure or virus scanner software crash. I'd still be cautious and look at installing a new hard disk or moving to a new server.

--
Des
NowSMS Support
Malathy
New member
Username: Malathy

Post Number: 10
Registered: 04-2007
Posted on Friday, May 31, 2013 - 07:16 pm:   

Hi
This server is very new and we purchased it just 2 weeks back , i dont think hard disk has problem.
I went inside Q folder and deleted all contents without problem while nowsms was running.
Before delete , outbound sms queue was showing 300 and now zero.

Even in nowsms 2006 we were deleting Q contents time to time because some messages remained there and not sent.
Why some messages remain in Q without sending?
Des - NowSMS Support
Board Administrator
Username: Desosms

Post Number: 4511
Registered: 08-2008
Posted on Friday, May 31, 2013 - 11:31 pm:   

It still feels like there is a system level problem occurring somewhere.

I would strongly suggest making use of Windows Task Manager, and monitoring for any applications that are showing excessive memory or handle usage.

If you have to restart the server (not just restart the NowSMS service) in order to restore full system functionality, then something else on the server is causing a problem.

That said, we are concerned about your comment regarding having to delete Q contents from time to time because of stuck messages.

So we took a closer look at your logs and configuration details. We see that you primarily have SMPP clients submitting messages which are routed to an outbound HTTP SMSC with "Send Long Messages without Segmentation" enabled in the settings.

If an SMPP client signals that it is submitting a multipart (segmented) message, but fails to submit all segments, there is an issue where this setting will cause the incomplete message to be stuck in the queue.

As these stuck messages are related to the temporary files that the server is unable to delete, they may be somehow triggering an unexpected problem.

We have created an emergency fix that automatically releases any incomplete segmented messages after 3 minutes. This might help address the problems that you are seeing.

A complete install that includes this fix is at http://www.nowsms.com/download/nowsms20130531.zip

--
Des
NowSMS Support
Malathy
New member
Username: Malathy

Post Number: 11
Registered: 04-2007
Posted on Sunday, June 02, 2013 - 08:07 am:   

Thanks
Should i uninstall old version first? Or i can install this directly over old version?
Des - NowSMS Support
Board Administrator
Username: Desosms

Post Number: 4513
Registered: 08-2008
Posted on Sunday, June 02, 2013 - 04:02 pm:   

There is no need to uninstall. (Uninstall can cause configuration settings to be deleted.)

Running the install of a newer (or older) install will update the program files while preserving configuration settings.

--
Des
NowSMS Support
Malathy
New member
Username: Malathy

Post Number: 12
Registered: 04-2007
Posted on Tuesday, June 04, 2013 - 07:33 am:   

Hi
I have installed new version over old one.
First thing i noticed is Q is clearing is so slow.
When I opened smsdebug.log I saw hundreds of these lines:
06:31:32:203 [20] ReportOverThreshold: Debug
06:31:32:203 [20] ReportOverThreshold: Debug
06:31:32:203 [22] ReportOverThreshold: Debug
06:31:32:203 [22] ReportOverThreshold: Debug
06:31:32:203 [19] ReportOverThreshold: Debug
06:31:32:203 [19] ReportOverThreshold: Debug
06:31:32:203 [23] ReportOverThreshold: Debug
06:31:32:203 [23] ReportOverThreshold: Debug
06:31:32:203 [21] ReportOverThreshold: Debug
06:31:32:203 [21] ReportOverThreshold: Debug

What does it mean?
Des - NowSMS Support
Board Administrator
Username: Desosms

Post Number: 4523
Registered: 08-2008
Posted on Tuesday, June 04, 2013 - 11:52 am:   

Hi Malathy,

That indicates that the number of messages sent in the current minute meets (or is very close to) the number of messages allowed per minute by the serial number.

The debug statement indicates that a statistic counter is being updated to track how often the system is running at license capacity. This statistic is reported on the "Serial #" page.

Count the messages sent that minute in the SMSOUT-yyyymmdd.LOG and compare that to the limit for your "Serial #".

You mentioned this is a new server. Did you recently upgrade your serial number with us? Or get your serial number reissued for a new server? If so, you should contact our sales department to get it reissued as it may be invalid.

Or if this is a trial version, trials are limited to around 30 messages per minute. (Trials with higher limits are available from our sales department.) I'm guessing it is a trial version as they typically send no more than one message per second and 30 messages per minute, so the threshold kicks in at around 30 seconds after each minute for the rest of the minute.

--
Des
NowSMS Support