Files stuck in SMS-IN folder

Files stuck in SMS-IN folder SearchSearch
Author Message
Nikos Mavrakis
New member
Username: Nmavra

Post Number: 21
Registered: 05-2007
Posted on Wednesday, July 29, 2009 - 06:14 pm:   

Hello, we seem to have a problem with .rec and .in files being stuck in SMS-IN folder. We have the latest patched version amd we notice that once the files get stuck, the gateway does not seem to process them. However, it processes all new files just fine.

What we did after the messages got stuck was to remove them from the folder, and gradually moved them back in. None of these "old" messages got processed.

Any ideas why this is happening?
CPU usage on the server running NowSMS is fine, as is the CPU usage of the database server.
Des - NowSMS Support
Board Administrator
Username: Desosms

Post Number: 1091
Registered: 08-2008
Posted on Wednesday, July 29, 2009 - 09:48 pm:   

Hi Nikos,

This really puzzles me.

Are these files located in a particular subdirectory of SMS-IN, or just in the root?

Is there anything unusual about these messages if you look in the file? For example, the only situation I can see where NowSMS would skip processing a message is if the sender (Sender=) and recipient (PhoneNumber=) fields are both missing or blank.

Aside from that, I can't see what would be wrong.

If there is no match for a 2-way command to execute, then the .REC or .IN file gets renamed with a .SMS extension.

Then again ... maybe I'm missing something obvious. Maybe NowSMS is processing the message, but an unexpected error is occurring in your script .... so NowSMS is trying over and over again.

I'd suggest enabling the SMSDEBUG.LOG.

Note the filenames of any of the stray .IN or .REC files.

E-mail the SMSDEBUG.LOG and the stray .IN or .REC files in a ZIP or RAR file to nowsms@nowsms.com with "Attention: Des" in the subject line of the message.

--
Des
NowSMS Support
Nikos Mavrakis
New member
Username: Nmavra

Post Number: 22
Registered: 05-2007
Posted on Thursday, July 30, 2009 - 08:55 am:   

Hello Des,
I've sent you an email with the debug log including some sample rec and in files.

We haven't seen the creation of any subfolder in the SMS-IN directory, all files are in the root directory!
Des - NowSMS Support
Board Administrator
Username: Desosms

Post Number: 1096
Registered: 08-2008
Posted on Thursday, July 30, 2009 - 05:49 pm:   

Hi Nikos,

This had me very confused for some time.

But I think there is a possibility that everything is ok.

All of the .IN and .REC files included in the RAR file that you sent me have been processed.

I think what is happening is that because you have allocated so many threads to the 2-way command processing, there are some contention problems deleting these files after they have been processed.

NowSMS knows that they have already been processed, but the scanning is so aggressive, it is hard to find an open window to allow the files to be deleted.

Over time, it is likely that they will get deleted.

I think if you lower that 2WaySMSThreadCount setting, this problem will go away.

I'd expect you to be OK with a setting of 2WaySMSThreadCount=5.

What does puzzle me a little bit is why you are still able to see content in these files.

When NowSMS finishes processing one of these message files, it truncates the file size to 0 before it deletes the file. The 0 byte file size is how NowSMS knows not to re-process the file.

Are you accessing these files on the server via a network share? I'm wondering if there is some difference in caching accessing the files over the network vs. accessing them directly.

In any event, these files that you are still seeing have been processed. That is why they are being skipped. That is good, because things are being handled correctly.

What I can't explain fully is why it is taking so long to delete them. However, I can see that a large 2WaySMSThreadCount value may cause contention that keeps them from being deleted.

Lower the 2WaySMSThreadCount value to 5, and monitor it from there.

--
Des
NowSMS Support
Nikos Mavrakis
New member
Username: Nmavra

Post Number: 23
Registered: 05-2007
Posted on Friday, July 31, 2009 - 08:58 am:   

Hello Des, thanks for your attention, I changed the value to 5 and see what happens. I had it at 5 while we started seeing this same problem a few weeks ago, so we experimented with different values. Our 2WaySMSThreadCount=100 setting showed that those files where being processed fast, so we left it as is.

I'm accessing the files using Remote Desktop connection since the server is in a different building, I haven't tried opening them from the physical server.
Des - NowSMS Support
Board Administrator
Username: Desosms

Post Number: 1108
Registered: 08-2008
Posted on Friday, July 31, 2009 - 08:25 pm:   

I would expect "Remote Desktop" to show the same view as at the physical server ... but apparently it is not.

The good news is that the message files in question were processed ... and the fact that they are being skipped is because they are marked for deletion.

Keep an eye on it, particularly looking for messages that stay in the directory for an extended period of time.

We've added some critical section logic for an update which will allow the files to be deleted immediately instead of deferring deletion. (This logic will only work for single server installations, which is why we didn't include it in the first place.)

However, I'm hesitant to send out this update, because I'm not convinced that you're having an operational problem. Let's keep an eye on it a little longer.

--
Des
NowSMS Support
Nikos Mavrakis
New member
Username: Nmavra

Post Number: 24
Registered: 05-2007
Posted on Monday, August 10, 2009 - 12:04 pm:   

Des,

one thing that I have just noticed is that all the incoming SMS files in the SMS-IN folder, have the extension .in, while the delivery receipts have .IN and .rec.

Since you have mentioned that the .IN files are incoming SMS that are subject to our license limit, is it possible that this behaviour affects our incoming traffic?

regards
Nikos
Des - NowSMS Support
Board Administrator
Username: Desosms

Post Number: 1139
Registered: 08-2008
Posted on Monday, August 10, 2009 - 01:50 pm:   

Yes. The .IN processing is throttled.

-- Des

> [Delete this line and type your message here]
Nikos Mavrakis
New member
Username: Nmavra

Post Number: 25
Registered: 05-2007
Posted on Monday, August 10, 2009 - 02:18 pm:   

Des this seems to be the cause of our problem for many days.. A large number of delivery receipts appear as IN files in the folder.

The thing is, why do delivery reports appear as .IN files? We have recently increased our SMS traffic (to the extend that we are still covered by our 120SMS/min licence) but we did not know that those files are throttled as well, because since they are receipts, they shouldn't count.

This is a very urgent issue for us and is linked to our problems in the last 3 weeks.
How can we overcome this?
Nikos Mavrakis
New member
Username: Nmavra

Post Number: 26
Registered: 05-2007
Posted on Thursday, August 13, 2009 - 03:28 pm:   

Hello again, this is still an issue for us and as per thread title this behavior of stuck incoming messages (.IN, .rec and .in) is still occuring.

We are forced to manually stop NowSMS service many times in a 24h period and remove the files, and manually put .in files in the SMS-IN folder in order for our normal SMS traffic to be processed. All .in files are normal incoming SMS, and all .rec and .IN are delivery receipts.

Although you said that the files have in fact been processed, this is not always the case. In our example, when the SMS-IN folder is full, we send an SMS from our mobile and we never get a reply. This means that the SMS we sent is put in the SMS-IN folder and does not get processed. If we stop NowSMS service and manually remove all files from this folder and start the service again and then send a new SMS from our mobile, the reply is almost instant.

Also, we went back to the 2009.07.09 version of NowSMS because we have reason to believe that this behavior is happening even more often with this version.
Nikos Mavrakis
New member
Username: Nmavra

Post Number: 27
Registered: 05-2007
Posted on Thursday, August 13, 2009 - 05:02 pm:   

I'm sorry, in the previous message I meant to say that we have reason to believe that this strange behavior with files stuck in SMS-IN folder is happening more often with the recent patch, therefore we went back to the previous one.
Des - NowSMS Support
Board Administrator
Username: Desosms

Post Number: 1140
Registered: 08-2008
Posted on Thursday, August 13, 2009 - 09:10 pm:   

Nikos,

I've done some further checking and I was incorrect. For licenses of 60mpm (1mps) or higher, there currently is no throttling of the 2-way command processing.

So I don't think throttling is an issue in your configuration. I also checked the previous log file that you sent me, and there was no evidence of any throttling.

To ultimately resolve this problem, I think we need to see a debug log when a problem is actively occurring.

The time period that the previously sent log covered showed no evidence of a problem ... all of the supposedly not processed messages were processed.

Before we do that, however ...

First ... have I previously asked you whether or not a virus scanner is actively running on this system? If so, disable it to see if the problem goes away. (Desktop virus scanners are not always built to handle the rapid file creation/deletion, and could be causing the problem.)

Second ... I mentioned earlier in this thread how NowSMS truncates an inbound file to 0 bytes before deleting it (to mark it as processed), and delays the actual deleting of the file. While we're not convinced that it is causing an operational problem, this is causing considerable confusion in debugging the problem that you are experiencing.

We've updated NowSMS to quickly delete the messages as they are processed by the 2-way command processor. And we've added some additional information to the debug log to help troubleshoot should the problem remain.

I've posted the update to http://www.nowsms.com/download/nowsmsupdate.zip. It identifies itself as version 2009.08.13.

Let's see first if the quicker delete of the processed files resolves the problem. If it doesn't, then enable the SMSDEBUG.LOG, and we'll have to take a closer look at it.

--
Des
NowSMS Support
Nikos Mavrakis
New member
Username: Nmavra

Post Number: 28
Registered: 05-2007
Posted on Saturday, August 15, 2009 - 10:06 am:   

Des,
I sent you the debug logs as you requested, we still haven't been able to solve the issue.

For around 14 hours everything seemed to run fine, but then the issue appeared again.

As far as the antivirus is concerned, I have disabled it a long time ago but we had also excluded the NowSMS folder from scanning.

Hopefully you will be able to find something in the debug log that explains the issue.
Nikos Mavrakis
New member
Username: Nmavra

Post Number: 29
Registered: 05-2007
Posted on Saturday, August 15, 2009 - 04:10 pm:   

Des,
I sent you the debug logs as you requested, we still haven't been able to solve the issue.

For around 14 hours everything seemed to run fine, but then the issue appeared again.

As far as the antivirus is concerned, I have disabled it a long time ago but we had also excluded the NowSMS folder from scanning.

Hopefully you will be able to find something in the debug log that explains the issue.
Des - NowSMS Support
Board Administrator
Username: Desosms

Post Number: 1145
Registered: 08-2008
Posted on Saturday, August 15, 2009 - 10:50 pm:   

Hi Nikos,

I've been studying your debug log today, and I've been simultaneously running heavy stress tests against the 2-way command processor.

I was very confused by some of what I saw in SMSDEBUG2.LOG. In particular, toward the end of the log, the log entries that occurred while you were shutting down the system were very interesting.

I was confused at first, but then I realised what was happening.

Basically, it is taking a long time for your ASP script to return a response.

Sometimes it is very fast.

But most of the time it takes 8 to 15 seconds to complete.

If you allocate more threads to 2-way command processing, the response seems to get even slower.

This is what leads to the backlog. For each 2-way command processing thread in NowSMS, it waits for the response from the 2-way command before moving on to the next message.

I don't see anything wrong on the NowSMS side. I setup a test system with a 2-way command ASP scrip that did nothing. With keep-alive sockets disabled, and without any optimisation of the configuration settings, NowSMS was processing 150 messages per second via the 2-way command facility.

I think the problem on your system is the speed at which the 2-way command is processing, and the only way to resolve the problem is to speed up the processing of the ASP script that is processing the receipts.

What I don't understand is why restarting NowSMS would make a difference ... unless the ASP script is also getting overwhelmed by NowSMS sending it so many requests simultaneously. You might get better results configuring 2WaySMSThreadCount=1. (If this parameter is not present, NowSMS defaults to allocating 10 threads for 2-way command processing.)

That said ... I did notice one other thing that may be helpful.

As I understand the problem that you are facing ... the major inconvenience is that the processing of normal messages (not receipts) is being severely delayed.

Earlier you noted this:


quote:

one thing that I have just noticed is that all the incoming SMS files in the SMS-IN folder, have the extension .in, while the delivery receipts have .IN and .rec.




Studying your log, I realised that the receipts that have the ".IN" extension are non-delivery receipts that NowSMS generates itself when an SMSC connection rejects a message.

We fixed that so that these receipts have a ".REC" extension instead.

This should have a positive impact on your system. ".REC" files will still be delayed because the ASP script is taking too long to process them.

However, the ".IN" files are processed as a separate queue, so the fact that there is a backlog of ".REC" files should not impact the processing of the ".IN" files.

I've updated http://www.nowsms.com/download/nowsmsupdate.zip with version 2009.08.15 which makes this change.

I still expect you to see a backlog of ".REC" message processing, but the ".IN" files should not be backlogged.

--
Des
NowSMS Support
Nikos Mavrakis
New member
Username: Nmavra

Post Number: 30
Registered: 05-2007
Posted on Monday, August 24, 2009 - 11:29 am:   

hello Des,
its been a while since I got back to you because after applying the new patch I wanted to let it run for a week before drawing conclusions.

The situation now is as follows, the .in files are processed just fine, and the rec are stuck in the SMS-IN folder, within the subfolders that were introduced in the patch you sent us.

However, delivery reports play a major role in our business so we need them.
You said the script takes too long to process them, but our script seems to be doing a very simple job so I don't know how to make it run faster.

What comes to my mind is the fact that our MSSQL server is under very heavy load at times so perhaps this is one reason for the script to delay. Perhaps the script tries to update the database table containing the delivery reports, but because the db is too busy it fails. Do you think this is the case? I'm puzzled though because if this was the case, the script processing the .in messages would be stuck as well (as was the case prior to the patch).

Also, this happens even in times when the db server is not stressed at all (cpu utilization around 5-7%).

One other thing is that if I reset our IIS, the processing will start again for a very short period of time, perhaps 5 or 10 minutes.

The last 7-8 days we have not seen any .in files get stuck, but most of our delivery reports are not processed. The two scripts run on the same server, and connect to the same database which resides on a different (dedicated) server obviously.

We have also tried to host the 2way scripts on different server, but the result is the same.

Could you think of a way to speed up the processing script?
Nikos Mavrakis
New member
Username: Nmavra

Post Number: 31
Registered: 05-2007
Posted on Tuesday, August 25, 2009 - 11:03 am:   

One more thing I would like to ask you is if it is possible to ignore some .rec messages that contain a specific text.

For example we receive many .rec messages that contain the text "enroute", which are useless to us. These do not need to be processed, so we would benefit from the decreased processing load.
Des - NowSMS Support
Board Administrator
Username: Desosms

Post Number: 1161
Registered: 08-2008
Posted on Tuesday, August 25, 2009 - 02:04 pm:   

Hi Nikos,

It is likely database related.

However, it may not be the database server itself that is overloaded ... it may just be the number of transactions against this particular database that is too much.

If you're updating the database, the database server is going to lock the database while it performs the update. So the more threads that are sending requests with database updates, the more threads that are going to be waiting.

In your case, allocating more threads to 2-way processing in NowSMS won't make much difference, because they seem to be spending most of their time waiting for the script to finish. The scripts are likely waiting for other updates to finish and the database locks to be released.

The only reason that allocating more threads helped before is because the ".REC" files are what are taking a long time for the scripts to process. Some of the ".REC" files appearing as ".IN" files was causing the ".IN" processing to block.

I don't have any experience optimising MSSQL, so I don't know what to suggest.

I would suggest that you modify your script to not update the database for "ENROUTE" receipts. That should be a simple script change, and I think you'll see some performance benefit from that.

Then you probably need to look at the database table that you are using. Visit some web sites that talk about MSSQL optimisations.

Some general tips that I've seen from working with other SQL based databases. Make sure you have indexes on the right fields, so that lookups are fast (but avoid using too many indexes as this will slow down inserts and updates).

How big is your database currently?

Can you post the SQL part of your script that does the update?

This might be a good place to start for optimisation info:

http://msdn.microsoft.com/en-us/library/ms998577.aspx

--
Des
NowSMS Support
Nikos Mavrakis
New member
Username: Nmavra

Post Number: 32
Registered: 05-2007
Posted on Wednesday, August 26, 2009 - 09:47 am:   

Hello Des, thanks a lot for your help so far.
We are constantly looking for DB optimisation and have fine tuned a lot of our queries. The difficult thing to troubleshoot is why some times the system gets stuck even when the DB server is at low load.

Anyhow, our script for the delivery receipts that contain the ENROUTE part is already ignoring those files (does not pass any info to the database). What I mean was if there was somehow a way to completely ignore them, say for example if we had another script that was only processing the Enroute delivery reports (and obviously just deleting them) so that the "useful" delivery reports would be processed from our main script, with less chances of getting stuck. Or perhaps if the gateway could have an option of deleting the enroute DRs in the same way that it deletes the ERR files automatically.

Finally, the change you made for the IN files in the previous patch helped us substancially and we very rarely see any .in files stuck.
Des - NowSMS Support
Board Administrator
Username: Desosms

Post Number: 1169
Registered: 08-2008
Posted on Wednesday, August 26, 2009 - 02:45 pm:   

Nikos,

I went back and looked at the log files that you sent me previously.

The good news is that the ENROUTE messages are being handled efficiently. On average, it takes around 30 milliseconds for NowSMS to process the message, including the time to connect to your script. A single 2-way thread could process around 30 of these per second.

There may be a lot of them in the queue waiting to be processed because the others get backlogged. But the good news is that when they get their turn, they'll clear out of the queue pretty quickly.

I'm not convinced that an option to automatically delete the ENROUTE statuses would offer any significant performance benefit, as from what I can see they get processed very quickly when their turn comes around.

It's the other receipts that take 8 to 15 seconds to process (sometimes much longer) that are problematic. The only way your system can ever catch up is if there are prolonged periods where you don't receive any receipts.

There's got to be something fundamentally wrong with the database structure. When you update the message status in your database, it shouldn't be updating any indexed fields, so it should be relatively quick.

The only thing I can figure is that the update is updating fields it doesn't need to ... or it is getting locked out by inserts that are happening at the same time.

It seems to me that someone in another thread was recently discussing their strategy for keeping up with delivery receipts because they were having problems.

If I recall, their 2-way command was writing receipt information to a temporary database (or it may have been a simple log file). They had a separate program running that scanned this temporary database and did bulk updates to the database.

Typically you can make 10 or more updates as part of a bulk update (i.e., BEGIN TRANSACTION ... update, update, update ... END TRANSACTION) much faster than you can make 10 individual updates in separate transactions.

This technique allows them to keep up.

--
Des
NowSMS Support
Nikos Mavrakis
New member
Username: Nmavra

Post Number: 33
Registered: 05-2007
Posted on Monday, August 31, 2009 - 08:54 am:   

Hello Des,
We are experimenting with the option of handling delivery reports using a different server and the processing is happening very fast, thanks for the suggestion!

Our problem of course remains that the database server is under heavy load during peak times and so it still takes time to process all reports. Our systems are quite complex and one delivery report has to update many tables (all of which are needed) so its not very possible to fasten things up.

The important thing is that we finally found what causes the delay and have already ordered a new database server.

Thank you for your suggestions and the latest patch, I will be in touch if I have something more to add!