MMSC services keep restarting

MMSC services keep restarting SearchSearch
Author Message
Andy Rank
New member
Username: Arank

Post Number: 1
Registered: 05-2010
Posted on Tuesday, May 25, 2010 - 03:50 pm:   

Hello,

We had an issue with our MMSC services constantly restarting yesterday (several times a minute), which stopped our MMSC from processing MMS messages. We are running NowSMS/MMS V2010.02.09, we have moved to a backup MMSC and all is operating normally now, but we would like to know what caused this issue to happen. Please let me know what log files you need to investigate this issue.

Thank you,
Andy
Des - NowSMS Support
Board Administrator
Username: Desosms

Post Number: 2158
Registered: 08-2008
Posted on Tuesday, May 25, 2010 - 04:04 pm:   

Hi Andy,

Please explain the process that you followed by moving to a backup.

Did you move configuration and data files over to the backup? Or is the backup not running with current configuration/data files?

I'm just trying to determine what would be different about the backup machine that would allow it to run without problems.

Does the original server exhibit this same behaviour now that it is no longer actively receiving messaging traffic?

I haven't seen any problems like you describe in a long time. But in the past, I have seen problems with a corrupt message queue causing problems, in particular MM4.

In particular, I'd want to see if there are any files stuck in the "MMSCIN", "MMSCOUT" and/or "VASPQ" directories.

Beyond that, I'd want to know more about your configuration, especially external links.

And we'd want to enable the MMSCDEBUG.LOG to shed some clues as to what is being processed when this restart is triggered.

If you have any files to send me (from the above mentioned directories, or an MMSCDEBUG.LOG), please send them to nowsms@nowsms.com with "Attention: Des" in the subject line.

--
Des
NowSMS Support
Andy Rank
New member
Username: Arank

Post Number: 2
Registered: 05-2010
Posted on Tuesday, May 25, 2010 - 04:12 pm:   

Hi Des,

I will see what files I can get off of the machine for you. To answer your backup question, we have our MMSC running on a VMware virtual server and I had made a clone of the MMSC on 5/21 that we moved over to, so it is an exact copy from that date. Our issue did not happen until 5/25 so our backup that was made is running without any issues. I had also opened a ticket with our GRX provider, thinking that they were having a connection issue with our MMSC but after investagtion we found the issue to be the MMSC itself.

Our original MMSC that experienced the issues is currently powered off since we have the backup running. I hope this information helps.

Thanks,
Andy
Andy Rank
New member
Username: Arank

Post Number: 3
Registered: 05-2010
Posted on Tuesday, May 25, 2010 - 07:22 pm:   

Hi Des,

I checked the folders that you mentioned, the MMSCIN & MMSCOUT folders were empty, however there was an MMS stuck in the VASPQ folder, which I have attached. I am assuming that this is the file that caused our problem? Let me know if you need anything else.

Thanks,
Andy

application/octet-streamVASPQ Files
VASPQ.rar (1.7 k)
Des - NowSMS Support
Board Administrator
Username: Desosms

Post Number: 2160
Registered: 08-2008
Posted on Tuesday, May 25, 2010 - 07:48 pm:   

Hi Andy,

I don't ever recall any situations where a file in the VASPQ directory triggered restarts. So I was skeptical that it is the cause of the problem.

However, I can confirm with certainty that it is the source of the problem.

We are investigating further.

--
Des
NowSMS Support
Des - NowSMS Support
Board Administrator
Username: Desosms

Post Number: 2161
Registered: 08-2008
Posted on Tuesday, May 25, 2010 - 09:11 pm:   

A follow-up Andy ...

We have successfully identified the problem and have a preliminary fix for the problem.

However, it's not clear how the message got as far as it did. In other words, there's a question of how this corrupt message got there in the first place.

I'm assuming that the message originated with one of your local users. If this message was submitted by a local user, it would have been rejected because it is corrupt.

This suggests that the message would have had to gotten corrupt during the routing process, and the pattern of corruption is just not consistent.

Is there any chance that your system ran out of disk space?

This particular message was created during the 18.00 hour (6pm). Any MMSC-20100524.LOG entries that you might have referring to it may be helpful.

At this point, we're considering it a freak occurrence, and testing the fix that prevents a corrupt VASPQ message from causing a crash.

However, if there is a scenario that causes this corrupt message to be generated, we need to further evaluate it.

I will post a follow-up after more testing.

--
Des
NowSMS Support
Andy Rank
New member
Username: Arank

Post Number: 4
Registered: 05-2010
Posted on Tuesday, May 25, 2010 - 09:25 pm:   

Thanks for the update. I know that our system did not run out of disk space. Yes, that message originated from one of our users trying to send over our intercarrier connection to Syniverse to another user. I have attached the log file from yesterday, hopefully that will help further. Let me know if you need anything else.

Thanks,
Andy

application/octet-stream052410 MMSC Log
MMSC-20100524.LOG (991.7 k)
Des - NowSMS Support
Board Administrator
Username: Desosms

Post Number: 2162
Registered: 08-2008
Posted on Tuesday, May 25, 2010 - 10:17 pm:   

Hi again Andy,

Thank you for the follow-up information.

We can now see that this message came in from the client corrupt. Specifically there was garbage data in the recipient field. One valid recipient followed by garbage data.

When NowSMS reformatted the message header, the particular corruption in the recipient header caused the message to be further corrupted.

We've got a theory of what happened.

This person received a corrupt MMS message that originated with a subscriber on another network (probably Verizon). The message contained corrupt SMIL (markup) because of some suspect MM4 formatting at an interconnect point. (Problem described here http://support.nowsms.com/discus/messages/485/51075.html.)

The recipient later tried to forward this message, but doing so caused unexpected behaviour in the MMS client, which caused the client to generate an MMS message with a corrupt header. (The MMS message in question does have a "FW:" tag in it, and there's no other logical reason why the message would be corrupt.)

When NowSMS parsed the header, this particular pattern of corruption caused further corruption as the message was routed.

Unfortunately, that's a pretty tall order to recreate this scenario.

We did previously implement a work-around to address the corrupt SMIL issue described in the other thread.

As a result of this current incident, we are now in the process of applying two additional fixes.

The first fix improves the algorithm that detects invalid MMS messages so that messages with corrupt headers are not accepted. (Actually, in this case, we can be a little lenient and ignore the invalid recipient data.)

The second fix is to prevent the restart problem if a corrupt message sneaks into the outbound MM4 queue.

We'll post an update in about 24 to 48 hours with these two fixes.

--
Des
NowSMS Support
Des - NowSMS Support
Board Administrator
Username: Desosms

Post Number: 2166
Registered: 08-2008
Posted on Wednesday, May 26, 2010 - 03:11 pm:   

Hi Andy,

I've posted an update that addresses these issues. It is available for download at http://www.nowsms.com/download/nowsms20100525.zip.

However, this version has not yet gone through all of our testing procedures.

At this point, I would recommend downloading it, but installing it only if problems re-occur. We should complete the testing process in another 24 to 36 hours.

--
Des
NowSMS Support
Andy Rank
New member
Username: Arank

Post Number: 5
Registered: 05-2010
Posted on Wednesday, May 26, 2010 - 03:49 pm:   

Hi Des,

Thanks for all your help with this. Just out of curiosity, if the MMS was already corrupt before our customer received it and tried to forward it, how did the other carriers MMSC allow this corrupt message to go through without any issues? I will download the update, i will wait to hear from you after you have completed testing to install unless we experience this issue again meanwhile.

Thanks again,
Andy
Des - NowSMS Support
Board Administrator
Username: Desosms

Post Number: 2168
Registered: 08-2008
Posted on Wednesday, May 26, 2010 - 05:15 pm:   

Hi Andy,

What you basically have is a trail of problems, where a very minor corruption problem has a snowball effect as the message gets converted between different formats.

At the other operator's MMSC, everything is fine. There is no problem with the message.

When the message gets converted to MM4, it gets reformatted. And it appears that some other transformations get applied as it goes through one or more MM4 interconnects.

When it gets to your system, there is a small problem. One of the transformations that happens to some messages results in a header that doesn't strictly follow the MIME rules.

This small problem results in NowSMS misinterpreting the message, at which point NowSMS corrupts the SMIL presentation in the MMS message body.

Now here's where it gets theoretical. The user in question submitted an MMS with garbage in the header. I'm guessing that the above corruption led to the user's MMS client to generate an MMS message with garbage in the header.

When NowSMS receives the message, it does a message format check. The garbage in the header technically follows the formatting rules, however when NowSMS tries to process it to route the message, it does not expect the garbage and it then generates an MMS message with a corrupt header. This leads to the crash.


Don't get me wrong. It's our problem. An MMSC needs to be handle the unexpected.

The small problem with the original message should not cause the SMIL to be corrupted.

The garbage in the MMS message header should not cause an MMS message to be generated with an invalid header format.

The MMS message with the invalid header format should not cause the MMSC to crash.

All 3 of those issues are bugs that we need to address. It's just an odd combination of factors that appear to have led to this problem.

--
Des
NowSMS Support
Andy Rank
New member
Username: Arank

Post Number: 6
Registered: 05-2010
Posted on Wednesday, May 26, 2010 - 08:19 pm:   

Hi Des,

Thank you, that was very helpful in understanding this issue. Let me know when you have finished your testing with the patch and I will install it on our MMSC.

Thanks,
Andy