Segments of the same message submitting to different SMSCs

Segments of the same message submitting to different SMSCs SearchSearch
Author Message
ashot shahbazian
Frequent Contributor
Username: Animatele

Post Number: 59
Registered: 06-2004
Posted on Thursday, November 19, 2009 - 02:45 pm:   

Bryce, Des:

We've been noticing this occasionally:

Received from the customer on the 1st server and submitted to 2nd server upstream:
2009-11-12 13:53:42,SAR-+7918554xxxx-6a-2-1.req,127.0.0.1,+7918554xxxx,OK -- SMPP - 00russag:1111,SubmitUser=dixxxx;Sender=+BANK;SMSCMsgId=SAR-+7918554xxxx-6a-2-1;U DH=0500036A0201;Text="OPLATA NA SUMMU
2009-11-12 13:53:43,SAR-+7918554xxxx-6a-2-2.req,127.0.0.1,+7918554xxxx,OK -- SMPP - 00russag:1111,SubmitUser=dixxxx;Sender=+BANK;SMSCMsgId=SAR-+7918554xxxx-6a-2-2;U DH=0500036A0202;Text="ROSTOV-NA-DON

First segment delivered, second not:
2009-11-12 13:53:54,7EE846B1.req,,+BANK,OK -- LocalUser:dixxxx,SubmitUser=SMPP - 00russag#2:1111;Sender=+7918554xxxx;Text="id:SAR-+7918554xxxx-6a-2-2 sub:001 dlvrd:001 submit date:0911121353 done date:0911121353 stat:DELIVRD err:000"
2009-11-12 14:03:54,7EE86DFC.req,,+BANK,OK -- LocalUser:dixxxx,SubmitUser=SMPP - 00russag#2:1111;Sender=+7918554xxxx;Text="id:SAR-+7918554xxxx-6a-2-1 sub:0 dlvrd:0 submit date:0911121353 done date:0911121403 stat:UNDELIV err:0 Text:"

Submitted upstream to SMSC on the second server. First segment went to one SMSC (smsc1:2222)
2009-11-12 13:53:42,SAR-+7918554xxxx-6a-2-1.req,10.0.0.6,+7918554xxxx,OK -- SMPP - smsc1#2:2222,SubmitUser=00russag;Sender=BANK;SMSCMsgId=679299002;UDH=0500036A020 1;Text="OPLATA NA SUMMU

Second segment submitted off another SMSC (smsc2:3333)
2009-11-12 13:53:43,SAR-+7918554xxxx-6a-2-2.req,10.0.0.6,+7918554xxxx,OK -- SMPP - smsc2:3333,SubmitUser=00russag;Sender=BANK;SMSCMsgId=5403168604891832374;UDH=050 0036A0202;Text="ROSTOV-NA-DON

First segment delivered, 2nd failed.
2009-11-12 13:53:52,54158F11.req,,BANK,OK -- LocalUser:00russag,SubmitUser=SMPP - smsc2:3333;Sender=+7918554xxxx;Text="id:SAR-+7918554xxxx-6a-2-2 sub:001 dlvrd:001 submit date:0911121353 done date:0911121353 stat:DELIVRD err:000"
2009-11-12 14:03:52,541596AE.req,,BANK,OK -- LocalUser:00russag,SubmitUser=SMPP - smsc1#4:2222;Sender=+7918554xxxx;Text="id:SAR-+7918554xxxx-6a-2-1 sub:0 dlvrd:0 submit date:0911121353 done date:0911121403 stat:UNDELIV err:0 Text:"

We don’t have tools to track this as we presume different segments must always submit over the same link. So I can’t say how often it is happening, but we receive one such complaint in about 4-5 weeks. I think most have involved messages with Alphanumeric source addresses.

The BANK Source Address is not in the Senderadress field for either or the uplinks (if it were it would have been happening to a lot more segmented messages – unless you’ve fixed it)

There is no AllowedUser restriction for these uplinks.

The +7 is not in the RoutePrefOnly lists.

RoutePrefOnly is not set for the uplinks. However, there’s a couple of hundred prefixes on each uplink as RouteX=yyyy (prefixes are different on different uplinks) to force messages to a particular destination route over this uplink only (there are indications that this isn’t always performed accurately) while if no prefixes are set the messages would load-share across different uplinks (such is the case with Russia +7)

The closest matches in prefix lists to the number in question are:
On smsc2:
Route159=+7705*
Route160=+7777*

On smsc1:
Route114=+70*
Route115=+77*
Route116=+7701*
Route117=+7702*
Route118=+7705*
Route119=+7777*

Not sure why 7705 and 7777 are across both, but that’s Kazakhstan, not Russia.

The NowSMS version on the second (last in the chain before SMSC-s) server is 2009.07.09

Perhaps there was a patch for this already and we’ve missed it? I think someone else has been asking about a similar problem.

Kind regards,
Ashot
Bryce Norwood - NowSMS Support
Board Administrator
Username: Bryce

Post Number: 7869
Registered: 10-2002
Posted on Friday, November 20, 2009 - 08:32 pm:   

Hi Ashot,

I think this is a different issue. I do recall another issue, but it was resolved prior to 2009.07.09.

I'm suspicious regarding the timings. I think the problem is that we've already dispatched part 1 by the time we receive part 2. That presents a problem for the tracking.

I was concerned about this possibility when we encountered the earlier problem. But at the time, we didn't run any tests to try to confirm that this was a problem.

Clearly, this scenario is a problem.

We need to spend some time working on a solution.

-bn
ashot shahbazian
Frequent Contributor
Username: Animatele

Post Number: 61
Registered: 06-2004
Posted on Saturday, November 21, 2009 - 01:17 am:   

Hi Bryce,

Could this be because of the stray .lck file issue? Perhaps in some cases you are deleting .lck files earlier than needed? Or not creating it quickly enough?

This could be happening while we were having intermittent connection problems or delays (slow submit_sm_resp) to the 1st SMSC, smsc1#2:2222.

Here's what's in the SMSWEB log of the 2nd in chain server for this message:

2009-11-12 13:53:41,10.0.0.6,00russag,SAR-+7918554xxxx-6a-2-1.req,Binary
2009-11-12 13:53:42,10.0.0.6,00russag,SAR-+7918554xxxx-6a-2-2.req,Binary

But since the SMSOUT log records on receipt of the submit_sm_resp and both are 1 sec. apart from the message receipt on the server the SMSC delay could not be the reason..

Clocks on this and the 1st server are not in sync obviously.

I know this must be tough to troubleshoot, I’m not even sure how to recreate this scenario. Let me dig for more of these in the logs.

Amazing! This is never happening with any segmented messages except with those from a particular customer. All fit exactly the same pattern, here are 3 examples from SMSOUT and respective SMSWEB logs:

2009-11-21 00:04:56,SAR-+7916133xxxx-99-2-1.req,10.0.0.6,+7916133xxxx,OK -- SMPP - smsc2:3333,SubmitUser=00russag;Sender=Bank;SMSCMsgId=5406294800549939766;DCS=8;U DH=050003990201;Text=""
2009-11-21 00:04:59,SAR-+7916133xxxx-99-2-2.req,10.0.0.6,+7916133xxxx,OK -- SMPP - smsc1#10:2222,SubmitUser=00russag;Sender=Bank;SMSCMsgId=687563595;DCS=8;UDH=0500 03990202;Text=""
2009-11-21 00:04:56,10.0.0.6,00russag,SAR-+7916133xxxx-99-2-1.req,Binary
2009-11-21 00:04:58,10.0.0.6,00russag,SAR-+7916133xxxx-99-2-2.req,Binary


2009-11-21 00:16:17,SAR-+7911829xxxx-9e-2-1.req,10.0.0.6,+7911829xxxx,OK -- SMPP - smsc2:3333,SubmitUser=00russag;Sender=Bank;SMSCMsgId=5406297725422758454;DCS=8;U DH=0500039E0201;Text=""
2009-11-21 00:16:18,SAR-+7911829xxxx-9e-2-2.req,10.0.0.6,+7911829xxxx,OK -- SMPP - smsc1#5:2222,SubmitUser=00russag;Sender=Bank;SMSCMsgId=687566174;DCS=8;UDH=05000 39E0202;Text=""
2009-11-21 00:16:17,10.0.0.6,00russag,SAR-+7911829xxxx-9e-2-1.req,Binary
2009-11-21 00:16:18,10.0.0.6,00russag,SAR-+7911829xxxx-9e-2-2.req,Binary

2009-11-21 00:50:42,SAR-+7962101xxxx-a8-2-1.req,10.0.0.6,+7962101xxxx,OK -- SMPP - smsc2:3333,SubmitUser=00russag;Sender=Bank;SMSCMsgId=5406306590235377462;DCS=8;U DH=050003A80201;Text=""
2009-11-21 00:50:43,SAR-+7962101xxxx-a8-2-2.req,10.0.0.6,+7962101xxxx,OK -- SMPP - smsc1#2:2222,SubmitUser=00russag;Sender=Bank;SMSCMsgId=687571985;DCS=8;UDH=05000 3A80202;Text=""
2009-11-21 00:50:41,10.0.0.6,00russag,SAR-+7962101xxxx-a8-2-1.req,Binary
2009-11-21 00:50:42,10.0.0.6,00russag,SAR-+7962101xxxx-a8-2-2.req,Binary

1. All are segmented Unicode
2. All have a “Bank” source addr
3. Despite being in Unicode, messages contain Latin letters only
4. The customer submits (to the 1st server) the source address with a wrong TON=1, but this seems to be corrected by the 1st server sending to this one (source addr in the 2nd server’s log does not contain the “+”)

I hope I’ve narrowed it down. This is definitely not happening with other concats, while this thread seems to be simply load-sharing and ignoring the segments..

If you would need more info such as the raw PDU-s from the SMSIN logs or debugs I can do it using secure means (upload via HTTPS or FTPS) only, as these messages contain bank transaction info. Or I can send it in Skype.

Kind regards,
Ashot
Bryce Norwood - NowSMS Support
Board Administrator
Username: Bryce

Post Number: 7871
Registered: 10-2002
Posted on Saturday, November 21, 2009 - 02:09 pm:   

Hi Ashot,

I've had consistent success recreating the problem simply by throttling a submitter so that it only submits a message every few seconds.

We essentially forget about the routing of a multipart message if there are no parts remaining in our queue.

We're experimenting with an adjustment. It looks good so far, but we need to put it under some more load.

-bn
ashot shahbazian
Frequent Contributor
Username: Animatele

Post Number: 62
Registered: 06-2004
Posted on Saturday, November 21, 2009 - 09:45 pm:   

Bryce,

I'm positive that this is only happening with that "Bank" thread. I've spent hours searching (manually) through the log which had hundred times more concat messages with regular numeric (MSISDN with Source Addr TON=1) source addresses and couldn't find a single misrouted one..

I'll ask the engineers to grep the logs to confirm it's not happening with regular messages.

There was no throttling, neither artificially set on the uplinks nor by the SMSC-s upstream - which is evident from no latency between WEB and OUT records.

Both SMSC definitions are verbose via the Hosts file. Both host names are 5-letter and begin with an "a". The smsc1 definition is:
- 12 TRX sessions with a window 25 on each
- 2 RX sessions. For some reason they had both "Receive" and "Support any Outbound" checked, but that hasn't made them send anything, as "Send and Receive" was not checked. I’ve corrected it now.
- Sender Address field of every transceiver session has a few comma-separated numeric (some are beginning with +7) and alphanumeric addresses. A “BANK” is not among them, but the verbose host name is. Such as, the SMSC hostname is smsc1#8:2222, and “smsc1” is also in the list of sender addresses.
- Segmentation method is set to Default (7-bit not checked)
- Dest TON and NPI are set to 1 and 1

The smsc2 definition is very similar. The differences are:
- 1 TRX session only
- No Dest TON/NPI override
- Sender Address field also contains a few addresses, but none is matching that for the smsc1 definitions

As I mentioned, both SMSC-s have relatively long lists in “Preferred SMSC Connection for,” but “Support any outbound traffic” is also checked for both.

The “BANK” thread is specific in that:
- it is with an Alpha sender
- Unicode despite characters all Latin
- comes with a wrong Source Address TON on the 1st server, which is being corrected by I think the second server.

Note that we don’t use the Separate Outbound Message Queues setting. Also if you recall this version was made on our request so that it won’t create sub-folders in the \q folders. The [SMSGW]
section looks like:

[SMSGW]
TrackSMPPReceipts=Yes
RetryMaxAttempts=5
QDir=\xxx
LogDirectory=\yyy
WebAuth=Yes
WebMenu=Yes
WebPort=aaaaa
SMPPPort=bbbbb
SMPPPortSSL=ccccc
ReceiveSMS=No
ReceiveMMS=No
SeparateUserQueues=No

Two entries for [SMPP]

[SMPP]
DefaultDelReceipt=Yes
and
[SMPP]
[Inbound SMS Routing]
testsatuser=xxxxxxxx

As you are already working on this, can you please also check and confirm that concats won’t misroute if for different SMSC-s:

1. Same Sender Addresses are indicated for (definitely was a problem is some past release)
2. Same RouteXX= are set for
3. There’s no AllowedUser for smsc1 and there is for smsc2, but the Sender Address in the message matches that specified for smsc1 – segments all route through one uplink, according to which setting has the higher priority
4. Different segments of the same message came from different users – they should always route through the same uplink, regardless of the AllowedUser setting. Believe it or not, we commonly receive segments from different aggregators/hubs. When that happens the last to arrive get stuck.
5. Routing for concats works properly if routing by service_type is used.

In other words, proper routing of concatenated messages should be of higher priority than other routing rules.

Kind regards,
Ashot
ashot shahbazian
Frequent Contributor
Username: Animatele

Post Number: 63
Registered: 06-2004
Posted on Tuesday, December 15, 2009 - 11:46 am:   

Hi Bryce, Des

Is there an update about this issue?

Kind regards,
Ashot
Des - NowSMS Support
Board Administrator
Username: Desosms

Post Number: 1558
Registered: 08-2008
Posted on Tuesday, December 15, 2009 - 06:55 pm:   

Hi Ashot,

Sorry, I forgot to follow-up on this.

The update at http://www.nowsms.com/download/nowsmsupdate.zip includes a fix for the problem that Bryce described.

--
Des
NowSMS Support
Des - NowSMS Support
Board Administrator
Username: Desosms

Post Number: 1559
Registered: 08-2008
Posted on Tuesday, December 15, 2009 - 07:05 pm:   

Actually I just tested that ZIP. It appears to be corrupt (incomplete upload).

I just updated it.
ashot shahbazian
Frequent Contributor
Username: Animatele

Post Number: 64
Registered: 06-2004
Posted on Tuesday, December 15, 2009 - 07:40 pm:   

Thanks Des!

Patched it, will check in a couple of days if the trouble's gone.

Kind regards,
Ashot

P.S. funny you've mentioned the "incomplete file" issue. Often when I try downloading from NowSMS site or mirrors I get an imcomplete one, less than 1MB. Same just happened with this update when I clicked on the link, after you've updated it. But when I tried "save target as" I got the complete file. Not sure if this has to do with my ancient browser (IE6) or something's in fact wrong with the file.
Des - NowSMS Support
Board Administrator
Username: Desosms

Post Number: 1572
Registered: 08-2008
Posted on Thursday, December 17, 2009 - 09:43 pm:   

Hi Ashot,

Our hosting provider (Rackspace) does seem to have a strange timeout issue with downloads.

For awhile, I did have a redirect that redirected all downloads to www2.nowsms.com instead of www.nowsms.com to avoid problems with incomplete downloads. But it appears that this redirect was removed awhile ago.

Anyway ... the hosting provider is a cluster/cloud system with pretty good performance. But what we found was that the www cluster would timeout downloads quickly when the browser asked the user what they wanted to do with the download.

I tried some tests just now with Chrome (what I use most of the time), and IE8. And I didn't see any problems. But I did notice that both of these browsers keep downloading when they are prompting the user what to do with the file.

So it may be the older browsers that we were originally having this problem with.

Something else for me to keep an eye on. Right click an save target as is probably a good suggestion for now.

--
Des
NowSMS Support
ashot shahbazian
Frequent Contributor
Username: Animatele

Post Number: 70
Registered: 06-2004
Posted on Thursday, December 17, 2009 - 11:05 pm:   

You're right, LOL!

It'd time out every time I'd add a date to the name of the file before starting the download! If I don't it'd more often complete normally than not.

Kind regards,
Ashot
ashot shahbazian
Frequent Contributor
Username: Animatele

Post Number: 71
Registered: 06-2004
Posted on Thursday, December 17, 2009 - 11:27 pm:   

Checked for misrouting of segments in that "Bank" thread. Not a single case in 24 hours after an update to v.2009.12.08, despite this was a hard day: one of the SMSC-s servicing this thread kept timing out messages.

Great job, thanks much!

Kind regards,
Ashot