Concatenated messages issue

Concatenated messages issue SearchSearch
Author Message
Alex Kaiser
Frequent Contributor
Username: Alex_k

Post Number: 164
Registered: 07-2006
Posted on Thursday, November 25, 2010 - 11:16 pm:   

Hi Des,

We found issue that NowSMS doesn't work fine when submitting concat messages. NowSMS can choose not the 1st part of the message, it can be random and moves to another message or its' part. For example
3-part messages 3A, 3B, 3C
steps:
2B, 1C
2C, 3A
1B, 3B
2A, 2B
3C
It can cause delays in displaying messages properly and adequate. SMSC can has a queue and these messages will be delivered 2-5 mins between each.
I think, it's better
to go with:
1A, 2A,
3A, 1B,
2B, 3B
1C, 2C
3C

What do you think, am i right?

Regards,
Alex K.
Des - NowSMS Support
Board Administrator
Username: Desosms

Post Number: 2675
Registered: 08-2008
Posted on Tuesday, November 30, 2010 - 05:52 pm:   

Hi Alex,

I don't understand this.

Normally NowSMS would go 1A, 1B, 1C, 2A, 2B, 2C, 3A, 3B, 3C. (Or 3A, 3B, 3C, 1A, 1B, 1C, 2A, 2B, 2C ... there may be some randomness to which message is selected first.)

This assumes that the messages are received in order from the submitting client. NowSMS does not wait for all parts of a message before releasing it for upstream submission (maybe it should).

So either there is some strangeness on the submitting side. Or there is something else that is affecting operation.

I've had a number of discussions with our engineers, and the one issue that keeps coming up is that NowSMS does rely on the operating system to return directory scan results in sorted order for the normal behaviour.

Other questions ... is your routing controlled by accounting callbacks and UseRouteQueues=Yes?

Does this behaviour happen consistently, or is it only during the long startup times that you are complaining about in another thread? (We are investigating a possibility that a startup with a large number of pre-queued messages might result in some out-of-sequence message deliveries.)

--
Des
NowSMS Support
Alex Kaiser
Frequent Contributor
Username: Alex_k

Post Number: 165
Registered: 07-2006
Posted on Wednesday, December 01, 2010 - 05:35 pm:   

Hi Des,

If NowSMS does not wait for all message parts before sending that makes a problem, or maybe SMSC returns 0x58 then NowSMS processes messages to another connection. But when error disappears NowSMS takes random message from the queue. That is what I’m guessing. Our routing logic fully controlled by "SMSCRoute=" and UseRouteQueues=Yes.

Seems that issue is always exists, especially when queue is rising and we didn't notice it before many months. But we got a complain for our customer that SMS-segments delays up to 30 mins before unite into one messages.

Regards,
Alex K.
Des - NowSMS Support
Board Administrator
Username: Desosms

Post Number: 2680
Registered: 08-2008
Posted on Wednesday, December 01, 2010 - 07:07 pm:   

Hi Alex,

I think you are correct about throttling being the possible issue.

We were coming around to the same conclusion ... that it could only be related to throttling errors, or other errors that result in a retry condition.

When a connection is not running in async mode, we recently made some changes to retry the same message several times.

However, this is not so easy to accomplish in async mode. If a throttling error occurs on the first part, subsequent parts have already been transmitted and are just waiting for acknowledgment from the other end.

So, if you're seeing throttling errors, I'd suggest seeing if a connection speed limit is the answer. If the cause of the throttling is more of an issue of upstream server load, then this will not be a solution.

I'm going to revisit the throttling retry issue with our team to see if we could resubmit the message after a throttling error in async mode. It's not possible to change the fact that subsequent parts have already been transmitted (without performance penalty at least), but it would improve keeping all segments within the same time window.

--
Des
NowSMS Support
Alex Kaiser
Frequent Contributor
Username: Alex_k

Post Number: 166
Registered: 07-2006
Posted on Wednesday, December 01, 2010 - 08:04 pm:   

Hi Des,

In our scenario, throttling errors depends on SMSC's load, so "SMSCSendLimit" parameter isn't suitable here. Maybe you ought to revise ".LCK" file format and logic - process them first in the queue instead of ".REQ". But i suppose, that are consequences of an impractical data storage implementation (queues, user balances, statistics). NowSMS is a perfect SMPP client/server application but very hard for integration and custom development.

But we talked about that so many time :-)

Regards,
Alex K.
Bryce Norwood - NowSMS Support
Board Administrator
Username: Bryce

Post Number: 7943
Registered: 10-2002
Posted on Friday, December 03, 2010 - 04:33 am:   

Hi Alex,

I don't this issue has anything to do with difficulties in integration or custom development.

It's a complex issue of trying to maximise performance and bad luck that one or more segments of a multipart message is held up by a throttling error.

These are the types of complex issues that cause us to lose sleep.

I think that our changes earlier this year to retry the same message for submission after a throttling error was a very positive change. The problem is that the complexities, and mostly performance challenges, of SMPP async mode meant that we did not implement it for SMPP async mode.

We are reviewing approaches to allow us to implement this for SMPP async mode without having to take a performance hit.

This is basically going to mean that after a throttling error occurs, we are going to prioritise resubmitting the throttled message (and allowing several retries). We won't be able to promise it will be the next message submitted, but it should be within "2 times window size" submissions, which should be more than sufficient to prevent problems.

It will take us a week or two to get this implemented.

-bn
ashot shahbazian
Frequent Contributor
Username: Animatele

Post Number: 91
Registered: 06-2004
Posted on Wednesday, December 08, 2010 - 12:37 am:   

Hello everyone,

Interesting topic indeed.

The culprit is indeed how the OS scans the filenames. If there is an outbound queue the files with SAR names get scanned last, and so they tend to accumulate and submit (resubmit) after short messages, which have filenames as sequential chronologically assigned (but not always, as we've noticed) HEX strings.

Alex is right; depending on the trouble at the SMSC(s) upsrteam this might cause nasty cyclical problems and non-delivery (some SMSC-s even delete incomplete segmented messages in a few minutes.)

I suppose the filesystem should be looking at file names as text strings. If that's the case, how about this:

- do away with SAR-DestNumber-aa-b-c format for segments
- name these files as HexNumber-aa-b-c (and message ID-s, respectively) where HexNumber is equal to that assigned to previously received short message (or the HexNumber of the 1st segment of a segmented message, if the previous one was also a segment,) plus 1.
- HexNumber-s for different parts of a long message are the same. This is to ensure that the logic of sending concatenated messages is not altered (they submit to the same uplink, for example.)

Thus, the operating system won't treat these files and respective messages differently from short ones.

Kind regards,
Ashot
Des - NowSMS Support
Board Administrator
Username: Desosms

Post Number: 2704
Registered: 08-2008
Posted on Wednesday, December 08, 2010 - 04:30 pm:   

Hi Ashot,

There is an extra hex code for tracking the message segments. We've been doing this for HTTP submissions for awhile, but it was only added for SMPP client submissions starting with the 2010.09.21 version. Previously we were using the reference ID provided by the submitting client, but as that is usually only an 8 bit reference number, it was frequently recycled.

Unfortunately, the handling of the throttling error is more complex, but I think we've got a solution (it just won't be posted for another week or so).

The problem is best illustrated with an example.

Let's say you've got a large queue of outbound pending messages.

And let's say one of these is a 3 part segmented message.

While submitting this message, the upstream SMSC returns a throttling error on part 1, but accepts part 2 or part 3. (Which part number is accepted is not necessarily important.)

NowSMS keeps processing other messages in the queue, and doesn't retry the throttled part until the next cycle.

We made some changes to the logic over the summer to automatically retry throttled messages to prevent segmented messages from getting too far separated by their related segments. However, for SMPP, these changes did not apply to async connections, because the logic is more complicated.

We are testing an update that adds similar logic for async connections. Because of the nature of async connections, the throttled part can't be the next message submitted because additional messages may already be in the transmit window. But we can add the part back into the transmit queue for a quicker retry.

We're also adding some other logic to deal with the possibility that we can't rely on the sort order. We don't believe that to the case here ... we believe it's more of an issue of how throttling errors are handled. But we are making some changes just in case.

--
Des
NowSMS Support
ashot shahbazian
Frequent Contributor
Username: Animatele

Post Number: 92
Registered: 06-2004
Posted on Thursday, December 09, 2010 - 10:55 am:   

Hi Des,

We've not been checking for new releases since May. I can see there’s a host of interesting features, just downloaded the 2010.11.04 - will test and let you know what we find out. The release we’ve been using is 2010.05.24.

There seems to be more than one aspect in handling segmented messages. While you have referred to perhaps the trickiest one in async mode, our feeling is that there are other reasons why processing concatenated messages by NowSMS is so much slower than that of short messages.

We’ve just had a case yesterday: one of the customers had submitted a batch of 4-part notifications, at 30-35 MPS. It nearly brought down two servers with NowSMS – one receiving the batch from the client and routing it to servers/SMSC-s upstream and the second NowSMS server that got most of that traffic before submitting to termination points.

While the servers have many clients and SMSC connections, the second also a few thousand Route=+xxxx prefixes they are quite powerful (8 CPU cores, 16-32GB RAM and SSD banks for queues/DB files) and would have handled a similar batch of short messages without any strain at all. In this case however, both servers’ CPU load was 95-100%, the receiving one had 8K-10K messages in its queue (despite the fact the 2nd NowSMS upstream where most of queued messages were destined to was in the same LAN) and the 2nd server has become very slow: delaying resps for the 1st one and other ESME-s downstream and barely able to send/receive messages and DLR at more than 10 MPS.

“SAR” files sinking to the bottom of the queue because of how the OS scans the filenames may not be the only problem. I could be mistaken, but for a long time we’ve been wondering if filenames longer than 8.3 put an additional strain on a Windows server – even though all our servers have 8.3 filename generation disabled.

Secondly, even if the issue with proper scheduling of “SAR” submissions was resolved in the newer releases, there is more to it:

- You should not be sending parts of the same message too fast, unless your SS7 TP upstream knows how to properly handle it (in which case you send the message reassembled and the box breaks it apart, makes one SRI query and submits each MT_FSM immediately on receipt of the MT-Deliver for the previous one.) Normal SMSC-s treat segmented messages more or less like separate MT-s. While for subscriber traffic it’s okay (segments come slowly as they are generated by slow handsets,) segments in application traffic are often milliseconds apart. When you send that to a conventional SMSC, the first segment gets delivered, while the rest typically fail on “MS Busy” error (then retransmit, also in a fast batch, the second segment delivering, etc.) Depending on the SMSC-s retry schedule, a message longer than a certain number of segments may never make it, especially if the recipient is on the move or in a dense urban area.

Hence, a per-uplink setting DelaySegmentsInConcatSMS=xxxx (milliseconds) can be quite useful. Sending the segments not faster than 2 seconds apart usually does the trick for the case above.

- It is relatively common that some of the segments never arrive from the customer, i.e., the UDH indicates a 3-part message, but only 2 parts are received. That can cause at least 2 problems: 1st is that some SMSC-s won’t submit such message before all segments are received, the 2nd , if the message is intended for a NowSMS uplink with WDP adaptation, it’d also get stuck and never submit, as NowSMS naturally expects to get them all before sending a reassembled message.

This could be remedied by a per-client and/or per-uplink setting WaitForAllSegmentsInConcatSMS=xxxx (seconds.) If not all of the segments were received within the specified period, you should reassemble the text in the received parts (can be tricky, as some smartphones send the payload in UDH,) break it up into proper segments and only then submit upstream.

Obviously, setting that threshold to more than 1 or 2 minutes won’t make sense, so another setting used in conjunction could be WaitForDelayedSegmentsInConcatSMS=xxxx : if a stray segment (segments) arrives later than specified in the 1st but before the 2nd timeout, you construct a new short (segmented) message – so that the recipient may get the missing part(s) as a separate message.

- Some SMPP/SS7 FDA gateways capable of very fast yet intelligent handling of segmented messages (those which expect them received reassembled) do the trick only for messages not longer than a certain number of bytes or segments. A Cisco, for example, would reject a reassembled message longer than 600 bytes.
This can be handled by the settings used in conjunction with WDPAdaptation=Yes: MaxReassembleMessageLengthBytes=xxxx and RoutesForVeryLongMessages=RouteName1,RouteName2,..RouteNameN. In other words, if the UDH or TLV in the segmented message indicates it’s longer than specified, the settings should prevent the message reassembly and specify alternative uplinks where to route the message as separate segments.

Regardless of features added for individual scenarios as above, queuing/scheduling segmented messages should in our opinion be handled separately from short messages.
For example, in presence of a large outbound queue (which often makes NowSMS slow down,) if segmented messages are handled and submitted upstream independently of short ones, it’d allow you to better control the sequence in which they submit, i.e., if one segment has already submitted then submitting of other segments of that message should take precedence over all other messages. If you use a separate (not below the main though) folder for queuing segmented messages and separate program threads to handle their submission then the impact of how the OS scans the filenames can also be minimized.

Kind regards,
Ashot