Exceptions handling and start-up

Exceptions handling and start-up SearchSearch
Author Message
Alex Kaiser
Frequent Contributor
Username: Alex_k

Post Number: 200
Registered: 07-2006
Posted on Saturday, May 14, 2011 - 11:01 am:   

Hi Des,

We faced some problem that if NowSMS raised the exception – the whole service restarting. I believe that current logic isn’t good, exception in 1 thread stopping possible could make a lot of problems. Exception is part of a program (of course, if it’ not fatal), current NowSMS’s vision that if you an exception you have to enable debug log and try recreate the exception and send it to you. In real life some exception couldn’t be recreated or depends on system features (hardware, OS, connections and etc.) - fixing each problem very consuming issue. Many files in Q or IN folders or big DR tracking database lead to slow startup of NowSMS - it could last up to several minutes but that isn’t acceptable.

My suggestions:
- Any exception couldn’t lead to restarts. I understand that isn’t easy implement and how to under is it a fatal error or not – maybe some self diagnostic system can handle it and restarts NowSMS only if doesn’t respond in timely fashion. Also you should add some debugging information into exception.log even without enabling debug log, simple texts like “exception in thread#XYZ” isn’t pretty useful.
- You should revise start-up logic and avoid any pauses in that process, it can’t take more than 2 minutes. Scanning folders of files possible could be done in delayed process.

Regards,
Alex K.
Des - NowSMS Support
Board Administrator
Username: Desosms

Post Number: 3171
Registered: 08-2008
Posted on Saturday, May 14, 2011 - 04:35 pm:   

Hi Alex,

There has been an increase in these exceptions due to a bug introduced just prior to the 2011.03.21 release. We have been working hard to implement a solution where configuration changes do not require a service restart. However, one of the changes in preparation for this support had some major problems.

If you have seen an increase in these problems, it is most likely because of this. We are currently planning for the 2011.05.09 release to supersede these problematic versions.

Unfortunately the outbound queue scan and dr database checks are necessary for proper startup. The SMS-IN scan could and should be delayed, as this has been a common culprit in startup problems. We have made some changes in 2011.05.09 to prevent having a single directory level with a large number of SMS-IN files. However, I would caution that if you do have a large SMS-IN queue, you should implement a 2-way command to process these messages so that they do not build up. If you have no 2-way commands, these are messages that NowSMS cannot figure out how to route.

The 2011.05.09 release is at http://www.nowsms.com/download/20110509.zip.

--
Des
NowSMS Support
Alex Kaiser
Frequent Contributor
Username: Alex_k

Post Number: 201
Registered: 07-2006
Posted on Monday, May 16, 2011 - 06:34 pm:   

Hi Des,

Thanks fot your response!

We’re using 2011.04.11 version, does it have a “restart” bug?

Generally exceptions aren’t a big evil but long start-ups – definitely yes. We’ve already implemented “SMSIN” scripts and it works as you sad, many thanks. But 2 more influencing factors still exists:
- Large outbound message queue. We made an experiment: if outbound queue quite big (thousands of messages) we change “QDir” parameter to another empty folder, then NowSMS starts in seconds, then we just move “.req” files from old queue to a new one. Everything working perfect – no exceptions.
- Large DR tracking databases. If you delete them NowSMS'll start in seconds.
Both of them drastically increase startup time.

If you’re able to fix those issue – there’s in no need to solve bug a problems. Once bug occurred – NowSMS restarts in seconds.

Regards,
Alex K.
Des - NowSMS Support
Board Administrator
Username: Desosms

Post Number: 3177
Registered: 08-2008
Posted on Tuesday, May 17, 2011 - 02:25 pm:   

Hi Alex,

Yes, that version is affected. In particular, any change made to SMSGW.INI during operation is almost guaranteed to trigger a problem.

I don't think there is much we can do about the queue, but possibly the DR integrity check could be sped up, or only a more limited check at startup. I will run that idea past my colleagues.

--
Des
NowSMS Support
Alex Kaiser
Frequent Contributor
Username: Alex_k

Post Number: 202
Registered: 07-2006
Posted on Tuesday, May 17, 2011 - 04:21 pm:   

Hi Des,

What can I say, long start-ups = serious vulnerability. Unpredictable start-ups making unpredictable level of service and you can’t rely on it. I’m an advanced NowSMS user (I know some tricks ) and recommend you revise start-up logic, for newbies it would be completed hell if GUI doesn’t respond for minutes.

Regards,
Alex K.
Alex Kaiser
Frequent Contributor
Username: Alex_k

Post Number: 203
Registered: 07-2006
Posted on Wednesday, May 18, 2011 - 08:46 am:   

Hi Des,

New version has a problem.
- Message sent
- Receipt requested
- Receipt came from SMSC
- No records in logs or any DR forwarded
That problem raise sporadically and for random logins, unfortunately we unable to reproduce it. Also we have “SMPPServerAsyncWindowSize” parameter enabled, maybe that could help.

Regards,
Alex K.
Alex Kaiser
Frequent Contributor
Username: Alex_k

Post Number: 204
Registered: 07-2006
Posted on Wednesday, May 18, 2011 - 10:32 am:   

In addition,
We using the following testing scheme:
NowSMS(a)<->NowSMS(b)<->SMSC(n1)
A and B were updated to 20110509.
We submit to A, message (receipt requested) goes to B and then to SMSC
All is fine at this stage.
But after that, no DR coming, I couldn’t see it even in SMSIN logs and as far as I understand SMSC replies with DELIVER_SM.
When we rollback B to 20110411 everything becomes normal even with 20110509 as A.
Honestly, I don’t know where to dig...

Regards,
Alex K.
Des - NowSMS Support
Board Administrator
Username: Desosms

Post Number: 3180
Registered: 08-2008
Posted on Wednesday, May 18, 2011 - 01:03 pm:   

Hi Alex,

Confirmed. I should not have mentioned this version until it went through all of our tests. I see similar problems, which appear to be receipt tracking issues.

--
Des
NowSMS Support
tnet.com
New member
Username: Tnet

Post Number: 1
Registered: 05-2011
Posted on Wednesday, May 18, 2011 - 03:28 pm:   

OK DAAAAAAA
Alex Kaiser
Frequent Contributor
Username: Alex_k

Post Number: 205
Registered: 07-2006
Posted on Friday, May 20, 2011 - 01:36 pm:   

Hi Des,

Another issue, NowSMS won't start if at the moment some users submit messages at full speed.
Alex Kaiser
Frequent Contributor
Username: Alex_k

Post Number: 206
Registered: 07-2006
Posted on Friday, May 20, 2011 - 01:46 pm:   

Everyday day we have at least 1 exception, with long start-ups it’s completely annoying.

Regards,
Alex K.
Des - NowSMS Support
Board Administrator
Username: Desosms

Post Number: 3188
Registered: 08-2008
Posted on Saturday, May 21, 2011 - 11:47 am:   

Hi Alex,

You may wish to consider going back to an earlier version.

We expect to have a fix for the one known issue finished testing by Tuesday. However, maybe you are having a different issue. I would suggest disabling the SMPP Server Async support. We have not found exception issues related to this new feature, but this feature is new, and if instability has been introduced, this is an area where I would be suspicious.

--
Des
NowSMS Support
Alex Kaiser
Frequent Contributor
Username: Alex_k

Post Number: 207
Registered: 07-2006
Posted on Monday, May 23, 2011 - 11:02 am:   

Hi Des,

We'll disable async DR support, make more tests and let you know the results. Have you planned to fix long start-up issue? I guess that's the main problem.

Regards,
Alex K.
Alex Kaiser
Frequent Contributor
Username: Alex_k

Post Number: 209
Registered: 07-2006
Posted on Monday, May 23, 2011 - 04:38 pm:   

Hi Des,

Another idea how to speed up NowSMS. If you make an ability to remove DR database and make that process dependable on 2way server (callback). NowSMS will expect “RouteTo=” parameter in SMSIN callback in order to route DR to correct user if SMPP access enabled, or start custom rule or just waste it (like with 2way SMSIN cleanup script). If you’re using callbacks for collecting statistics then NowSMS does double job tracking DRs in own database (resource consuming and slow down start-up process). If 2way server doesn’t respond with HTTP 200 then just put that “.rec” file into SMSIN folder with retry counter inside. What do you think?

Regards,
Alex K.
Des - NowSMS Support
Board Administrator
Username: Desosms

Post Number: 3193
Registered: 08-2008
Posted on Tuesday, May 24, 2011 - 10:02 pm:   

Hi Alex,

The delivery receipt tracking is required not only to determine routing, but also to translate from upstream message ID back to locally assigned message ID. It is not practical for us to rely on callbacks for this, as this is a basic requirement.

This 2011.05.09 update that I mentioned in this thread was seriously flawed. I should never have mentioned it because it hadn't gone through full testing.

A 2011.05.23 update has been posted that has seen more thorough testing, and does not have the DR problem (among other issues).

This version does reduce some of the startup overhead. The Q an in scans are still required. They have been sped up slightly. More significantly, the integrity checks for large DR databases are very time consuming. There are still some checks made against the DR databases, but they have been sped up considerably.

The download link is http://www.nowsms.com/download/nowsms20110523.zip

--
Des
NowSMS Support