Euro symbol (€), GSM default alphabet and iso-8859-1

Euro symbol (€), GSM default alphabet and iso-8859-1 SearchSearch
Author Message
Miguel Fernández Corral
New member
Username: Mfcorral

Post Number: 1
Registered: 08-2010
Posted on Wednesday, August 11, 2010 - 02:12 am:   

Hello support team, I will try to explain the issue as well as possible.

I come from Spain and I'm client of "Telefonica Movistar" for submit SMS via SMPP. The implementation of the SMSC is very poor, because they don't support the native GSM default alphabet (data coding schema set to 0x00). They only support Latin1 with euro (€) variation (this is iso-8859-15), setting the data coding schema (DCS) of the PDU to 0x03, as if pure Latin1 (is0-8859-1) was treated. Then when I send the character 0xA4 (currency symbol in Latin1) I obtain the real euro (€) symbol in the terminated devices.

I'm probing NowSMS to allow certain clients to connect to my server and submit SMS via SMPP too. All these clients submit all the SMS using the GSM default alphabet (DCS set to 0x00) and the NowSMS interpret them perfectly, including the euro symbol (0x1B65). I post a example of the SMSOUT.log:



2010-08-11 02:11:26,4C52C4F8.req,192.168.1.1,+34666666666,OK -- SMPP -10.10.10.1:2775,SubmitUser=test;Sender=test;SMSCMsgId=5E7CBAB1;Text="@£$¥èé ùìòÇØøÅå_üÜñÑ€.1.2.3.4.5.6.7.8.9.0.aA.bB.cC.dD.eE.fF.gG.hH.iI.jJ.k K.lL.mM.nN.ñÑ.oO.pP.qQ.rR.sS.tT.uU.vV.wW.xX.yY.zZ."


Until this point the behavior of NowSMS is perfect for me and the flexibility offered is very powerful. The issue is the following:

As I wrote above, my SMSC (provider) "Telefonica Movistar" only supports DCS 0x03, then NowSMS makes a character remapping to the new charset, more explicity, from DCS 0x00 to DCS 0x03. Obviously it may contain characters without a match in the other charset, but the thing I don´t understand is why 0x1B65 (€) in DCS 0x00 is mapped to 0x80 in DCS 0x03 (Latin1) that is a control character (non printable) instead of 0xA4, although is not the euro symbol in Latin1 at least is the global currency symbol. Any possibility to resolv this issue? I post below the SMPPDEBUG.log:


02:11:26:187 (00000224) 192.168.1.1 <-: 174 byte packet
02:11:26:187 (00000224) 192.168.1.1 <-: 00 00 00 AE 00 00 00 04 00 00 00 00 00 00 00 02
02:11:26:187 (00000224) 192.168.1.1 <-: 00 05 09 35 34 38 30 00 01 01 2B 33 34 36 36 35
02:11:26:187 (00000224) 192.168.1.1 <-: 30 37 35 35 32 34 00 00 00 00 00 00 00 00 00 00
02:11:26:187 (00000224) 192.168.1.1 <-: 7D 00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E
02:11:26:187 (00000224) 192.168.1.1 <-: 0F 11 7E 5E 7D 5D 1B 65 2E 31 2E 32 2E 33 2E 34
02:11:26:187 (00000224) 192.168.1.1 <-: 2E 35 2E 36 2E 37 2E 38 2E 39 2E 30 2E 61 41 2E
02:11:26:187 (00000224) 192.168.1.1 <-: 62 42 2E 63 43 2E 64 44 2E 65 45 2E 66 46 2E 67
02:11:26:187 (00000224) 192.168.1.1 <-: 47 2E 68 48 2E 69 49 2E 6A 4A 2E 6B 4B 2E 6C 4C
02:11:26:187 (00000224) 192.168.1.1 <-: 2E 6D 4D 2E 6E 4E 2E 7D 5D 2E 6F 4F 2E 70 50 2E
02:11:26:187 (00000224) 192.168.1.1 <-: 71 51 2E 72 52 2E 73 53 2E 74 54 2E 75 55 2E 76
02:11:26:187 (00000224) 192.168.1.1 <-: 56 2E 77 57 2E 78 58 2E 79 59 2E 7A 5A 2E
02:11:26:234 (00000224) 192.168.1.1 ->: 25 byte packet
02:11:26:234 (00000224) 192.168.1.1 ->: 00 00 00 19 80 00 00 04 00 00 00 00 00 00 00 02
02:11:26:234 (00000224) 192.168.1.1 ->: 34 43 35 32 43 34 46 38 00

02:11:26:843 (000001A4) 10.10.10.1 ->: 172 byte packet
02:11:26:843 (000001A4) 10.10.10.1 ->: 00 00 00 AC 00 00 00 04 00 00 00 00 00 00 00 02
02:11:26:843 (000001A4) 10.10.10.1 ->: 00 05 09 35 34 38 30 00 01 01 33 34 36 36 35 30
02:11:26:843 (000001A4) 10.10.10.1 ->: 37 35 35 32 34 00 00 00 00 00 00 00 00 03 00 7C
02:11:26:843 (000001A4) 10.10.10.1 ->: 40 A3 24 A5 E8 E9 F9 EC F2 C7 0A D8 F8 0D C5 E5
02:11:26:843 (000001A4) 10.10.10.1 ->: 5F FC DC F1 D1 80 2E 31 2E 32 2E 33 2E 34 2E 35
02:11:26:843 (000001A4) 10.10.10.1 ->: 2E 36 2E 37 2E 38 2E 39 2E 30 2E 61 41 2E 62 42
02:11:26:843 (000001A4) 10.10.10.1 ->: 2E 63 43 2E 64 44 2E 65 45 2E 66 46 2E 67 47 2E
02:11:26:843 (000001A4) 10.10.10.1 ->: 68 48 2E 69 49 2E 6A 4A 2E 6B 4B 2E 6C 4C 2E 6D
02:11:26:843 (000001A4) 10.10.10.1 ->: 4D 2E 6E 4E 2E F1 D1 2E 6F 4F 2E 70 50 2E 71 51
02:11:26:843 (000001A4) 10.10.10.1 ->: 2E 72 52 2E 73 53 2E 74 54 2E 75 55 2E 76 56 2E
02:11:26:843 (000001A4) 10.10.10.1 ->: 77 57 2E 78 58 2E 79 59 2E 7A 5A 2E
02:11:26:921 (000001A4) 10.10.10.1 <-: 25 byte packet
02:11:26:921 (000001A4) 10.10.10.1 <-: 00 00 00 19 80 00 00 04 00 00 00 00 00 00 00 02
02:11:26:921 (000001A4) 10.10.10.1 <-: 35 45 37 43 42 41 42 31 00


The DCS is highlighted in green and the € symbol and its mapping in red.

Best regards,
Miguel Fernández Corral.
Des - NowSMS Support
Board Administrator
Username: Desosms

Post Number: 2408
Registered: 08-2008
Posted on Wednesday, August 11, 2010 - 04:56 pm:   

Hi Miguel,

Thanks for your detailed explanation.

When the Euro symbol was introduced, Microsoft added it to Windows Code page 1252 (their version of iso-8859-1) with the code 0x80. We've had good success using that as a way to encode the Euro symbol.

However, it does make sense that a provider would go with the more officially standardised iso-8859-15.

I've been discussing this issue with our engineering team to see how they want to add support for this encoding. Give us a few days, and we'll figure out a way to support this.

--
Des
NowSMS Support
Miguel Fernández Corral
New member
Username: Mfcorral

Post Number: 2
Registered: 08-2010
Posted on Wednesday, August 11, 2010 - 05:32 pm:   

Hi again Des,

Thank you for so quick response. I now understand the why of the issue and your decission of mapping the € symbol to the Microsoft implementation of iso-8859-1. As you know, each SMSCs at the end have their particularities and Movistar is famous for have a lot of those. This is only a sample of that.

It would be a very interesting feature that NowSMS could "bypass" a client user data without make any character remapping between DCS (send as received). For example, our SMSC have the DCS 0 assigned to the iso-8859-15 specification. I know this SMSC (Movistar) implementation and its behaviour is really curious but in Spain we need to address it if you want to support a very high sending rate of SMS :-).

Thanks a lot for all and I will keep my attention during the next days in the engineer team response.

Best regards,
Miguel Fernández.
Des - NowSMS Support
Board Administrator
Username: Desosms

Post Number: 2409
Registered: 08-2008
Posted on Wednesday, August 11, 2010 - 10:16 pm:   

Hi Miguel,

The transparent idea is interesting, and agree that it could be very useful for some scenarios. However, it is easier for us to deal with quirks of different implementations than to redesign for complete transparency.

Transparency is also very different from the scenario in your original post, where you have SMPP clients using the GSM character set, but interfacing to an upstream SMSC that uses iso-8859-15.

I tried to reconcile these two requirements, and the thought that I have is this ...

Just as different SMSC implementations have different character set assumptions, different SMPP client implementations have different character set assumptions, and it is not always possible or desirable to modify the SMPP client.

So you might have some SMPP clients using GSM character set encoding and some using iso-8859-15 for Movistar (and maybe even some others using the Windows extension to iso-8859-1).

What we're going to do is this ...

1.) Add iso-8859-15 support for outbound SMPP connections. This will resolve the issue in your original post. (And if you want it to use DCS 0 instead of DCS 3 with this encoding, that will also be possible.)

2.) Add support for allowing different SMPP clients to have different default character sets, so that you can mix clients using GSM encoding with those using iso-8859-15.

It'll be a couple of days before an update with these changes is ready for testing.

--
Des
NowSMS Support
Miguel Fernández Corral
New member
Username: Mfcorral

Post Number: 3
Registered: 08-2010
Posted on Thursday, August 12, 2010 - 01:55 am:   

Hi Des,

That is a very good new for us, because those features are exactly what we needed . We will be happy to test those features in our high disponibility scenario for the three major SMSCs in Spain. We will make a report with the results of the tests and we can send you it if you are interested in it.

We had been probing NowSMS during this month and the feeling was very nice till now, overall the debug information that was very usefull to address all the issues.

The transparency feature about I wrote it's not requested and at this moment unneeded for us, just was an idea resulting of a few hard days trying to address Movistar DCS issue before write to this board. Those days, as you can imagine, were a madness: debugging hundreds of PDUs switching the DCS between clients, NowSMS and the SMSCs to see the behaviour of the alphabets in diferents mobile devices .

Thanks a lot for all and request us any kind of information or tests that we can do in the Spain's SMPP world. We will be glad to realize them.

Best regards,
Miguel Fernández.
Des - NowSMS Support
Board Administrator
Username: Desosms

Post Number: 2431
Registered: 08-2008
Posted on Monday, August 16, 2010 - 09:27 pm:   

Hi Miguel,

Apologies for the delay in response. I intended to post this Friday, but we've had some technical problems our web site.

An update is available at http://www.nowsms.com/download/nowsms20100813.zip.

The readme is a little confusing, so let me explain how it pertains to your configuration.

1a.) For the outbound SMPP connection, a new configuration option has been added to support the iso-8859-15 (Latin-9) character set. Enable this setting (press Ok, Ok, Apply to save) to use the 0xA4 character instead of 0x80 for the € character.

1b.) If you want to use DCS value 0 instead of DCS 3, manually edit SMSGW.INI, and under the [SMPP - server:port] header for this SMPP connection, add SMSCCharsetDefault=Yes.

1c.) If for some reason text messages containing a € character are interpreted incorrectly, add SMSCCharsetReceiveTextOverride=Yes to the same section as described in 1b. (This should not be necessary. It would only be necessary if the SMSC is using an unusual DCS value when delivering received messages.)

2a.) The default character set for interfacing with SMPP clients is defined on the "Web" page of the NowSMS configuration, under the "SMPP Options" button. This configuration now supports "iso-8859-15" as a default choice. (Or if you have more clients using the GSM character set, you can leave it with that setting.)

2b.) If there is a need to support SMPP clients that use different default character sets, it is possible to add additional user-specific section headers to SMSGW.INI. Under a new header of [SMPP - username], the following parameters can be applied to a specific SMPP client account (username):

SMSCCharset=IA5 (for GSM), iso-8859-1 or iso-8859-15
SMSCCharsetDefault=Yes (tells NowSMS to use a data_coding value of 0 with the configured character set)
SMSCCharsetReceiveTextOverride=Yes (tells NowSMS to always use the character set configured for the user for text messages, even if the data_coding value implies a different character set is being used).

--
Des
NowSMS Support
Miguel Fernández Corral
New member
Username: Mfcorral

Post Number: 4
Registered: 08-2010
Posted on Tuesday, August 17, 2010 - 11:47 am:   

Hi Des,
Firstly thank you very much for the NowSMS update to support the requested features. We were testing them along the morning and we obtained a sucesful result for all the tests.

Clients connect to NowSMS via SMPP and they can send messages with the any desired DCS (0x00, 0x03 or 0x08) and the behaviour is perfect because everything keeps running as before with the difference that the € character is mapped to a character 0xA4.

Here is our configuration for the SMSC connection:

SMSCCharset=iso-8859-15
SMSCCharsetDefault=Yes

We left the NowSMS SMPP server charset (in the Web tab) set to default to allow clients use differents DCS:

- If a client submit a message with DCS 0 and use the default encoding, the € character (0x1B65, EURO symbol) is correctly mapped to 0xA4 in the Latin9 (iso-8859-15).

- If a client submit a message with DCS 3 and use the Latin1 (iso-8859-1) encoding, the ¤ character (0xA4, global currency symbol) is correctly mapped to 0xA4 in the Latin9 (iso-8859-15).

- If a client submit a message with DCS 3 and use the Latin9 (iso-8859-15) encoding, the € character (0xA4, EURO symbol) is kept.

Then, all messages are sent to the SMSC with DCS 0x00 thanks to the SMSCCharsetDefault parameter.

Best regards,
Miguel Fernández.