Unicode encoding issues

Unicode encoding issues SearchSearch
Author Message
Manveer Chawla
New member
Username: Manveer

Post Number: 1
Registered: 11-2009
Posted on Monday, November 02, 2009 - 07:48 am:   

We are trying to send short unicode message but the SMPPServer is not able to decode the message correctly. We sent "यूनिकोड, प्रत्येक अक्षर के लिए" text (without quotes) using NowSMS and the byte array received at the SMPP Server was [9, 47, 9, 66, 9, 40, 9, 63, 9, 21, 9, 75, 9, 33, 0, 44, 0, 32, 9, 42, 9, 77, 9, 48, 9, 36, 9, 77, 9, 47, 9, 71, 9, 21, 0, 32, 9, 5, 9, 21, 9, 77, 9, 55, 9, 48, 0, 32, 9, 21, 9, 71, 0, 32, 9, 50, 9, 63].

We tried sending same message using kannel and the byte array received at the server was [48, 57, 50, 70, 48, 57, 52, 50, 48, 57, 50, 56, 48, 57, 51, 70, 48, 57, 49, 53, 48, 57, 52, 66, 48, 57, 50, 49, 48, 48, 50, 67, 48, 48, 50, 48, 48, 57, 50, 65, 48, 57, 52, 68, 48, 57, 51, 48, 48, 57, 50, 52, 48, 57, 52, 68, 48, 57, 50, 70, 48, 57, 52, 55, 48, 57, 49, 53, 48, 48, 50, 48, 48, 57, 48, 53, 48, 57, 49, 53, 48, 57, 52, 68, 48, 57, 51, 55, 48, 57, 51, 48, 48, 48, 50, 48, 48, 57, 49, 53, 48, 57, 52, 55, 48, 48, 50, 48, 48, 57, 51, 50, 48, 57, 51, 70, 48, 57, 48, 70]

In both the cases message was sent in ShortMessage field with optional parameters set.

Can you tell us how to resolve this ?
Des - NowSMS Support
Board Administrator
Username: Desosms

Post Number: 1404
Registered: 08-2008
Posted on Monday, November 02, 2009 - 08:37 pm:   

Hi,

I'm sorry, but I don't understand how are you submitting this message to NowSMS.

Are you submitting the message using HTTP?

Does it work if you enter this text into the web form?

If it works when you use the web form, but not when you try a URL submission, this is because NowSMS assumes that URL submissions are encoded with UTF-8 text. If you are using a different character set, you need to include the character set in the URL using the "&charset=xxxx" parameter, where "xxxx" is the character set you are using.

--
Des
NowSMS Support
Des - NowSMS Support
Board Administrator
Username: Desosms

Post Number: 1405
Registered: 08-2008
Posted on Monday, November 02, 2009 - 08:39 pm:   

Wait a minute ... I just looked up the characters that you are trying to send.

I have no knowledge of the language that you are using. However ...

यू is DEVANAGARI VOWEL SIGN E, correct? ... and the Unicode character encoding for it is 0x0947.

It looks like NowSMS is encoding the character correctly. It is the encoding of the other software that appears suspect to me.

--
Des
NowSMS Support
Manveer Chawla
New member
Username: Manveer

Post Number: 2
Registered: 11-2009
Posted on Tuesday, November 03, 2009 - 08:44 am:   

We are not submitting this message to NowSMS. We are submitting the message using NowSMS desktop client to a SMSC.

The logs of the server given above are at the same SMSC. Kannel is the software used by SMSC for most of the clients. Anyway this issue can be resolved at the NowSMS client ?

And the characters are from Hindi alphabet.
Manveer Chawla
New member
Username: Manveer

Post Number: 3
Registered: 11-2009
Posted on Wednesday, November 04, 2009 - 11:09 am:   

The other thing which I am wondering is that shouldn't 0x0947 be interpreted as [48, 57, 50, 70] (treating each character as byte), currently it is encoding it as [9,47] treating two characters as 1 byte. Treating two characters as one byte is surely going to mess the message in my opinion.
Des - NowSMS Support
Board Administrator
Username: Desosms

Post Number: 1406
Registered: 08-2008
Posted on Wednesday, November 04, 2009 - 05:11 pm:   

Hi,

I still don't see what the problem is.

If the Unicode character code is 0x0947, then it would be encoded as two bytes ... 0x09 and 0x47.

And that is how the character would be sent via SMS, with the DCS indicating that 16-bit Unicode is being used.

I do not understand the logic why you would think that this character should be represented as [48, 57, 50, 70]. Why do you think this? What logic are you using to perform that conversion?

I just cut & paste the यू character from above into NowSMS, and on my system, it did interpret this as two characters: Unicode 0x092F and 0x0942. So it goes out as 0x09, 0x2F, 0x09, 0x42.

That's still different from what you're expecting.

So let me explain a little bit more about how NowSMS encoding works.

If you submit a message from the NowSMS web interface, the web browser encodes the text using UTF-8 (8-bit multibyte representation of Unicode characters).

NowSMS receives the UTF-8 encoded request and converts it to 16-bit Unicode. The actual SMS that goes out needs to use either the 7-bit GSM character set, or 16-bit Unicode. In this case, 16-bit Unicode needs to be used.

If you enable the SMSDEBUG.LOG, you will see the character encoding that NowSMS receives from the web browser.

On my system, I see this:

/Send%20Text%20Message.htm?PhoneNumber=999999999999&Text=test+%E0%A4%AF%E0%A5%82 &InfoCharCounter=&PID=&DCS=&DestPort=&DelayUntil=&Submit=Submit

%E0%A4%AF%E0%A5%82 is how the browser encoded the यू character. This is UTF-8 encoding.

%E0%A4%AF is URL encoded UTF-8 encoding for Unicode character 0x092F.

%E0%A5%82 is URL encoded UTF-8 encoding for Unicode character 0x0942.

On your system, you probably have additional language support installed ... so instead of interpreting it as two Unicode characters, the web browser is interpreting it as the single Unicode character DEVANAGARI VOWEL SIGN E, and the SMSDEBUG.LOG would show that the browser is submitting %E0%A5%87, which would be URL encoded UTF-8 encoding for Unicode character 0x0947.

I'm still not clear what encoding you are expecting to be used ... as I don't understand how 0x0947 would be interpreted as [48, 57, 50, 70].

--
Des
NowSMS Support
Des - NowSMS Support
Board Administrator
Username: Desosms

Post Number: 1407
Registered: 08-2008
Posted on Wednesday, November 04, 2009 - 05:24 pm:   

It appears that I'm cut and pasting the wrong character in my examples above. That's why there is some confusion.

But the Devanagari Unicode characters are all in the 0x09?? range, so the typical encoding in a unicode SMS would be 09 ?? 09 ?? 09 ??, where the ?? would be character dependent.

Does your SMSC not expect Unicode encoding? If not, what character set does it expect? (And what protocol is used to connect to the SMSC? If you are using HTTP, there is some flexibility on the character set being used.)

--
Des
NowSMS Support
Simranjit Singh
New member
Username: Manchanda_17

Post Number: 10
Registered: 06-2007
Posted on Saturday, December 26, 2009 - 06:21 pm:   

Hi Manveer!

the above mentioned issue is not with the Now Sms, actually whenever you submit a message, or recive a message through NOW SMS, it converts the same into Hex Values, which can be converted into desired text with converter, i was too facing this problem, but as of now we have sorted out the same, for more details get in touch with me.
Regards