Unicode Characters

Unicode Characters SearchSearch
Author Message
Anonymous
Posted on Thursday, May 22, 2003 - 08:01 pm:   

Hi,

I'm using your gateway for a test system that I am presenting to my organisation.

The only problem I have is the unicode and I'm hoping you can help me with that.

What I want to know is if I can translate the special charters for swedish.

Unicode: %E5 %E4 %F6 %C5 %C4 %D6
Typed from the keyboard: å ä ö Å Ä Ö

I dont know if you can see them but at http://www.unicode.org/charts/PDF/U0080.pdf you can see what charters I want.

I have manged to translate the question mark and some other special charters but i can't translate those.

Help!
Bryce Norwood - NowSMS Support (Bryce)
Posted on Thursday, May 22, 2003 - 11:03 pm:   

In UTF-8 format, the character encoding for these characters are:

å = %C3%A5
ä = %C3%A4
ö = %C3%B6
Å = %C3%85
Ä = %C3%84
Ö = %C3%96

In order to figure out these encodings, I cheated with a short cut trick ... so I'll let you know my trick, in case you need to figure out encoding for additional characters.

I submitted a message containing the characters that I wanted the codes for via the web menu interface of the gateway. The web menu interface uses an HTML form that tells the web browser to encode the data in UTF-8 format when submitting to the server.

I had the gateway running in debug mode (which you can enable by manually editing SMSGW.INI, and adding Debug=Yes under the [SMSGW] section header of that file, and restart the gateway after making a settings change in SMSGW.INI).

So after submitting this message through the web menu interface, I looked at the SMSDEBUG.LOG, and it showed me the URL request that was coming in to the gateway. In the URL request, the web browser had converted the characters to UTF-8. So that was my quick trick for determining the UTF-8 encoding of the characters that you asked about.

There is also another trick. If you want to submit a message to the gateway using a character set other than UTF-8, we also support iso-8859-1, iso-8859-2, iso-8859-4, iso-8859-5, iso-8859-6, iso-8859-7, iso-8859-8, iso-8859-9, big5 (big-5), gbk (gb2312, gb2312-80), and shift-jis (shift_jis). Just include a "charset=" parameter on the URL request to tell us which of these character sets you are using to encode the message that is being submitted.

So if you wanted to send those five characters, you could either do:

The UTF-8 approach:

/&PhoneNumber=xxxxxx&Text=%C3%A5%C3%A4%C3%B6%C3%85%C3%84%C3%96

iso-8859-1 override:

/&PhoneNumber=xxxxxx&Text=%E5%E4%F6%C5%C4%D6&charset=iso-8859-1

Or, iso-8859-1 override without escape:

/&PhoneNumber=xxxxxx&Text=åäöÅÄÖ&charset=iso-8859-1

Hope that helps!

-bn