SMPP, SMS & National Language Shift Tables

SMPP, SMS & National Language Shift Tables SearchSearch
Author Message
Justin Kulesza
New member
Username: Kuleszaj

Post Number: 1
Registered: 07-2011
Posted on Wednesday, July 06, 2011 - 09:53 pm:   

Hello,

I'm working with a corporation which is attempting to send SMS messages to people in countries all over the world using various languages.

The corporation has a custom-written application which communicates using the SMPP protocol with SMSC's of various Telcos.

We have been told by different telcos which data_encoding to use for submitting SMPP PDU's to the SMSC.

Currently we are using 7-bit GSM), Latin-1, and UCS-2 encodings. We are using the encoding that each Telco has told us to use. The payload of the SMPP PDU is submitted encoded, and the data_coding parameter is set accordingly (0x00 for GSM, 0x03 for Latin-1, and 0x08 for UCS-2).

Question 1: Should it really matter what we encoding that we utilize for submitting SMPP PDU's to the SMSC? Shouldn't the SMSC be able to convert from the submitted SMPP encoding to the appropriate encoding based upon the contents of the data_coding parameter? Shouldn't we be able to submit all messages via SMPP as UCS-2 , set the data_coding parameter to 0x08, and have the Telco take care of the conversion to the SMS PDU for us?

Currently, we send want to send Portuguese language SMS messages. The telco has told us to use the "SMSC Default Alphabet" for SMPP to submit the messages. Pressed further, they said this was the same as the GSM default alphabet This is concerning as the Portuguese Alphabet isn't fully represented by the GSM Default Alphabet. It seems that the telco is simply transliterating the Portuguese letters to English equivalents. The telco informed us that "if you send a SMS with a special character that the SMSC does not recognize (á,ó,ã for instance) the SMSC will encode those characters to the closest character possible." I find this somewhat impossible since the GSM Default Alphabet doesn't support such characters in the first place.

Question 2: How can special characters be submitted, and then not be recognized if one uses the GSM Default Alphabet? Shouldn't all characters submitted as the GSM Default Alphabet conform to the 7-bit, 128 letter alphabet which is defined in the GSM 03.38 standard?

Question 3: Since the telco has requested that we use the "GSM Default Alphabet", we should submit our SMPP payload encoded as 7-bit packed octets, correct?

Our application stores text as UTF-8. Since the Portuguese telco is requesting that we submit SMPP with a payload containing the GSM Default Alphabet, I presume that we will need to convert from UTF-8 to the 7-bit GSM default alphabet. My current strategy involves mapping each UTF-8 character which has a GSM default equivalent (128 characters total) by value, and then transliterating other UTF-8 characters to the closest GSM default alphabet equivalent, and a question mark otherwise.

Question 4: Is this the appropriate way to handle conversion from UTF-8 to the GSM default alphabet? There don't seem to be many other approaches. The application in question uses Ruby in a Unix environment. No existing libraries supporting GSM seem to be available, so a custom library seems to be the only approach.

My research has uncovered details of the GSM locking shift tables to support other languages using only 7-bits. The locking shift tables are specified in the UDH portion of the SMS PDU.

Question 5: How would one send SMS messages using the locking shift tables via SMPP? Does the SMPP PDU payload need to be modified to contain a UDH which specifies the locking shift table? What should the data_coding parameter be set to?

I'd be thrilled if anyone could answer any of these questions authoritatively.
Des - NowSMS Support
Board Administrator
Username: Desosms

Post Number: 3322
Registered: 08-2008
Posted on Thursday, July 07, 2011 - 03:46 pm:   

Hi Justin,

You've got quite a few questions ... so let's try to work through them.

#1


quote:

Should it really matter what we encoding that we utilize for submitting SMPP PDU's to the SMSC?




Yes.


quote:

Shouldn't the SMSC be able to convert from the submitted SMPP encoding to the appropriate encoding based upon the contents of the data_coding parameter?




Your expectations are too high.


quote:

Shouldn't we be able to submit all messages via SMPP as UCS-2 , set the data_coding parameter to 0x08, and have the Telco take care of the conversion to the SMS PDU for us?




It is technically possible that an SMPP server could be implemented with such logic.

However, in the real world, you are unlikely to encounter such a server, especially at a telco.

Let me explain why ... SMPP was originally designed as a protocol where there was a direct mapping between an SMPP PDU and an over the air SMS message. In fact, early SMPP implementations required standard 160 character text messages to be 7-bit packed just like they are when sent over the air.

If you specify UCS-2 as the character set in a message, then the actual over the air SMS message is going to be encoded with UCS-2. If UCS-2 encoding is used in an over the air SMS message, you are limited to 70 characters in a single message (longer messages are segmented in 67 character blocks). If GSM encoding is used, you are limited to 160 characters in a single message (longer messages are segmented in 153 character blocks).

In other words, using UCS-2 when not strictly necessary can result in a significant increase in message traffic.

(How does Latin-1 fit into this? Well, it is never used over the air. I don't authoritatively know why it exists as an option, but I suspect it was added so that application developers could work in a more conventional character set, and the SMSC would convert between Latin1 and GSM.)


quote:

... This is concerning as the Portuguese Alphabet isn't fully represented by the GSM Default Alphabet ...




Yeah, you have to realise that most people you talk to at a provider don't technically know what they are talking about.

If you are submitting messages using the GSM character set, it is your responsibility to do this character mapping.

If you are submitting messages using the Latin1/iso-8859-1 character set, and you use a Portuguese character that is not part of the GSM character set, then the provider would perform this mapping.

#2


quote:

How can special characters be submitted, and then not be recognized if one uses the GSM Default Alphabet? Shouldn't all characters submitted as the GSM Default Alphabet conform to the 7-bit, 128 letter alphabet which is defined in the GSM 03.38 standard?




If you are using the GSM default alphabet, and the provider also supports that character set (some SMPP servers only support Latin1), then yes, they should recognise all characters supported by the
GSM default alphabet.

#3


quote:

Since the telco has requested that we use the "GSM Default Alphabet", we should submit our SMPP payload encoded as 7-bit packed octets, correct?




Most often, no. You use 8-bit characters and the SMSC performs the packing.

There are a few operators that use old systems that require 7-bit packing, but they are relatively rare.

(Note that some do require 7-bit packing if you use any UDH.)

#4


quote:

My current strategy involves mapping each UTF-8 character which has a GSM default equivalent (128 characters total) by value, and then transliterating other UTF-8 characters to the closest GSM default alphabet equivalent, and a question mark otherwise.




If those other characters are not significant to your application, that is reasonable.

#5


quote:

How would one send SMS messages using the locking shift tables via SMPP? Does the SMPP PDU payload need to be modified to contain a UDH which specifies the locking shift table? What should the data_coding parameter be set to?




As you describe in the following sentence. Yes. 0.

Locking shift and single shift tables are relatively new.

In our implementation in NowSMS, we currently only convert from UTF-8 (or other character sets) to SMS encoded with shift tables for HTTP submissions. (And from shift table encoding back to UTF-8 for HTTP based 2-way commands.) For more information on configuration, see http://support.nowsms.com/discus/messages/1/70000.html.

We are considering an SMPP implementation that would convert from UCS-2 SMPP PDUs to shift table encoded messages for upstream SMPP submission. However, it would not be practical to perform this conversion on messages that were presegmented by the submitting application ... it would only be practical for messages that were submitted unsegmented (longer messages using message_payload). Whether or not we add support at this level will depend on customer demand.

--
Des
NowSMS Support
JK
New member
Username: Kuleszaj

Post Number: 2
Registered: 07-2011
Posted on Thursday, July 07, 2011 - 06:03 pm:   

Hi Des,

Your answers have been incredibly helpful. Thank you for taking the time to answer them -- I greatly appreciate it.

I'm frustrated by the fact that this sort of information is so difficult to find. I came to this forum to ask the question in the hopes that someone might have an inkling as to what I was working with.

Q: Is information on these topics that scarce? Or is it just so telco specific that only telephone company programmers/engineers really work at this level? I've been searching for answers to these questions quite extensively, and found no real answers. Is there someplace I should have been looking?

Q: More simply: where did you learn all of this, and how/where can I learn it?

Based on your responses... it seems that SMSC's are not really standard... that is, each telco may have a different implementation which behaves differently -- you need to adapt to work with each company's implementation.

Again, your answers have been very helpful. However, if you'll indulge me, I have a few clarifications I'd like to make...

Q: For the SMPP PDU payload... there are two fields which can be used "short_message" and "message_payload". My understanding is that the "short_message" field is used for shorter, simpler messages... and that the "message_payload" field is used for longer, more complex messages where a custom SMS UDH can be included.

For example, if you wanted to use concatenated SMS... you could send multiple SMPP PDU's containing the individual segments of a concatenated SMS message in the "message_payload" which would be 'joined' or associated with each other by the data in the provided UDH.

Q: Regarding the packing of the octets in the SMPP Payload (short_message or message_payload fields)... if the telco has not specified... it seems the best thing to do would be to try 8-bit, and if SMPP PDU's are rejected, or mobile-terminated messages are garbled, that 7-bit packing should be tried.

Q: Are there any plans to make NowSMS available on different platforms (i.e. UNIX/Linux) or accessible via other languages (i.e. Ruby)?

Thank you!
Des - NowSMS Support
Board Administrator
Username: Desosms

Post Number: 3327
Registered: 08-2008
Posted on Thursday, July 07, 2011 - 10:51 pm:   

Hi JK,

Information on SMS Shift Tables is relatively scarce. They are used quite a bit for Turkish, and we are starting to see them more frequently for Portuguese.

There is also a shift table defined for Spanish, but thus far we've not noticed it being used much. And it is very early days for shift tables for Indian subcontinent languages.

The extent of documentation is 3GPP TS 23.038 and 23.040, Release 8. (Release 9 adds the Indian languages.) Links to find those specs are in this article: http://www.nowsms.com/shift-tables-national-language-sms-in-160-characters-witho ut-unicode

There are links to version 3.3 and 3.4 of the SMPP specification (the only relevant versions IMHO) in the following article: http://www.nowsms.com/smpp-information

What you will notice about the SMPP specification is that it is not specific enough in many areas.

There was an attempt to form an industry consortium to develop follow-up versions of SMPP (known as 5.0), but it was not successful. In my opinion, the fatal flaw of this effort was that rather than clarifying areas that needed clarification, they overburdened the protocol with more confusing options.

As a result, you are left with quite a few implementation differences between different providers and operators.


quote:

Q: More simply: where did you learn all of this, and how/where can I learn it?




The first question is do you need to learn all of it? You're already asking some pretty intelligent questions, so you're ahead of most.

We learned it the hard way. In the case of SMPP, we tested against a handful of implementations and simulators. Then, over the years, customers have told us about things that didn't work as expected with a particular provider, or that needed to work better for specific scenarios.


quote:

Q: For the SMPP PDU payload... there are two fields which can be used "short_message" and "message_payload". My understanding is that the "short_message" field is used for shorter, simpler messages... and that the "message_payload" field is used for longer, more complex messages where a custom SMS UDH can be included.

For example, if you wanted to use concatenated SMS... you could send multiple SMPP PDU's containing the individual segments of a concatenated SMS message in the "message_payload" which would be 'joined' or associated with each other by the data in the provided UDH.




The key thing to keep in mind about "message_payload" and other TLV parameters is that they are "optional".

An implementation may or may not support them, or may only support certain optional parameters.

Segmentation is a particular issue.

If a message is too big for a single SMS message, some servers will allow you to use the "message_payload" parameter to specify a longer message. The server then takes responsibility for segmenting the message and creating UDH, if necessary.

If you need to send large messages, try this first. It is easiest to implement if the provider supports it. However, many don't because from an accounting perspective, they don't like a single message submission resulting in multiple messages being generated.

When you segment the message before submitting, some systems use the TLV parameters sar_msg_ref_num, sar_total_segments and sar_segment_seqnum. Others expect you to generate the UDH directly, and include it at the start of the message with the UDHI bit set in the header. And when you generate the UDH directly, some providers expect the message text to be 7-bit packed.

The key thing to remember is that the SMPP spec clearly refers to "message_payload" and these other parameters as "optional". You might interpret this as optional for me to use if I want to. But the programmers that implemented the server used that same logic, and may or may not have chosen to implement the parameters you want to use.


quote:

...the best thing to do would be to try 8-bit, and if SMPP PDU's are rejected, or mobile-terminated messages are garbled, that 7-bit packing should be tried.




Yes.

In most cases, 8-bit encoding is expected. (Although if UDH is present, there is a better chance of expecting the content to be 7-bit.)

It may not be pretty, but it's trial and error to determine the attributes of a particular provider.

You'll see that in discussion threads here. And in particular the saga of problematic characters @ and €.


quote:

Are there any plans to make NowSMS available on different platforms (i.e. UNIX/Linux) or accessible via other languages (i.e. Ruby)?




I never rule anything out, but Unix is doubtful.

Regarding Ruby, the basic interface to NowSMS that most people use is HTTP based, so our PHP or Java or command line or other examples are wrappers for an HTTP transaction. I'm sure it would be simple to implement for Ruby on Rails, we just don't have any experience with that platform.

--
Des
NowSMS Support
JK
New member
Username: Kuleszaj

Post Number: 3
Registered: 07-2011
Posted on Monday, July 11, 2011 - 01:15 pm:   

Hi Des,

Thank you very much for your continued assistance. It has been unbelievably helpful to have had solid answers to these questions.

Are you aware of any other message board, forums, or mailing lists which would be useful for future discussion of technical matters regarding SMS and SMPP?

Best,
- JK

Add Your Message Here, or click here to start a new topic.
Post:
Bold text Italics Underline Create a hyperlink Insert a clipart image
Options: Automatically activate URLs in message
Action: