# HG changeset patch # User Mychaela Falconia # Date 1616649043 0 # Node ID a43c5dc251dc7c84579795e6c02dd4ed0c5c8ee9 # Parent 30fbaa652ea572baa298912c61ea40cd4fb6f653 doc/User-phone-tools: new sms-pdu-decode backslash escapes diff -r 30fbaa652ea5 -r a43c5dc251dc doc/User-phone-tools --- a/doc/User-phone-tools Thu Mar 25 03:26:23 2021 +0000 +++ b/doc/User-phone-tools Thu Mar 25 05:10:43 2021 +0000 @@ -200,8 +200,7 @@ By default, sms-pdu-decode only emits 7-bit ASCII characters in its output; any GSM7 or UCS-2 characters which fall outside of this plain ASCII repertoire are -displayed as the '?' error character and the presence of such decoding errors -is indicated in the Length: header. This conservative default behaviour can be +converted into backslash escapes. This conservative default behaviour can be modified as follows: -e option extends the potential output character repertoire from 7-bit ASCII to @@ -209,8 +208,41 @@ i.e., are NOT encoded in UTF-8 - this option is intended for non-UTF-8 environments. --u option extends the potential output character repertoire to the entire Basic -Multilingual Plane of Unicode, and changes the output encoding to UTF-8. +-u option extends the potential output character repertoire to all of Unicode, +and changes the output encoding to UTF-8. + +Regardless of whether the source message character set is GSM7 or UCS-2 and +irrespective of -e or -u options, any backslash characters are always escaped +as \\, and any CR characters are represented as \r. Additional backslash +escape encodings depend on the source message character set: + +* If the source message character set is GSM7, the following additional + backslash escapes can be emitted: + + - In the absence of -u option, the Euro currency symbol is converted to \E; + + - Any GSM7 escape characters (0x1B) that aren't part of a valid escape + sequence for [\]^ or {|}~ or \E are represented as \e; + + - Any GSM7 characters that either can't be represented in the output character + set (ASCII or ISO 8859-1) or are outright invalid per GSM 03.38 are + represented as \xX, where xX is the original GSM7 code point in 2-digit + hexadecimal form between 00 and 7F; + + - Invalid GSM7 escape sequences are emitted as \e\xX. + +* If the source message character set is UCS-2, the following additional + backslash escapes can be emitted: + + - Invalid UCS-2 characters falling onto control character code points are + emitted as \u00XX; + + - UCS-2 characters that can't be represented in ASCII or ISO 8859-1 (when + running without -u option) are emitted as \uXXXX; + + - If UTF-16 surrogate pairs are detected in the input, the encoded high-plane + Unicode character is reconstructed and emitted as \UXXXXXX in the absence + of -u option, or as the appropriate UTF-8 byte sequence with -u. -h option causes the user data portion of every message to be displayed as a raw hex dump; in the case of GSM7-encoded messages, this hex dump shows the