FreeCalypso > hg > sms-coding-utils
comparison doc/SMS-PDU-decoding @ 31:19476164c54d
doc/SMS-PDU-decoding: document imported utilities
| author | Mychaela Falconia <falcon@freecalypso.org> |
|---|---|
| date | Fri, 14 Jun 2024 18:48:58 +0000 |
| parents | |
| children |
comparison
equal
deleted
inserted
replaced
| 30:d7571dc2fecc | 31:19476164c54d |
|---|---|
| 1 The decoding part of the present sms-coding-utils suite consists of two | |
| 2 programs: sms-pdu-decode and pcm-sms-decode. Their functions are as follows: | |
| 3 | |
| 4 * The input to sms-pdu-decode is an ASCII text stream (stdin or read from a | |
| 5 file) in which every SMS PDU to be decoded appears as a long hex string. | |
| 6 This input can originate from the GSM 07.05 interface on a FreeCalypso GSM MS | |
| 7 (fcup-smdump utility in FC host tools), in which case every GSM 03.40 TPDU | |
| 8 will be preceded by an SC address field - the original use case for | |
| 9 sms-pdu-decode, run it without special options. In the other alternative, | |
| 10 the input to sms-pdu-decode can originate from test scenarios on the network | |
| 11 side of GSM (SMSC development and testing), in which case input SMS PDUs will | |
| 12 be pure GSM 03.40 TPDUs, without SC address prefix - use sms-pdu-decode -n | |
| 13 option in this case. | |
| 14 | |
| 15 * The input to pcm-sms-decode is a binary file with 176 bytes per record, | |
| 16 corresponding to the format of EF_SMS elementary file on SIM cards. This | |
| 17 program can be used to decode readouts of this EF_SMS file made with | |
| 18 fc-simtool, or readouts of /pcm/SMS file in the flash file system of Pirelli | |
| 19 DP-L10 phone, which uses the same format - the latter use case arose first in | |
| 20 chronological order of FreeCalypso development, hence the name of the utility. | |
| 21 | |
| 22 Common options: character set and dump format control | |
| 23 ===================================================== | |
| 24 | |
| 25 By default, sms-pdu-decode and pcm-sms-decode only emit 7-bit ASCII characters | |
| 26 in their output; any GSM7 or UCS-2 characters which fall outside of this plain | |
| 27 ASCII repertoire are converted into backslash escapes. This conservative | |
| 28 default behaviour can be modified as follows: | |
| 29 | |
| 30 -e option extends the potential output character repertoire from 7-bit ASCII to | |
| 31 8-bit ISO 8859-1. Any 8859-1 high characters are emitted as single bytes, | |
| 32 i.e., are NOT encoded in UTF-8 - this option is intended for non-UTF-8 | |
| 33 environments. | |
| 34 | |
| 35 -u option extends the potential output character repertoire to all of Unicode, | |
| 36 and changes the output encoding to UTF-8. | |
| 37 | |
| 38 Regardless of whether the source message character set is GSM7 or UCS-2 and | |
| 39 irrespective of -e or -u options, any backslash characters are always escaped | |
| 40 as \\, and any CR characters are represented as \r. Additional backslash | |
| 41 escape encodings depend on the source message character set: | |
| 42 | |
| 43 * If the source message character set is GSM7, the following additional | |
| 44 backslash escapes can be emitted: | |
| 45 | |
| 46 - In the absence of -u option, the Euro currency symbol is converted to \E; | |
| 47 | |
| 48 - Any GSM7 escape characters (0x1B) that aren't part of a valid escape | |
| 49 sequence for [\]^ or {|}~ or \E are represented as \e; | |
| 50 | |
| 51 - Any GSM7 characters that either can't be represented in the output character | |
| 52 set (ASCII or ISO 8859-1) or are outright invalid per GSM 03.38 are | |
| 53 represented as \xX, where xX is the original GSM7 code point in 2-digit | |
| 54 hexadecimal form between 00 and 7F; | |
| 55 | |
| 56 - Invalid GSM7 escape sequences are emitted as \e\xX. | |
| 57 | |
| 58 * If the source message character set is UCS-2, the following additional | |
| 59 backslash escapes can be emitted: | |
| 60 | |
| 61 - Invalid UCS-2 characters falling onto control character code points are | |
| 62 emitted as \u00XX; | |
| 63 | |
| 64 - UCS-2 characters that can't be represented in ASCII or ISO 8859-1 (when | |
| 65 running without -u option) are emitted as \uXXXX; | |
| 66 | |
| 67 - If UTF-16 surrogate pairs are detected in the input, the encoded high-plane | |
| 68 Unicode character is reconstructed and emitted as \UXXXXXX in the absence | |
| 69 of -u option, or as the appropriate UTF-8 byte sequence with -u. | |
| 70 | |
| 71 -h option causes the user data portion of every message to be displayed as a | |
| 72 raw hex dump; in the case of GSM7-encoded messages, this hex dump shows the | |
| 73 unpacked septets. | |
| 74 | |
| 75 sms-pdu-decode specifics | |
| 76 ======================== | |
| 77 | |
| 78 The input to the program may contain additional text besides SMS PDUs in the | |
| 79 form of long hex strings; all lines that are not hex strings are passed through | |
| 80 to the output. Every input line that is purely a string of directly abutted hex | |
| 81 bytes is taken to be an SMS PDU in need of decoding, and the full decoding | |
| 82 operation is attempted. The following additional options are available besides | |
| 83 the common -e, -u and -h options documented above: | |
| 84 | |
| 85 -n By default, sms-pdu-decode expects every hex-encoded SMS PDU to begin | |
| 86 with an SC address field, followed by a GSM 03.40 TPDU - the format used | |
| 87 on GSM 07.05 interface in PDU mode and in SIM SMS storage. With -n | |
| 88 option, sms-pdu-decode expects pure GSM 03.40 TPDUs instead, without | |
| 89 SC address prefix. | |
| 90 | |
| 91 -p Keep all hex-encoded PDU lines in the output: for each encountered hex | |
| 92 PDU, first the original hex line is output, then the decoding result. | |
| 93 | |
| 94 pcm-sms-decode specifics | |
| 95 ======================== | |
| 96 | |
| 97 This program reads a binary file; the file to be read must be named on the | |
| 98 command line. The output is ASCII (or an extended character set with -e or -u | |
| 99 options as described in the common section above), naming each dumped record as | |
| 100 "Record #%u" and showing its content. For a binary file of N records, the | |
| 101 default record numbering is from 0 to N-1: this numbering order is natural to | |
| 102 this Mother's native world of CompSci, and I implemented it when I originally | |
| 103 wrote pcm-sms-decode for the purpose of decoding /pcm/SMS readouts from Pirelli | |
| 104 DP-L10 FFS. However, when I later wrote fc-simtool and pcm-sms-decode acquired | |
| 105 a second use case of decoding SIM EF_SMS readouts, a mismatch became apparent: | |
| 106 the record numbering used in READ RECORD and UPDATE RECORD commands on the | |
| 107 SIM-ME interface is 1..N instead of 0..N-1. pcm-sms-decode -s option switches | |
| 108 the record numbering scheme to 1..N to match the SIM application. |
