FreeCalypso > hg > gsm-codec-lib
comparison doc/HR-codec-utils @ 641:83de961cc54b
document GSM-HR codec utilities
| author | Mychaela Falconia <falcon@freecalypso.org> |
|---|---|
| date | Thu, 26 Mar 2026 22:48:20 +0000 |
| parents | |
| children |
comparison
equal
deleted
inserted
replaced
| 640:e0e5905261e2 | 641:83de961cc54b |
|---|---|
| 1 Beginning with gsm-codec-lib-r5 release, Themyscira Wireless GSM codec libraries | |
| 2 and utilities package includes support for GSM-HR codec for the sake of | |
| 3 completeness, alongside with more useful FR, EFR and AMR codecs. The set of | |
| 4 command line utilities for GSM-HR codec includes speech encoding and decoding, | |
| 5 conversion of encoded speech between different formats, display of various | |
| 6 encoded formats and certain specialized utilities described later in this | |
| 7 article. | |
| 8 | |
| 9 File formats for GSM-HR encoded speech | |
| 10 ====================================== | |
| 11 | |
| 12 The present suite of tools supports ETSI *.cod and *.dec formats, TW-TS-005 | |
| 13 Annex B hexadecimal format, and a simple "raw packed" binary format. These | |
| 14 file formats are explained below. | |
| 15 | |
| 16 ETSI *.cod encoder output format | |
| 17 -------------------------------- | |
| 18 | |
| 19 ETSI reference implementation of GSM-HR speech encoder writes its output in | |
| 20 this format; for each encoded 20 ms frame the output consists of 18 speech | |
| 21 parameters followed by VAD and SP flags. Each parameter or flag is written | |
| 22 into the file as a 16-bit word, hence each encoded 20 ms frame turns into 40 | |
| 23 bytes in this format. | |
| 24 | |
| 25 ThemWi suite of GSM-HR codec utilities allows examining and hand-crafting *.cod | |
| 26 files; we have an ETSI-style speech encoder utility that emits this reference | |
| 27 format and a TFO transform utility that does the same, and we support conversion | |
| 28 from *.cod into our preferred TW-TS-005 Annex B hex format. | |
| 29 | |
| 30 ETSI *.dec decoder input format | |
| 31 ------------------------------- | |
| 32 | |
| 33 ETSI reference implementation of GSM-HR speech decoder takes this format as its | |
| 34 input. Each 20 ms Rx unit (traffic frame or garbage received in the place of | |
| 35 one) consists of 18 speech parameters followed by 4 words of flags (BFI, UFI, | |
| 36 SID and TAF), stored as 22 16-bit words. As explained in HR-codec-library | |
| 37 article, ThemWi implementation of GSM-HR includes an extension to this decoder | |
| 38 input format: BFI=1 means BFI with payload bits included in both ETSI and | |
| 39 ThemWi versions, but BFI=2 (ThemWi extension) means BFI without payload bits. | |
| 40 | |
| 41 ThemWi suite of GSM-HR codec utilities allows examining and hand-crafting *.dec | |
| 42 files; we have an ETSI-style speech decoder utility that reads this reference | |
| 43 format and a TFO transform utility that does the same, and we support | |
| 44 bidirectional conversion between this *.dec format and our preferred TW-TS-005 | |
| 45 Annex B hex format. | |
| 46 | |
| 47 TW-TS-005 Annex B hexadecimal format | |
| 48 ------------------------------------ | |
| 49 | |
| 50 Themyscira Wireless Technical Specification TW-TS-005 defines a hexadecimal file | |
| 51 format for sequences of RTP payloads for GSM speech codecs; TW-TS-005 Annex B | |
| 52 specifies application of this hex file format to GSM-HR codec. | |
| 53 | |
| 54 This TW-TS-005 Annex B hex is the preferred file format for GSM-HR encoded | |
| 55 speech recordings for most workflows. It can represent both Tx semantics (every | |
| 56 20 ms frame position is filled with either a good speech frame or a perfect SID; | |
| 57 during DTX pauses a new SID appears in every frame) and Rx semantics (BFI frame | |
| 58 gaps can occur anywhere and are expected during DTX pauses; SIDs can be valid | |
| 59 or invalid) in the same file format, hence it is the operator's responsibility | |
| 60 to know the semantics of each given recording file and to use it in the correct | |
| 61 context. | |
| 62 | |
| 63 As explained in TW-TS-005 Annex B itself (see TW-TS-005 article for the most | |
| 64 up-to-date link to the actual spec document), each frame can be represented | |
| 65 either in the basic RTP format of ETSI TS 101 318 section 5.2, or in the | |
| 66 extended RTP format of RFC 5993 and TW-TS-002. Utilities in the present suite | |
| 67 that write TW-TS-005 Annex B hex files can be told to emit either format, with | |
| 68 the exception of gsmhr-dec2hex utility which always emits the extended format; | |
| 69 utilities that read these hex files accept both formats. The two RTP formats | |
| 70 carry different information content: the basic format can only represent Tx | |
| 71 semantics, while the extended format can represent both semantics. | |
| 72 | |
| 73 Compared to *.cod format, TW-TS-005 Annex B format with Tx semantics lacks the | |
| 74 VAD flag, although the extended RTP format does represent an equivalent of SP | |
| 75 flag. However, the inclusion of VAD flag in *.cod format is only a debug | |
| 76 feature for speech encoder test sequences; it is not a part of the interface | |
| 77 from the Tx DTX handler to the Tx RSS as defined in GSM 06.41 Tx chapter. | |
| 78 | |
| 79 Compared to *.dec format, TW-TS-005 Annex B format with Rx semantics (which has | |
| 80 to use the extended RTP format) collapses 3 possible invalid SID conditions | |
| 81 (BFI=0 SID=1, BFI=1 SID=1, BFI=1 SID=2) into the same TW-TS-002 representation | |
| 82 of FT=1. However, per GSM 06.41 Table 1 the Rx DTX handler for GSM-HR is | |
| 83 required to apply exactly the same handling to all 3 possibilities, and the | |
| 84 same collapsing of invalid SID conditions also happens on TDM-based (8 kbit/s) | |
| 85 Abis and Ater interfaces and in TFO, as detailed in GSM 08.61 and 08.62 specs. | |
| 86 | |
| 87 Raw packed binary format | |
| 88 ------------------------ | |
| 89 | |
| 90 When working with FR and EFR codecs, a speech recording with Tx semantics can be | |
| 91 stored in a gsmx binary file (see Binary-file-format article) that consists of | |
| 92 directly abutted codec frames (good speech or SID) in RTP format, with exactly | |
| 93 33 (FR) or 31 (EFR) bytes per frame. We offer an equivalent ability for GSM-HR | |
| 94 with the so-called raw packed format. It is a binary format that consists of | |
| 95 directly abutted frames; each frame is 14 bytes long and stores a GSM-HR codec | |
| 96 frame in the basic RTP format of TS 101 318 section 5.2, which we also call the | |
| 97 raw packed format. | |
| 98 | |
| 99 This raw packed binary file format is not used directly by any of our speech | |
| 100 encoder or decoder utilities, instead it is supported via gsmhr-hex2rpf and | |
| 101 gsmhr-rpf2hex format conversion utilities. | |
| 102 | |
| 103 Common command line options | |
| 104 =========================== | |
| 105 | |
| 106 Certain flag options are common across different utilities in the present suite | |
| 107 of command line tools for GSM-HR codec; these common flags are as follows: | |
| 108 | |
| 109 -b and -l Utilities that read or write ETSI *.cod or *.dec format emit | |
| 110 and expect the local machine's native byte order by default. | |
| 111 -b option forces big-endian byte order; -l forces little-endian. | |
| 112 | |
| 113 -d Speech encoder utilites run with Tx DTX disabled by default; | |
| 114 -d option enables speech encoding with DTX. The same logic | |
| 115 applies to DTXd control in TFO transform utilities. | |
| 116 | |
| 117 -x Utilities that emit TW-TS-005 Annex B hex format with Tx | |
| 118 semantics emit the basic RTP format (TS 101 318) by default; | |
| 119 -x option switches to the extended RTP format. (The latter | |
| 120 format is TW-TS-002, but it also constitutes valid RFC 5993 in | |
| 121 the case of Tx semantics.) | |
| 122 | |
| 123 Inspecting encoded speech file formats | |
| 124 ====================================== | |
| 125 | |
| 126 In common with other GSM speech codecs supported by ThemWi GSM codec libraries | |
| 127 and utilities suite, utilities are provided that read GSM-HR encoded speech | |
| 128 recording files and display all codec frames contained therein, in terms of | |
| 129 compressed speech parameters and accompanying flags. These utilities are as | |
| 130 follows: | |
| 131 | |
| 132 Utility Reads file format | |
| 133 ----------------------------------------- | |
| 134 gsmhr-cod-parse ETSI *.cod | |
| 135 gsmhr-dec-parse ETSI *.dec | |
| 136 tw5b-dump TW-TS-005 Annex B | |
| 137 | |
| 138 gsmhr-cod-parse and gsmhr-dec-parse expect the local machine's native byte order | |
| 139 by default; -b and -l override options are supported. | |
| 140 | |
| 141 ThemWi utilities for FR, EFR and AMR codecs display compressed speech parameters | |
| 142 in decimal form separated by spaces, with each subframe on its own line after | |
| 143 per-frame LPC parameters. A different format has been adopted for GSM-HR: | |
| 144 | |
| 145 * Individual speech parameters are displayed in hex, with a fixed number of | |
| 146 digits corresponding to the size of each parameter in bits; | |
| 147 | |
| 148 * Only two lines are used to display the actual speech parameters for each | |
| 149 frame, with per-frame parameters on the first line and all subframe parameters | |
| 150 on the second line; | |
| 151 | |
| 152 * The set of LPC parameters and each of the 4 subframe parameter sets are | |
| 153 displayed as comma-separated triplets; R0, Int and Mode parameters are | |
| 154 displayed as singletons; | |
| 155 | |
| 156 * Each just-described triplet or singleton is displayed as Name=value for better | |
| 157 readability; | |
| 158 | |
| 159 * Ignoring Name= annotations and treating commas and spaces as equivalent, all | |
| 160 18 speech parameters are printed in their standard order as defined by ETSI. | |
| 161 | |
| 162 File format conversion utilities | |
| 163 ================================ | |
| 164 | |
| 165 The following format conversions are supported between different GSM-HR encoded | |
| 166 speech formats: | |
| 167 | |
| 168 Utility From format To format | |
| 169 --------------------------------------------------------- | |
| 170 gsmhr-cod2hex ETSI *.cod TW-TS-005 Annex B | |
| 171 gsmhr-dec2hex ETSI *.dec TW-TS-005 Annex B | |
| 172 gsmhr-hex2dec TW-TS-005 Annex B ETSI *.dec | |
| 173 gsmhr-hex2rpf TW-TS-005 Annex B Raw packed format | |
| 174 gsmhr-rpf2hex Raw packed format TW-TS-005 Annex B | |
| 175 | |
| 176 The hexadecimal format of TW-TS-005 Annex B is treated as central; all provided | |
| 177 file format conversion utilities convert either from or to this central format. | |
| 178 Additional notes follow regarding each supported conversion. | |
| 179 | |
| 180 Conversion from ETSI *.cod to TW-TS-005 Annex B | |
| 181 ----------------------------------------------- | |
| 182 | |
| 183 ETSI *.cod format naturally represents only Tx semantics, while TW-TS-005 | |
| 184 Annex B supports both semantics. Semantics don't change with file format | |
| 185 conversion, hence the output of gsmhr-cod2hex still has Tx semantics. -b and -l | |
| 186 options are supported for *.cod input; hex output is written in the basic RTP | |
| 187 format by default or in the extended RTP format with -x option. | |
| 188 | |
| 189 Conversion in the opposite direction is not supported, as there is no way to | |
| 190 resurrect VAD debug flag from a data source that lacks such. | |
| 191 | |
| 192 Conversion between ETSI *.dec and TW-TS-005 Annex B | |
| 193 --------------------------------------------------- | |
| 194 | |
| 195 Bidirectional conversion is supported between these two formats, carrying Rx | |
| 196 semantics. However, this conversion may be slightly lossy in each direction: | |
| 197 | |
| 198 * gsmhr-hex2dec is nothing more than a command line utility around libgsmhr1 | |
| 199 function gsmhr_rtp_in_direct() described in HR-codec-library article. The | |
| 200 exact same preprocessing step is done by every libgsmhr1-based program | |
| 201 whenever RTP input (be it real RTP or hex lines read from a TW-TS-005 Annex B | |
| 202 file) needs to be fed to GSM-HR speech decoder or TFO transform, hence the | |
| 203 output of gsmhr-hex2dec elucidates what always happens under the hood anyway. | |
| 204 | |
| 205 The extended RTP format with Rx semantics defined in TW-TS-002 allows RTP | |
| 206 payloads carrying GSM-HR invalid SID to either include or emit payload bits. | |
| 207 As explained in HR-codec-library article, gsmhr_decoder_twts002_in() function | |
| 208 and gsmhr_rtp_in_direct() wrapper around it ignore these optional payload | |
| 209 bits for invalid SID frames and always set all 18 speech parameters in the | |
| 210 dec-style frame to 0. This same behaviour becomes explicitly visible when | |
| 211 using gsmhr-hex2dec - but if the input contains invalid SID frames with | |
| 212 payload bits included, then the conversion is lossy in the strict sense. | |
| 213 | |
| 214 * gsmhr-dec2hex is an ad hoc program, not a wrapper around a library function, | |
| 215 as this operation is not needed in any standard workflow. The conversion may | |
| 216 be lossy in two cases: | |
| 217 | |
| 218 - All possible combinations that mean invalid SID (BFI=0 SID=1, BFI=1 SID=1, | |
| 219 BFI=1 SID=2, plus variants of the same with BFI=2) collapse into the same | |
| 220 representation in TW-TS-002, just like in 8 kbit/s TRAU frame format. | |
| 221 | |
| 222 - Whatever payload bits were given for these invalid SID frames in the 18 | |
| 223 speech parameter words are discarded, i.e., non-verbose invalid SID format | |
| 224 is written in TW-TS-002 output. | |
| 225 | |
| 226 Additional notes: | |
| 227 | |
| 228 * gsmhr-dec2hex expects the local machine's native byte order by default, but | |
| 229 supports -b and -l options. OTOH, gsmhr-hex2dec writes *.dec output in the | |
| 230 local machine's native byte order only. | |
| 231 | |
| 232 * By default gsmhr-hex2dec refuses to process files that contain BFI-no-data | |
| 233 frame gaps, as no such support exists in the standard GSM-HR speech decoder | |
| 234 from ETSI or its *.dec input format. (BFI=2 representation of such gaps is a | |
| 235 Themyscira extension.) -f option allows BFI=2 frames to be emitted. | |
| 236 | |
| 237 Conversion between TW-TS-005 Annex B and raw packed format | |
| 238 ---------------------------------------------------------- | |
| 239 | |
| 240 Bidirectional conversion is supported between these two formats, carrying Tx | |
| 241 semantics. In gsmhr-hex2rpf conversion direction, the input hex file may be in | |
| 242 either basic or extended RTP format; if the latter is used, the only allowed | |
| 243 frame types are good speech (FT=0) and good SID (FT=2). The conversion is | |
| 244 lossless as long as Tx semantics are maintained, more specifically, as long as | |
| 245 the extended RTP format hex input does not contain any frames that are marked | |
| 246 as FT=2, but are not perfect SID with all 79 bits of SID codeword set to 1. If | |
| 247 such imperfect valid SID frames are present, they are converted to perfect SID. | |
| 248 | |
| 249 In gsmhr-rpf2hex conversion direction, each raw packed (TS 101 318) frame is | |
| 250 written out in hex, either unchanged (basic RTP format) or with a prepended | |
| 251 RFC 5993 ToC octet (extended RTP format, enabled with -x option). If -x option | |
| 252 is given, the classification of good speech vs good SID for the purpose of | |
| 253 emitted ToC octet is a check for perfect SID with all 79 bits of SID codeword | |
| 254 set to 1. | |
| 255 | |
| 256 gsmhr-rpf2hex conversion is always lossless. | |
| 257 | |
| 258 Speech encoder and decoder utilities | |
| 259 ==================================== | |
| 260 | |
| 261 The present suite of tools provides 3 styles of speech encoder and decoder | |
| 262 utilities: | |
| 263 | |
| 264 gsmhr-encode Speech encoder, PCM speech input is in WAV format, compressed | |
| 265 speech output is in TW-TS-005 Annex B format. | |
| 266 | |
| 267 gsmhr-decode Speech decoder, compressed speech input is in TW-TS-005 Annex B | |
| 268 format, PCM speech output is in WAV format. | |
| 269 | |
| 270 gsmhr-encode-r Speech encoder, PCM speech input is in robe (raw big-endian) | |
| 271 format, compressed speech output is in TW-TS-005 Annex B format. | |
| 272 | |
| 273 gsmhr-decode-r Speech decoder, compressed speech input is in TW-TS-005 Annex B | |
| 274 format, PCM speech output is in robe format. | |
| 275 | |
| 276 gsmhr-etsi-enc Speech encoder, ETSI style, operating from *.inp to *.cod in | |
| 277 ETSI test sequence format. | |
| 278 | |
| 279 gsmhr-etsi-dec Speech decoder, ETSI style, operating from *.dec to *.out in | |
| 280 ETSI test sequence format. | |
| 281 | |
| 282 gsmhr-etsi-enc and gsmhr-etsi-dec utilities both read their inputs and write | |
| 283 their outputs in the local machine's native byte order by default. Both | |
| 284 utilities also accept -b and -l options that select the desired byte order | |
| 285 explicitly; these options affect both input and output for both encoder and | |
| 286 decoder utilities. The other two styles of speech encoder and decoder utilities | |
| 287 have no byte order concerns. | |
| 288 | |
| 289 TFO transform utilities | |
| 290 ======================= | |
| 291 | |
| 292 TFO-transform article explains the general concept of TFO transform; | |
| 293 HR-codec-Rx-logic article explains ThemWi implementation of this transform for | |
| 294 GSM-HR codec. HR-codec-library article describes libgsmhr1 API functions for | |
| 295 this GSM-HR TFO transform; here are 2 command line utilities that exercise it: | |
| 296 | |
| 297 gsmhr-tfo-xfrm This TFO transform exerciser reads a stream of radio | |
| 298 leg A Rx frames from a TW-TS-005 Annex B hex file and | |
| 299 writes the "pristine" stream intended for radio leg B | |
| 300 Tx into another TW-TS-005 Annex B hex file. -d option | |
| 301 enables DTXd (disabled by default); -x option switches | |
| 302 the output RTP format from basic to extended. | |
| 303 | |
| 304 gsmhr-tfo-xfrm-dc This TFO transform utility reads radio leg A Rx input | |
| 305 in ETSI *.dec format and emits radio leg B Tx output in | |
| 306 ETSI *.cod format, thus acting as an inverse of | |
| 307 GSM 06.06 REID utility that was originally used to | |
| 308 generate test sequence *.dec files. This variant | |
| 309 exercises libgsmhr1 TFO transform function in its most | |
| 310 native form. | |
| 311 | |
| 312 gsmhr-tfo-xfrm-dc reads its *.dec input and writes its *.cod output in the | |
| 313 local machine's native byte order by default. -b and -l options are supported, | |
| 314 selecting either big-endian or little-endian byte order explicitly; these | |
| 315 options affect both *.dec input and *.cod output. OTOH, the more FR-like | |
| 316 gsmhr-tfo-xfrm utility has no byte order concerns. | |
| 317 | |
| 318 Hand-crafting *.cod and *.dec files | |
| 319 =================================== | |
| 320 | |
| 321 The present suite of GSM-HR codec utilities includes specialized tools for | |
| 322 hand-crafting *.cod and *.dec files: gsmhr-cod-craft and gsmhr-dec-craft, | |
| 323 respectively. Each utility reads an ad hoc line-based ASCII source file and | |
| 324 emits its respective binary format. The ad hoc source language for these two | |
| 325 special-purpose tools is the same except for parts that set frame metadata | |
| 326 flags, which are different between *.cod and *.dec given their opposite | |
| 327 semantics. | |
| 328 | |
| 329 Because of their highly specialized nature, these two utilities are not | |
| 330 documented further - please read the source code for further understanding. | |
| 331 Work scope limits explained in HR-codec-limits article apply here. |
