comparison doc/HR-codec-utils @ 641:83de961cc54b

document GSM-HR codec utilities
author Mychaela Falconia <falcon@freecalypso.org>
date Thu, 26 Mar 2026 22:48:20 +0000
parents
children
comparison
equal deleted inserted replaced
640:e0e5905261e2 641:83de961cc54b
1 Beginning with gsm-codec-lib-r5 release, Themyscira Wireless GSM codec libraries
2 and utilities package includes support for GSM-HR codec for the sake of
3 completeness, alongside with more useful FR, EFR and AMR codecs. The set of
4 command line utilities for GSM-HR codec includes speech encoding and decoding,
5 conversion of encoded speech between different formats, display of various
6 encoded formats and certain specialized utilities described later in this
7 article.
8
9 File formats for GSM-HR encoded speech
10 ======================================
11
12 The present suite of tools supports ETSI *.cod and *.dec formats, TW-TS-005
13 Annex B hexadecimal format, and a simple "raw packed" binary format. These
14 file formats are explained below.
15
16 ETSI *.cod encoder output format
17 --------------------------------
18
19 ETSI reference implementation of GSM-HR speech encoder writes its output in
20 this format; for each encoded 20 ms frame the output consists of 18 speech
21 parameters followed by VAD and SP flags. Each parameter or flag is written
22 into the file as a 16-bit word, hence each encoded 20 ms frame turns into 40
23 bytes in this format.
24
25 ThemWi suite of GSM-HR codec utilities allows examining and hand-crafting *.cod
26 files; we have an ETSI-style speech encoder utility that emits this reference
27 format and a TFO transform utility that does the same, and we support conversion
28 from *.cod into our preferred TW-TS-005 Annex B hex format.
29
30 ETSI *.dec decoder input format
31 -------------------------------
32
33 ETSI reference implementation of GSM-HR speech decoder takes this format as its
34 input. Each 20 ms Rx unit (traffic frame or garbage received in the place of
35 one) consists of 18 speech parameters followed by 4 words of flags (BFI, UFI,
36 SID and TAF), stored as 22 16-bit words. As explained in HR-codec-library
37 article, ThemWi implementation of GSM-HR includes an extension to this decoder
38 input format: BFI=1 means BFI with payload bits included in both ETSI and
39 ThemWi versions, but BFI=2 (ThemWi extension) means BFI without payload bits.
40
41 ThemWi suite of GSM-HR codec utilities allows examining and hand-crafting *.dec
42 files; we have an ETSI-style speech decoder utility that reads this reference
43 format and a TFO transform utility that does the same, and we support
44 bidirectional conversion between this *.dec format and our preferred TW-TS-005
45 Annex B hex format.
46
47 TW-TS-005 Annex B hexadecimal format
48 ------------------------------------
49
50 Themyscira Wireless Technical Specification TW-TS-005 defines a hexadecimal file
51 format for sequences of RTP payloads for GSM speech codecs; TW-TS-005 Annex B
52 specifies application of this hex file format to GSM-HR codec.
53
54 This TW-TS-005 Annex B hex is the preferred file format for GSM-HR encoded
55 speech recordings for most workflows. It can represent both Tx semantics (every
56 20 ms frame position is filled with either a good speech frame or a perfect SID;
57 during DTX pauses a new SID appears in every frame) and Rx semantics (BFI frame
58 gaps can occur anywhere and are expected during DTX pauses; SIDs can be valid
59 or invalid) in the same file format, hence it is the operator's responsibility
60 to know the semantics of each given recording file and to use it in the correct
61 context.
62
63 As explained in TW-TS-005 Annex B itself (see TW-TS-005 article for the most
64 up-to-date link to the actual spec document), each frame can be represented
65 either in the basic RTP format of ETSI TS 101 318 section 5.2, or in the
66 extended RTP format of RFC 5993 and TW-TS-002. Utilities in the present suite
67 that write TW-TS-005 Annex B hex files can be told to emit either format, with
68 the exception of gsmhr-dec2hex utility which always emits the extended format;
69 utilities that read these hex files accept both formats. The two RTP formats
70 carry different information content: the basic format can only represent Tx
71 semantics, while the extended format can represent both semantics.
72
73 Compared to *.cod format, TW-TS-005 Annex B format with Tx semantics lacks the
74 VAD flag, although the extended RTP format does represent an equivalent of SP
75 flag. However, the inclusion of VAD flag in *.cod format is only a debug
76 feature for speech encoder test sequences; it is not a part of the interface
77 from the Tx DTX handler to the Tx RSS as defined in GSM 06.41 Tx chapter.
78
79 Compared to *.dec format, TW-TS-005 Annex B format with Rx semantics (which has
80 to use the extended RTP format) collapses 3 possible invalid SID conditions
81 (BFI=0 SID=1, BFI=1 SID=1, BFI=1 SID=2) into the same TW-TS-002 representation
82 of FT=1. However, per GSM 06.41 Table 1 the Rx DTX handler for GSM-HR is
83 required to apply exactly the same handling to all 3 possibilities, and the
84 same collapsing of invalid SID conditions also happens on TDM-based (8 kbit/s)
85 Abis and Ater interfaces and in TFO, as detailed in GSM 08.61 and 08.62 specs.
86
87 Raw packed binary format
88 ------------------------
89
90 When working with FR and EFR codecs, a speech recording with Tx semantics can be
91 stored in a gsmx binary file (see Binary-file-format article) that consists of
92 directly abutted codec frames (good speech or SID) in RTP format, with exactly
93 33 (FR) or 31 (EFR) bytes per frame. We offer an equivalent ability for GSM-HR
94 with the so-called raw packed format. It is a binary format that consists of
95 directly abutted frames; each frame is 14 bytes long and stores a GSM-HR codec
96 frame in the basic RTP format of TS 101 318 section 5.2, which we also call the
97 raw packed format.
98
99 This raw packed binary file format is not used directly by any of our speech
100 encoder or decoder utilities, instead it is supported via gsmhr-hex2rpf and
101 gsmhr-rpf2hex format conversion utilities.
102
103 Common command line options
104 ===========================
105
106 Certain flag options are common across different utilities in the present suite
107 of command line tools for GSM-HR codec; these common flags are as follows:
108
109 -b and -l Utilities that read or write ETSI *.cod or *.dec format emit
110 and expect the local machine's native byte order by default.
111 -b option forces big-endian byte order; -l forces little-endian.
112
113 -d Speech encoder utilites run with Tx DTX disabled by default;
114 -d option enables speech encoding with DTX. The same logic
115 applies to DTXd control in TFO transform utilities.
116
117 -x Utilities that emit TW-TS-005 Annex B hex format with Tx
118 semantics emit the basic RTP format (TS 101 318) by default;
119 -x option switches to the extended RTP format. (The latter
120 format is TW-TS-002, but it also constitutes valid RFC 5993 in
121 the case of Tx semantics.)
122
123 Inspecting encoded speech file formats
124 ======================================
125
126 In common with other GSM speech codecs supported by ThemWi GSM codec libraries
127 and utilities suite, utilities are provided that read GSM-HR encoded speech
128 recording files and display all codec frames contained therein, in terms of
129 compressed speech parameters and accompanying flags. These utilities are as
130 follows:
131
132 Utility Reads file format
133 -----------------------------------------
134 gsmhr-cod-parse ETSI *.cod
135 gsmhr-dec-parse ETSI *.dec
136 tw5b-dump TW-TS-005 Annex B
137
138 gsmhr-cod-parse and gsmhr-dec-parse expect the local machine's native byte order
139 by default; -b and -l override options are supported.
140
141 ThemWi utilities for FR, EFR and AMR codecs display compressed speech parameters
142 in decimal form separated by spaces, with each subframe on its own line after
143 per-frame LPC parameters. A different format has been adopted for GSM-HR:
144
145 * Individual speech parameters are displayed in hex, with a fixed number of
146 digits corresponding to the size of each parameter in bits;
147
148 * Only two lines are used to display the actual speech parameters for each
149 frame, with per-frame parameters on the first line and all subframe parameters
150 on the second line;
151
152 * The set of LPC parameters and each of the 4 subframe parameter sets are
153 displayed as comma-separated triplets; R0, Int and Mode parameters are
154 displayed as singletons;
155
156 * Each just-described triplet or singleton is displayed as Name=value for better
157 readability;
158
159 * Ignoring Name= annotations and treating commas and spaces as equivalent, all
160 18 speech parameters are printed in their standard order as defined by ETSI.
161
162 File format conversion utilities
163 ================================
164
165 The following format conversions are supported between different GSM-HR encoded
166 speech formats:
167
168 Utility From format To format
169 ---------------------------------------------------------
170 gsmhr-cod2hex ETSI *.cod TW-TS-005 Annex B
171 gsmhr-dec2hex ETSI *.dec TW-TS-005 Annex B
172 gsmhr-hex2dec TW-TS-005 Annex B ETSI *.dec
173 gsmhr-hex2rpf TW-TS-005 Annex B Raw packed format
174 gsmhr-rpf2hex Raw packed format TW-TS-005 Annex B
175
176 The hexadecimal format of TW-TS-005 Annex B is treated as central; all provided
177 file format conversion utilities convert either from or to this central format.
178 Additional notes follow regarding each supported conversion.
179
180 Conversion from ETSI *.cod to TW-TS-005 Annex B
181 -----------------------------------------------
182
183 ETSI *.cod format naturally represents only Tx semantics, while TW-TS-005
184 Annex B supports both semantics. Semantics don't change with file format
185 conversion, hence the output of gsmhr-cod2hex still has Tx semantics. -b and -l
186 options are supported for *.cod input; hex output is written in the basic RTP
187 format by default or in the extended RTP format with -x option.
188
189 Conversion in the opposite direction is not supported, as there is no way to
190 resurrect VAD debug flag from a data source that lacks such.
191
192 Conversion between ETSI *.dec and TW-TS-005 Annex B
193 ---------------------------------------------------
194
195 Bidirectional conversion is supported between these two formats, carrying Rx
196 semantics. However, this conversion may be slightly lossy in each direction:
197
198 * gsmhr-hex2dec is nothing more than a command line utility around libgsmhr1
199 function gsmhr_rtp_in_direct() described in HR-codec-library article. The
200 exact same preprocessing step is done by every libgsmhr1-based program
201 whenever RTP input (be it real RTP or hex lines read from a TW-TS-005 Annex B
202 file) needs to be fed to GSM-HR speech decoder or TFO transform, hence the
203 output of gsmhr-hex2dec elucidates what always happens under the hood anyway.
204
205 The extended RTP format with Rx semantics defined in TW-TS-002 allows RTP
206 payloads carrying GSM-HR invalid SID to either include or emit payload bits.
207 As explained in HR-codec-library article, gsmhr_decoder_twts002_in() function
208 and gsmhr_rtp_in_direct() wrapper around it ignore these optional payload
209 bits for invalid SID frames and always set all 18 speech parameters in the
210 dec-style frame to 0. This same behaviour becomes explicitly visible when
211 using gsmhr-hex2dec - but if the input contains invalid SID frames with
212 payload bits included, then the conversion is lossy in the strict sense.
213
214 * gsmhr-dec2hex is an ad hoc program, not a wrapper around a library function,
215 as this operation is not needed in any standard workflow. The conversion may
216 be lossy in two cases:
217
218 - All possible combinations that mean invalid SID (BFI=0 SID=1, BFI=1 SID=1,
219 BFI=1 SID=2, plus variants of the same with BFI=2) collapse into the same
220 representation in TW-TS-002, just like in 8 kbit/s TRAU frame format.
221
222 - Whatever payload bits were given for these invalid SID frames in the 18
223 speech parameter words are discarded, i.e., non-verbose invalid SID format
224 is written in TW-TS-002 output.
225
226 Additional notes:
227
228 * gsmhr-dec2hex expects the local machine's native byte order by default, but
229 supports -b and -l options. OTOH, gsmhr-hex2dec writes *.dec output in the
230 local machine's native byte order only.
231
232 * By default gsmhr-hex2dec refuses to process files that contain BFI-no-data
233 frame gaps, as no such support exists in the standard GSM-HR speech decoder
234 from ETSI or its *.dec input format. (BFI=2 representation of such gaps is a
235 Themyscira extension.) -f option allows BFI=2 frames to be emitted.
236
237 Conversion between TW-TS-005 Annex B and raw packed format
238 ----------------------------------------------------------
239
240 Bidirectional conversion is supported between these two formats, carrying Tx
241 semantics. In gsmhr-hex2rpf conversion direction, the input hex file may be in
242 either basic or extended RTP format; if the latter is used, the only allowed
243 frame types are good speech (FT=0) and good SID (FT=2). The conversion is
244 lossless as long as Tx semantics are maintained, more specifically, as long as
245 the extended RTP format hex input does not contain any frames that are marked
246 as FT=2, but are not perfect SID with all 79 bits of SID codeword set to 1. If
247 such imperfect valid SID frames are present, they are converted to perfect SID.
248
249 In gsmhr-rpf2hex conversion direction, each raw packed (TS 101 318) frame is
250 written out in hex, either unchanged (basic RTP format) or with a prepended
251 RFC 5993 ToC octet (extended RTP format, enabled with -x option). If -x option
252 is given, the classification of good speech vs good SID for the purpose of
253 emitted ToC octet is a check for perfect SID with all 79 bits of SID codeword
254 set to 1.
255
256 gsmhr-rpf2hex conversion is always lossless.
257
258 Speech encoder and decoder utilities
259 ====================================
260
261 The present suite of tools provides 3 styles of speech encoder and decoder
262 utilities:
263
264 gsmhr-encode Speech encoder, PCM speech input is in WAV format, compressed
265 speech output is in TW-TS-005 Annex B format.
266
267 gsmhr-decode Speech decoder, compressed speech input is in TW-TS-005 Annex B
268 format, PCM speech output is in WAV format.
269
270 gsmhr-encode-r Speech encoder, PCM speech input is in robe (raw big-endian)
271 format, compressed speech output is in TW-TS-005 Annex B format.
272
273 gsmhr-decode-r Speech decoder, compressed speech input is in TW-TS-005 Annex B
274 format, PCM speech output is in robe format.
275
276 gsmhr-etsi-enc Speech encoder, ETSI style, operating from *.inp to *.cod in
277 ETSI test sequence format.
278
279 gsmhr-etsi-dec Speech decoder, ETSI style, operating from *.dec to *.out in
280 ETSI test sequence format.
281
282 gsmhr-etsi-enc and gsmhr-etsi-dec utilities both read their inputs and write
283 their outputs in the local machine's native byte order by default. Both
284 utilities also accept -b and -l options that select the desired byte order
285 explicitly; these options affect both input and output for both encoder and
286 decoder utilities. The other two styles of speech encoder and decoder utilities
287 have no byte order concerns.
288
289 TFO transform utilities
290 =======================
291
292 TFO-transform article explains the general concept of TFO transform;
293 HR-codec-Rx-logic article explains ThemWi implementation of this transform for
294 GSM-HR codec. HR-codec-library article describes libgsmhr1 API functions for
295 this GSM-HR TFO transform; here are 2 command line utilities that exercise it:
296
297 gsmhr-tfo-xfrm This TFO transform exerciser reads a stream of radio
298 leg A Rx frames from a TW-TS-005 Annex B hex file and
299 writes the "pristine" stream intended for radio leg B
300 Tx into another TW-TS-005 Annex B hex file. -d option
301 enables DTXd (disabled by default); -x option switches
302 the output RTP format from basic to extended.
303
304 gsmhr-tfo-xfrm-dc This TFO transform utility reads radio leg A Rx input
305 in ETSI *.dec format and emits radio leg B Tx output in
306 ETSI *.cod format, thus acting as an inverse of
307 GSM 06.06 REID utility that was originally used to
308 generate test sequence *.dec files. This variant
309 exercises libgsmhr1 TFO transform function in its most
310 native form.
311
312 gsmhr-tfo-xfrm-dc reads its *.dec input and writes its *.cod output in the
313 local machine's native byte order by default. -b and -l options are supported,
314 selecting either big-endian or little-endian byte order explicitly; these
315 options affect both *.dec input and *.cod output. OTOH, the more FR-like
316 gsmhr-tfo-xfrm utility has no byte order concerns.
317
318 Hand-crafting *.cod and *.dec files
319 ===================================
320
321 The present suite of GSM-HR codec utilities includes specialized tools for
322 hand-crafting *.cod and *.dec files: gsmhr-cod-craft and gsmhr-dec-craft,
323 respectively. Each utility reads an ad hoc line-based ASCII source file and
324 emits its respective binary format. The ad hoc source language for these two
325 special-purpose tools is the same except for parts that set frame metadata
326 flags, which are different between *.cod and *.dec given their opposite
327 semantics.
328
329 Because of their highly specialized nature, these two utilities are not
330 documented further - please read the source code for further understanding.
331 Work scope limits explained in HR-codec-limits article apply here.