comparison doc/HR-codec-library @ 632:7fc57e2a6784

beginning of GSM-HR documentation
author Mychaela Falconia <falcon@freecalypso.org>
date Thu, 19 Mar 2026 04:13:45 +0000
parents
children 3ab76caba41c
comparison
equal deleted inserted replaced
631:6bad9af66f69 632:7fc57e2a6784
1 Themyscira libgsmhr1: library for GSM-HR codec
2 ==============================================
3
4 The present library provides the following functionalities related to GSM-HR
5 speech codec, also known as HRv1:
6
7 * Stateful speech encoder and decoder engines based on GSM 06.06 reference code
8 from ETSI;
9
10 * TS 28.062 section C.3.2.1.1 stateful TFO transform for this codec;
11
12 * A rich set of stateless utility functions for format conversion and other
13 common manipulations.
14
15 Compared to libgsmhr alternative implemented as part of Osmocom gapk, our
16 implementation provides the following advantages:
17
18 * In our librification of ETSI GSM-HR code, speech encoder and decoder engines
19 have been outfitted with proper state structures, as opposed to the hack of
20 treating the entire bss segment with global variables as a poor man's state
21 structure.
22
23 * The Rx front end has been factored out of the speech decoder and can also be
24 used as a TFO transform. Because HRv1 codec falls chronologically between
25 FRv1 and EFR, our TFO transform for HRv1 serves as a stepping stone toward
26 future work on TFO transform for EFR.
27
28 * We made a slight extension to this GSM-HR Rx front end, applicable to both
29 full decoder and TFO transform configurations, to support BFI without payload
30 bits. This condition occurs in the case of FACCH stealing, packet loss in IP
31 transport, or the non-modifiable DSP PHY implementation in sysmoBTS that does
32 not provide erroneous payload bits along with BFI.
33
34 * We added many format conversion and other utility functions that are
35 Themyscira original work, not from ETSI GSM-HR code.
36
37 However, because of very limited practical utility of GSM-HR codec, almost no
38 work has been done to speed up any of the grossly inefficient code that
39 originates from ETSI - see HR-codec-limits article.
40
41 GSM-HR codec frame formats
42 ==========================
43
44 Our speech encoder, speech decoder and TFO transform engines operate on the
45 same canonical formats as the original reference code from ETSI:
46
47 * The output from our speech encoder engine for each frame is an array of 20
48 16-bit words (18 codec parameters followed by VAD and SP flags) that matches
49 ETSI *.cod format.
50
51 * The input format to our speech decoder engine is an array of 22 16-bit words
52 (18 codec parameters followed by BFI, UFI, SID and TAF) as represented in
53 ETSI *.dec format.
54
55 * The input format to our TFO transform implementation is a decoder input frame
56 (*.dec), and the output mimics an encoder output (*.cod) frame complete with
57 VAD and SP flags. (The latter outputs are VAD=1 SP=1 in the case of DTXd=0,
58 or VAD=0 dummy and correct SP output in the case of DTXd=1.)
59
60 All other standard formats for GSM-HR codec frames, namely TS 101 318, RFC 5993
61 and TW-TS-002, are supported via stateless format conversion functions.
62
63 Representation and handling of BFI
64 ----------------------------------
65
66 If a decoder input frame in the canonical 22-word *.dec format has BFI=1 and
67 SID=0, it is a non-SID BFI frame, also called an unusable frame in GSM 06.41
68 spec. If such frame arrives in comfort noise insertion state, all parameters
69 are ignored. On the other hand, if such frame arrives outside of DTX state,
70 when ECU logic is applied instead, our version retains the logic from ETSI
71 reference code in that codevector parameters from BFI frames are used if the
72 voiced vs unvoiced mode matches between the BFI frame and the saved frame used
73 by the ECU. This logic resides in the Rx front end that is shared between full
74 decoder and TFO transform implementations.
75
76 There is, however, an extension to this logic original to Themyscira: if the
77 BFI word in the 22-word decoder input frame equals 2 instead of 1 (not allowed
78 in ETSI reference code where BFI is strictly a binary flag), the frame is
79 treated as BFI-no-data and no parameter bits are ever used in any state.
80 Higher-level RTP input functions, described later in this article, feed this
81 BFI=2 code to the speech decoder or TFO transform engine when RTP input is BFI
82 without payload bits.
83
84 Representation and handling of invalid SID
85 ------------------------------------------
86
87 If a decoder input frame in the canonical 22-word *.dec format has indicators
88 set to either SID=1 (irrespective of other flags) or SID=2 with either BFI or
89 UFI nonzero, that input frame is invalid SID per GSM 06.41. All 18 speech
90 parameters are fully ignored in such frames, always.
91
92 In RTP transport these invalid SID frames can be represented only in TW-TS-002,
93 not in either of the two non-Themyscira standards. TW-TS-002 offers the option
94 of either including or omitting payload bits in invalid SID packets - however,
95 if invalid SID payload bits are included, they are ignored by our speech decoder
96 and TFO transform engines.
97
98 Libgsmhr1 general usage
99 =======================
100
101 The external public interface to Themyscira libgsmhr1 consists of a single
102 header file <tw_gsmhr.h>; it should be installed in some system include
103 directory.
104
105 The dialect of C used by all Themyscira GSM codec libraries is ANSI C (function
106 prototypes), const qualifier is used where appropriate, and the interface is
107 defined in terms of <stdint.h> types; <tw_gsmhr.h> includes <stdint.h>.
108
109 State allocation and freeing
110 ============================
111
112 In order to use the speech encoder, you will need to allocate an encoder state
113 structure, and to use the speech decoder, you will need to allocate a decoder
114 state structure. The same goes for the stateful TFO transform. The necessary
115 state allocation functions are:
116
117 struct gsmhr_encoder_state *gsmhr_encoder_create(int dtx);
118 struct gsmhr_decoder_state *gsmhr_decoder_create(void);
119 struct gsmhr_rxfe_state *gsmhr_rxfe_create(void); /* TFO transform */
120
121 struct gsmhr_encoder_state, struct gsmhr_decoder_state and struct
122 gsmhr_rxfe_state are opaque structures to library users: you only get pointers
123 which you remember and pass around, but <tw_gsmhr.h> does not give you full
124 definitions of these structs. As a library user, you ordinarily don't even
125 need to know the size of these structs, hence the necessary malloc() operation
126 happens inside gsmhr_encoder_create(), gsmhr_decoder_create() and
127 gsmhr_rxfe_create() functions. (But see the following section regarding
128 alternative memory allocation schemes.) However, each structure is malloc'ed
129 as a single chunk, hence when you are done with it, simply call free() to
130 relinquish each encoder, decoder or TFO state instance.
131
132 gsmhr_encoder_create(), gsmhr_decoder_create() and gsmhr_rxfe_create() functions
133 can fail if the malloc() call inside fails, in which case these libgsmhr1
134 functions return NULL.
135
136 The dtx argument to gsmhr_encoder_create() is a Boolean flag represented as an
137 int; it tells the GSM-HR speech encoder whether it should operate with DTX
138 enabled (run GSM 06.42 VAD and emit SID frames instead of speech frames per
139 GSM 06.41) or DTX disabled (skip VAD and always emit speech frames).
140
141 It should be noted that the original GSM-HR speech encoder from ETSI always runs
142 GSM 06.42 VAD algorithm, whether DTX is enabled or disabled; if DTX is disabled,
143 VAD flag is forced to 1, but all VAD and Tx DTX logic still executes and burns
144 up CPU cycles. Due to work scope limits described in HR-codec-limits article,
145 this poor design from ETSI has been retained at the present. However, the API
146 design allows DTX enable-or-disable flag to be changed only with a full reset
147 of the speech encoder, rather than per frame - this API design thus prepares
148 for the possibility of cleaning up this implementation in the future and
149 executing VAD and Tx DTX code only when DTX is enabled in the speech encoding
150 direction.
151
152 State reset functions
153 =====================
154
155 void gsmhr_encoder_reset(struct gsmhr_encoder_state *st, int dtx);
156 void gsmhr_decoder_reset(struct gsmhr_decoder_state *st);
157 void gsmhr_rxfe_reset(struct gsmhr_rxfe_state *st);
158
159 Each of these functions resets the state of the corresponding element to its
160 initial or "home" state. Home states for the standard speech encoder and
161 decoder are given in GSM 06.20 sections 5.5 and 5.6, respectively; the home
162 state for the separated-out RxFE block (used for TFO transform) is a subset of
163 the full speech decoder home state as relevant to the reduced functionality.
164
165 The following const "variables" are exported by the library in order to
166 facilitate alternative memory allocation schemes:
167
168 extern const unsigned gsmhr_encoder_state_size;
169 extern const unsigned gsmhr_decoder_state_size;
170 extern const unsigned gsmhr_rxfe_state_size;
171
172 Using these const "variables", an application can allocate buffers of the
173 correct size for each state structure, and then initialize each newly allocated
174 state structure with gsmhr_*_reset(), as an alternative to gsmhr_*_create()
175 functions. Each of the standard gsmhr_*_create() functions allocates a buffer
176 of the correct size using standard malloc(), then initializes it with the
177 corresponding gsmhr_*_reset() function - hence the lower-level approach can be
178 used by applications that desire some other memory allocation scheme than
179 standard malloc().
180
181 Using the speech encoder
182 ========================
183
184 To encode one 20 ms audio frame per GSM-HR, call gsmhr_encode_frame():
185
186 void gsmhr_encode_frame(struct gsmhr_encoder_state *st, const int16_t *pcm,
187 int16_t *param);
188
189 You need to provide an encoder state structure allocated earlier with
190 gsmhr_encoder_create(), a block of 160 linear PCM samples, and an output buffer
191 of 20 (GSMHR_NUM_PARAMS_ENC) 16-bit words into which the encoded GSM-HR frame
192 will be written. The encoded frame format emitted by this function is the same
193 as in the reference implementation from ETSI: 18 words of speech parameters
194 followed by VAD and SP flags. Stateless format conversion functions described
195 later in this document can be used to emit more commonly used RTP formats.
196
197 The mandatory encoder homing function is included: if the input frame matches
198 the encoder homing frame, the encoder state is reset to the home state at the
199 end of gsmhr_encode_frame() processing.
200
201 Using the speech decoder
202 ========================
203
204 Our speech decoder main function is:
205
206 void gsmhr_decode_frame(struct gsmhr_decoder_state *st, const int16_t *param,
207 int16_t *pcm);
208
209 The input frame format is the canonical one from ETSI: 22 (GSMHR_NUM_PARAMS_DEC)
210 16-bit words providing speech parameters followed by BFI, UFI, SID and TAF
211 metadata flags. In normal operation, this internal canonical form of speech
212 decoder input will be provided by one of the stateless format conversion
213 functions in the same libgsmhr1.
214
215 Important note: the parameter frame input to this function is expected to be
216 valid, i.e., it is NOT subjected to explicit validation checks! If your
217 application reads *.dec files or otherwise receives this format directly from
218 some external source (as opposed to output from one of our own format conversion
219 functions), you need to validate these bits with gsmhr_check_decoder_params()
220 before feeding them to the decoder engine!
221
222 This speech decoder function includes all mandatory logic for decoder homing:
223 special handling of the homed state, decoder homing frame checks of both full
224 and partial (to the first subframe) kinds, internal state reset and EHF output.
225
226 TFO transform function
227 ======================
228
229 To operate our TFO transform for GSM-HR codec, create a standalone RxFE state
230 structure (use gsmhr_rxfe_create() or your own allocation followed by
231 gsmhr_rxfe_reset()) and then call this function for every frame to be processed:
232
233 void gsmhr_tfo_xfrm(struct gsmhr_rxfe_state *st, int dtxd, const int16_t *ul,
234 int16_t *dl);
235
236 UL Rx input has the same form as input to gsmhr_decode_frame(), and the same
237 caveats apply in terms of validation checks. DL Tx output is emitted in the
238 same form as gsmhr_encode_frame() output, complete with VAD and SP flags. VAD
239 output should be considered a dummy, but SP output flag is valid: 1 in the case
240 of a speech frame or 0 in the case of a SID frame; the latter is possible only
241 when DTXd is enabled.
242
243 DTXd control: dtxd argument to gsmhr_tfo_xfrm() tells the transform if it should
244 emit both speech and SID frames or speech frames only, corresponding to DTXd
245 flag in TRAU-UL frames that affects both speech encoder and TFO transform
246 functions in traditional TRAUs. This flag can be changed mid-session: as
247 explained in HR-codec-Rx-logic article, our implementation of TFO transform
248 proceeds by applying classic Rx front end processing that only emits speech
249 frames, and then replacing output with SID frames under certain conditions if
250 DTXd is enabled.