FreeCalypso > hg > gsm-codec-lib
changeset 632:7fc57e2a6784
beginning of GSM-HR documentation
| author | Mychaela Falconia <falcon@freecalypso.org> |
|---|---|
| date | Thu, 19 Mar 2026 04:13:45 +0000 |
| parents | 6bad9af66f69 |
| children | 3ab76caba41c |
| files | doc/HR-codec-Rx-logic doc/HR-codec-library doc/HR-codec-limits doc/TFO-transform |
| diffstat | 4 files changed, 748 insertions(+), 2 deletions(-) [+] |
line wrap: on
line diff
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/doc/HR-codec-Rx-logic Thu Mar 19 04:13:45 2026 +0000 @@ -0,0 +1,447 @@ +Rx DTX handler logic for GSM-HR speech codec +============================================ + +With all 3 classic GSM speech codecs (FR, HR and EFR), as TCH UL Rx traffic on +the network side passes from the BTS to the TRAU, the first processing step +performed by the TRAU prior to actual speech decoding is an Rx DTX handler. +(For TCH DL Rx on the mobile side, exactly the same processing steps happen in +total, but because everything is integrated into a single device, interfaces +between steps may be implemented more loosely.) + +For GSM-HR codec the 3 controlling specs for different parts of Rx DTX handler +logic are GSM 06.21, GSM 06.22 and GSM 06.41 - however, for the full details +these specs defer to the reference C code in GSM 06.06. This article explains +this logic from all aspects which we find important: what the Rx DTX logic was +in the original reference code from ETSI and how we adapted it in libgsmhr1, +both for the full speech decoder and for our implementation of TFO transform. + +Normative vs freely changeable aspects +====================================== + +In the case of error-free transmission, such that the receiver never encounters +a frame with BFI or UFI set except during continuation of a DTX pause (after +receiving a valid SID that begins comfort noise insertion) and is never asked +to begin CN insertion with an invalid SID, the full behaviour of the speech +decoder to the final linear PCM output is required to be bit-exact and gets +exercised by test sequences. This bit-exact behaviour includes non-error- +handling aspects of the Rx DTX handler and comfort noise generation, complete +with interpolation for periodic CN updates via subsequent SID frames. + +However, the reference C implementation becomes a non-normative example +(allowing changes in logic without violating spec requirements) in the +following aspects: + +* Handling of BFI and UFI outside of DTX pauses previously entered via a valid + SID, including most aspects of error concealment; + +* Exact manner of comfort noise muting when expected SID updates fail to arrive; + +* The exact logic to be applied when a CN insertion period begins with an + invalid SID frame. + +Almost-modular nature of GSM-HR Rx DTX handler +============================================== + +An Rx DTX handler can be considered fully modular if its output (which is then +passed as input to the main body of the speech decoder) is a potentially +modified set of speech parameters that can be packed into a new speech frame +and transmitted through a second radio leg with no change in the final output +of the speech decoder. The Rx DTX handler implemented in the reference code +from ETSI (both spec-normative and "example" aspects as broken down above) +_almost_ meets this modularity criterion, but not fully. The following aspects +are non-modular: + +* The interpolation of R0 and LPC parameters during comfort noise insertion + (bit-exact implementation considered normative) happens after expansion of + transmitted parameter bits into linear form. In the general case one cannot + produce a new set of encoded parameters (that can be transmitted through a + second radio leg) that will produce the same bit-exact result upon final + decoding. + +* Handling of speech frames (not SID, outside of DTX pause state) that are + marked with BFI=0 and UFI=1 (unreliable frames) has both a modular and a + non-modular aspect. If R0 increment is either small enough to not trigger + any mitigation or large enough to where UFI is converted into BFI, the applied + handling is fully modular. However, if R0 increment falls into the narrow + window between the two thresholds, the applied handling (output signal + concealment per GSM 06.21 section 5.1.2) is non-modular: it happens deep in + the guts of the speech decoder and cannot be represented via a modified set + of speech parameters. + +TFO transform derived from the reference Rx DTX handler +======================================================= + +If one extracts the reference Rx DTX handler from GSM 06.06 code and removes +the two non-modular aspects detailed above, leaving only fully modular logic, +the result can be used as a TFO transform that implements the functions of +TS 28.062 section C.3.2.1.1, specifically Case 1 in which UL may have DTX, but +DL is required to consist of speech frames only. + +How does one address the two non-modular aspects of the standard GSM-HR Rx DTX +handler that are not possible in TFO? The simplest implementation is to remove +them altogether: + +* Comfort noise parameters are not interpolated, instead an abrupt change in R0 + and LPC parameters occurs every 240 ms when a new SID frame arrives. + +* UFI is simply dropped in the case when the standard decoder would apply output + signal concealment, i.e., the latter feature is given up. + +Obviously this approach constitutes functional regression relative to the +standard speech decoder - thus we were initially hesitant to adopt it. However, +experiments with a real historical TRAU that supports TFO (Nokia TCSM2) reveal +that Nokia implemented exactly the same approach (minimal complexity at the +price of slight functional degradation) in their TRAU DSP firmware. Seeing +that a major classic vendor of GSM infrastructure implemented this simplistic +approach, we are now comfortable with doing the same - especially considering +the work scope limits explained in HR-codec-limits article. + +In Themyscira libgsmhr1 implementation, a component has been factored out which +we call the Rx front end (RxFE). This RxFE is our cleaned-up reimplementation +of those parts of the original Rx DTX handler that are fully modular (including +the speech ECU and all CN parameters that aren't interpolated), plus some +additional internal flag inputs and outputs. Out of the latter internal flags, +some are used only by the full speech decoder, while others are used only by +the TFO transform. RxFE state, which also serves as the API-visible TFO +transform state, is a subset of full speech decoder state. However, the core +RxFE function is not exported directly as API; instead the TFO transform API +function is a TFO-specific wrapper around the RxFE. + +Detailed RxFE logic and its evolution +===================================== + +Now that we have covered the background of the previous sections, we can +properly examine the actual logic of our RxFE, the follow-up logic for CN +interpolation that exists only in the full decoder, and their origins in the +reference GSM 06.06 code. + +Unless noted otherwise, all logic described in the following sections is the +same between ETSI original and the present Themyscira implementation. The +internal representation and code structure may be different, but the behavioral +logic remains the same unless explicitly called out otherwise. + +Input frame classification +-------------------------- + +As the very first processing step for every incoming frame, BFI, UFI and SID +flags are combined per GSM 06.41 Table 1 to classify the frame as good speech, +valid SID, invalid SID or unusable for DTX purposes. Note that UFI turns valid +SID into invalid just like BFI, and for DTX purposes all non-SID frames marked +with UFI are considered "unusable". But as we shall see shortly, this +"unusable" classification matters only for DTX and not for speech ECU logic, +which is separate. + +Speech vs CNI state +------------------- + +RxFE state that carries from one frame to the next includes one very important +two-state flag: either speech or CNI (comfort noise insertion) mode. By +combining the 4 possible frame classifications from GSM 06.41 Table 1 (see +above) with these two possible carry-over states, we get 4 possible ways in +which the current frame may be handled: + +Input frame class Previously speech Previously CNI +-------------------------------------------------------------- +SID (valid or invalid) CNIFIRSTSID CNICONT +Good speech SPEECH SPEECH +Unusable SPEECH CNIBFI + +Here we can see that unless we enter DTX/CNI state, neither BFI nor UFI moves +RxFE logic out of SPEECH handling. This SPEECH handling mode includes the ECU +and handles both good and bad speech frames. However, once DTX/CNI state has +been entered, then only a (BFI==0 && UFI==0 && SID==0) good speech frame can +effect exit from this state! + +Speech ECU logic +================ + +The frame-to-frame persistent state for the ECU consists of the state counter +variable (range [0,7]) described in GSM 06.21 section 6.3 and a saved copy of +the last good speech frame. The just-referenced spec section describes the +logic quite well, but a few additional notes are in order: + +* The last good speech frame that gets regurgitated in substitution/muting + states of the ECU is not exactly the same as the actual last good speech frame + that went through: + + + GSP0 parameters for the first 3 subframes are replaced with GSP0 parameter + for the last subframe; + + + If the frame is voiced, LTP lag parameters are modified - read the code for + the details. + + In the original ETSI implementation, these modifications are applied at the + time of substitution/muting output; in our implementation, they are applied + at the time when a good speech frame is saved. Our implementation approach + makes it clearer what state is actually retained, but the functional behaviour + is exactly the same. + +* When that last good speech frame gets regurgitated during bad frame handling, + codevector parameters may be taken either from that saved last good speech + frame or from the current bad frame. Use of codevector parameters from the + current bad frame is possible only when the current bad frame and the saved + last good speech frame have the same voiced vs unvoiced mode. If this mode + matches for one frame and bad-frame codevector parameters get passed on, but + the next bad frame has incompatible mode, the saved last good speech frame + gets used in its entirety once again, subject only to the modifications + described above. + +* Our Themyscira version features an extension: if BFI equals 2 instead of 1, + indicating BFI without payload bits, then there are no bad-frame codevector + parameters and the saved last good speech frame is used in its entirety, + just as if BFI frames always have the wrong voiced vs unvoiced mode. + +BFI out of reset +================ + +What happens if the very first input frame in reset state (after external reset +or after a decoder homing frame) is a bad frame per BFI, or per UFI treated as +BFI - what is the default "last" good speech frame? In ETSI original code it +is a frame of all zero parameters, but this oddity is not readily visible - the +final output of linear PCM is also all zeros, and all is well. In Themyscira +implementation, the output of our RxFE may be visible externally if it is used +as a TFO transform - hence more attention was given to this issue. + +If we feed all zeros as PCM input to a homed standard GSM-HR speech encoder, we +get this frame, repeating endlessly as long as all-zeros PCM input continues: + +R0=00 LPC=164,171,cb Int=0 Mode=0 +s1=00,00,00 s2=00,00,00 s3=00,00,00 s4=00,00,00 + +This frame differs from all-zero params only in the LPC set, and this sane-LPC +silence frame is the one we have adopted as our reset-default fallback frame. + +When libgsmhr1 full speech decoder engine is used, as opposed to TFO transform, +there is an additional check. If the current state is the special home state +(logic required for spec-mandated EHF output with repeated DHF input) and the +input frame has BFI flag set (no other flags are considered in this case), the +PCM output is set to all zero samples without leaving the home state. However, +the regular speech ECU and its last good frame default can still be reached if +BFI is clear, UFI is set and R0 is high. + +Comfort noise logic in RxFE +=========================== + +GSM 06.22 spec treats the required bit-exact CN generator as a single entity - +however, in our implementation it is split between the RxFE and the main body +of the full speech decoder. The bit-exact result in the case of full speech +decoding remains the same, but our arrangement allows non-interpolated CN +generation in the TFO transform as well. + +When our RxFE is used as a TFO transform with DTXd=0 (the mode that includes CN +generation), CN output from the transform matches GSM 06.22 Table 2, with the +exception of R0 and LPC parameters. These R0 and LPC parameters will be filled +as follows: + +* If CN insertion period begins with a valid SID, R0 and LPC are taken from + that SID. + +* If CN insertion period begins with an invalid SID, R0 and LPC are taken from + the last good speech frame, the one used by the speech ECU. Directly out of + reset (or after a DHF), these parameters are as shown above: + + R0=00 LPC=164,171,cb + +* Any time a new valid SID frame arrives during a CN insertion period, R0 and + LPC parameters change to this new SID. + +* Any time the input during CN insertion is either an unusable frame or an + invalid SID, R0 and LPC parameters remain unchanged from the most recently + received valid SID, or from the last good speech frame if only invalid SID + frames have been received in the entire CN insertion period so far. + +Comfort noise muting +==================== + +Per GSM 06.21 sections 5.2.3 and 5.2.4, when SID frames fail to arrive for 3 +consecutive TAF positions, generated comfort noise needs to be muted. We +implement this logic in our RxFE, and the actual logic is unchanged from ETSI +reference code - it is described in GSM 06.21 section 6.4. + +This SID aging and CN muting logic works by counting unusable frames received +in between SID updates. In the original GSM 06.06 code the criterion to start +CN muting is: + + TAF == 1 && CNIBFI_count >= 25 + +In our version we changed it to: + + CNIBFI_count >= (TAF ? 25 : 36) + +When TAF is indicated correctly, once every 12 frames and with the flag always +present at least in BFI frames (consider GSM 08.61 TRAU-8k format), our extended +criterion is equivalent to the original; however, our version will also produce +eventual CN muting if TAF is missing. + +For the purpose of this logic, invalid SID is as good as valid: while it is +treated just like unusable frames (CNIBFI) for the purpose of R0 and LPC +parameters and their interpolation (see next section), for the purpose of SID +aging and CN muting, invalid SID resets the count of unusable frames, and if +muting already started previously, it is halted at the current (partially muted) +R0 value. + +Comfort noise interpolation +=========================== + +When our RxFE is invoked internally by our full speech decoder, the RxFE passes +some additional flags to the main body of the decoder. One of these flags +controls interpolation of R0 and LPC parameters for CNI, a function that is +required by the specs with bit-exact stipulation, but which cannot be +implemented at the level of speech parameters. + +The only case in which the behaviour of our libgsmhr1 full speech decoder +differs from ETSI original is when an invalid SID frame arrives immediately out +of reset, not preceded by any good speech, valid SID or even unusable frames. +In this case the original GSM 06.06 code uses initialized all-zero state of +pswOldFrmKsDec[] array, which cannot happen in any other case. In our +implementation we use LPC=164,171,cb instead, as already explained. + +Outside of this corner case, invalid SID frames are handled as follows +(unchanged between EISI original and our version): + +* If CN insertion period begins with an invalid SID, R0 and LPC are taken from + the last good speech frame, the one used by the speech ECU. These R0 and LPC + params are then fed into the prescribed bit-exact interpolation mechanism as + if CN insertion started with a valid SID frame with these parameters. + +* Any invalid SID frames that occur in the middle of a CN insertion period are + treated just like unusable frames for the purpose of interpolation. + +Return from CN insertion to speech state +======================================== + +Exit from DTX/CNI state happens upon receipt of a good speech frame, i.e., a +frame that meets this criterion: + + BFI == 0 && UFI == 0 && SID == 0 + +However, the original implementation in GSM 06.06 reference code exhibits this +flaw: if the speech ECU is in state 6 (see GSM 06.21 section 6.3) and then an +accepted SID frame (valid or invalid) puts us into DTX state, the first good +speech frame after this DTX pause will be dropped and replaced with fully muted +form of the last good speech frame from before the CN insertion period. This +effect happens no matter how long that DTX pause was - thus the last good speech +frame being regurgitated (with R0 reduced to 0) may be indefinitely old and out +of place. Furthermore, if the CNI-exiting good speech frame that is dropped +here is followed by BFI unusable frames, the ECU will return to state 6 and the +parameters (other than muted R0) of the last good speech frame from before the +DTX pause will continue being reused indefinitely. + +In our libgsmhr1 version, the state counter for the speech ECU is reset to 7 +(the initial home state) whenever our RxFE passes through DTX/CNI state. Since +only a good speech frame with BFI=0 and UFI=0 can make exit from CN insertion +state, this reset of ECU state ensures that this good speech frame will pass +through, and then the ECU will be in state 0 after this talkspurt-opening good +speech frame. + +Fully muted state after unusable frames in input +================================================ + +If the input to the speech decoder or TFO transform becomes nothing but BFI +unusable frames, what is the final fully muted or "decayed" output at the level +of modified speech parameters? In GSM-FR codec there is a special silence frame +defined in GSM 06.11 Table 1, and the final decayed state is a continuous output +of these fixed silence frames - irrespective of whether the Rx DTX handler got +to this fully decayed state from speech or CN muting. + +However, no equivalent fully decayed state with fixed output is defined for +GSM-HR. While this aspect is a non-normative "example" implementation detail, +in both GSM 06.06 reference code and Themyscira libgsmhr1 the fundamental state +of speech vs CNI persists indefinitely even when fully muted: + +* If an indefinitely long string of unusable frames occurs in speech state, + the speech ECU will be in state 6, and the output from the RxFE (externally + visible in the case of TFO) will endlessly repeat parameters of the last good + speech frame, except for R0 reduced to 0. + +* If an indefinitely long string of unusable frames occurs in DTX/CNI state, + the output form shown in GSM 06.22 Table 2, complete with bit-exact + pseudorandom sequence in unvoiced codevector parameters, will likewise + continue indefinitely. LPC parameters will remain from the most recently + received valid SID frame (or from the last good speech frame if CNI period + began with invalid SID and no valid SID was received afterward), but R0 will + be reduced to 0 by the CN muting logic. + +Because R0 is reduced to 0 in both cases, the above details are generally +invisible with full endpoint speech decoding. However, they become fully +visible in the case of TFO transform with DTXd=0. + +TFO transform with DTXd=1 +========================= + +The internal RxFE block that emits CN parameters during DTX/CNI state is correct +for the full endpoint speech decoder application and for TFO transform with +DTXd=0. The case of TFO transform with DTXd=1 is implemented by calling the +same RxFE block, then applying this simple modification to its output: if the +current frame was processed in DTX/CNI mode, the frame of CN parameters is +transformed into a downlink SID frame by replacing all speech parameters beyond +R0 and LPC with all-ones SID codeword. + +The internal RxFE block tells the TFO wrapper when this just-described +modification should be applied by way of an internal flag. This flag is set +in two cases: + +1) When the current frame was processed in DTX/CNI mode, or + +2) When the speech ECU applied substitution/muting handling to the current + frame, and the ECU state was 6 or 7 at the beginning of current frame + processing. + +The effects of this logic are as follows: + +1) DTX pauses in UL pass through into DTX pauses in DL, with unusable frames + and invalid SID replaced with the most recent valid SID, or with R0+LPC from + the last good speech frame in the case of initial invalid SID. The + spec-compliant Rx DTX handler in the destination MS can then produce the + most correct form of comfort noise, including interpolation of R0 and LPC + parameters. + +2) When the input to TFO transform is nothing but unusable frames, the downlink + radio leg should go into DTXd state in order to produce the desired reduction + in radio interference and BTS power consumption. This effect should happen + irrespective of whether the "fully decayed" state of RxFE is DTX/CNI muting + or speech ECU, as covered in the previous section. Our logic of turning + "fully decayed" ECU state into DTXd SID achieves the desired effect. + +Finally, there is one more modification applied only in the case of TFO +transform with DTXd=1 and not in other cases: muting of comfort noise. In the +case of full endpoint speech decoding or TFO transform with DTXd=0, when the +criterion for CN muting is first reached, the muting proceeds by decrementing +R0 by 2 on every frame, i.e., gradually. (See GSM 06.21 section 6.4.) However, +in the case of TFO transform with DTXd=1, CN muting is effected by reducing R0 +to 0 immediately as soon as CN muting criterion is reached. The rationale is +as follows: + +* A TRAU (or TRAU-emulating MGW that feeds Abis to a BTS) has no way of knowing + exactly which of its continuously emitted DL SID frames will actually get + transmitted on the air and seen by the MS. Therefore, a muting process that + gradually decrements R0 with every emitted SID frame would make no sense. + +* If the destination MS receives a SID update with R0=0 subsequent to whatever + previous SID it received with non-zero R0, the spec-required CN interpolation + logic in that MS will produce the desired effect of gradual muting over 240 ms + - not too far from the 320 ms muting time called for in GSM 06.21 section + 5.2.4. + +TFO transform homing +==================== + +ThemWi implementation of TFO transform includes the feature of in-band homing: +if the input to the transform is the spec-defined decoder homing frame (DHF), +this DHF is passed through to the output just like any other good speech frame, +but the internal state is reset to the initial "home" state. + +The check for DHF (all bits must match, plus (BFI == 0 && SID == 0) criterion) +and the resulting state reset happen at the end of frame processing, after the +output for the current frame has been generated. In the case of ThemWi TFO +transform for GSM-HR, there are two corner cases in which an incoming DHF may +be acted upon (produce state reset), but not appear in the output: + +1) The overall state of RxFE was speech (as opposed to DTX/CNI) and the speech + ECU state was 6 - the state in which the first received good speech frame + gets dropped. + +2) The overall state of RxFE was DTX/CNI and the incoming DHF is marked with + UFI=1. UFI is not a criterion for DHF detection, only BFI is, but UFI in + DTX/CNI state will cause current frame processing to treat the frame as + unusable.
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/doc/HR-codec-library Thu Mar 19 04:13:45 2026 +0000 @@ -0,0 +1,250 @@ +Themyscira libgsmhr1: library for GSM-HR codec +============================================== + +The present library provides the following functionalities related to GSM-HR +speech codec, also known as HRv1: + +* Stateful speech encoder and decoder engines based on GSM 06.06 reference code + from ETSI; + +* TS 28.062 section C.3.2.1.1 stateful TFO transform for this codec; + +* A rich set of stateless utility functions for format conversion and other + common manipulations. + +Compared to libgsmhr alternative implemented as part of Osmocom gapk, our +implementation provides the following advantages: + +* In our librification of ETSI GSM-HR code, speech encoder and decoder engines + have been outfitted with proper state structures, as opposed to the hack of + treating the entire bss segment with global variables as a poor man's state + structure. + +* The Rx front end has been factored out of the speech decoder and can also be + used as a TFO transform. Because HRv1 codec falls chronologically between + FRv1 and EFR, our TFO transform for HRv1 serves as a stepping stone toward + future work on TFO transform for EFR. + +* We made a slight extension to this GSM-HR Rx front end, applicable to both + full decoder and TFO transform configurations, to support BFI without payload + bits. This condition occurs in the case of FACCH stealing, packet loss in IP + transport, or the non-modifiable DSP PHY implementation in sysmoBTS that does + not provide erroneous payload bits along with BFI. + +* We added many format conversion and other utility functions that are + Themyscira original work, not from ETSI GSM-HR code. + +However, because of very limited practical utility of GSM-HR codec, almost no +work has been done to speed up any of the grossly inefficient code that +originates from ETSI - see HR-codec-limits article. + +GSM-HR codec frame formats +========================== + +Our speech encoder, speech decoder and TFO transform engines operate on the +same canonical formats as the original reference code from ETSI: + +* The output from our speech encoder engine for each frame is an array of 20 + 16-bit words (18 codec parameters followed by VAD and SP flags) that matches + ETSI *.cod format. + +* The input format to our speech decoder engine is an array of 22 16-bit words + (18 codec parameters followed by BFI, UFI, SID and TAF) as represented in + ETSI *.dec format. + +* The input format to our TFO transform implementation is a decoder input frame + (*.dec), and the output mimics an encoder output (*.cod) frame complete with + VAD and SP flags. (The latter outputs are VAD=1 SP=1 in the case of DTXd=0, + or VAD=0 dummy and correct SP output in the case of DTXd=1.) + +All other standard formats for GSM-HR codec frames, namely TS 101 318, RFC 5993 +and TW-TS-002, are supported via stateless format conversion functions. + +Representation and handling of BFI +---------------------------------- + +If a decoder input frame in the canonical 22-word *.dec format has BFI=1 and +SID=0, it is a non-SID BFI frame, also called an unusable frame in GSM 06.41 +spec. If such frame arrives in comfort noise insertion state, all parameters +are ignored. On the other hand, if such frame arrives outside of DTX state, +when ECU logic is applied instead, our version retains the logic from ETSI +reference code in that codevector parameters from BFI frames are used if the +voiced vs unvoiced mode matches between the BFI frame and the saved frame used +by the ECU. This logic resides in the Rx front end that is shared between full +decoder and TFO transform implementations. + +There is, however, an extension to this logic original to Themyscira: if the +BFI word in the 22-word decoder input frame equals 2 instead of 1 (not allowed +in ETSI reference code where BFI is strictly a binary flag), the frame is +treated as BFI-no-data and no parameter bits are ever used in any state. +Higher-level RTP input functions, described later in this article, feed this +BFI=2 code to the speech decoder or TFO transform engine when RTP input is BFI +without payload bits. + +Representation and handling of invalid SID +------------------------------------------ + +If a decoder input frame in the canonical 22-word *.dec format has indicators +set to either SID=1 (irrespective of other flags) or SID=2 with either BFI or +UFI nonzero, that input frame is invalid SID per GSM 06.41. All 18 speech +parameters are fully ignored in such frames, always. + +In RTP transport these invalid SID frames can be represented only in TW-TS-002, +not in either of the two non-Themyscira standards. TW-TS-002 offers the option +of either including or omitting payload bits in invalid SID packets - however, +if invalid SID payload bits are included, they are ignored by our speech decoder +and TFO transform engines. + +Libgsmhr1 general usage +======================= + +The external public interface to Themyscira libgsmhr1 consists of a single +header file <tw_gsmhr.h>; it should be installed in some system include +directory. + +The dialect of C used by all Themyscira GSM codec libraries is ANSI C (function +prototypes), const qualifier is used where appropriate, and the interface is +defined in terms of <stdint.h> types; <tw_gsmhr.h> includes <stdint.h>. + +State allocation and freeing +============================ + +In order to use the speech encoder, you will need to allocate an encoder state +structure, and to use the speech decoder, you will need to allocate a decoder +state structure. The same goes for the stateful TFO transform. The necessary +state allocation functions are: + +struct gsmhr_encoder_state *gsmhr_encoder_create(int dtx); +struct gsmhr_decoder_state *gsmhr_decoder_create(void); +struct gsmhr_rxfe_state *gsmhr_rxfe_create(void); /* TFO transform */ + +struct gsmhr_encoder_state, struct gsmhr_decoder_state and struct +gsmhr_rxfe_state are opaque structures to library users: you only get pointers +which you remember and pass around, but <tw_gsmhr.h> does not give you full +definitions of these structs. As a library user, you ordinarily don't even +need to know the size of these structs, hence the necessary malloc() operation +happens inside gsmhr_encoder_create(), gsmhr_decoder_create() and +gsmhr_rxfe_create() functions. (But see the following section regarding +alternative memory allocation schemes.) However, each structure is malloc'ed +as a single chunk, hence when you are done with it, simply call free() to +relinquish each encoder, decoder or TFO state instance. + +gsmhr_encoder_create(), gsmhr_decoder_create() and gsmhr_rxfe_create() functions +can fail if the malloc() call inside fails, in which case these libgsmhr1 +functions return NULL. + +The dtx argument to gsmhr_encoder_create() is a Boolean flag represented as an +int; it tells the GSM-HR speech encoder whether it should operate with DTX +enabled (run GSM 06.42 VAD and emit SID frames instead of speech frames per +GSM 06.41) or DTX disabled (skip VAD and always emit speech frames). + +It should be noted that the original GSM-HR speech encoder from ETSI always runs +GSM 06.42 VAD algorithm, whether DTX is enabled or disabled; if DTX is disabled, +VAD flag is forced to 1, but all VAD and Tx DTX logic still executes and burns +up CPU cycles. Due to work scope limits described in HR-codec-limits article, +this poor design from ETSI has been retained at the present. However, the API +design allows DTX enable-or-disable flag to be changed only with a full reset +of the speech encoder, rather than per frame - this API design thus prepares +for the possibility of cleaning up this implementation in the future and +executing VAD and Tx DTX code only when DTX is enabled in the speech encoding +direction. + +State reset functions +===================== + +void gsmhr_encoder_reset(struct gsmhr_encoder_state *st, int dtx); +void gsmhr_decoder_reset(struct gsmhr_decoder_state *st); +void gsmhr_rxfe_reset(struct gsmhr_rxfe_state *st); + +Each of these functions resets the state of the corresponding element to its +initial or "home" state. Home states for the standard speech encoder and +decoder are given in GSM 06.20 sections 5.5 and 5.6, respectively; the home +state for the separated-out RxFE block (used for TFO transform) is a subset of +the full speech decoder home state as relevant to the reduced functionality. + +The following const "variables" are exported by the library in order to +facilitate alternative memory allocation schemes: + +extern const unsigned gsmhr_encoder_state_size; +extern const unsigned gsmhr_decoder_state_size; +extern const unsigned gsmhr_rxfe_state_size; + +Using these const "variables", an application can allocate buffers of the +correct size for each state structure, and then initialize each newly allocated +state structure with gsmhr_*_reset(), as an alternative to gsmhr_*_create() +functions. Each of the standard gsmhr_*_create() functions allocates a buffer +of the correct size using standard malloc(), then initializes it with the +corresponding gsmhr_*_reset() function - hence the lower-level approach can be +used by applications that desire some other memory allocation scheme than +standard malloc(). + +Using the speech encoder +======================== + +To encode one 20 ms audio frame per GSM-HR, call gsmhr_encode_frame(): + +void gsmhr_encode_frame(struct gsmhr_encoder_state *st, const int16_t *pcm, + int16_t *param); + +You need to provide an encoder state structure allocated earlier with +gsmhr_encoder_create(), a block of 160 linear PCM samples, and an output buffer +of 20 (GSMHR_NUM_PARAMS_ENC) 16-bit words into which the encoded GSM-HR frame +will be written. The encoded frame format emitted by this function is the same +as in the reference implementation from ETSI: 18 words of speech parameters +followed by VAD and SP flags. Stateless format conversion functions described +later in this document can be used to emit more commonly used RTP formats. + +The mandatory encoder homing function is included: if the input frame matches +the encoder homing frame, the encoder state is reset to the home state at the +end of gsmhr_encode_frame() processing. + +Using the speech decoder +======================== + +Our speech decoder main function is: + +void gsmhr_decode_frame(struct gsmhr_decoder_state *st, const int16_t *param, + int16_t *pcm); + +The input frame format is the canonical one from ETSI: 22 (GSMHR_NUM_PARAMS_DEC) +16-bit words providing speech parameters followed by BFI, UFI, SID and TAF +metadata flags. In normal operation, this internal canonical form of speech +decoder input will be provided by one of the stateless format conversion +functions in the same libgsmhr1. + +Important note: the parameter frame input to this function is expected to be +valid, i.e., it is NOT subjected to explicit validation checks! If your +application reads *.dec files or otherwise receives this format directly from +some external source (as opposed to output from one of our own format conversion +functions), you need to validate these bits with gsmhr_check_decoder_params() +before feeding them to the decoder engine! + +This speech decoder function includes all mandatory logic for decoder homing: +special handling of the homed state, decoder homing frame checks of both full +and partial (to the first subframe) kinds, internal state reset and EHF output. + +TFO transform function +====================== + +To operate our TFO transform for GSM-HR codec, create a standalone RxFE state +structure (use gsmhr_rxfe_create() or your own allocation followed by +gsmhr_rxfe_reset()) and then call this function for every frame to be processed: + +void gsmhr_tfo_xfrm(struct gsmhr_rxfe_state *st, int dtxd, const int16_t *ul, + int16_t *dl); + +UL Rx input has the same form as input to gsmhr_decode_frame(), and the same +caveats apply in terms of validation checks. DL Tx output is emitted in the +same form as gsmhr_encode_frame() output, complete with VAD and SP flags. VAD +output should be considered a dummy, but SP output flag is valid: 1 in the case +of a speech frame or 0 in the case of a SID frame; the latter is possible only +when DTXd is enabled. + +DTXd control: dtxd argument to gsmhr_tfo_xfrm() tells the transform if it should +emit both speech and SID frames or speech frames only, corresponding to DTXd +flag in TRAU-UL frames that affects both speech encoder and TFO transform +functions in traditional TRAUs. This flag can be changed mid-session: as +explained in HR-codec-Rx-logic article, our implementation of TFO transform +proceeds by applying classic Rx front end processing that only emits speech +frames, and then replacing output with SID frames under certain conditions if +DTXd is enabled.
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/doc/HR-codec-limits Thu Mar 19 04:13:45 2026 +0000 @@ -0,0 +1,27 @@ +GSM-HR speech codec is currently deemed by us to have no practical utility: + +* GSM networks that need to use half-rate channels for capacity reasons will + preferentially use AMR-HR (superior speech quality to HRv1), and may even + have to reject GSM MS that don't support it; + +* GSM networks that aren't so constrained in capacity will use one of AMR-FR, + EFR or FRv1, all of which provide better speech quality than HRv1. + +Because of this lack of practical utility, some important limits have been +placed on the scope of work to support this codec in the present Themyscira +Wireless GSM codec libraries and utilities suite: + +* Between speech encoder & decoder functions on the one hand and the TFO + transform function (TS 28.062 section C.3.2.1.1) on the other hand, the + latter received more development attention. For completeness, TFO transform + implementations need to be provided for all 3 codecs for which this transform + is defined, and the implementation of TFO transform for GSM-HR serves as a + stepping stone toward the much more important one for EFR. + +* Librified forms of speech encoder and decoder engines from GSM 06.06 GSM-HR + reference code have been included in libgsmhr1 for the sake of completeness, + as well as command line utilities built on top of these library engines. + However, unlike the situation with EFR and AMR codecs, no effort is being + expended to make these operations fast or computationally efficient - the + slow and grossly inefficient code from ETSI remains mostly unchanged except + for elimination of global variables and introduction of state structures.
--- a/doc/TFO-transform Mon Mar 16 20:36:42 2026 +0000 +++ b/doc/TFO-transform Thu Mar 19 04:13:45 2026 +0000 @@ -4,8 +4,8 @@ "TFO transform" is the term adopted by Themyscira Wireless for the non-trivial transform on GSM codec frames called for by the TFO spec, 3GPP TS 28.062 section C.3.2.1.1. We have a goal of implementing TFO transform for all 3 -classic GSM codecs (FR, HR and EFR) in our Themyscira codec libraries; in the -present release, only GSM-FR version has been implemented. +classic GSM codecs (FR, HR and EFR) in our Themyscira codec libraries; at the +present time, only GSM-FR and GSM-HR versions have been implemented so far. The input to this transform is the stream of received uplink frames from call leg A, possibly containing BFI frame gaps and SID frames if call leg A uses @@ -95,3 +95,25 @@ and radio interference reduction. However, if the input to the transform is all good speech frames without DTX pauses, the transform does not attempt to apply VAD and make its own DTXd. + +TFO transform for HRv1 +====================== + +This transform is implemented in libgsmhr1 in both DTXd=0 and DTXd=1 +configurations. This feat has been achieved by factoring the Rx front end out +of ETSI reference speech decoder, producing a common RxFE block that is then +shared by full speech decoder and TFO transform services provided by the +library. + +The following articles provide further details: + +HR-codec-Rx-logic Algorithmic logic of ThemWi Rx DTX handler for GSM-HR, + shared between the regular speech decoder and our TFO + transform. + +HR-codec-library This article describes the API to Themyscira libgsmhr1, + the library that implements the present TFO transform + along with other GSM-HR codec functions. + +HR-codec-utils gsmhr-tfo-xfrm and gsmhr-tfo-xfrm-dc utilities will be + documented here.
