FreeCalypso > hg > efr-experiments
comparison Theory-and-mystery @ 7:1fd613cec7ab
Theory-and-mystery: document written
| author | Mychaela Falconia <falcon@freecalypso.org> |
|---|---|
| date | Wed, 17 Apr 2024 17:14:41 +0000 |
| parents | |
| children |
comparison
equal
deleted
inserted
replaced
| 6:6119d2c1e7d9 | 7:1fd613cec7ab |
|---|---|
| 1 Relation between GSM-EFR and 12k2 mode of AMR | |
| 2 ============================================= | |
| 3 | |
| 4 What are the differences between GSM-EFR codec and the highest 12k2 mode of AMR, | |
| 5 or MR122 for short? The most obvious difference is in DTX: the format of SID | |
| 6 frames and even the very paradigm of how DTX works are completely different | |
| 7 between EFR and AMR. But what about non-DTX operation? If a codec session | |
| 8 consists solely of good speech frames, no SIDs and no BFI frame gaps, are EFR | |
| 9 and MR122 strictly identical? | |
| 10 | |
| 11 The correct answer is that in the absence of SIDs, EFR and MR122 are directly | |
| 12 interoperable in that the output of an EFR encoder can be fed to the input of | |
| 13 an AMR decoder, and vice-versa. However, the two codecs are NOT identical at | |
| 14 the bit-exact level! The differences are subtle, such that finding them | |
| 15 requires some intense study; here I cover those diffs which I was able to find. | |
| 16 | |
| 17 DHF difference and the reason why it occurs | |
| 18 =========================================== | |
| 19 | |
| 20 In their official form (non-telco-grade corner-cutting libraries don't count, | |
| 21 no matter how popular among FOSS), both EFR and AMR include codec homing as a | |
| 22 mandatory feature, and the mechanism works on the same principle across all | |
| 23 ETSI/3GPP codecs. The encoder homing frame (EHF) is the same for all codecs: | |
| 24 all 160 samples equal to 0x0008, but each codec has its own decoder homing frame | |
| 25 (DHF). Each codec's respective DHF is the natural output of its encoder when | |
| 26 the input is EHF and the initial state is the reset state - as simple as that. | |
| 27 Note the natural aspect: every spec-defined DHF came about naturally in that | |
| 28 codec, hence the exact set of codec parameters that constitutes a DHF is not a | |
| 29 detail which some standard-setting committee could define arbitrarily. | |
| 30 | |
| 31 AMR has 8 different DHFs for its 8 different modes, and the DHF for MR122 is | |
| 32 *not* the same as EFR DHF! Given that this DHF is nothing but the encoder's | |
| 33 natural response to encoding an EHF input, this difference in DHF between EFR | |
| 34 and MR122 indicates the existence of some difference between the two encoders. | |
| 35 A simple experiment, contained in this source tree, reveals what the key | |
| 36 difference is: see src/cod_12k2.c, #ifdef EFR2_VARIANT. When this source is | |
| 37 compiled with -DEFR2_VARIANT in efr2 directory, the resulting encoder produces | |
| 38 DHF (natural response to EHF received in the reset state) that is identical to | |
| 39 the one defined for MR122, proving that this specific change is the reason for | |
| 40 the diff in DHF parameters between EFR and MR122. | |
| 41 | |
| 42 The encoder diff that happens here (change from EFR to MR122) is an artificial | |
| 43 delay of 5 ms. In EFR, on each invocation of the encoder, a frame of new 160 | |
| 44 speech samples is fed in, and that same frame is subject to encoding. In AMR, | |
| 45 the input is still 160 samples each time, but the frame being encoded consists | |
| 46 of 40 samples from the tail of the previous input and 120 samples from the new | |
| 47 input. The newest 40 samples are used for auto-correlation computation in the | |
| 48 lower modes of AMR (see 3GPP TS 26.090 section 5.2), but in MR122 they do | |
| 49 absolutely nothing until the next invocation of the encoder, effecting an | |
| 50 artificial delay of 5 ms. In true multirate operation this delay is needed to | |
| 51 support seamless mode switching, but in an MR122-only environment it is just | |
| 52 waste. | |
| 53 | |
| 54 Other encoder differences | |
| 55 ========================= | |
| 56 | |
| 57 The 5 ms delay covered above is not the only diff between non-DTX EFR and MR122 | |
| 58 encoders. We know that other diffs must exist because the output of the test | |
| 59 encoder built in efr2 directory of this repository does not match that of the | |
| 60 official AMR encoder beyond the initial homing frames; however, those additional | |
| 61 differences have not been studied yet. | |
| 62 | |
| 63 Decoder diffs between EFR and MR122 | |
| 64 =================================== | |
| 65 | |
| 66 The two decoders are also different at the bit-exact level: if you take a "pure" | |
| 67 stream of 12k2 speech frames (no DHF, no SIDs and no BFI frame gaps or defects) | |
| 68 and feed it to EFR and AMR decoders, both starting from external reset state, | |
| 69 the resulting outputs will be different. | |
| 70 | |
| 71 Two specific differences in the decoder have been identified: | |
| 72 | |
| 73 * The AGC module is different: see agc.c vs agc_amr.c in src directory. The | |
| 74 diffs inside AGC have not been studied yet. | |
| 75 | |
| 76 * The post-processing step described in 3GPP TS 26.090 section 6.2.2 (high-pass | |
| 77 filtering) is new with AMR. | |
| 78 | |
| 79 The code version built in efr2 directory has these two changes applied; it | |
| 80 passes on all available test sequences (amr122_efr.zip described below), but | |
| 81 there may be other diffs that aren't caught by this test sequence set and which | |
| 82 we therefore have not identified yet. | |
| 83 | |
| 84 ETSI/3GPP laxness toward EFR implementors | |
| 85 ========================================= | |
| 86 | |
| 87 ETSI had a tradition of defining standard GSM codecs (FR, HR, EFR) in bit-exact | |
| 88 form, and every production implementation was required to match the output of | |
| 89 the official reference bit for bit. However, once AMR came out, the regulation | |
| 90 on EFR was loosened. GSM 06.54 document from 2000-08 (ETSI TS 100 725 V5.2.0) | |
| 91 has an appendix-like chapter (chapter 10) whose first paragraph reads: | |
| 92 | |
| 93 The 12.2 kbit/s mode of the Adaptive Multi Rate speech coder described | |
| 94 in TS 26.071 is functionally equivalent to the GSM Enhanced Full Rate | |
| 95 speech coder. An alternative implementation of the Enhanced Full Rate | |
| 96 speech service based on the 12.2 kbit/s mode of the Adaptive Multi Rate | |
| 97 coder is allowed. Alternative implementations shall implement the | |
| 98 functionality specified in TS 26.071 for the 12.2 kbit/s mode, with the | |
| 99 exception that the DTX transmission format (GSM 06.81) and the comfort | |
| 100 noise generation (GSM 06.62) shall be used. | |
| 101 | |
| 102 It appears that DSP vendors (for GSM MS or for network transcoders, or perhaps | |
| 103 both) weren't too happy with the prospect of having to include two different | |
| 104 versions of _almost_ the same codec algorithm with a bunch of interspersed | |
| 105 subtle diffs, and so the rules were bent: EFR implementors were given permission | |
| 106 to deviate from the original bit-exact definition of EFR in order to have more | |
| 107 commonality with MR122. | |
| 108 | |
| 109 But the devil is in the details. If I am seeking to implement this "EFR | |
| 110 alternative 2", where is the new bit-exact reference to be followed for this | |
| 111 option? No such reference C code for this AMR-EFR hybrid appears to have been | |
| 112 published anywhere, but this code must have existed once in unpublished form, | |
| 113 as we do have surviving published _output_ from that mystery code. | |
| 114 | |
| 115 The digital companion to just-quoted GSM 06.54 is a ZIP archive named | |
| 116 ts_100725v050200p0.zip; inside this ZIP archive there are 9 inner ZIPs: 8 ZIPs | |
| 117 for the 8 original EFR test sequence disks, plus a later addendum named | |
| 118 amr122_efr.zip. The latter ZIP contains *.cod and *.dec test sequence files in | |
| 119 EFR format (*not* AMR), as well as *.out files from the intended decoding of | |
| 120 *.dec. The transformation from *.cod to *.dec in this set is unchanged EFR | |
| 121 ed_iface, but the encoder run that produced *.cod and the decoder run that | |
| 122 produced *.out were quite special: | |
| 123 | |
| 124 * t??_efr.cod contain the same codec parameters as the AMR counterpart in 06.74 | |
| 125 test sequence set except for the first two frames in each sequence, which are | |
| 126 proper EFR DHFs. It appears that they ran an essentially-unmodified AMR | |
| 127 encoder in MR122 wtth DTX disabled, then artificially patched the DHF after | |
| 128 MR122 encoder output, then packaged the output in EFR *.cod format - but it | |
| 129 must have been more complicated, as this simplistic approach would not support | |
| 130 DTX. | |
| 131 | |
| 132 * dtx?_efr.cod and dtx?_efr2.cod are more intriguing: they are said to | |
| 133 correspond to VAD1 and VAD2 in the AMR reference source, yet these sequences | |
| 134 have EFR SID frames in their silence parts, not AMR DTX. Thus someone must | |
| 135 have constructed an encoder that combines most of AMR code (including AMR VAD | |
| 136 and the AMR version of 12k2 speech encoding) with EFR Tx DTX logic and EFR SID | |
| 137 generation - quite a feat! | |
| 138 | |
| 139 * In the decoder direction, the hack presented in efr2 directory of this code | |
| 140 repository is sufficient to produce a matching *.out for every *.dec in the | |
| 141 amr122_efr.zip mystery collection, including dtx?_efr.dec and dtx?_efr2.dec. | |
| 142 However, we made our hack by starting with EFR reference source and making | |
| 143 small surgical changes to it; I wonder if whoever did the original feat at | |
| 144 ETSI/3GPP started with AMR source instead and outfitted it with ability to | |
| 145 understand EFR SID frames and do comfort noise generation per GSM 06.62 - | |
| 146 that approach would be a big feat, just like with the encoder. | |
| 147 | |
| 148 The present author considers it a shame that whatever AMR-EFR hybrid programs | |
| 149 were used to generate the sequences in amr122_efr.zip were never published. In | |
| 150 the absence of such published code, the details of exactly what was done by | |
| 151 those commercial DSP/transcoder vendors who combined AMR with EFR will remain | |
| 152 elusive. |
