view doc/AMR-EFR-conversion @ 242:f081a6850fb5

libgsmfrp: new refined implementation The previous implementation exhibited the following defects, which are now fixed: 1) The last received valid SID was cached forever for the purpose of handling future invalid SIDs - we could have received some valid SID ages ago, then lots of speech or NO_DATA, and if we then get an invalid SID, we would resurrect the last valid SID from ancient history - a bad design. In our new design, we handle invalid SID based on the current state, much like BFI. 2) GSM 06.11 spec says clearly that after the second lost SID (received BFI=1 && TAF=1 in CN state) we need to gradually decrease the output level, rather than jump directly to emitting silence frames - we previously failed to implement such logic. 3) Per GSM 06.12 section 5.2, Xmaxc should be the same in all 4 subframes in a SID frame. What should we do if we receive an otherwise valid SID frame with different Xmaxc? Our previous approach would replicate this Xmaxc oddity in every subsequent generated CN frame, which is rather bad. In our new design, the very first CN frame (which can be seen as a transformation of the SID frame itself) retains the original 4 distinct Xmaxc, but all subsequent CN frames are based on the Xmaxc from the last subframe of the most recent SID.
author Mychaela Falconia <falcon@freecalypso.org>
date Tue, 09 May 2023 05:16:31 +0000
parents 8eb0e7a39409
children 78739fda2856
line wrap: on
line source

We have two simple utilities that allow one to experiment with "dumb" bit-
shuffling conversion between AMR 12k2 and EFR codec formats, to explore
capabilities and limitations of this approach.

gsm-amr2efr reads an AMR speech recording in RFC 4867 storage format (the common
.amr format) and converts it to EFR in gsmx format.  The AMR input to this
utility must consists of MR122 frames only - no other AMR modes, no SID and no
NO_DATA gaps.  The intent is that one can take a starting speech sample in WAV
format, encode it into AMR with amrnb-enc from opencore-amrnb (by default that
utility produces MR122 encoding without DTX), and then convert the AMR output to
EFR with gsm-amr2efr.  One can then encode the same starting-point WAV speech
sample with gsmefr-encode (matching official EFR from ETSI) and compare the two
EFR outputs.  When you do this experiment, you will see that the two EFR outputs
will be different (you can then analyze encoded speech parameter diffs with
gsmrec-dump), but each version can be fed to an EFR decoder, resulting in
OK-sounding speech.

gsm-efr2amr performs the opposite conversion: it reads an EFR session recording
in gsmx format and converts it to AMR storage format.  The input to gsm-efr2amr
is allowed to contain Themyscira BFI markers in addition to EFR frames; these
BFI markers will be turned into AMR NO_DATA frames.  The same input can also
contain EFR SID frames - however, gsm-efr2amr will not detect them and won't
give them any special handling, instead they will be bit-reshuffled into MR122
just like EFR speech frames.  The result of such "dumb" conversion is invalid
AMR, and when you decode it with amrnb-dec, you will hear some strange noises.