FR/HR/EFR voice codecs in Osmocom RAN Presented by: Mother Mychaela N. Falconia, operator of Themyscira Wireless GSM network based on Osmocom CNI components. High Priestess of Telecommunications, Women's Republic of Themyscira Summary of GSM speech codecs GSM speech codecs in the order of invention: FRv1: 260 bits per 20 ms frame, 13.0 kbit/s HRv1: 112 bits per 20 ms frame, 5.6 kbit/s EFR: 244 bits per 20 ms frame, 12.2 kbit/s AMR: not part of this presentation HR was invented in an effort to increase cell capacity, but EFR's sole improvement over FRv1 is voice quality, created for no other reason. Every GSM MS must support FRv1, all others are optional in any combination. Why care about non-AMR codecs? A network that aims to provide service to Vintage Mobile Phones must support FRv1 at the minimum - it is the only codec all phones are required to support. There are many vintage phones that support EFR but not AMR, in addition to required FRv1. Higher voice quality, their original selling point, is achieved only when the network supports EFR. If significant effort is to be expended to support FRv1 and EFR to the finest level of quality, AMR support can be deprioritized - EFR is same voice quality as the highest mode of AMR. Themyscira Wireless currently operates with EFR as the preferred codec and FRv1 as fallback; AMR is disabled in OsmoBSC codec-list config. (And no TCH/H timeslots, only TCH/F.) DTX with FR/HR/EFR codecs All 3 codecs support DTX, in both UL and DL, but: * DTXd is possible only on multicarrier cells (not on C0) * OTOH, DTXu is always allowed and almost always desired * Therefore, a high-quality GSM retronetworking operation needs to support the combination of DTXu without DTXd Enabling any form of DTX means allowing SID frames to occur: A SID frame is a specially modified codec frame, encodes CN (comfort noise) parameters. Classic libgsm for FRv1 does not support SID in input! SID classification complication: 3rd state of invalid SID Classifying each received frame as either (good) speech or (valid) SID is not enough, there is also the 3rd state of invalid SID. If a SID frame is corrupted, having it interpreted as speech (because the SID codeword isn't detected) would be bad - hence the need for invalid SID class. The threshold between invalid SID and speech is a compromise between corrupted SID escaping the check and speech frames getting misclassified as invalid SID. FR & EFR specs: if at least 80 bits out of the 95-bit SID field are set to indicate SID, but it isn't a valid SID (too many bit errors), then it is invalid SID. More on valid vs invalid SID SID classified as valid: CN (comfort noise) params taken from this frame; SID classified as invalid: tells Rx that CN should be started or continued, but no usable params in this frame; The threshold between valid and invalid SID is also a compromise: undesirable to reject minimally corrupted frames with still-usable CN params, but also undesirable to have badly corrupted CN params accepted as valid. FR & EFR specs: one bit error in SID field is still a valid SID, two or more bit errors make it invalid SID. SID detector problem for HR The specs prescribe exact bit counting rules for FR & EFR, but not for HR! In the original architecture the SID detector was closely coupled with the GSM 05.03 channel decoder; SID field bit counting needs to be done before the voiced/unvoiced bit reordering selection in the channel decoder - otherwise invalid SID will be missed and (badly) misinterpreted as unvoiced speech if the two SID codeword bits that fall onto mode bits are corrupted. If we (Osmocom) wish to support HRv1 codec decently and properly, we as the community will have to come up with our own thresholds for: 1) the boundary between invalid SID and non-SID speech; 2) the boundary between valid and invalid SID. There is a _non-normative example_ provided in the ETSI C code in GSM 06.06, but it is weird, refers to a mystery BCI flag that was dropped from the final published HR codec specs... Architecture of standard speech decoders Per the specs, the complete speech decoder block for each of the 3 codecs begins with a subblock called Rx DTX handler - and this block is mandatory whether you run with DTX enabled or not! Handling of bad or entirely missing speech frames is part of the duties of the Rx DTX handler block, and of course there will always be some BFIs and frame gaps in input. And when DTX is enabled, SIDs are naturally handled by the Rx DTX handler. - For FRv1 the Rx DTX handler is a modular piece, a front end to the basic GSM 06.10 speech decoder. The only FLOSS implementation I know of is the one I wrote, released as Themyscira libgsmfrp. - For HR and EFR the Rx DTX handler has been an integral part of the speech decoder (reference source published by ETSI) from day 1. Particularly with FRv1 codec, skipping the required Rx DTX handler and using only a raw GSM 06.10 library (libgsm) in tools like gapk is an ever-present source of misunderstanding and misdiagnosis of speech/voice/audio problems among developers who haven't spent months studying this material like I had to! Transport within RAN Classic E1 Abis: GSM 08.60 (FR & EFR) and 08.61 (HR) Out-of-band flags for UL frame stream: BFI, SID, TAF (all 3 codecs) UFI (HR only) BFI = Bad Frame Indicator UFI = Unreliable Frame Indicator TAF = Time Alignment Flag (SACCH alignment marker for Rx DTX handler) Note 1: frame data bits are still sent when BFI=1! Note 2: SID ternary classification is done by the BTS and indicated out-of-band, not reconstructed from frame payload by the recipient! Doesn't matter for FR & EFR, but matters a lot for HR! DL frame stream: good speech expected in every frame without DTXd, or a mixture of good speech and valid SID with DTXd. No BFIs or invalid SID! TFO and TrFO TFO = Tandem-Free Operation TrFO = Transcoder-Free Operation In both cases codec frames from leg A UL go to leg B DL without tandem transcoding. But what about BFIs in the stream from UL? And what if leg A UL did DTXu but there is no DTXd on leg B DL? A single-carrier cell means no DTXd! TS 28.062 section C.3.2.1.1 sets the rules for turning leg A UL into leg B DL; this spec was written for TRAUs with TFO, but I see no reason why the same logic shouldn't carry over to TrFO systems such as a self-contained Osmocom network. These C3211 rules essentially amount to a parameter-level (operating on codec frame bits) ECU plus CNG (comfort noise generator) implemented in the TFO or TrFO path. C3211 rules in practice Spec language: "subject to manufacturer dependent future improvements and is not part of this recommendation." They tell us what SHALL be done, but not how to actually do it algorithmically! I would LOVE to get my hands on a real TRAU (remotely over OCTOI would be just fine), experiment and see what they actually do! If I had to do it: easy for FRv1, a bit harder but hopefully still doable for HR, but very off-putting-ly difficult for EFR. The whole point of EFR is to give better voice quality than FRv1, it was created to be a "luxury" codec - so let's not ruin it with a quirk-hack, low-effort implementation of C3211 rules for TrFO! What I did in OsmoBTS TrFO path - Weed out invalid SID like the TFO spec says; - Reposition valid SID frames to where they need to be relative to the SACCH multiframe. Logical DTXd without physical DTXd: transmit inverted CRC3, so the Rx DTX handler in the MS ends up in the same state as if Alice and Bob had their call on an E1-based network with TRAUs, TFO and DTXd on each DL leg. Bad frame gaps outside of DTXu pauses: instead of applying ECU in the TrFO path like the TFO spec says, transmit inverted CRC3 and let the Rx DTX handler in the MS do the heavy lifting. Result: best possible outcome for EFR, better than what a parameter-level ECU+CNG combo would produce if we tried to implement TS 28.062 section C.3.2.1.1 rules to the letter. RTP encodings for IP-based RAN RTP payload format for FRv1 was invented by early VoIP people without any consideration given to IP-based GSM RAN implementors. RTP payload format for EFR was invented by the TIPHON group (not Groupe speciale mobile!) in ETSI by essentially copying what IETF people did for FRv1, thus the same problem of not having been designed for GSM RAN remains. All out-of-band metadata bits of GSM 08.60 & 08.61 are lost in RTP! BFI in UL is indicated by sending no RTP packet at all (apparent industry standard practice) or sending zero-length RTP payload (rtp continuous-streaming vty option in OsmoBTS). Setting all 260 bits of FR or all 244 bits of EFR to 0 is not a valid BFI marker, it is a bogon! Only recently fixed in osmo-bts-trx... Problem with RTP-based GSM RAN using IETF/TIPHON formats Compared to GSM 08.60, the industry standard RTP format for FR & EFR (specified in both RFC 3551 and TS 101 318) has two functional regressions: - No ability to indicate BFI along with data bits; - TAF bit is lost. These functional regressions from GSM 08.60 are bad when the RTP stream goes to a network edge transcoder implemented in the spirit of retronetworking, especially if that transcoder also implements in-band TFO in its G.711 output. TFO frames are slightly modified TRAU-UL frames, so imagine this chain: E1 Abis -> OsmoMGW -> RTP -> Network edge TC -> TFO inside G.711 RTP transport effectively mutilates TRAU-UL frames... Solution: TW-TS-001 enhanced RTP format Themyscira Wireless Technical Specification TW-TS-001 Enhanced RTP transport of FR and EFR codec frames in an IP-based GSM RAN An extended RTP payload format specifically for GSM RAN, modeled after GSM 08.60 TRAU-like Extension Header (TEH) octet prepended before basic RTP payload OsmoBTS implementation: falconia/rtp_traulike branch Two patches on that branch: * First patch makes OsmoBTS accept TW-TS-001 packets: simply strip off TEH octet like we do with RFC 5993 ToC; * Second patch adds vty option to emit this TW-TS-001 format. What about HR codec? The problems addressed by TW-TS-001 for FR & EFR are not already solved for HR by RFC 5993 format. The following defects still remain: * No way to indicate BFI along with data bits * No way to represent "invalid SID" output from SID-aware GSM 05.03 decoder * TAF bit lost just like in FR & EFR standard RTP formats * UFI bit (exists for HR only in GSM specs) is likewise dropped Themyscira Wireless does not use HR codec, but I may produce a TW-TS-002 spec for HR that will define an extension to RFC 5993 format: * Two more frame types (FT field in ToC octet) for BFI-with-data and Invalid SID; * Use reserved bits in the ToC octet to carry UFI, TAF and DTXd bits like TW-TS-001 does. This putative TW-TS-002 spec will also define a storage format, similar to how *.amr file format is based on RTP octet-aligned format for AMR, and this format will be used if and when Themyscira GSM codec libraries and utilities toolkit gets extended to support HR. More on HR in RTP: TS 101 318 vs RFC 5993 Unless we further extend that ToC octet with our own TS, RFC 5993 as it stands does not provide any new capabilities over more basic TS 101 318 format: * RFC 5993 can explicitly indicate a missing frame (FT=7), but we already have zero-length RTP payloads that achieve the same effect in a codec-independent way. * FT=0 vs FT=2 indicates SID status, but: - no representation is provided for Invalid SID; - my reading of the RFC tells me that even for valid SID (FT=2) the SID field must be all 1s - thus nothing is gained over the fallback method of having the receiver check for the SID codeword. Neither of the two standard formats is friendly to a BTS implementor who seeks to do what ETSI envisioned, with proper ternary SID classification tightly coupled to the GSM 05.03 channel decoder. Remaining bogon in osmo-bts-trx: ECU call in the UL path The call to ECU in the UL path of osmo-bts-trx is unnecessary, counter to the specs and should be considered a bug. In the traditional GSM architecture a BTS never applies an ECU to its UL output, instead it emits BFIs when it received a bad frame or nothing at all from the air. If one interprets TFO rules as also applying to TrFO, TS 28.062 section C.3.2.1.1 calls for an ECU in the path from leg A UL (RTP input in OsmoBTS) to leg B DL (internal DL path in OsmoBTS) - but this rule would be super-difficult to implement for EFR, and we already have a better-performing alternative implementation which is the same for all 3 non-AMR codecs. Therefore, there is no justification at all for an ECU call anywhere in OsmoBTS! Mistaken belief about uplink ECU in proprietary BTS PHYs There is a myth that sysmoBTS PHY applies some ECU of its own to its UL output. It does not - at least not for FRv1 or EFR, no easy test setup on my end for HRv1 or AMR. Actual experimental observations: when GSM MS exercises DTXu and sysmoBTS PHY is receiving radio noise, this PHY sends zero-length payloads (indicating BFI) to the ARM Linux part of sysmoBTS - not any kind of ECU output. Occasionally the CRC3 check happens to succeed despite the PHY receiving radio noise (1/8 probability), and during these times the PHY sends garbage payloads presented as valid speech - the link quality check in l1sap has to filter them out. This quirk serves as further proof that there is no secret ECU in the DSP. Early osmo-bts-trx developers came to this mistaken belief because they got noticeably worse speech quality with their early osmo-bts-trx work than what sysmoBTS produces. Real reason for bad audio on early osmo-bts-trx: the UL code indicated BFI conditions with a bogon in RTP output, and when the receiving end interpreted this bogon instead of invoking the spec-defined bad frame handler, unpleasant sounds were the result. Applying ECU in the BTS UL output path masked this bug. Internal UL ECU application in OsmoBTS is very inconsistent The misfeature exists only in osmo-bts-trx version and not in any others; Even within osmo-bts-trx it is inconsistent: because the ECU call happens before the link quality check in l1sap, sometimes the RTP output will be this ECU output, and othertimes it will be standard BFI produced by the l1sap layer. Proposed solution to OsmoBTS ECU problem My first-choice preference is to remove the ECU call from OsmoBTS altogether - see proposed patch on falconia/ecu-ectomy branch. But will such a patch be acceptable _in principle_, aside from code implementation details to be worked out in code review? If that first-choice option is not acceptable to the community, the alternative is to move the ECU from osmo-bts-trx model code to the common layer, making it available on all models, and make it a vty option. But do we really need this ECU, and why? Voice testing tools: gapk shortcomings It is my understanding that many Osmocom developers use gapk-based processing chains for voice testing, either as RTP source and sink on the network side or with trxcon as part of test MS in virtual Um test setups. Bad news: gapk is broken, and I am not smart enough to fix it. The scope of what gapk aims to do seems overly ambitious to me, and I am unsure if it is possible to make a fully correct implementation with such wide ambition. gapk is fundamentally designed as an anything-to-anything converter, but this design leaves no room for any use-case-specific out-of-band flags: BFI flag for all codecs, or HR-specific UFI flag, or HR-specific out-of-band SID classification, let alone fine details like TAF. In order for a GSM voice testing tool (developer-oriented) to be properly correct, it needs to be endowed with support for and understanding of these out-of-band flags, and of most practical relevance, these flags need to be passed to the proper Rx DTX handler, which is either built into the main body of the speech decoder (HR and EFR) or implemented as a front-end preprocessor (FRv1). But I don't know how to implement these ideas within gapk architecture. Proposal to Osmocom community: a narrower-scope tool maybe? A super-general, super-versatile tool like gapk is great, but if making it perform correctly in real-life use cases is too difficult of a problem... What about a more limited-scope, more specialized tool? Seeking input from the community: what are your real (used everyday, not hypothetical) use cases for voice testing with gapk? Given some specific use cases, perhaps I can produce an alternative tool, based on public domain Themyscira GSM codec libraries, that does whatever you currently do with gapk, but with a correct Rx DTX handler (of which the BFI handler and ECU are an integral part) at the speech decoder input. Summary of open questions - Should we (Osmocom community) invest any more effort in HRv1 codec? Does anyone actually need or use it? - Community thoughts on merging vty-optional support for TW-TS-001 for FR & EFR. - Removing the bogus UL ECU from osmo-bts-trx, or alternatively moving it to model-independent common layer and making it optional. - What to do about gapk architectural limitations? The community deserves a proper voice testing tool that heeds out-of-band metadata flags and invokes proper Rx DTX handlers for all 3 codecs covered here. ... and I still want to experiment with a real historical TRAU - how to get a hold of one, or get remote access to one? If I make an international call in G.711 codec to a country that has a legacy GSM network with TRAUs, can I get a transparent-enough G.711 channel all the way through, so I can talk in-band TFO to that TRAU?