Themyscira Wireless Technical Specification TW-TS-001 Version 1.0.1 Enhanced RTP transport of FR and EFR codec frames in an IP-based GSM RAN 0. Foreword This Technical Specification (TS) has been produced under the authority of the Presiding Sisterhood (government) of the Women's Republic of Themyscira as part of Themyscira Wireless technical initiative. Author: Mother Mychaela N. Falconia, High Priestess of Telecommunications As an official publication of Themyscira government, this document is not subject to copyright. This document makes numerous references to Cellular Network Infrastructure (CNI) software components produced by Osmocom community project, particularly OsmoBTS, OsmoBSC, OsmoMGW and libosmocore function library. The Presiding Sisterhood of Themyscira gratefully acknowledges the contribution of Osmocom to the noble cause of restoration and re-creation of classic GSM/2G technology in the face of adversity from the industry and culture of modernity. 1. Scope The enhanced RTP payload format defined in this TS is applicable only to GSM-FR and GSM-EFR codecs, and is intended for use only within IP-based GSM RAN, not in general-purpose IP networks or VoIP applications. More specifically, the applicability scope of this TS is the network segment extending from an IP-based BTS, or a converter from T1/E1 Abis to IP-based RAN, to the network edge transcoder, or from one IP-based BTS to another in the case of TrFO. 2. References The following documents contain provisions which, through reference in this text, constitute provisions of the present document. [1] IETF RFC 3550, H. Schulzrinne, S. Casner, R. Frederick, V. Jacobson: "RTP: A Transport Protocol for Real-Time Applications". [2] IETF RFC 3551, H. Schulzrinne, S. Casner: "RTP Profile for Audio and Video Conferences with Minimal Control". [3] ETSI TS 101 318: "Telecommunications and Internet Protocol Harmonization Over Networks (TIPHON); Using GSM speech codecs within ITU-T Recommendation H.323". [4] 3GPP TS 46.010: "Full rate speech; Transcoding". [5] 3GPP TS 46.031: "Full rate speech; Discontinuous Transmission (DTX) for full rate speech traffic channels". [6] 3GPP TS 46.060: "Enhanced Full Rate (EFR) speech transcoding". [7] 3GPP TS 46.081: "Discontinuous Transmission (DTX) for Enhanced Full Rate (EFR) speech traffic channels". [8] 3GPP TS 48.060: "In-band control of remote transcoders and rate adaptors for full rate traffic channels". [9] 3GPP TS 28.062: "Inband Tandem Free Operation (TFO) of speech codecs". [10] 3GPP TS 46.011: "Full rate speech; Substitution and muting of lost frames for full rate speech channels". [11] 3GPP TS 46.061: "Substitution and muting of lost frames for Enhanced Full Rate (EFR) speech traffic channels". [12] 3GPP TS 46.062: "Comfort noise aspects for Enhanced Full Rate (EFR) speech traffic channels". 3. Definitions, conventions and abbreviations 3.1. Definitions For the purposes of the present document, the following terms and definitions apply: basic RTP format: the RTP payload format defined in TS 101 318 [3]. enhanced RTP format: the RTP payload format defined in chapter 5 of the present Technical Specification. extended RTP format: a synonym for enhanced RTP format. octet: a group of 8 bits that functions as an indivisible elementary data unit for the purposes of RTP payload formats. S611 rules: SID classification rules prescribed in section 6.1.1 of GSM 06.31 for FR or GSM 06.81 for EFR. TRAU-like Extension Header: the new octet defined in the present TS that is prepended to a basic RTP format frame to produce an enhanced RTP format frame. 3.2. Conventions Because the present document does not specify or refer to any kind of bit- oriented serial interface, the smallest elementary data unit of concern to the present TS is an 8-bit byte, also known as an octet. In this context the notion of bit order within an octet is meaningless, thus no bit numbering convention is used. However, a need exists to identify and refer to specific bits in an octet for the purpose of specifying encoding. The convention adopted for this TS is that individual bits within an octet shall be identified with hexadecimal masks, using the C programming language notation for hexadecimal numbers. From the most significant bit to the least significant bit in an octet, from left to right, these hexadecimal masks are: +----+----+----+----+----+----+----+----+ |0x80|0x40|0x20|0x10|0x08|0x04|0x02|0x01| +----+----+----+----+----+----+----+----+ A group of adjacent bits can be identified and referred to by combining the hexadecimal masks of individual bits with the bitwise OR function: for example, the 4 most significant bits of an octet may be collectively referred to by hexadecimal mask 0xF0. 3.3. Abbreviations For the purposes of the present document, the following abbreviations apply: BFI Bad Frame Indicator BTS Base Transceiver Station CN Core Network DL Downlink DTX Discontinuous Transmission EFR Enhanced Full Rate speech codec FR Full Rate speech codec GSM Global System for Mobile telecommunications RAN Radio Access Network RTP Real-time Transport Protocol SID SIlence Descriptor TAF Time Alignment Flag TC Transcoder TDM Time Division Multiplexing TEH Trau-like Extension Header TFO Tandem-Free Operation TRAU Transcoder and Rate Adaptor Unit TrFO Transcoder-Free Operation UL Uplink 4. Problem being solved The industry transition from T1/E1-based GSM RAN with TRAUs to IP-based RAN with RTP transport of codec frames, using standardized RTP payload formats of TS 101 318 [3] that are also duplicated in RFC 3551 [2], has resulted in certain functional regressions: certain functions of the traditional TDM-based GSM RAN cannot be represented in the industry standard RTP formats of [2] and [3], forcing IP-based GSM RAN implementors to either compromise on replicating the classic GSM architecture in its full functionality or intentionally deviate from RTP formats that have been accepted as standard by major industry players. Themyscira Wireless chose to do the latter, and has produced the present TS as a solution to the shortcomings of standard TS 101 318 or RFC 3551 RTP payload formats for FR and EFR. This chapter details the two specific shortcomings of these industry standard RTP formats that serve as motivators for the present TS. 4.1. Indicating BFI along with data bits The only way to indicate a BFI condition in standard RTP (for FR/EFR) is to either send no packet at all in the 20 ms window in question (industry standard behavior) or send an RTP packet with a zero-length payload ("rtp continuous-streaming" option in OsmoBTS). The latter option provides a timing tick for a CN-attached transcoder relying on the BTS-originating RTP stream as its timing source, but there is still no way to send a frame of marked-erroneous data bits. Contrast with TS 48.060 TRAU-UL format: in this format the Dn bits carrying FR or EFR frame bits and the C12 bit carrying BFI are orthogonal. Why would one care about known-bad or deemed-to-be-bad frame data bits? They do matter at least in the case of EFR: the official reference C-source EFR decoder from ETSI makes use of the "fixed codebook excitation pulses" portion of its EFR frame bits input (140 bits out of 244) even when BFI=1. This portion of reference C-source behavior is declared to be a non-normative example by the text of GSM 06.61 spec, thus there may be other compliant EFR decoder implementations that never look at marked-erroneous data bits - but given the ease of simply using the C code from ETSI as-is, or recoding it more efficiently but keeping unchanged all bit-exact algorithms, including non-normative ones, we should expect that the behavior of ETSI reference code is retained in many production implementations and deployments. Consider the case where a traditional E1-based BTS with a classic Abis interface is attached to an IP-based GSM RAN by way of OsmoBSC+OsmoMGW, and the resulting RTP stream then goes to a "soft TRAU" transcoder (TC) in the CN. The TC will feed its RTP input to FR and EFR decoders, and at least the EFR decoder makes use of "fixed codebook excitation pulses" bits from erroneous frames. Furthermore, the TC may implement in-band TFO (3GPP TS 28.062) inside its G.711 RTP output, in which case it will need to insert a slightly modified TRAU-UL frame into that output. The bits that would ideally be fed to the ETSI EFR decoder and emitted to the outside world in TFO frames already exist at the output of the E1-based BTS, but they get lost in the RTP transport when the industry standard RTP payload format is used. Consider another case where OsmoBTS does have an FR/EFR traffic frame that could potentially be sent out, but it is suppressed by the (tch_ind->lqual_cb >= bts->min_qual_norm) check in l1sap_tch_ind() in src/common/l1sap.c. In this case it would be ideal to send out that frame along with a BFI=1 indication, if the RTP transport format were to allow such representation. 4.2. Lack of TAF bit in standard RTP transport The TRAU-UL frame format of TS 48.060 for FR and EFR includes a bit called TAF, for Time Alignment Flag. Per the specs (TS 48.060 refers to TS 46.031 for definition and coding of frame indicators) this bit shall be set to 1 in one particular position in the 480 ms SACCH multiframe (the particular 20 ms frame position in which a valid frame is always transmitted, even during DTX pauses) and set to 0 in all other frames. This flag factors into the Rx DTX handler logic prescribed in GSM 06.31 and 06.81 specs for FR and EFR, respectively, and there exist production decoders for these codecs that implement their Rx DTX handler function exactly to the letter of the specs, including the use of TAF bit when deciding what to do with a BFI=1 frame received in the comfort noise generation state. (These spec-compliant decoders include the reference ETSI C-source decoder for EFR and Themyscira libgsmfrp for FR.) This TAF bit does not exist in the standard RTP transport for FR & EFR. The lack of this TAF bit causes the following problems for the CN-attached "soft TRAU" transcoder: 1) The ability to implement spec-compliant handling of GSM 06.11 or 06.61 section 5.4 requirement (same section in both specs) is lost; 2) The TC won't know when to set the TAF bit in its outgoing TFO frames, if it implements in-band TFO per 3GPP TS 28.062. The TFO problem is particularly concerning because these TFO frames are emitted to the outside world, outside of administrative and technical control of the party implementing the Osmocom-based GSM network and the TC at its edge. The resulting G.711 octet stream with TFO frames embedded inside can be carried half-way around the world by the international toll telephone network, and there is no telling what kind of implementation may be receiving and decoding these bits on the other end. For this reason, "poor man's" workarounds in the RTP-fed, TFO-generating TC are very unattractive: * If the TC were to set TAF=0 in all TFO frames it generates, the receiver's expectation of seeing TAF=1 in every 24th frame will be violated. * If the TC were to arbitrarily set TAF=1 in every 24th frame by its own free- running count, without knowledge of the actual SACCH alignment in the original GSM call leg, these TAF-marked frames won't coincide with those frame positions where the MS sends its SID frames, and the resulting TFO frame stream will be invalid to the receiving Rx DTX handler on the far end. The knowledge of which frames need to be marked with TAF=1 exists inside the entity that generates the FR/EFR RTP stream: if this entity is a converter from E1-based Abis to RTP, the TRAU-UL frames from the BTS contain this TAF bit, and if the RTP-generating entity is a native IP BTS, it knows the frame number for which it generates each RTP packet. The only problem is that there is no place to insert this TAF bit in the standard RTP transport format of TS 101 318. 5. TRAU-UL-like RTP format for FR and EFR As a solution to the problems described in chapter 4, we define a new RTP format for FR and EFR that explicitly mimics the functionality and semantics of TS 48.060 TRAU-UL frames for these two codecs, hereby called the enhanced or extended RTP format. 5.1. Enhanced RTP format definition The enhanced RTP payload format shall consist of a single octet called TRAU-like Extension Header (TEH), followed (most of the time) by the standard (same as in TS 101 318) 33 octets for FR or 31 octets for EFR. The TEH octet has the following structure: +----+----+----+----+----+----+----+----+ Hex mask | 0xF0 |0x08|0x04|0x02|0x01| +----+----+----+----+----+----+----+----+ Meaning | signature |DTXd|NDF |BFI |TAF | +----+----+----+----+----+----+----+----+ The following bit fields are defined within the TEH octet: signature: the upper nibble of the TEH octet shall be set to 0xE. This signature allows RTP packet receivers to identify the payload format by the upper nibble of the first octet: if it equals 0xC, the format is EFR without TEH, if it equals 0xD, the format is FR without TEH, and if it equals 0xE, then the first octet is TEH. DTXd: this bit is strictly identical with TRAU-UL frame bit C17. No_Data flag (NDF): this bit shall be set to 1 if the enhanced-format payload consists solely of TEH, with the standard 33-octet FR frame or 31-octet EFR frame entirely omitted, and shall be 0 otherwise. BFI: this bit is strictly identical with TRAU-UL frame bit C12. TAF: this bit is strictly identical with TRAU-UL frame bit C15. There are two possibilities for full composition of an enhanced-format RTP payload: Possibility 1: TEH with NDF=0 is followed by a standard 33-octet FR frame or a standard 31-octet EFR frame. The signature in the upper nibble of the octet immediately following TEH shall be correct: 0xD for FR or 0xC for EFR. Possibility 2: TEH with NDF=1 constitutes the entirety of the RTP payload for the 20 ms time window in question. If the No_Data flag is set, BFI shall also be set: the combination of NDF=1 and BFI=0 is invalid. Per this specification, the sender of a BFI packet has the choice of sending it in one of two forms: with or without presumed-erroneous frame bits. If the enhanced RTP packet is generated from bits received in an actual TRAU-UL frame (E1 Abis or TFO), erroneous frame bits shall be included, unchanged from the TRAU-UL source. However, if the entity generating the enhanced RTP packet is the ultimate point of origin (e.g., a native IP BTS), then it shall choose one form or the other based on the situation at hand: a) if the sender does have an FR or EFR frame "on hand" but that frame is considered to be erroneous (for example, the link quality check in l1sap_tch_ind() in OsmoBTS), the long form of BFI shall be sent, with the presumed-erroneous frame bits included. b) if the sender does not have any FR or EFR frame at all that could be sent (for example, if the reason for the BFI condition is because FACCH was successfully received and decoded instead of a traffic frame), then the No_Data form of BFI shall be sent. The option of No_Data BFI is provided in this RTP transport format specification because if this option were disallowed, senders would be tasked with an additional burden of having to artificially generate dummy or "garbage" frame bits. This task is slightly complicated, as detailed in Annex B, and the present design moves that task from all senders to only those receivers that need it. 5.2. Lack of SID classification bits matching TRAU-UL C13 & C14 TRAU-UL frame format includes two bits C13 & C14 that carry the ternany SID flag (0, 1 or 2) as defined in GSM 06.31 and 06.81 section 6.1.1 (same section in both specs). No equivalent bits are included in the enhanced RTP format of the present TS - however, these bits are redundant. The rules of section 6.1.1 in GSM 06.31 and 06.81, hereafter called S611 rules, specify a strictly deterministic, unambiguous formula by which these C13 & C14 bits derive their values from the bit content of the FR/EFR frame payload - thus if a TRAU-UL frame is received in which these C13 & C14 bits fail to match the S611 value derived from the contained payload, then that TRAU-UL frame is defective. There is no need to include such redundant bits in our enhanced RTP format, only to create confusion for receivers as to which source of SID S611 classification they should use. 5.3. Continuous RTP output The industry standard practice of producing an intentional gap in the RTP stream (sending no packets at all, but incrementing the RTP timestamp over the gap) is not compatible with the present TS. A BTS serving a traffic channel in FR or EFR codec that is configured to emit its RTP output according to the present TS shall emit an RTP packet carrying an enhanced-format payload per this TS in every 20 ms frame window without any exceptions; if the BTS has nothing else to send in a given frame, it shall emit a No_Data BFI packet with RTP payload consisting of just the TEH octet. 6. Mixing basic-format and enhanced-format RTP payloads An RTP stream receiver for FR/EFR codecs that supports the present extension to the RTP payload format shall behave gracefully when it receives a mixture of traditional TS 101 318 (or RFC 3551) payloads and enhanced-format payloads of the present TS in the same RTP stream: a) A receiver that has no interest in the additional information carried in the TRAU-like Extension Header shall simply strip the TEH octet when one is received, reducing the received payload to standard TS 101 318; if a BFI or No_Data payload is received, treat it the same as if nothing at all was received. b) A receiver that is interested in the TRAU-like Extension Header but receives an FR/EFR payload without one should behave as if it received a TEH with BFI=0, TAF=0, and a received zero-length RTP payload should be treated the same as receiving a No_Data enhanced payload with TAF=0. There may even be cases when an RTP sender may alternate between sending basic-format and enhanced-format payloads in the same session: for example, a TFO-supporting CN transcoder may emit basic-format payloads when supplying the output of its free-running speech encoder, but switch to sending enhanced payloads when it switches to forwarding bits received in TFO frames from the far end. Annex A (informative): Why TRAU-UL and not TRAU-DL The present TS defines the enhanced RTP format for FR and EFR as explicitly mimicking the functionality and semantics of TS 48.060 TRAU-UL frames for these two codecs. At this point a reader may reasonably ask: why TRAU-UL and not TRAU-DL? The answer is TFO: 3GPP TS 28.062 and its predecessor GSM 08.62 define the TFO frame format as being based on TRAU-UL frames with only a few bits changed, and no change in semantics of any of the frame indicator bits of TRAU-UL (C12 through C17). Whereas the Abis interface is inherently asymmetric (TRAU-UL frames in one direction, TRAU-DL frames in the other direction), end-to-end TFO is directionally symmetric. If we imagine a TFO call between Alice in America and Bob in Britain, there will be TRAU-UL frames flowing in both directions of the trans-oceanic G.711 toll connection, one set coming almost unchanged from Alice's BTS CCU and the other coming almost unchanged from Bob's BTS CCU. Of course each party's GSM call DL will require TRAU-DL frames to be fed to it, not TRAU-UL, but the necessary UL-to-DL conversion is the responsibility of the TFO receiver on each end. The general rules for turning a TRAU-UL frame into one for TRAU-DL are specified in TS 28.062 section C.3.2.1.1; it should be noted that this section spells out the requirements of what the UL-to-DL converter needs to do, but does not specify exactly how to do it algorithmically - the wording it uses is "subject to manufacturer dependent future improvements and is not part of this recommendation." At this point it is important to point out that native IP-based BTSes (if they aim to support FR, HR and EFR codecs properly) already have to implement functionality that is logically equivalent to the just-mentioned TS 28.062 section C.3.2.1.1. In a self-contained IP-based GSM network, a call from mobile A to mobile B is a TrFO call, and the BTS on each end will be receiving RTP packets from the uplink of call leg A while being responsible for constructing the downlink frame stream for call leg B. The needed UL-to-DL transformation in TrFO is a logical function fully equivalent to the one needed in TFO, spelled out in TS 28.062 section C.3.2.1.1. Looking from this perspective, we can see that RTP transport within an IP-based GSM RAN already fits into a place in the overall network architecture that logically corresponds to TRAU-UL and not TRAU-DL. It should now therefore be clear why TRAU-UL frames were chosen as the reference whose functionality and semantics need to be replicated in our enhanced RTP format. Annex B (normative): Feeding received BFI frames to an EFR decoder If an EFR decoder implementation is based on the reference C source from ETSI, this decoder requires that _some_ frame bits input be fed to it at all times, even when BFI=1. But what if the BFI packet came in as No_Data? In that case the receiver needs to synthesize its own fake "bad data" bits to feed to the standard decoder. When synthesizing "bad data" bits in this manner, the following rules should be observed: * The 140 bits corresponding to "fixed codebook excitation pulses" (35 bits in each of the 4 subframes) shall be filled using a PRNG. These bits are the ones used by the standard decoder when its internal state, based on previous good frames, puts it in GSM 06.61 substitution/muting mode as opposed to GSM 06.62 comfort noise generation mode. * The remaining 104 bits of the EFR frame shall be set to 0. These bits are never used by the standard decoder under the condition of BFI=1, and setting them to 0 prevents the possibility of S611 rules classifying the frame as SID even if the PRNG output in the other 140 bits happens to be all 1s in those bits of "fixed codebook excitation pulses" (70 bits out of 140) that also fall within the SID field (70 bits out of 95). Annex C (normative): Converting from TRAU-UL to enhanced RTP format There will be a need to convert from standard TS 48.060 TRAU-UL frames to the enhanced RTP format of the present TS in the following two scenarios: 1) When interfacing an E1 BTS to Osmocom RAN, when and if such support is to be added to OsmoMGW; 2) In the CN transcoder operating in TFO mode, when forwarding received TFO frames to the local RAN. In both cases the conversion is straightforward: * Always generate full-length enhanced RTP payloads, never generate No_Data in the case of a properly received TRAU-UL speech (not idle) frame. * Forward the payload bits directly from TRAU-UL to enhanced RTP, for both good and bad frames. * Directly forward BFI, TAF and DTXd frame indicator bits from TRAU-UL C-bits to TEH octet bits. * Ignore TRAU-UL C13 & C14 bits. Annex D (normative): Converting from enhanced RTP format to TRAU-UL This direction of conversion will need to be performed in the CN transcoder when emitting TFO frames toward the outside world. The following rules will need to be applied: * If the incoming enhanced RTP payload is full-length, as opposed to No_Data, simply copy the payload bits into the constructed TRAU-UL frame, for both good (BFI=0) and bad (BFI=1) frames. * If the incoming enhanced RTP payload is No_Data, put the following filler in the data bits portion of the TRAU-UL frame: - For FR codec, use the silence frame of 3GPP TS 46.011 Table 1 as the filler. - For EFR codec, perform the PRNG procedure of Annex B for the case of feeding a No_Data BFI packet to the standard ETSI decoder for EFR. Given that a TFO-frame-emitting transcoder still needs to run its regular speech decoder in order to fill the upper 6 bits of each outgoing G.711 sample octet, the same No_Data PRNG handler will typically be run just once for both internal decoding and TFO frame output. * Algorithmically set C13 & C14 bits in the generated TRAU-UL frame per the rules of S611. For software implementations, the following methods of computing these bits are officially endorsed as known to be correct: - Programs that link with Themyscira libgsmfrp and libgsmefr, version 1.0.0 or later for each library, may use gsmfr_preproc_sid_classify() and EFR_sid_classify() functions provided by these libraries. - Programs that link with libosmocore (any version that includes git commit ec65085d5f13e57ed486a31efc242ca9cd1b44c0, as long as the two functions introduced in this commit remain behaviorally unchanged) may use osmo_fr_sid_classify() and osmo_efr_sid_classify() functions provided by libosmocodec component of libosmocore. * Directly forward BFI, TAF and DTXd frame indicator bits from TEH octet bits to TRAU-UL C12, C15 and C17, respectively. Annex E (informative): Specification change history Version 1.0.1: initial publication for review by Osmocom community.