view doc/RTP-TRAUlike-format @ 207:185225722714

doc: new extended RTP format
author Mychaela Falconia <falcon@freecalypso.org>
date Thu, 06 Apr 2023 21:30:33 -0800
parents
children f0b90591f67c
line wrap: on
line source

TRAU-UL-like RTP transport format for FR & EFR codecs
=====================================================

The generally accepted industry standard format for RTP transport of FR and EFR
codec frames in an IP-based GSM RAN is given in ETSI TS 101 318; the same format
is also codified in IETF RFC 3551.  However, when compared to the classic
TRAU-UL format of 3GPP TS 48.060, the standard RTP format of RFC 3551 exhibits
the following two shortcomings:

1) no way to indicate a BFI condition and still send frame data bits;
2) no way to transport the Time Alignment Flag (TAF).

Both of these shortcomings will be explained in detail further in this document;
however, the primary purpose of this document is to propose a new, regrettably
non-standard, RTP transport format for FR & EFR codecs, for use only within a
GSM RAN and the immediately attached CN transcoder ("soft TRAU"), that provides
the same functionality as the classic TRAU-UL format of TS 48.060, but is
carried over RTP in IP rather than a 16 kbps TDM subchannel.

The non-standard RTP transport format presented in this document is implemented
in OsmoBTS on a private feature branch:

https://cgit.osmocom.org/osmo-bts/log/?h=falconia/rtp_traulike

OsmoBTS versions that include this code always accept TRAUlike FR/EFR packets
on their RTP input, following the principle of being liberal in what you accept
while being conservative in what you send, but emit such packets on their RTP
output only when this non-default vty config option is given:

rtp fr-efr-traulike

The recently added (mainline) "rtp continuous-streaming" vty config option also
needs to be enabled.

The present document serves as the formal specification for the TRAUlike RTP
transport format for FR and EFR.

Detailed description of shortcomings of standard RTP transport for FR & EFR
===========================================================================

These shortcomings are solved in the TRAUlike RTP transport format defined in
this document; understanding these shortcomings provides the essential rationale
for TRAU-like RTP.

Indicating BFI along with data bits
-----------------------------------

The only way to indicate a BFI condition in standard RTP (for FR/EFR) is to
either send no packet at all in the 20 ms window in question (industry standard
behavior and OsmoBTS default) or send an RTP packet with a zero-length payload
("rtp continuous-streaming" option in OsmoBTS).  The latter option provides a
timing tick for a CN-attached transcoder relying on the BTS-originating RTP
stream as its timing source, but there is still no way to send a frame of
marked-erroneous data bits.  Contrast with TS 48.060 TRAU-UL format: in this
format the Dn bits carrying FR or EFR frame bits and the C12 bit carrying BFI
are orthogonal.

Why would one care about known-bad or deemed-to-be-bad frame data bits?  They
do matter at least in the case of EFR: the official reference C-source EFR
decoder from ETSI makes use of the "fixed codebook excitation pulses" portion
of its EFR frame bits input (140 bits out of 244) even when BFI=1.  This
portion of reference C-source behavior is declared to be a non-normative example
by the text of GSM 06.61 spec, thus there may be other compliant EFR decoder
implementations that never look at marked-erroneous data bits - but given the
ease of simply using the C code from ETSI as-is, or recoding it more efficiently
but keeping unchanged all bit-exact algorithms, including non-normative ones,
we should expect that the behavior of ETSI reference code is retained in many
production implementations and deployments.

Consider the case where a traditional E1-based BTS with a classic TRAU interface
is attached to an IP-based Osmocom RAN by way of OsmoMGW, and the resulting RTP
stream then (after passing through another OsmoMGW instance at the MSC) goes to
a "soft TRAU" transcoder (TC) in the CN.  The TC will feed its RTP input to FR
and EFR decoders, and at least the EFR decoder makes use of "fixed codebook
excitation pulses" bits from erroneous frames.  Furthermore, the TC may
implement in-band TFO (3GPP TS 28.062) inside its G.711 RTP output, in which
case it will need to insert a slightly modified TRAU-UL frame into that output.
The bits that would ideally be fed to the ETSI EFR decoder and emitted to the
outside world in TFO frames already exist at the output of the E1-based BTS,
but they get lost in the RTP transport when the industry standard RTP payload
format is used.

Consider another case where OsmoBTS does have an FR/EFR traffic frame that
could potentially be sent out, but it is suppressed by the
(tch_ind->lqual_cb >= bts->min_qual_norm) check in l1sap_tch_ind() in
src/common/l1sap.c.  In this case it would be ideal to send out that frame
along with a BFI=1 indication, if the RTP transport format were to allow such
representation.

Lack of TAF bit in standard RTP transport
-----------------------------------------

The TRAU-UL frame format of TS 48.060 for FR and EFR includes a bit called TAF,
for Time Alignment Flag.  Per the specs (TS 48.060 refers to TS 46.031 for
definition and coding of frame indicators) this bit shall be set to 1 in one
particular position in the 480 ms SACCH multiframe (the particular 20 ms frame
position in which a valid frame is always transmitted, even during DTX pauses)
and set to 0 in all other frames.  This flag factors into the Rx DTX handler
logic prescribed in GSM 06.31 and 06.81 specs for FR and EFR, respectively, and
there exist production decoders for these codecs that implement their Rx DTX
handler function exactly to the letter of the specs, including the use of TAF
bit when deciding what to do with a BFI=1 frame received in the comfort noise
generation state.  (These spec-compliant decoders include the reference ETSI
C-source decoder for EFR and Themyscira libgsmfrp for FR.)

This TAF bit does not exist in the standard RTP transport for FR & EFR.  The
lack of this TAF bit causes the following problems for the CN-attached "soft
TRAU" transcoder:

1) The ability to implement spec-compliant handling of GSM 06.11 or 06.61
   section 5.4 requirement (same section in both specs) is lost;

2) The TC won't know when to set the TAF bit in its outgoing TFO frames, if it
   implements in-band TFO per 3GPP TS 28.062.

The TFO problem is particularly concerning because these TFO frames are emitted
to the outside world, outside of administrative and technical control of the
party implementing the Osmocom-based GSM network and the TC at its edge.  The
resulting G.711 octet stream with TFO frames embedded inside can be carried
half-way around the world by the international toll telephone network, and there
is no telling what kind of implementation may be receiving and decoding these
bits on the other end.  For this reason, "poor man's" workarounds in the
RTP-fed, TFO-generating TC are very unattractive:

* If the TC were to set TAF=0 in all TFO frames it generates, the receiver's
  expectation of seeing TAF=1 in every 24th frame will be violated.

* If the TC were to arbitrarily set TAF=1 in every 24th frame by its own free-
  running count, without knowledge of the actual SACCH alignment in the original
  GSM call leg, these TAF-marked frames won't coincide with those frame
  positions where the MS sends its SID frames, and the resulting TFO frame
  stream will be invalid to the receiving Rx DTX handler on the far end.

The knowledge of which frames need to be marked with TAF=1 exists inside the
entity that generates the FR/EFR RTP stream: if this entity is a converter from
E1-based Abis to RTP, the TRAU-UL frames from the BTS contain this TAF bit, and
if the RTP-generating entity is a native IP BTS, it knows the frame number for
which it generates each RTP packet.  The only problem is that there is no place
to insert this TAF bit in the standard RTP transport format of TS 101 318.

Why TRAU-UL and not TRAU-DL
===========================

The present document argues the case that the industry standard RTP transport
format for FR & EFR is functionally crippled compared to the TRAU-UL transport
format of 3GPP TS 48.060, and defines an alternative RTP transport format that
can be used by those who desire TRAU-UL-like functionality badly enough to
accept the price of going totally non-standard in their IP RAN transport.  The
new RTP transport format defined in this document explicitly mimics the
functionality and semantics of TS 48.060 TRAU-UL for FR and EFR.

At this point a reader may reasonably ask: why TRAU-UL and not TRAU-DL?  The
answer is TFO: 3GPP TS 28.062 and its predecessor GSM 08.62 define the TFO frame
format as being based on TRAU-UL frames with only a few bits changed, and no
change in semantics of any of the frame indicator bits of TRAU-UL (C12 through
C17).  Whereas the Abis interface is inherently asymmetric (TRAU-UL frames in
one direction, TRAU-DL frames in the other direction), end-to-end TFO is
directionally symmetric.  If we imagine a TFO call between Alice in America and
Bob in Britain, there will be TRAU-UL frames flowing in both directions of the
trans-oceanic G.711 toll connection, one set coming almost unchanged from
Alice's BTS CCU and the other coming almost unchanged from Bob's BTS CCU.  Of
course each party's GSM call DL will require TRAU-DL frames to be fed to it,
not TRAU-UL, but the necessary UL-to-DL conversion is the responsibility of the
TFO receiver on each end.

The general rules for turning a TRAU-UL frame into one for TRAU-DL are specified
in TS 28.062 section C.3.2.1.1; it should be noted that this section spells out
the requirements of what the UL-to-DL converter must do, but does not specify
exactly how to do it algorithmically - the wording it uses is "subject to
manufacturer dependent future improvements and is not part of this
recommendation."  Implementing all of these section C.3.2.1.1 rules (hereafter
called C3211 rules for short) exactly to the letter is quite easy for the FR
codec (Themyscira libgsmfrp does everything that is needed, and is a simple and
lightweight FLOSS function library), but much harder for EFR.  At the present
time it is unclear to the author of this document whether real historical T1/E1
TRAU implementations for which GSM 08.62 TFO was originally specified really did
implement C3211 rules to the letter, particularly for EFR, or if they cut some
corners.

Because the TRAUlike RTP transport format defined in this document is
semantically equivalent to TRAU-UL, any entity that receives such RTP packets
but internally needs to generate either TRAU-DL or some private functional
equivalent thereof will need to perform the same UL-to-DL conversion as called
for in TFO.  The lack of a readily available function library that implements
the onerous rules of C3211 for EFR is certainly an obstacle, but it is also
possible to "cut corners" by doing the following:

1) Ignore Table C.3.2.1-1 case 1 and treat it like case 2, at least for EFR:
   whenever SID frames are received on the incoming TRAU-UL or TRAUlike RTP
   interface, forward them to call leg B even when that destination call leg
   has no DTXd.  Given that DTX and SID support has been an integral part of
   the EFR codec from the beginning, as opposed to an after-addition in the
   case of FR, every GSM MS that supports EFR can be expected to understand
   SID frames on the downlink.

2) During speech pauses following transmission of a SID frame on call leg B DL,
   if real DTXd (turning off Tx) is not allowed, do "fake DTXd" by transmitting
   dummy FACCH with an L2 fill frame in the same 20 ms traffic frame windows in
   which real DTXd would have been exercised if it were allowed.

3) Whenever a BFI condition is encountered in the incoming TRAU-UL or TRAUlike
   RTP frame stream outside of SID, i.e., the case described in the first
   paragraph of section C.3.2.1.1, induce an intentional BFI condition in the
   receiving GSM MS by transmitting a dummy FACCH frame as above, instead of
   trying to devise a parameter-level ECU for EFR.

It should be noted that the just-outlined "cut corners" method is exactly what
OsmoBTS (and a "pure" Osmocom network in general) does currently, hence nothing
is lost and no regression is introduced by continuing to do the same.

Seen another way, by making our RTP transport semantically equivalent to
TRAU-UL, we achieve harmonization between TFO and TrFO.  TrFO (Transcoder-Free
Operation) is a scenario in which the RTP output from one IP BTS for call leg A
goes directly to the RTP input of another IP BTS for call leg B, possibly
passing through simple RTP forwarders like OsmoMGW, but never passing through
any transcoder.  TrFO is what happens in a self-contained Osmocom network
without any external MNCC connected to OsmoMSC.  The principal rules of what
transformations are inherently necessary in order to produce a fully proper DL
for call leg B from the UL of call leg A remain the same whether the transport
in between is old-fashioned TFO or modern TrFO, hence the same conversions that
are codified in TS 28.062 section C.3.2.1.1 are still needed - the only question
is where in the network are they to be performed.  The original TDM-based GSM
designers at ETSI gave us a superb architecture end to end; by employing an RTP
transport that is semantically equivalent to TRAU-UL, we can preserve that whole
architecture fully intact in an all-IP implementation.

Specification for TRAUlike RTP payload format for FR and EFR
============================================================

The modified RTP payload format shall consist of a single octet called TRAUlike
Extension Header (TEH), followed (most of the time) by the standard (same as in
RFC 3551) 33 octets for FR or 31 octets for EFR.  The TEH octet has the
following structure:

         +----+----+----+----+----+----+----+----+
Hex mask |       0xF0        |0x08|0x04|0x02|0x01|
         +----+----+----+----+----+----+----+----+
Meaning  |     signature     |DTXd|NDF |BFI |TAF |
         +----+----+----+----+----+----+----+----+

(Bit numbers are identified by hex masks in order to avoid getting into an
 argument over which bit numbering convention should be used.)

The following bit fields are defined within the TEH octet:

signature: the upper nibble of the TEH octet shall be set to 0xE.  This
signature allows RTP packet receivers to identify the payload format by the
upper nibble of the first octet: if it equals 0xC, the format is EFR without
TEH, if it equals 0xD, the format is FR without TEH, and if it equals 0xE, then
the first octet is TEH.

DTXd: this bit is strictly identical with TRAU-UL frame bit C17.

No_Data flag (NDF): this bit shall be set to 1 if the TRAUlike payload consists
solely of TEH, with the standard 33-octet FR frame or 31-octet EFR frame
entirely omitted, and shall be 0 otherwise.

BFI: this bit is strictly identical with TRAU-UL frame bit C12.

TAF: this bit is strictly identical with TRAU-UL frame bit C15.

There are two possibilities for full composition of a TRAUlike RTP payload:

Possibility 1: TEH with NDF=0 is followed by a standard 33-octet FR frame or a
standard 31-octet EFR frame.  The signature in the upper nibble of the octet
immediately following TEH shall be correct: 0xD for FR or 0xC for EFR.

Possibility 2: TEH with NDF=1 constitutes the entirety of the RTP payload for
the 20 ms time window in question.

If the No_Data flag is set, BFI must also be set: the combination of NDF=1 and
BFI=0 is invalid.

Per this specification, the sender of a BFI packet has the choice of sending it
in one of two forms: with or without presumed-erroneous frame bits.  If the
TRAUlike RTP packet is generated from bits received in an actual TRAU-UL frame
(E1 Abis or TFO), erroneous frame bits shall be included, unchanged from the
TRAU-UL source.  However, if the entity generating the TRAUlike RTP packet is
the ultimate point of origin (e.g., a native IP BTS), then it shall choose one
form or the other based on the situation at hand:

a) if the sender does have an FR or EFR frame "on hand" but that frame is
   considered to be erroneous (for example, the link quality check in
   l1sap_tch_ind() in OsmoBTS), the long form of BFI shall be sent, with the
   presumed-erroneous frame bits included.

b) if the sender does not have any FR or EFR frame at all that could be sent
   (for example, if the reason for the BFI condition is because FACCH was
   successfully received and decoded instead of a traffic frame), then the
   No_Data form of BFI shall be sent.

The option of No_Data BFI is provided in this RTP transport format specification
because if this option were disallowed, senders would be tasked with an
additional burden of having to artificially generate dummy or "garbage" frame
bits.  This task is slightly complicated, as explained in the following section,
and the present design moves that task from all senders to only those receivers
that need it.

Lack of SID classification bits matching TRAU-UL C13 & C14
----------------------------------------------------------

TRAU-UL frame format includes two bits C13 & C14 that carry the ternany SID flag
(0, 1 or 2) as defined in GSM 06.31 and 06.81 section 6.1.1 (same section in
both specs).  No equivalent bits are included in the TRAUlike RTP transport
format as defined by this specification - however, these bits are redundant.
The rules of section 6.1.1 in GSM 06.31 and 06.81, hereafter called S611 rules,
specify a strictly deterministic, unambiguous formula by which these C13 & C14
bits derive their values from the bit content of the FR/EFR frame payload -
thus if a TRAU-UL frame is received in which these C13 & C14 bits fail to match
the S611 value derived from the contained payload, then that TRAU-UL frame is
defective.  There is no need to include such redundant bits in our TRAUlike RTP
format, only to create confusion for receivers as to which source of SID S611
classification they should use.

Feeding received TRAUlike BFI frames to an EFR decoder
======================================================

If an EFR decoder implementation is based on the reference C source from ETSI,
this decoder requires that _some_ frame bits input be fed to it at all times,
even when BFI=1.  But what if the BFI packet came in as No_Data?  In that case
the receiver must synthesize its own fake "bad data" bits to feed to the
standard decoder.  When synthesizing "bad data" bits in this manner, the
following rules should be observed:

* The 140 bits corresponding to "fixed codebook excitation pulses" (35 bits in
  each of the 4 subframes) shall be filled using a PRNG.  These bits are the
  ones used by the standard decoder when its internal state, based on previous
  good frames, puts it in GSM 06.61 substitution/muting mode as opposed to
  GSM 06.62 comfort noise generation mode.

* The remaining 104 bits of the EFR frame shall be set to 0.  These bits are
  never used by the standard decoder under the condition of BFI=1, and setting
  them to 0 prevents the possibility of S611 rules classifying the frame as SID
  even if the PRNG output in the other 140 bits happens to be all 1s in those
  SID codeword bit positions (70 out of 140) that fall within the "fixed
  codebook excitation pulses" portion.

Converting from TRAU-UL to TRAUlike RTP
=======================================

There will be a need to convert from standard TS 48.060 TRAU-UL frames to our
TRAUlike RTP format in the following two scenarios:

1) When interfacing an E1 BTS to Osmocom RAN, when and if such support is to be
   added to OsmoMGW;

2) In the CN transcoder operating in TFO mode, when forwarding received TFO
   frames to the local RAN.

In both cases the conversion is straightforward:

* Always generate full-length TRAUlike RTP payloads, never generate No_Data in
  the case of a properly received TRAU-UL speech (not idle) frame.

* Forward the payload bits directly from TRAU-UL to TRAUlike RTP, for both good
  and bad frames.

* Directly forward BFI, TAF and DTXd frame indicator bits from TRAU-UL C-bits
  to TEH octet bits.

* Ignore TRAU-UL C13 & C14 bits.

Converting from TRAUlike RTP to TRAU-UL
=======================================

This direction of conversion will need to be performed in the CN transcoder when
emitting TFO frames toward the outside world.  The following rules will need to
be applied:

* If the incoming TRAUlike RTP payload is full-length, as opposed to No_Data,
  simply copy the payload bits into the constructed TRAU-UL frame, for both
  good (BFI=0) and bad (BFI=1) frames.

* If the incoming TRAUlike RTP payload is No_Data, put the following filler in
  the data bits portion of the TRAU-UL frame:

  - For FR codec, use the silence frame of 3GPP TS 46.011 Table 1 as the filler.

  - For EFR codec, perform the same PRNG procedure as detailed earlier in this
    document for the case of feeding a No_Data BFI packet to the standard ETSI
    decoder for EFR.  Given that a TFO-frame-emitting transcoder still needs to
    run its regular speech decoder in order to fill the upper 6 bits of each
    outgoing G.711 sample octet, the same No_Data PRNG handler will typically
    be run just once for both internal decoding and TFO frame output.

* Algorithmically set C13 & C14 bits in the generated TRAU-UL frame per the
  rules of S611.  This step can be done using osmo_{fr,efr}_sid_classify()
  functions proposed in this Gerrit patch submission:

  https://gerrit.osmocom.org/c/libosmocore/+/32183

  or using equivalent functions in Themyscira libgsmefr and libgsmfrp.

* Directly forward BFI, TAF and DTXd frame indicator bits from TEH octet bits
  to TRAU-UL C12, C15 and C17, respectively.

Mixing standard RFC 3551 and TRAUlike RTP payloads
==================================================

An RTP stream receiver for FR/EFR codecs that supports the present non-standard
extension to the RTP payload format shall behave gracefully when it receives a
mixture of standard RFC 3551 payloads and TRAUlike payloads in the same RTP
stream.  A receiver that has no interest in the additional information carried
in the TRAUlike Extension Header shall simply strip the TEH octet when one is
received, reducing the received payload to standard RFC 3551; if a BFI or
No_Data payload is received, treat it the same as if nothing at all was
received.  A receiver that is interested in the TRAUlike Extension Header but
receives an FR/EFR payload without one should behave as if it received a TEH
with BFI=0, TAF=0, and a received zero-length RTP payload should be treated the
same as receiving a No_Data TRAUlike payload with TAF=0.

There may even be cases when an RTP sender may alternate between sending
standard RFC 3551 payloads and TRAUlike payloads in the same session: for
example, a TFO-supporting CN transcoder may emit "plain" RFC 3551 payloads when
supplying the output of its free-running speech encoder, but switch to sending
TRAUlike payloads when it switches to forwarding bits received in TFO frames
from the far end.