# HG changeset patch # User Mychaela Falconia # Date 1671415348 0 # Node ID 8a45cd92e3c36680c72076623058a472606bbfd3 # Parent 7aaed576fa26e0fd7e7d44dcc75365b3afd5d82d TCH-tap-modes: new article diff -r 7aaed576fa26 -r 8a45cd92e3c3 TCH-tap-modes --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/TCH-tap-modes Mon Dec 19 02:02:28 2022 +0000 @@ -0,0 +1,416 @@ +It has been discovered that the DSP ROM in the Calypso GSM baseband processor +makes it possible to "tap" into speech traffic on GSM traffic channels (TCH): + +1) In the downlink direction, the signal processing chain which every GSM MS + must implement includes a GSM 05.03 channel decoder, operating in one of + several variants as necessary for each supported TCH mode, followed by speech + decoders for each supported codec. TI's DSP naturally implements this + required signal processing chain, and this implementation includes one nifty + feature: the bits that make up the internal interface from GSM 05.03 channel + decoder output to the input of speech decoders are written into the NDB API + RAM page that is also accessible to the ARM core, and these bits can be + externally read out. The act of reading these bits is completely + non-invasive (we are only reading bits that are already there, not modifying + anything), thus we can sniff TCH downlink on any voice call in real time + without disrupting or impacting standard type-approved GSM MS operation in + any way. + +2) In the uplink direction, there is a reverse signal processing chain in which + the output of the internal speech encoder for the selected codec feeds into + the input of the corresponding GSM 05.03 channel encoder. In this direction + there are two tapping possibilities: + +2a) There is a buffer in the NDB API RAM page from which one can read the bits + that pass from the speech encoder output to the channel encoder input - + let's call this form of TCH tap "uplink sniffing"; + +2b) There is a special mode in which the output of the internal speech encoder + is effectively suppressed and the input to the channel encoder comes from + another NDB API RAM buffer that needs to be filled by ARM firmware - let's + call this form of TCH tap "uplink substitution". + +Sources of knowledge about these DSP functions +============================================== + +For the functions of TCH DL sniffing (tap 1 in the above summary) and TCH UL +substitution (tap 2b in the above summary), the primary source of knowledge is +the defunct '#if TRACE_TYPE==3' code in TSM30 and LoCosto L1 sources. I call +this code defunct because the TRACE_TYPE preprocessor symbol is set to 4 (not 3) +in both TCS211 and LoCosto versions, and appears to be set to 0 (all trace +disabled) in the ancient TSM30 build. This code appears to be some very old +test mode, apparently sending some test bit patterns into TCH UL and expecting +the same bit patterns back on TCH DL, presumably with a test instrument such as +CMU200 providing a loopback from UL to DL on this test TCH, and has only +survived in an incomplete form: + +* There are '#if TRACE_TYPE==3' stanzas in l1_cmplx.c, in both TSM30 and LoCosto + versions, that implement DSP buffer writing for TCH UL substitution (TCH/F + only) and timing control for TCH DL buffer reading (both TCH/F and TCH/H), + calling a function named play_trace() for the latter. + +* There is no play_trace() code in the LoCosto source. but there is an + hw_debug.c source module in the TSM30 code drop under MCU/Layer1/L1c/Src, + and it contains (presumed) TI-legacy play_trace() and play_diagnostics() + functions, once again under '#if (TRACE_TYPE==3)'. play_trace() reads the + DSP's TCH DL buffer and saves the bits in an ARM firmware RAM buffer, and + then play_diagnostics() analyzes the captured booty - and studying the second + function is how we learn the apparent original intent of doing test bit + patterns on TCH. + +* The code that feeds "UL play" test bit patterns to the earlier-mentioned + '#if TRACE_TYPE==3' TCH UL substitution code in l1_cmplx.c (apparently once + hacked into dll_read_dcch() and tx_tch_data()) has not been found anywhere. + +For TCH tap 2a in our summary at the beginning of this article (non-invasive +sniffing of TCH UL bits produced by the internal speech encoder) there does not +exist any authoritative source of knowledge. It naturally follows from +otherwise-known Calypso DSP architecture that these internally produced TCH UL +bits should reside in the "main" a_du_0 buffer (or in a_du_1 when TCH/H +subchannel 1 is active), and I (Mother Mychaela) have heard an anecdotal report +(from someone who once worked with Calypso in a non-community-based manner) that +these UL bits could indeed be read out of this buffer - but in the absence of +an authoritative source, we don't know when would be the correct time to read +this buffer. + +In our current state of knowledge, only TCH DL sniffing can be exercised safely: +for UL sniffing we don't know the correct time when the buffer would need to be +read, while active UL substitution is obviously an invasive hack involving a DSP +debug or test feature that is never used in standard GSM MS operation. + +Support for different speech codecs +=================================== + +When it comes to passively sniffing TCH DL and/or UL, we are merely reading bits +that are already there, and basic reasoning tells us that the DSP's DL and UL +buffers involved in this exercise exist in all speech TCH modes supported by +the DSP: FR1, HR1, EFR and AMR. However: + +* The ancient '#if TRACE_TYPE==3' reference code exists only for FR1, HR1 and + EFR - it clearly predates the addition of AMR in the later Calypso DSP + versions. + +* FR1, HR1 and EFR are the only codecs for which we (FreeCalypso community) know + the format in which TCH DL bits appear in the DSP's a_dd_0 and a_dd_1 buffers. + +* I (Mother Mychaela) have heard an anecdotal report (from the same + non-community-based party mentioned earlier) that TCH DL bits could be read + out of a_dd_0 buffer in TCH/AFS (AMR) mode - but I never got any details. + +In contrast with passive sniffing, active TCH UL substitution requires explicit +support from the DSP - and this explicit DSP support is known to exist for +certain only for TCH/FS and TCH/EFS channel modes, i.e., for FR1 and EFR codecs +only. In the case of TCH/HS channel mode (HR1 codec), it *appears* that the DSP +supports UL substitution in this mode too, but this combination has only been +exercised by OsmocomBB people (the original '#if TRACE_TYPE==3' code for UL play +only supports TCH/F), and FreeCalypso policy is to treat everything coming out +of OBB as highly suspect. + +What about AMR? The anecdotal report (from the same already-mentioned party) is +that TCH UL substitution that works for FR1 and EFR appears to NOT work for AMR +- that's all I know - but frankly speaking, given that it's a weird DSP debug +mode that is never needed in standard GSM MS operation, I find it more +surprising that it works for FR1 and EFR than the observation that it doesn't +work for AMR. + +FreeCalypso support for TCH tap functions +========================================= + +TCH DL sniffing and UL substitution provisions were initially implemented in +FreeCalypso back in 2016, but only in the Citrine version, which was deemed to +be a dead end later that same year. However, this functionality is now being +resurrected, and it has been incorporated into our production FC Tourmaline +firmware as of 2022-12-13. + +In order to activate the function of TCH DL sniffing and save the recording of a +TCH DL session into a file, one needs to use the fc-shell utility from FC host +tools, specifically the tch record command in an interactive fc-shell session. +The format in which TCH DL tap traffic is passed over RVTMUX (an original +FreeCalypso invention) has changed in a slight but incompatible way between the +original hackish version from 2016 and the new production version as of 2022, +and capturing TCH DL with new firmware requires the updated version of fc-shell +that will be released as part of fc-host-tools-r18. The current (late 2022) +incarnation of FreeCalypso TCH DL sniffing feature supports FR1, HR1 and EFR +codecs, although only FR1 and EFR have been tested so far. + +The function of TCH UL substitution is currently implemented in FC Tourmaline +only for FR1 and EFR (no HR1, no AMR), and it likewise requires running an +interactive fc-shell session in which you would invoke the tool's tch play +command. In the case of TCH UL play feature there has been NO change in the +RVTMUX transport format between 2016 and 2022 versions. + +TCH DL DSP buffers and capture format +===================================== + +The DSP's NDB API page has two buffers in which TCH DL bits appear: a_dd_0 and +a_dd_1. All TCH/F modes use a_dd_0, but TCH/H uses one buffer or the other +depending on the subchannel: subchannel 0 uses a_dd_0, subchannel 1 uses a_dd_1. +(It is certainly a strange design - the DSP won't be able to receive and decode +the "wrong" subchannel because it doesn't know the ciphering key for the other +MS - but perhaps the designers of this DSP architecture aeons ago found this +design to somehow flow more naturally with their scheduling of DSP tasks.) Each +buffer consists of 22 16-bit words - they were originally 20 words, but then +extended to 22 words to support CSD 14.4 kbps mode. + +Each TCH buffer in the DSP's NDB API page consists of 3 status or header words +followed by N words of payload, where N depends on TCH mode: 17 for TCH/FS and +TCH/EFS, 8 for TCH/HS, and not-yet-studied for AMR and CSD. Let's begin our +analysis with the 3 status words that make up the buffer header: + +Status word 0 (a_dd_0[0] or a_dd_1[0]) is a word of flag bits. We don't know +the meaning of every bit in this word, but at least for TCH/FS and TCH/EFS (we +haven't exercised TCH/HS at all) we know the following bits: + +* Bit 15 (B_BLUD) is a "buffer filled" or "data present" flag. This flag is + observed as 1 in *almost* every 20 ms window in which a traffic frame is + expected (fn_report_mod13_mod4 == 0 in l1s_read_dedic_dl(), case TCHTF), + except for certain instances early in the call setup process which remain to + be studied. + +* Bit 14 (B_AF) will be set if the block of 8 half-bursts (block diagonal + interleaving of GSM 05.03) corresponding to this buffer was channel-decoded + as speech rather than as FACCH - see further analysis below. + +* Bit 9 (B_ECRC) has only ever been observed as 1 when B_AF is set, i.e., when + the speech-not-FACCH channel decoder was invoked. In the case of TCH/EFS this + bit is set to 1 if the EFR-added CRC-8 was bad, and cleared if this CRC-8 was + good; in the case of TCH/FS this bit has always been observed as 1 and should + be ignored because there is no CRC-8 in TCH/FS. + +* Bit 7 has always been observed as 1 wheneven B_BLUD is set but B_AF is + cleared, i.e., whenever the block was channel-decoded in FACCH rather than + speech mode. + +* Bits 6:5 indicate the result of FIRE decoding in the event that the FACCH + decoder was invoked. + +* Bits 4:3 carry the ternary SID flag encoded as in section 6.1.1 of GSM 06.31 + and 06.81, but only when the speech-not-FACCH channel decoder was invoked as + indicated by B_AF. + +* Bit 2 is BFI as defined in section 6.1.1 of GSM 06.31 and 06.81. Whenever + the block was decoded as FACCH (bit 14 clear, bit 7 set), bit 2 has always + been observed as set, agreeing with the stipulation in GSM 06.31 and 06.81 + that BFI=1 whenever a FACCH frame has been received. However, in the case of + TCH/EFS it appears that CRC-8 status (reported in bit 9) is NOT factored into + the logic that sets bit 2 - it appears that the subsequent speech decoding + logic is expected to OR bits 2 and 9 together to get the BFI flag for the Rx + DTX handler of GSM 06.81. + +In the case of 20 ms blocks (reassembled from 8 half-bursts) that were channel- +decoded as speech rather than FACCH, the observed behavior is that bits 15 and +14 are set, the payload portion of the buffer is filled with the output from the +channel decoder, and bits 4:3 are set from this payload by the bit-counting rule +of section 6.1.1 of GSM 06.31 and 06.81 irrespective of the good-or-bad status +in bits 2 and 9. However, when bit 14 is clear and bit 7 is set, indicating +that the block (from 8 half-bursts) was channel-decoded in FACCH mode, the +following additional behavior is observed: + +* The payload portion of the buffer remains unchanged from its previous content, + last written when a frame was channel-decoded in speech-not-FACCH mode; + +* Bit 2 is set, bit 9 is cleared; + +* Bits 4:3 are cleared even when they previously indicated SID based on the bit + pattern in the payload portion of the buffer, even when that SID-encoding + payload is still there. + +In the standard TCH DL signal processing chain, GSM 05.03 channel decoding is +followed by the Rx DTX handler of GSM 06.31 or 06.81 for TCH/FS or TCH/EFS, +respectively. It appears that the Rx DTX handler implemented in TI's DSP is +driven by this status word 0 at the head of the buffer, and we can only guess +as to its exact logic. At this point it bears reminding that the functions of +the Rx DTX handler are not rigidly prescribed in the specs: in the case of EFR +the bit-exact reference implementation is normative only in certain aspects +(e.g., comfort noise generation after receiving SID), but is considered a non- +normative example in some other key aspects (all GSM 06.61 functions, including +what happens when a FACCH block was received when speech frames were expected), +and in the case of FR1 there is no bit-exact reference implementation at all, +only general guidance. + +Having the curiosity of a cat, I (Mother Mychaela) naturally desire to know +exactly how the Rx DTX handler (the bridge between the channel decoder and the +speech decoder) works in TI's DSP. A full static reversing job on the DSP ROM +would provide complete answers, but is a very daunting proposition, thus I am +also looking at the idea of behavioral analysis: the output of the speech +decoder can be captured from MCSI on FCDEV3B hardware, or from the VSP tap on +FC Venus if we ever build that board, and if we combine that speech decoder +output capture with the currently-discussed capture of TCH DL buffers, we may +be able to glean some insight into the workings of the Rx DTX handler block: we +could implement a candidate Rx DTX handler clone in software and compare the +output (of this proposed handler followed by the spec-defined speech decoder) +against the actual speech output from the DSP. + +Back to our exposition of TCH DL buffer content: + +Status word 1 (a_dd_0[1] or a_dd_1[1]) is some kind of DSP measurement or count +which Calypso ARM fw does not need to look at, except when debugging - the only +code which I (Mother Mychaela) could find that does anything with this DSP +status word is the ancient play_diagnostics() code in the TSM30 version +(obviously never included in any production fw); this code looks at the unknown +word in question and calls it "D_MACC". This play_diagnostics() code compares +the D_MACC reading against a threshold, and if the per-block reading is below +the threshold, an error message is printed. That's all we know! + +Status word 2 (a_dd_0[2] or a_dd_1[2]) is a bit error count: the code in +l1s_read_dedic_dl() reads this error count and uses it for RXQUAL computation +for measurement reports. + +If one's area of interest is in replicating Rx DTX handling and speech decoding +that happens in the DSP, status words 1 and 2 can probably be ignored - instead +the important parts are status word 0 (extensively covered above) and the +payload portion of the buffer. + +The payload portion of the buffer consists of some number of 16-bit words: 17 +of them for TCH/FS and TCH/EFS, or 8 of them for TCH/HS. The DSP does not have +any notion of 8-bit bytes, instead it operates on 16-bit words as its elementary +data unit. The ordering of bits within these 16-bit words (in the payload +portion of TCH buffers) is from the most-significant bit toward the least- +significant bit, thus when these TCH buffers are transferred via octet-oriented +interfaces, the upper byte of each word should be transferred first, even though +this byte order is counter to the little-endian byte order of the Calypso ARM +core. + +In the case of TCH/FS and TCH/EFS, the fill order of bits in the payload words +is as follows, starting with the most-significant bit of buffer word 3 (first +word of the payload portion): + +* 182 bits of class 1; + +* 4 dummy bits (always observed as 0); + +* 78 bits of class 2; + +* the last 8 bits of a_dd_0[19] are unused. + +In the case of TCH/HS, the fill order is similar, but modified as appropriate +for TCH/HS: + +* 95 bits of class 1; + +* 4 dummy bits; + +* 17 bits of class 2; + +* the last 12 bits of a_dd_0[10] or a_dd_1[10] are unused. + +Aside from the insertion of 4 extra dummy bits at the boundary between class 1 +and class 2, the overall bit order is that of GSM 05.03 Figure 1 interface 1. + +In the case of TCH/EFS, the following additional considerations apply: + +* Bits [65:73] in all received DL frames, where CRC-8 would go in the 260-bit + frame of GSM 05.03 interface 1 for EFR, are always observed as 0, whether + this CRC-8 was good (a_dd_0[0] bit 9 clear) or bad (a_dd_0[0] bit 9 set). + +* The handling of repetition bits (4 bits of 244-bit EFR codec frame, each of + which is triplicated in the channel encoding for transmission) is unclear. + +Further detail regarding the repetition bits of TCH/EFS: distinct bit positions +exist in the 260-bit frame of GSM 05.03 interface 1 (which is the frame format +in the TCH buffers of TI's DSP) for each of the 3 copies of each of the 4 +triplicated bits. It is obvious that correct decoding of these triplicated bits +requires a majority-vote function just like the one implemented in TMR systems +in space gear - but it is not absolutely and unquestionably obvious where this +TMR voting function is implemented in the Rx processing chain of TI's DSP. It +*appears* that this majority-vote function has already been performed by the DSP +function that writes a_dd_0, and that the first bit position out of each group +of 3 holds the output of this voting function, so that the subsequent speech +decoder only needs to use those "cooked" bits - but there is this mystery: + +* At certain times, particularly during the main part of a test call, TCH DL + buffer readouts contain zeros in the "extra" repetition bit positions: for + each group of 3 bits, the first will contain 0 or 1, but the other two will + always be 0. + +* At other times, seemingly in the beginning and ending parts of test calls, + TCH DL buffer readouts contain matching bit values in all 3 positions: for + each group of 3 bits, if the first bit is 0, the other two will also be 0, or + if the first bit is 1, then the other two will also be 1. + +One possibility is that the DSP applies the required majority-voting function, +writes its output into the first bit position of each group of 3, but then +sometimes (and not at other times) applies another function that writes the +voting function output into the remaining bit positions, perhaps for loopback +of TCH DL into TCH UL. More study is needed in this area. + +FreeCalypso file format for TCH DL captures +=========================================== + +The file format written by fc-shell tch record command is ASCII hex, line-based, +with one line for every captured 20 ms window. The new format as of 2022 is: + +* Each line begins with an FR, HR or EFR keyword indicating which variant of + TCH DL has been captured; + +* This keyword is followed by 3 space-separated DSP status words, each written + as 4 hex digits; + +* The main body of the frame is written as 33 (TCH/FS & TCH/EFS) or 15 (TCH/HS) + hex bytes, produced from the payload portion of the TCH DL buffer by turning + each 16-bit word into 2 bytes (MSB first) and discarding the last byte that + is unused (always 0); + +* Each line ends with a frame number in decimal, specifically the value of + fn_mod_104 variable in the l1s_read_dedic_dl() function when the DSP buffer + was read. + +The addition of the frame number field allows these TCH DL captures to be +reconciled against the SACCH multiframe structure, which matters for the rules +of DTX. + +TCH UL substitution: open questions +=================================== + +Moving from the mostly-understood realm of TCH DL capture into the much more +experimental realm of TCH UL substitution, we have some open questions: how does +this DSP special mode really work? Here is what we know: if we load externally +sourced speech frames into otherwise-unused a_du_1 DSP buffer at the time of +(fn_report_mod13_mod4 == 3), which is the same time when FACCH or CSD UL would +be expected, and set B_PLAY_UL bit in DSP NDB API word d_tch_mode, the speech +frame stream going to the other end of the call will be the one we feed into +a_du_1 instead of the one produced from the microphone input by the internal +speech encoder. But here are the parts we don't know: + +* If one were to set B_PLAY_UL in d_tch_mode but not feed external UL input + into a_du_1 buffer at the needed time, what will happen? + +* Vice-versa, if one were to load a_du_1 and set its B_BLUD bit without setting + B_PLAY_UL in d_tch_mode, what will happen? + +* Can the frame stream fed into a_du_1 be encoded in DTX-enabled mode, including + SID frames? If this possibility is allowed, what magic bits would need to be + set where in order to get the correct behavior from the DSP's subsequent + burst-by-burst DTX logic? + +TCH UL substitution: implemented PoC +==================================== + +Back in 2016 we implemented a proof-of-concept TCH UL play feature in +FreeCalypso (only for TCH/FS and TCH/EFS), and the same PoC has been retained +when the overall TCH tap facility has been mainlined in late 2022. Having this +highly experimental (not fit for production use) TCH UL play code present in our +current production fw is deemed acceptable because this code will never be +invoked unless the user sends TCH_ULBITS_REQ packets to the running fw via +RVTMUX - and if you do send such packets (via tch play command in an fc-shell +session or by any other means), you are leaving the realm of production-approved +functionality and entering the realm of wild experimentation. + +The PoC TCH UL play mechanism consists of a small buffer (holding up to 4 FR1 or +EFR frames) implemented in the ARM firmware; this buffer is filled by arriving +TCH_ULBITS_REQ packets and drained by the tchf_substitute_uplink() function +called from l1s_ctrl_tchtf(). Specifically, a flag named tch_ul_play_mode is +set when TCH_ULBITS_REQ input is received, telling l1s_ctrl_tchtf() to start +calling tchf_substitute_uplink() when (fn_report_mod13_mod4 == 3); the called +function drains an uplink frame from the ring buffer, writes it into the DSP's +a_du_1 buffer, sets B_PLAY_UL in d_tch_mode and sends a TCH_ULBITS_CONF packet +back to the host. If the ring buffer is empty, the function clears both +B_PLAY_UL and the firmware's tch_ul_play_mode flag, ending the special TCH UL +play mode. + +This PoC mechanism is meant to be exercised with tch play command in an +interactive fc-shell session: this command reads an ASCII line-based uplink data +file and sends it to the firmware frame by frame, paced by TCH_ULBITS_CONF +packets from the target. The input to this command is a line-based ASCII hex +file similar to the format written by tch record, but simplified: each line is +just the 33-byte frame to be sent (in TI DSP buffer format, following GSM 05.03 +interface 1), without any flags or status words or frame numbers.