# HG changeset patch # User Mychaela Falconia # Date 1672172635 0 # Node ID 69061d044f05171ba9e04a373a6f1976809581e1 # Parent 8a45cd92e3c36680c72076623058a472606bbfd3 Voice-memo-feature: new article diff -r 8a45cd92e3c3 -r 69061d044f05 Voice-memo-feature --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/Voice-memo-feature Tue Dec 27 20:23:55 2022 +0000 @@ -0,0 +1,217 @@ +The full Calypso hw+fw solution as delivered by TI (the relevant components here +are the DSP, the official L1 code and RiViera Audio Service) implements an +interesting feature called voice memos. It is actually two paired features: + +* Voice memo recording: in almost all states of the MS (no GSM network at all, + or idle mode, or in an active call) it is possible to activate an extra + instance of GSM 06.10 encoder that takes input from the microphone (and also + from the active call downlink if invoked during a speech call) and writes its + output into an otherwise-unused DSP buffer. The combination of L1 and RiViera + Audio Service then writes this speech recording into a file in FFS. + +* Voice memo playback: voice memo files recorded with the just-described VM + record feature can be played into the phone's speaker output. The operation + of playing a previously recorded voice memo is conceptually no different from + playing tones or melodies, and can likewise be done in any state: with no GSM + network at all, in idle mode, or in an active call. + +VM recording and VM playback cannot be active at the same time: they use the +same DSP buffer, and likely other mutually exclusive DSP resources too. +Furthermore, the same DSP buffer that is used for these VM features is also +used for TCH UL substitution debug/test feature described in the TCH-tap-modes +article - therefore, all 3 features (VM record, VM play and TCH UL play) need +to be treated as mutually exclusive in time. However, aside from this mutual +exclusion, it is very remarkable that VM recording or VM playback can be invoked +during an active speech call (which can use any codec!), and the extra instance +of FR1 encoder or decoder (always FR1) invoked by VM features is essentially +independent from the main TCH encoder and the main TCH decoder, all of which +run simultaneously. It is worth noting that all newer GSM speech codecs (HR1, +EFR and AMR) are much more computationally intensive than FR1, thus given that +the DSP has the necessary horsepower to run any one of those "heavy" codecs, it +probably isn't too much extra work to also run a simultaneous instance of +unidirectional (encoder only or decoder only) FR1. + +The entire voice memo facility was already fully implemented in the TCS211 code +delivery from TI, but prior to FreeCalypso there was no way to exercise it. In +order to exercise VM functionality in TCS211, one needs to invoke these RiViera +Audio Service API functions: + +audio_vm_record_start() +audio_vm_record_stop() +audio_vm_play_start() +audio_vm_play_stop() + +In FreeCalypso we've added some simple AT commands that call the just-listed API +functions, and the facility that has been there all along is now accessible to +play - it is the same situation as with Melody E1. + +FreeCalypso AT commands for voice memo testing +============================================== + +AT@VMR="/pathname",dur,dtx + +This command initiates VM recording. The FFS pathname into which the recording +should be written must be given as a quoted string (and as a reminder, all FFS +pathnames must be absolute - there are no current directories in the firmware +architecture), and there is a second required argument that sets the maximum +size of the recording. The duration argument is a decimal integer, and it is +reckoned in 1000-word units: if you specify duration as 1, the maximum recording +size is 1000 words (2000 bytes), if you specify duration as 2, the maximum +recording size is 2000 words (4000 bytes), and so forth. If you record with DTX +disabled, each block of 1000 words corresponds to 1 second in time (every 20 ms +frame turns into a block of 20 words), thus with DTX disabled the duration +argument becomes the actual duration in seconds. However, if you record with +DTX enabled, then periods of silence will be written in a compressed format +described later in this article, and the time duration of the recording will +depend on how much silence there is. + +The dtx argument is 1 to enable DTX or 0 to disable it; the default is DTX +disabled. The employed FR1 DTX algorithm appears to be the same as would be +used for TCH/FS uplink, except that an "artificial" (there is no SACCH with +independent-of-GSM voice memos) TAF position is generated on every 16th audio +frame, i.e., every 320 ms. (Note the shortening of this SID interval compared +to official TCH, where it is 24 frames or 480 ms.) + +AT@VMRS + +This command stops any VM recording in progress, but it is rarely needed - the +recording will stop automatically when the size limit is reached. + +AT@VMP="/pathname" + +This command initiates playback of the VM recording contained in the named file +in FFS. The FFS pathname is the only argument. + +AT@VMPS + +This command stops any VM playback in progress, but it is rarely needed - the +playback will stop automatically when the end-marker is read from the file. + +Voice memo file format +====================== + +Using fc-fsio, you can read out voice memo files written by the VM record +facility, and you can likewise construct your own memo files externally, upload +them into FC device FFS and then play them via the VM play facility. The format +of these files is determined by TI's firmware stack (RV Audio Service on top of +L1 on top of the DSP), but is fundamentally based on a DSP buffer that is just +like those used for TCH. The companion TCH-tap-modes article describes the +format of the DSP buffer from which TCH DL bits can be read out; in the present +article we are going to cover the differences specific to the voice memo +facility. + +When VM recording is done with DTX disabled, every 20 ms speech frame turns into +a block of 40 bytes in the memo file. This block of 40 bytes is produced from +20 16-bit words in the DSP buffer, each word turned into two bytes in LE order +by the ARM part of Calypso. The DSP buffer used for the VM facility has the +same overall format as the one used for TCH DL, described in the TCH-tap-modes +article - 3 status or header words followed by 17 words of payload, with the +latter words carrying a 260-bit FR1 codec frame in the bit order of GSM 05.03 +interface 1. As explained in the TCH-tap-modes article, speech codec payload +words are filled in the msb-to-lsb direction by the DSP, thus the natural byte- +oriented representation would be big-endian - but because the little-endian ARM +core sits in between the DSP and the on-media file format, the byte order in +voice memo files comes out "wrong". Oh well - it is what it is. + +Of the 3 header words that precede every 20 ms speech frame, words 1 and 2 +appear to be dummies - they have meaning related to the channel decoder block +in the case of TCH DL, but in the case of isolated-from-GSM voice memos, there +does not seem to be any meaning. However, header or status word 0, consisting +of bit flags, is still important, but the bit flags for the VM facility are +different from those of TCH DL. + +When VM recording is done with DTX disabled, status word 0 is observed to always +equal 0xC400 on every frame. However, when DTX is enabled, the following bits +are seen in status word 0: + +* Bit 15 will be set if this frame needs to be saved in its entirety, or cleared + if it is to be skipped. When VM recording code in L1S sees that the DSP has + delivered a frame with this status bit cleared, it will save only this status + word 0, i.e., 2 bytes will be written into the memo file instead of 40 bytes + for this 20 ms frame. On VM playback, the code likewise checks this bit to + see how many words need to be read from the file, so synchronization is + maintained. + +* Bit 14 appears to be the SP flag of GSM 06.31 section 5.1: set when a speech + frame has been generated, or cleared when a SID frame has been generated + instead. + +* Bit 11 is a TAF-like flag: when DTX is enabled, this bit is set in every 16th + frame generated by the DSP in the VM recording session, otherwise it is + cleared. + +* Bit 10 will always be set in every status word 0 that gets written to voice + memo files: this bit is set by the DSP when it has finished encoding a 20 ms + audio frame and is checked by L1S on every TDMA frame, serving as a + synchronization mechanism telling L1S when it needs to copy a speech frame + from the DSP to the memo file. + +When VM recording is done with DTX enabled, the recorded memo file will consist +of speech frames (header word 0xC400 or 0xCC00), SID frames (header word 0x8400 +or 0x8C00) and skipped frames consisting of only the header word 0x0400, with +the remaining words omitted. There will always be a present (not skipped) frame +in every 16th position (0xCC00 or 0x8C00), thus no 0x0C00 frames are ever seen. + +Every voice memo binary file ends with a 0xFBFF end-marker word; this end-marker +is needed because TCS211 fw architecture exhibits a separation between the +actual data reading and writing processes in L1S and the FFS read/write +interface provided by RiViera Audio Service, and because of this separation the +operational code in L1S can't "see" an EOF condition at the file system level. + +FreeCalypso tools for decoding voice memo files +=============================================== + +If you have recorded a voice memo with AT@VMR and then read it out with fc-fsio, +you can use additional FC tools to analyze it. The following tools are +available, split between FC host tools and GSM codec libs & utilities packages: + +* fc-vm2hex converts a binary VM recording into ASCII hex format, similar to + the old (2016) TCH DL recording format before it was extended in late 2022. + Every fully-written frame is emitted in the hex output as 3 space-separated + hex status words followed by a block of 66 hex digits giving the FR1 codec + frame in the unchanged bit order of TI's DSP, and every skipped frame (one + for which only status word 0 was written into the memo file) is emitted in + the hex output as just that one word. + +* gsmfr-dlcap-parse utility, originally written for parsing TCH DL capture + files, accepts TCH DL recording files in both old and new formats, and it also + accepts the output from fc-vm2hex as its input. The combination of fc-vm2hex + and gsmfr-dlcap-parse allows a developer or tinkerer to do thorough human + analysis of TCS211 VM recordings in both DTX-disabled and DTX-enabled modes. + +* There will soon be a new fc-vm2gsmx utility that will read binary VM recording + files (as you would read out with fc-fsio) and convert them into extended- + libgsm (gsmx) format defined in our GSM codec libraries & utilities package. + This gsmx format is an extension of the classic libgsm (GSM 06.10) format, + adding the possibility of SID frames and BFI markers (frame gaps) in addition + to regular speech frames, thus it can represent the content of a voice memo + recording made in DTX mode. These gsmx files can then be decoded into + playable WAV with our gsmfr-decode utility. + +FreeCalypso tools for external generation of voice memo files +============================================================= + +Using FreeCalypso tools, you can produce an external speech recording in GSM +06.10 FR1 codec format, convert it into TCS211 VM format, upload it into FC +device FFS with fc-fsio, and then play these externally-produced voice memos +with AT@VMP. The steps are as follows: + +1) You can use gsmfr-encode to FR1-encode a speech sample from WAV into classic + .gsm format, or gsmfr-encode-r if the source is raw BE instead of WAV. + Alternatively, you can use any other off-the-shelf software that can encode + FR1 and write libgsm format; SoX shipped with Slackware includes the + necessary support. + +2) fc-gsm2vm converts a .gsm recording into non-DTX TCS211 VM format. + +At the present time we don't have any tools for producing external DTX-enabled +VM recordings: the main limitation is that at least to this Mother's knowledge, +the published source software community does not currently possess a GSM 06.10 +encoding library that has been extended with VAD and DTX functions. There is +classic libgsm from 1990s, used by everyone in the FOSS community who needs a +GSM 06.10 encoder or decoder, but it doesn't do DTX; we (FreeCalypso and +Themyscira Wireless) have produced our own libgsmfrp front-end that implements +Rx DTX handler functions (that's how we can properly decode FR1 streams that +contain SIDs and/or missing frames), but it doesn't help with DTX encoding. +Therefore, our ability to produce TCS211-compatible VM recordings externally is +currently limited to non-DTX mode.