changeset 17:3d65bdaf00da

MEMIF-wait-states article written
author Mychaela Falconia <falcon@freecalypso.org>
date Sun, 16 Jun 2019 23:30:33 +0000
parents 396d44c543e3
children 7ba5c951803c
files MEMIF-wait-states
diffstat 1 files changed, 166 insertions(+), 0 deletions(-) [+]
line wrap: on
line diff
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/MEMIF-wait-states	Sun Jun 16 23:30:33 2019 +0000
@@ -0,0 +1,166 @@
+The Calypso chip's MEMIF (ARM memory interface) block has a few configuration
+registers; most settings in these registers are quite straightforward, but the
+WS setting (number of wait states to be inserted for external memory access)
+requires some non-trivial analysis.
+
+Calypso MEMIF timings are described on pages 7 through 11 of this TI document:
+
+ftp://ftp.freecalypso.org/pub/GSM/Calypso/cal000_a.pdf
+
+When running on a Calypso C035 target, our TCS211 reference fw as well as most
+vendor firmwares we've examined run the ARM7 core at its maximum clock frequency
+of 52 MHz.  These same firmwares typically configure WS=3 for both flash and
+XRAM.  Most Calypso-based phones and modems have flash and RAM chips with 70 ns
+access time, and for a long time it seemed that this combination of ARM7 at
+52 MHz and WS=3 was OK for 70 ns memories: one ARM7 clock cycle at 52 MHz is
+19.23 ns, WS=3 means 4 cycles total per access (it's an N+1 arrangement),
+19.23 ns * 4 = 76.92 ns, thus it should be OK for 70 ns memories, right?  Not
+so fast: as shown in the formula on cal000_a.pdf page 11 and can be seen from
+the timing diagrams, two other timing parameters (tda and tsu) also need to be
+factored in.  The sum of tda+tsu for 2.8V MEMIF as given in the only document
+we have available is 10.5 ns, thus if we run the ARM7 core at 52 MHz and set
+WS=3, the available safe window for memory access time is only about 66 ns,
+which is 4 ns short of the 70 ns flash and RAM access time specs.
+
+TI's reference fw setting of WS=3 in conjuction with ARM7 running at 52 MHz has
+made its way into the official firmwares of Openmoko devices and several Compal
+phones, including Mot C11x/12x, Mot C139/140 and Sony Ericsson J100.  At least
+in the case of Openmoko we know that the hardware features a flash chip with
+70 ns access time (the combined flash+RAM chip is K5A3281CTM-D755, with the
+suffix meaning 70 ns access time for flash and 55 ns for RAM), and in the case
+of Compal phones it is highly unlikely that they used flash chips faster than
+70 ns, thus we have strong evidence that the access time spec is being violated
+by about 4 ns.  It works in practice because the official specs are guaranteed
+worst-case numbers, but it is still wrong in the strict sense.
+
+We have strong evidence that this WS=3 setting comes from TI's mainline
+reference fw, as opposed to being customized by or for Openmoko or Compal.
+The evidence is in the following instruction sequence which appears verbatim-
+identical across Openmoko's, Mot C11x/12x and C139/140 firmware versions:
+
+      ldr	r1, =0xFFFFFB00
+      mov	r0, #0xA3
+      strh	r0, [r1, #0]
+      strh	r0, [r1, #2]
+      mov	r2, #0xA5
+      strh	r2, [r1, #4]
+      strh	r0, [r1, #6]
+      mov	r0, #0x80
+      strh	r0, [r1, #0xA]
+      mov	r0, #0xC0
+      strh	r0, [r1, #0xC]
+      mov	r0, #0x40
+      strh	r0, [r1, #8]
+
+(The SE J100 version differs only in the nCS2 configuration; apparently this
+SE J100 phone has its ringtone melody generator chip hooked up to nCS2, whereas
+on both OM's modem and Mot C11x/12x/139/140 this chip select is unused and
+unconnected, meaning that its setting is a dummy just like nCS3 and nCS4.)
+
+The above instruction sequence has been reconstructed into the following
+sequence of C macro calls:
+
+      MEM_INIT_CS0(3, MEM_DVS_16, MEM_WRITE_EN, 0);
+      MEM_INIT_CS1(3, MEM_DVS_16, MEM_WRITE_EN, 0);
+      MEM_INIT_CS2(5, MEM_DVS_16, MEM_WRITE_EN, 0);
+      MEM_INIT_CS3(3, MEM_DVS_16, MEM_WRITE_EN, 0);
+      MEM_INIT_CS4(0, MEM_DVS_8,  MEM_WRITE_EN, 0);
+
+      MEM_INIT_CS6(0, MEM_DVS_32, MEM_WRITE_EN, 0);
+      MEM_INIT_CS7(0, MEM_DVS_32, MEM_WRITE_DIS, 0);
+
+(The last two lines setting nCS6 and nCS7 don't need to be considered, as those
+are internal to the Calypso chip itself.)
+
+Thus we see that what appears to be TI's mainline code sets WS=3 for both nCS0
+and nCS1 (flash and XRAM, respectively), and then sets what appears to be a
+dummy config for the unused nCS2, nCS3 and nCS4.  I say "appears to be" because
+we have no original source with comments, only a COFF binary object which our
+reconstructed recompilable C code has been made to match.
+
+We may never know the truth unless we miraculously find a surviving copy of the
+original (not reconstructed from disassembly) init.c source from TCS211, but my
+(Mother Mychaela's) current working hypothesis is that the above MEMIF settings
+were originally made for the D-Sample board and never changed for Leonardo.
+The D-Sample board has flash on nCS0, main XRAM bank on nCS1, an additional
+XRAM bank (typically unused) on nCS2 and peripherals (principally the LCD) on
+nCS3.  Furthermore, the original D-Sample boards had Calypso C05 chips populated
+on them, and that chip version has no nCS4, only CS4 which is muxed with ADD22
+and used for the latter on the D-Sample.
+
+I further hypothetize that the above MEMIF settings were likely cast into code
+in the days of Calypso C05, and that the WS=3 setting was computed when the
+ARM7 core ran at 39 MHz.  The combination of ARM7 at 39 MHz, WS=3 and the same
+tda+tsu = 10.5 ns adjustment from the available cal000_a.pdf document
+(officially corresponding to Calypso C035 F751774) gives an access time of
+92 ns, which is very sensible.  The hypothesis further goes that later TI moved
+to Calypso C035 silicon and started running the ARM7 core at 52 MHz, but the WS
+setting was never changed (overlooked), and the 92 ns access time turned into a
+mere 66 ns.  The latter works with 70 ns memories in practice despite being
+strictly incorrect (negative margin), and so the error escaped notice.
+
+Solution adopted for FreeCalypso
+================================
+
+Pirelli's firmware on the DP-L10 sets WS=4 for both flash and XRAM, and we have
+always used the same setting in FreeCalypso when running on this target.  When
+we made our FCDEV3B hardware using the same Spansion flash+RAM chip copied from
+the Pirelli DP-L10, we adopted the same WS=4 setting for our own FreeCalypso
+hardware family on the reasoning that it is needed for this chip.  But now we
+have a better theoretical foundation: the flash+RAM chip in question has 70 ns
+access time for both flash and pSRAM parts, same as most other flash and RAM
+chips used in most Calypso devices, and the WS=4 setting should really be used
+for all Calypso C035 targets (ARM7 at 52 MHz) with 70 ns memories.  Thus the
+new FreeCalypso strategy is to treat WS=4 as the generic default for Calypso
+C035 platforms unless explicitly overridden for specific targets, and to stop
+treating TI's reconstructed setup with WS=3 as canonical.
+
+When running on Openmoko GTA01/02, Mot C11x/12x, Mot C139/140 and SE J100
+targets (this specific list), we are going to keep WS=3 for nCS0 and nCS1 and
+the dummies for nCS2, nCS3 and nCS4 unchanged for now, i.e., run with exactly
+the same MEMIF settings as each manufacturer's respective original official fw.
+The reason is political: we are not the product manufacturer of record, and the
+error of negative design margin in the memory access timings is the liability
+of FIC/Openmoko and Compal/Motorola/SE, not us.  If we change from WS=3 to WS=4
+on these targets, our firmware will necessarily run a little slower, and given
+that the original official fw "works just fine", we may be accused of needlessly
+or artificially slowing down our aftermarket fw.  But when we market our own
+handset or modem products under the FreeCalypso trademark, then the full
+responsibility for the entire product (hw+fw) falls on us, hence we use the
+correct WS=4 setting.
+
+Interim WS setting during boot
+==============================
+
+There is one more complication to this picture.  The MEMIF settings discussed
+above for the operational phase with Calypso DPLL producing fast clocks are
+made in the Init_Target() function, but there is another interim setting
+established early on in assembly code, used prior to DPLL enabling, when the
+ARM7 core runs at unmultiplied 13 MHz or 26 MHz as fed to the Calypso by the
+board.  This interim setting is first set in bootloader.s, then again in int.s
+(with the definition residing in the included init.asm file), and the registers
+are set to 0x2A1, meaning WS=1 and 1 dummy cycle.
+
+Unlike the situation with the censored init.c source file, we have the original
+source for the assembly modules in question, and the only preprocessor
+conditionals found therein are based on BOARD and CHIPSET symbols.  Remember
+that TI's Leonardo board never got its own BOARD number, instead it shares
+BOARD=41 with D-Sample, yet the two boards have different Calypso clock inputs:
+13 MHz on the DS, 26 MHz on the Leonardo.  The C code in init.c (this part
+survived in the LoCosto source) uses a preprocessor conditional on the RF_FAM
+symbol to differentiate between 13 MHz and 26 MHz input clock arrangements, but
+there is no conditional of any such sort in the assembly code.  Thus it is my
+(Mother Mychaela's) educated guess that the WS=1 setting was chosen assuming a
+13 MHz clock, and when Leonardo came along with its 26 MHz clock, the problem
+spot was once again overlooked.
+
+WS=1 at 13 MHz is equivalent to WS=7 at 52 MHz, thus there is plenty of margin.
+But WS=1 at 26 MHz is equivalent to WS=3 at 52 MHz, once again putting us in
+the troubled territory of negative margin with 70 ns flash and RAM chips.
+Except that this case is even more difficult for firmware engineers to spot:
+Pirelli's fw still has the same 0x2A1 setting in its early boot path, i.e.,
+their fw engineers have changed WS=3 to WS=4 for the main body of the fw, but
+missed the early boot code.
+
+The solution adopted for FreeCalypso is to change the early MEMIF setting from
+0x2A1 to 0x2A2, i.e., set WS=2 for the interim boot phase.