# HG changeset patch
# User Mychaela Falconia <falcon@freecalypso.org>
# Date 1582594403 0
# Node ID 39b74c39d914ee72fc155204c37866c6d2efdcf3
# Parent  02bdb2f366bc0026b8ead323b865f4dfa46d0c45
doc/Loadtools-performance: complete for now

diff -r 02bdb2f366bc -r 39b74c39d914 doc/Loadtools-performance
--- a/doc/Loadtools-performance	Mon Feb 24 23:07:03 2020 +0000
+++ b/doc/Loadtools-performance	Tue Feb 25 01:33:23 2020 +0000
@@ -18,23 +18,49 @@
 port hardware - this host system dependency exists because of the way these
 operations are implemented in our architecture.
 
-Here is one example of expected flash programming time: flashing a FreeCalypso
-Magnetite hybrid fw image (2378084 bytes) into an FCDEV3B board (S71PL129N
-flash chip) via an FT2232D adapter at 812500 baud takes 2m11s on the Mother's
-Slackware 14.2 system.  This time is just for the flash program-bin operation,
-not counting the flash erase which must be done first.  Flash erase times are
-determined entirely by physical processes inside the flash chip and are not
-affected by software design or the serial link: for each sector to be erased,
-fc-loadtool issues the sector erase command to the flash chip and then polls
-the chip for operation completion status; the polling is done over the serial
-link and thus may seem very slow, but the extra bit of latency added by the
-finite polling speed is still negligible compared to the time of the actual
-sector erase operation inside the flash chip.  In contrast, the execution time
-of a flash program-bin operation is a sum of 3 components:
+Here are some examples of expected flash programming times, all obtained on the
+Mother's Slackware 14.2 host system:
+
+Flashing an Openmoko GTA02 modem (K5A3281CTM flash chip) with a new firmware
+image (2376448 bytes), using a PL2303 USB-serial cable at 115200 baud: 7m35s
+
+Flashing the same OM GTA02 modem with the same fw image, using a CP2102
+USB-serial cable at 812500 baud: 1m52s
+
+Flashing a Magnetite hybrid fw image (2378084 bytes) into an FCDEV3B board
+(S71PL129N flash chip) via an FT2232D adapter at 812500 baud: 2m11s
+
+These times are just for the flash program-bin operation, not counting the
+flash erase which must be done first.  Flash erase times are determined
+entirely by physical processes inside the flash chip and are not affected by
+software design or the serial link: for each sector to be erased, fc-loadtool
+issues the sector erase command to the flash chip and then polls the chip for
+operation completion status; the polling is done over the serial link and thus
+may seem very slow, but the extra bit of latency added by the finite polling
+speed is still negligible compared to the time of the actual sector erase
+operation inside the flash chip.  In contrast, the execution time of a flash
+program-bin operation is a sum of 3 components:
 
 * The time it takes for the bits to be transferred over the serial link;
 * The time it takes for the flash programming operation to complete on the
   target (physics inside the flash chip);
 * The overhead of command-response exchanges between fc-loadtool and loadagent.
 
-[To be continued]
+XRAM loading via fc-xram is similar to flash programming in that fc-xram sends
+a separate ML command to loadagent for each S-record, thus the total XRAM image
+loading time is not only the serial bit transfer time, but also the overhead of
+command-response exchanges between fc-xram and loadagent.  The flash programming
+times listed above include flashing an FC Magnetite fw image into an FCDEV3B,
+which took 2m11s; doing an fc-xram load of the same FC Magnetite fw image (built
+as ramimage.srec) into the same FCDEV3B via the same FT2232D adapter at 812500
+baud takes 2m54s.
+
+Why does XRAM loading take longer than flashing?  Shouldn't it be faster because
+the flash programming step on the target is replaced with a simple memcpy()?
+Answer: fc-xram is currently slower than flash program-bin because the latter
+sends 256 bytes at a time to loadagent, whereas fc-xram sends one S-record at a
+time; the division of the image into S-records is determined by the tool that
+generates the SREC image, but TI's hex470 post-linker generates images with 30
+bytes of payload per S-record.  Having the operation proceed in smaller chunks
+increases the overhead of command-response exchanges and thus increases the
+overall time.