# HG changeset patch
# User Mychaela Falconia <falcon@freecalypso.org>
# Date 1582656000 0
# Node ID 6824c4d5584890ab5495273754d7f8e26950b507
# Parent  97fe41e9242ad3f308b13b4d8aa5bd1cb1442ec2
doc/Loadtools-performance: program-m0 slowness documented

diff -r 97fe41e9242a -r 6824c4d55848 doc/Loadtools-performance
--- a/doc/Loadtools-performance	Tue Feb 25 07:01:28 2020 +0000
+++ b/doc/Loadtools-performance	Tue Feb 25 18:40:00 2020 +0000
@@ -19,7 +19,8 @@
 operations are implemented in our architecture.
 
 Here are some examples of expected flash programming times, all obtained on the
-Mother's Slackware 14.2 host system:
+Mother's Slackware 14.2 host system, using the flash program-bin command as
+opposed to program-m0 or program-srec:
 
 Flashing an Openmoko GTA02 modem (K5A3281CTM flash chip) with a new firmware
 image (2376448 bytes), using a PL2303 USB-serial cable at 115200 baud: 7m35s
@@ -46,14 +47,37 @@
   target (physics inside the flash chip);
 * The overhead of command-response exchanges between fc-loadtool and loadagent.
 
-XRAM loading via fc-xram is similar to flash programming in that fc-xram sends
-a separate ML command to loadagent for each S-record, thus the total XRAM image
-loading time is not only the serial bit transfer time, but also the overhead of
-command-response exchanges between fc-xram and loadagent.  The flash programming
-times listed above include flashing an FC Magnetite fw image into an FCDEV3B,
-which took 2m11s; doing an fc-xram load of the same FC Magnetite fw image (built
-as ramimage.srec) into the same FCDEV3B via the same FT2232D adapter at 812500
-baud takes 2m54s.
+If you are starting out with a firmware image in m0 format, converting it to
+binary with mokosrec2bin (like our FC Magnetite build system always does) and
+then flashing via program-bin is faster than flashing the original m0 image
+directly via program-m0.  Following the last example above of flashing a
+Magnetite hybrid fw image into an FCDEV3B, the flashing operation via
+program-bin took 2m11s; flashing the same image via program-m0 took 3m54s.
+
+Flashing via program-bin is faster than program-m0 or program-srec because the
+program-bin operation uses a larger unit size internally.  fc-loadtool
+implements all flash programming operations by sending AMFW or INFW commands to
+loadagent; each AMFW or INFW command carries a string of 16-bit words to be
+programmed.  Our program-bin operation programs 256 bytes at a time, i.e.,
+sends one AMFW or INFW command per 256 bytes of image payload; our program-m0
+and program-srec operations program one S-record at a time, i.e., each S-record
+in the source image turns into its own AMFW or INFW command to loadagent.  In
+the case of m0 images produced by TI's hex470 post-linker, each S-record carries
+30 bytes of payload, thus flashing that m0 image directly with program-m0 will
+proceed in 30-byte units, whereas converting it to binary and then flashing with
+program-bin will proceed in 256-byte units.  The smaller unit size slows down
+the overall operation by increasing the overhead of command-response exchanges.
+
+XRAM loading via fc-xram is similar to flash program-m0 and program-srec in that
+fc-xram sends a separate ML command to loadagent for each S-record, thus the
+total XRAM image loading time is not only the serial bit transfer time, but also
+the overhead of command-response exchanges between fc-xram and loadagent.  Going
+back to the same FC Magnetite fw image that can be flashed into an FCDEV3B in
+2m11s via program-bin or in 3m54s via program-m0, doing an fc-xram load of that
+same fw image (built as ramimage.srec) into the same FCDEV3B via the same
+FT2232D adapter at 812500 baud takes 2m54s - thus we can see that fc-xram
+loading is faster than flash program-m0 or program-srec, but slower than flash
+program-bin.
 
 Why does XRAM loading take longer than flashing?  Shouldn't it be faster because
 the flash programming step on the target is replaced with a simple memcpy()?