# HG changeset patch # User Mychaela Falconia # Date 1582594403 0 # Node ID 39b74c39d914ee72fc155204c37866c6d2efdcf3 # Parent 02bdb2f366bc0026b8ead323b865f4dfa46d0c45 doc/Loadtools-performance: complete for now diff -r 02bdb2f366bc -r 39b74c39d914 doc/Loadtools-performance --- a/doc/Loadtools-performance Mon Feb 24 23:07:03 2020 +0000 +++ b/doc/Loadtools-performance Tue Feb 25 01:33:23 2020 +0000 @@ -18,23 +18,49 @@ port hardware - this host system dependency exists because of the way these operations are implemented in our architecture. -Here is one example of expected flash programming time: flashing a FreeCalypso -Magnetite hybrid fw image (2378084 bytes) into an FCDEV3B board (S71PL129N -flash chip) via an FT2232D adapter at 812500 baud takes 2m11s on the Mother's -Slackware 14.2 system. This time is just for the flash program-bin operation, -not counting the flash erase which must be done first. Flash erase times are -determined entirely by physical processes inside the flash chip and are not -affected by software design or the serial link: for each sector to be erased, -fc-loadtool issues the sector erase command to the flash chip and then polls -the chip for operation completion status; the polling is done over the serial -link and thus may seem very slow, but the extra bit of latency added by the -finite polling speed is still negligible compared to the time of the actual -sector erase operation inside the flash chip. In contrast, the execution time -of a flash program-bin operation is a sum of 3 components: +Here are some examples of expected flash programming times, all obtained on the +Mother's Slackware 14.2 host system: + +Flashing an Openmoko GTA02 modem (K5A3281CTM flash chip) with a new firmware +image (2376448 bytes), using a PL2303 USB-serial cable at 115200 baud: 7m35s + +Flashing the same OM GTA02 modem with the same fw image, using a CP2102 +USB-serial cable at 812500 baud: 1m52s + +Flashing a Magnetite hybrid fw image (2378084 bytes) into an FCDEV3B board +(S71PL129N flash chip) via an FT2232D adapter at 812500 baud: 2m11s + +These times are just for the flash program-bin operation, not counting the +flash erase which must be done first. Flash erase times are determined +entirely by physical processes inside the flash chip and are not affected by +software design or the serial link: for each sector to be erased, fc-loadtool +issues the sector erase command to the flash chip and then polls the chip for +operation completion status; the polling is done over the serial link and thus +may seem very slow, but the extra bit of latency added by the finite polling +speed is still negligible compared to the time of the actual sector erase +operation inside the flash chip. In contrast, the execution time of a flash +program-bin operation is a sum of 3 components: * The time it takes for the bits to be transferred over the serial link; * The time it takes for the flash programming operation to complete on the target (physics inside the flash chip); * The overhead of command-response exchanges between fc-loadtool and loadagent. -[To be continued] +XRAM loading via fc-xram is similar to flash programming in that fc-xram sends +a separate ML command to loadagent for each S-record, thus the total XRAM image +loading time is not only the serial bit transfer time, but also the overhead of +command-response exchanges between fc-xram and loadagent. The flash programming +times listed above include flashing an FC Magnetite fw image into an FCDEV3B, +which took 2m11s; doing an fc-xram load of the same FC Magnetite fw image (built +as ramimage.srec) into the same FCDEV3B via the same FT2232D adapter at 812500 +baud takes 2m54s. + +Why does XRAM loading take longer than flashing? Shouldn't it be faster because +the flash programming step on the target is replaced with a simple memcpy()? +Answer: fc-xram is currently slower than flash program-bin because the latter +sends 256 bytes at a time to loadagent, whereas fc-xram sends one S-record at a +time; the division of the image into S-records is determined by the tool that +generates the SREC image, but TI's hex470 post-linker generates images with 30 +bytes of payload per S-record. Having the operation proceed in smaller chunks +increases the overhead of command-response exchanges and thus increases the +overall time.