FreeCalypso > hg > freecalypso-tools
comparison doc/Loadtools-performance @ 680:89ed8b374bc0
doc/Loadtools-performance: finished updates for fc-host-tools-r13
| author | Mychaela Falconia <falcon@freecalypso.org> |
|---|---|
| date | Mon, 09 Mar 2020 04:40:13 +0000 |
| parents | f2a023c20653 |
| children | 0815661d6e3e |
comparison
equal
deleted
inserted
replaced
| 679:be641fa7b68d | 680:89ed8b374bc0 |
|---|---|
| 79 flashing. | 79 flashing. |
| 80 | 80 |
| 81 Notice the difference in flash programming times between GTA02 and FCDEV3B: the | 81 Notice the difference in flash programming times between GTA02 and FCDEV3B: the |
| 82 fw image size is almost exactly the same, any difference in latency between | 82 fw image size is almost exactly the same, any difference in latency between |
| 83 CP2102 and FT2232D is less likely to produce such significant time difference | 83 CP2102 and FT2232D is less likely to produce such significant time difference |
| 84 given our current 2048 byte transfer block size, thus the difference in physical | 84 given our current 2048 byte transfer block size (in fact fc-xram transfer times |
| 85 flash program operation times between K5A3281CTM and S71PL129N flash chips seems | 85 suggest that FT2232D is faster), thus the difference in physical flash program |
| 86 to be the most likely explanation. | 86 operation times between K5A3281CTM and S71PL129N flash chips seems to be the |
| 87 most likely explanation. | |
| 88 | |
| 89 It also needs to be noted that in the current version of fc-loadtool there is | |
| 90 no difference in performance between flash program-bin, program-m0 and | |
| 91 program-srec operations: they all use the same binary protocol with 2048 byte | |
| 92 transfer block size. There is no coupling between source S-records and flash | |
| 93 programming operation blocks (2048-byte units) in the case of flash program-m0 | |
| 94 and program-srec: the new implementation of these commands prereads the entire | |
| 95 S-record image as a separate preparatory step on the host side, the bits to be | |
| 96 programmed are saved in a temporary binary file (automatically deleted | |
| 97 afterward), and the actual flash programming operation proceeds from this | |
| 98 internal binary source - but it knows about any discontiguous program regions | |
| 99 and skips the gaps properly. | |
| 87 | 100 |
| 88 XRAM loading via fc-xram | 101 XRAM loading via fc-xram |
| 89 ======================== | 102 ======================== |
| 90 | 103 |
| 91 Our current fc-xram implementation is similar to the old 2013 implementation of | 104 The new version of fc-xram as of fc-host-tools-r13 is dramatically faster than |
| 92 flash program-m0 and program-srec commands in that fc-xram sends a separate ML | 105 the original implementation from 2013, using a new binary transfer protocol. |
| 93 command to loadagent for each S-record, thus the total XRAM image loading time | 106 The speed increase comes from not only switching from hex to binary, but even |
| 94 is not only the serial bit transfer time, but also the overhead of command- | 107 more so from eliminating the command-response turnaround time on every S3 |
| 95 response exchanges between fc-xram and loadagent. The flash programming times | 108 record. The new XRAM loading times obtained on the Mother's Slackware 14.2 |
| 96 listed above include flashing an FC Magnetite fw image into an FCDEV3B, which | 109 host system are: |
| 97 took 2m11s; doing an fc-xram load of the same FC Magnetite fw image (built as | |
| 98 ramimage.srec) into the same FCDEV3B via the same FT2232D adapter at 812500 | |
| 99 baud takes 2m54s. | |
| 100 | 110 |
| 101 Why does XRAM loading take longer than flashing? Shouldn't it be faster because | 111 Pirelli DP-L10 with built-in CP2102 USB-serial chip, 812500 baud, loading |
| 102 the flash programming step on the target is replaced with a simple memcpy()? | 112 hybrid-vpm fw build, 49969 S3 records: 0m27s |
| 103 Answer: fc-xram is currently slower than flash program operations because the | 113 |
| 104 latter send 256 bytes at a time to loadagent, whereas fc-xram sends one | 114 FCDEV3B interfaced via FT2232D adapter, 812500 baud, loading hybrid fw build, |
| 105 S-record at a time; the division of the image into S-records is determined by | 115 78875 S3 records: 0m35m |
| 106 the tool that generates the SREC image, but TI's hex470 post-linker generates | 116 |
| 107 images with 30 bytes of payload per S-record. Having the operation proceed in | 117 With the previous version of fc-xram these two loads took 1m40s and 2m54s, |
| 108 smaller chunks increases the overhead of command-response exchanges and thus | 118 respectively. With the current version of loadtools XRAM loading is faster |
| 109 increases the overall time. | 119 than flash programming for the same fw image as one would naturally expect (the |
| 120 flash programming step on the target is replaced with a simple memcpy() | |
| 121 operation), but in the previous version XRAM loading was slower because of | |
| 122 massive command-response exchange overhead: there was a command-response | |
| 123 turnaround time incurred for every S3 record, typically carrying only 30 bytes | |
| 124 of payload. | |
| 110 | 125 |
| 111 Additional complication with FTDI adapters and newer Linux kernel versions | 126 Additional complication with FTDI adapters and newer Linux kernel versions |
| 112 ========================================================================== | 127 ========================================================================== |
| 113 | 128 |
| 114 If you are using an FTDI adapter and a Linux kernel version newer than early | 129 If you are using an FTDI adapter and a Linux kernel version newer than early |
| 115 2017 (the change was introduced between 4.10 and 4.11), then you have one | 130 2017 (the change was introduced between 4.10 and 4.11), then you have one |
| 116 additional complication: a change was made to the ftdi_sio driver in the Linux | 131 additional complication: a change was made to the ftdi_sio driver in the Linux |
| 117 kernel that makes many loadtools operations (basically everything other than | 132 kernel that made many loadtools operations (basically everything other than |
| 118 flash dumps which are entirely target-driven) unbearably slow (much slower than | 133 flash dumps which are entirely target-driven) unbearably slow, at least with |
| 119 the Slackware 14.2 reference times given above) unless you execute a special | 134 previous versions of loadtools that made many more command-response exchanges |
| 120 setserial command first. After you plug in your FTDI-based USB-serial cable or | 135 with loadagent for smaller transfer units and thus were much more sensitive to |
| 121 connect the USB cable between your PC or laptop and your FTDI adapter board, | 136 host system latency on these exchanges. We do not yet know if this FTDI |
| 122 causing the corresponding ttyUSBx device to appear, execute the following | 137 latency timer issue still has a significant negative impact or not with current |
| 123 command: | 138 loadtools, but if it does, the solution is to run a special setserial command. |
| 139 After you plug in your FTDI-based USB-serial cable or connect the USB cable | |
| 140 between your PC or laptop and your FTDI adapter board, causing the | |
| 141 corresponding ttyUSBx device to appear, execute the following command: | |
| 124 | 142 |
| 125 setserial /dev/ttyUSBx low_latency | 143 setserial /dev/ttyUSBx low_latency |
| 126 | 144 |
| 127 (Obviously change ttyUSBx to your actual ttyUSB number.) Execute this | 145 (Obviously change ttyUSBx to your actual ttyUSB number.) Execute this |
| 128 setserial command before running fc-loadtool or fc-xram, and then hopefully you | 146 setserial command before running fc-loadtool or fc-xram, and then hopefully you |
| 129 should get performance that is comparable to what I get on classic Slackware. | 147 should get performance that is comparable to what I get on classic Slackware. |
| 130 I say "hopefully" because I am not able to test it myself - I refuse to run any | 148 I say "hopefully" because I am not able to test it myself - I refuse to run any |
| 131 OS that can be categorized as "modern" - but field reports of performance on | 149 OS that can be categorized as "modern" - but field reports of performance on |
| 132 non-Slackware systems running newer Linux kernels (4.11 or later) are welcome. | 150 non-Slackware systems running newer Linux kernels (4.11 or later) are welcome, |
| 151 both with and without the low_latency setting. Please be sure to include your | |
| 152 Linux kernel version and your USB-serial adapter type in your report! |
