I am trying to make two of NVIDIA’s Jetson AGXs communicate via ethernet with as low latency as possible using UDP protocol. The default request-response latency measured by netperf is around 200 microseconds. I am looking for ways to reduce this and all suggestions are welcome.
Looking at the network stack, I came across the fact that Jetson uses little endian byte order while network uses big endian order. So, for a request response scenario, byte order conversion needs to be done 4 times
Host sender (LE)–> BE → send to client
Client receiver BE → LE
Client sender LE–> BE -->send to host
Host receiver BE → LE
Of course this is a very simplified picture and I have omitted all parts of the stack unrelated to byte ordering. My question is, does this 4 time conversion impact the latency significantly? If one were to use systems with big endian ordering, given everything else remains same, would that reduce network latency by any measurable amount?
In the most simplisitic terms, the end-to-end latency seen by a netperf TCP_RR test is: TimeInNetperfSendRequest + TimeInStackToSend + TimeInNICToSend + TimeOnNetwork + TimeInNICToReceive + TimeInstackToReceive + TimeInNetserverRecvRequest and then all that in reverse to send the response.
The CPU time to send or receive a packet is largely (not entirely, but largely) all in the latency path. So, if you include CPU utilization measurements in your netperf tests, you can see what the microseconds of CPU time per transaction happen to be. That will cover all the stuff listed above except the NIC and Network bits. You can then subtract that from the overall transaction latency to see just how much of the latency you are seeing is from packet processing, and how much is from “the NICs and network” and go from there to optimize.
Side note: While netperf will do endian normalization for what it passes across the control connection, it does not do so for the data on the data connection.