Wireshark · Wireshark-commits: [Wireshark-commits] master 8f0f691: RPC-over-RDMA: add reassembly for reply, read and write chunks

Wireshark-commits: [Wireshark-commits] master 8f0f691: RPC-over-RDMA: add reassembly for reply, rea
From: Wireshark code review <code-review-do-not-reply@xxxxxxxxxxxxx>
Date: Sat, 24 Mar 2018 07:10:11 +0000
URL: https://code.wireshark.org/review/gitweb?p=wireshark.git;a=commit;h=8f0f691312d93b12511c5cebf3e414b15e7661a4
Submitter: Anders Broman (a.broman58@xxxxxxxxx)
Changed: branch: master
Repository: wireshark

Commits:

8f0f691 by Jorge Mora (jmora1300@xxxxxxxxx):

    RPC-over-RDMA: add reassembly for reply, read and write chunks
    
    The RDMA reply chunk is used for a large RPC reply which does not fit
    into a single SEND operation and does not have a single large opaque,
    e.g., NFS READDIR. The RPC call packet is used only to set up the RDMA
    reply chunk. The whole RPC reply is transferred via RDMA writes.
    Fragments are added on any RDMA write packet, RDMA_WRITE_ONLY,
    RDMA_WRITE_FIRST, etc., and the reassembly is done on the reply
    message. The RPC reply packet has no data (RDMA_NOMSG) but
    fragments are reassembled and the whole RPC reply is dissected.
    
    The RDMA read chunk list is used for a large RPC call which has
    at least one large opaque, e.g., NFS WRITE. The RPC call packet
    is used only to set up the RDMA read chunk list. It also has the
    reduced message data which includes the first fragment (XDR data
    up to and including the opaque length), but it could also have
    fragments between each read chunk and the last fragment after
    the last read chunk data. The reduced message is then broken
    down into fragments and inserted into the reassembly table.
    Since the RDMA read chunk list is set up in the RPC call
    then do not dissect the upper layer in this case and just
    label rest of packet as "Data" since the reassembly will
    be done on the last read response.
    
    The protocol gives the XDR position where each chunk must be
    inserted into the XDR stream thus as long as the maximum
    I/O size is known it is possible to know exactly where to
    insert these fragments. This maximum I/O size is set on the
    first READ_RESPONSE_FIRST or READ_RESPONSE_MIDDLE but in case
    where any of these packets have not been seen then a value
    of 100 is used (real value should be at least 1024) but in
    this case the message numbers are not consecutive between
    chunks but since the total size of all chunks is verified to
    make sure there is a complete message to reassemble then all
    fragments should be in the correct order.
    
    Fragments are added on any RDMA read packet: RDMA_READ_RESPONSE_ONLY,
    RDMA_READ_RESPONSE_FIRST, etc., and the reassembly is done on the
    last read response. Since there could be multiple chunks and each
    chunk could have multiple segments then the total size must be
    checked to complete the reassembly because in this case there
    will be multiple READ_RESPONSE_LAST.
    
    The RDMA write chunk list is used for a large RPC reply which has
    at least one large opaque, e.g., NFS READ. The RPC call packet is
    used only to set up the RDMA write chunk list. The opaque data is
    then transferred via RDMA writes and then the RPC reply packet is
    sent from the server.
    
    The RPC reply packet has the reduced message data which includes
    the first fragment (XDR data up to and including the opaque length),
    but it could also have fragments between each write chunk and the
    last fragment after the last write chunk data. The reduced message
    is then broken down into fragments and inserted into the reassembly
    table. Since the RPC reply is sent after all the RDMA writes then
    the fragments from these writes must be inserted in the correct
    order: the first RDMA write fragment is inserted with message
    number 1, since the first fragment (message number 0) will come
    from the very last packet (the RPC reply with RDMA_MSG). Also,
    the last packet could have fragments which must be inserted in
    between chunk data, therefore message numbers from one chunk to
    another are not consecutive.
    
    In contrast with the RDMA read chunk list, the protocol does not
    allow an XDR position in the RDMA write chunks, since the RPC
    client knows exactly where to insert the chunk's data because
    of the virtual address of the DDP (direct data placement) item.
    There is no way to map a virtual address with an XDR position,
    thus in order to reassemble the XDR stream a two pass approach
    is used. In the first pass (visited = 0), all RDMA writes are
    inserted as fragments leaving a gap in between each chunk.
    Then the dissector for the upper layer is called with a flag
    letting the dissector know that it is dealing with a reduced
    message so all DDP enabled operations handle the opaque data
    as having only the size of the opaque but not the data and
    reporting back the offset from the end of the message.
    Once the upper layer dissector returns, this layer now has a
    list of DDP eligible item's offsets which are then translated
    into XDR offsets and then the RPC reply packet is broken into
    fragments and inserted in the right places as in the case for
    the RDMA read chunk list. On the second pass (visited = 1),
    all fragments have already been inserted into the reassembly
    table so it just needs to reassembled the whole message and
    then call the upper layer dissector.
    
    RFC 8267 specifies the upper layer bindings to RPC-over-RDMA
    version 1 for NFS. Since RPC-over-RDMA version 1 specifies the
    XDR position for the read chunks then only the write chunk DDP
    eligible items are handled in the upper layer, in this case the
    NFS layer. These are the only procedures or operations eligible
    for write chunks:
    * The opaque data result in the NFS READ procedure or operation
    * The pathname or linkdata result in the NFS READLINK procedure
      or operation
    
    Two functions are defined to signal and report back the DDP
    eligible item's offset to be used by the upper layers.
    Function rpcrdma_is_reduced() is used to signal the upper layer
    that it is dealing with a reduced data message and thus should
    ignore DDP eligible item's opaque processing and just report
    back the offset where the opaque data should be. This reporting
    is done using the second function rpcrdma_insert_offset().
    
    Reassembly is done for InfiniBand only. Reassemble fragments using
    the packet sequence number (PSN) of each RDMA I/O fragment to make
    sure the message is reassembled correctly when fragments are sent
    out of order. Also a unique message id is used for each message so
    fragments are reassembled correctly when fragments of different
    messages are sent in parallel.
    
    The reassembled message could be composed of multiple chunks
    and each chunk in turn could be composed of multiple segments
    in which each segment could be composed of multiple requests
    and of course each request is composed of one or more fragments.
    Thus in order to have all fragments for each segment belonging
    to the same message, a list of segments is created and all
    segments belonging to the same message are initialized with
    the same message id. These segments are initialized and added
    to the list on the call side on RDMA_MSG by calling
    process_rdma_lists.
    
    Bug: 13260
    Change-Id: Icf57d7c46c3ba1de5d019265eb151a81d6019dfd
    Reviewed-on: https://code.wireshark.org/review/24613
    Petri-Dish: Anders Broman <a.broman58@xxxxxxxxx>
    Tested-by: Petri Dish Buildbot
    Reviewed-by: Anders Broman <a.broman58@xxxxxxxxx>
    

Actions performed:

    from  a6fcceb   packet-mq: Fix for Encoding problem in some MQ Struct
    adds  8f0f691   RPC-over-RDMA: add reassembly for reply, read and write chunks


Summary of changes:
 epan/dissectors/CMakeLists.txt                     |    1 +
 epan/dissectors/Makefile.am                        |    1 +
 epan/dissectors/packet-nfs.c                       |   67 +-
 epan/dissectors/packet-rpcrdma.c                   | 1048 +++++++++++++++++++-
 .../dissectors/packet-rpcrdma.h                    |   15 +-
 5 files changed, 1087 insertions(+), 45 deletions(-)
 copy ui/gtk/lbm_stream_dlg.h => epan/dissectors/packet-rpcrdma.h (80%)
Prev by Date: [Wireshark-commits] master a6fcceb: packet-mq: Fix for Encoding problem in some MQ Struct
Next by Date: [Wireshark-commits] master 4847076: wslua: Fix logging
Previous by thread: [Wireshark-commits] master a6fcceb: packet-mq: Fix for Encoding problem in some MQ Struct
Next by thread: [Wireshark-commits] master 4847076: wslua: Fix logging
Index(es):
- Date
- Thread