Quantcast
Channel: VMware Communities : Discussion List - vSphere™ Storage
Viewing all articles
Browse latest Browse all 2328

Multipath probing in VMware ESXi 6.x

$
0
0

Basically I trying to understand how does ESXi host select a path during path state change event i.e how does ESXi gives weightage to a path based on all RTPG responses which it receives from the available target ports


I have a ESXi 6.x host with a clustered storage (Netapp cDOT)

During a takeover event in a 4 node Clustered where

4 node cluster consisting of (Node1,Node2) a pair and another pair being (Node3,Node4)

Nodes (Node3,Node4) are Out of Quorum (OOQ) that means they cannot sync with other nodes in the cluster

Node Node1 takes over Node2 i.e Node2 going down after transferring the LUN ownership to Node1

TPG ID:

Node1 (1000/0x03E8)

Node2 (1001/0x03E9)

Node3 (1010/0x03F2)

Node4 (1011/0x03F3)


For a given LUN in Node2


  Initial ALUA states for RTPG data looks as below

 

  RTPG to Node1

  RTPG Data:

                RTPG Data:

                Node1 (1000/0x03E8) - ANO  

                Node2 (1001/0x03E9) - AO

                Node3 (1010/0x03F2) - ANO

                Node4 (1011/0x03F3) - ANO

  RTPG to Node2

  RTPG Data:

                RTPG Data:

                Node1 (1000/0x03E8) - ANO  

                Node2 (1001/0x03E9) - AO

                Node3 (1010/0x03F2) - ANO

                Node4 (1011/0x03F3) - ANO

  RTPG to Node3

  RTPG Data:

                RTPG Data:

                Node1 (1000/0x03E8) - ANO  

                Node2 (1001/0x03E9) - AO

                Node3 (1010/0x03F2) - ANO

                Node4 (1011/0x03F3) - ANO

  RTPG to Node4

  RTPG Data:

                Node1 (1000/0x03E8) - ANO  

                Node2 (1001/0x03E9) - AO

                Node3 (1010/0x03F2) - ANO

                Node4 (1011/0x03F3) - ANO

 

Questions:

----------------

1. During the transition stage after a check condition to a I/O command followed by a RTPG response of new AO and ANO paths as below, why is ESXi continuing to route I/O through the same path which is marked as ANO

   Is it because the last reproted RTPG from Node4 says Node2 port (1001/0x03E9) is AO ?

 

  ALUA states for RTPG data looks as below and its mentioned in the sequence how its send and received in trace which I analyzed

 

  RTPG to Node1

  RTPG Data:

                RTPG Data:

                Node1 (1000/0x03E8) - AO   (Changed from ANO due to takeover)

                Node2 (1001/0x03E9) - ANO  (Changed from AO due to takeover)

                Node3 (1010/0x03F2) - Unavailable

                Node4 (1011/0x03F3) - Unavailable

  RTPG to Node2

  RTPG Data:

                RTPG Data:

                Node1 (1000/0x03E8) - AO   (Changed from ANO due to takeover)

                Node2 (1001/0x03E9) - ANO  (Changed from AO due to takeover)

                Node3 (1010/0x03F2) - Unavailable

                Node4 (1011/0x03F3) - Unavailable

  RTPG to Node3

  RTPG Data:

                RTPG Data:

                Node1 (1000/0x03E8) - Unavailable

                Node2 (1001/0x03E9) - AO    (No Change in path states because Node3 is out of quorum)

                Node3 (1010/0x03F2) - Unavailable

                Node4 (1011/0x03F3) - Unavailable

  RTPG to Node4

  RTPG Data:

                Node1 (1000/0x03E8) - Unavailable

                Node2 (1001/0x03E9) - AO    (No Change in path states because Node4 is out of quorum)

                Node3 (1010/0x03F2) - Unavailable

                Node4 (1011/0x03F3) - Unavailable

 

2. After the takeover is completed for Node2 (i.e Node2 completely down) RSCNs for Node2 were received from switch followed by RTPGs with the below mentioned states reported by target , why is ESXi going into an endless loop of path probing/RTPGs ? Is it because the last reproted RTPG from Node4 says Node2 port (1001/0x03E9) is AO ? when its really down and the host knows about it from the RSCN received?

 

  ALUA states for RTPG data looks as below and its mentioned in the sequence how its send and received in trace which I analyzed

 

  RTPG not send to Node2 as its down after takeover


  RTPG to Node1

  RTPG Data:

                RTPG Data:

                Node1 (1000/0x03E8) - AO

                Node2 (1001/0x03E9) - Unavailable

                Node3 (1010/0x03F2) - Unavailable

                Node4 (1011/0x03F3) - Unavailable

  RTPG to Node3

  RTPG Data:

                RTPG Data:

                Node1 (1000/0x03E8) - Unavailable

                Node2 (1001/0x03E9) - AO   (No Change in path states because Node3 is out of quorum)

                Node3 (1010/0x03F2) - Unavailable

                Node4 (1011/0x03F3) - Unavailable

  RTPG to Node4

  RTPG Data:

                RTPG Data:

                Node1 (1000/0x03E8) - Unavailable

                Node2 (1001/0x03E9) - AO     (No Change in path states because Node4 is out of quorum)

                Node3 (1010/0x03F2) - Unavailable

               Node4 (1011/0x03F3) - Unavailable

 

3.And why is ESXi always probing or sending RTPGs to path in the below order only

 

Node1 (1000/0x03E8)

Node2 (1001/0x03E9)

Node3 (1010/0x03F2)

Node4 (1011/0x03F3)


Viewing all articles
Browse latest Browse all 2328

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>