Wednesday, April 10, 2013

How server to storage data paths failover works using multipath?


■ Purpose : How server to storage data path failover works using multipath?
■ Environment : RHEL 6 

1. Default multipath.conf configuration looks like :

#multipath.conf
#NetApp recommended settings


defaults
{
        user_friendly_names yes
        max_fds max
        queue_without_daemon no
        bindings_file "/var/lib/multipath/bindings"
        uid=500
        gid=500
}
blacklist
{
        wwid DevId
        devnode "^(ram|raw|loop|fd|md|dm-|sr|scd|st)[0-9]*"
        devnode "^hd[a-z]"
        devnode "^cciss!c[0-9]d[0-9]*[p[0-9]*]"
}
devices
{
        device
        {
                vendor "NETAPP"
                product "LUN"
                getuid_callout "/sbin/scsi_id -g -u -s /block/%n"
                prio_callout "/sbin/mpath_prio_ontap /dev/%n"
                features "1 queue_if_no_path"
                hardware_handler "0"
                path_grouping_policy group_by_prio
                failback immediate
                rr_weight uniform
                rr_min_io 128
                path_checker directio
                flush_on_last_del yes
        }
}


2. View paths :

$ multipath -ll


mini_p (360a98000572d45394b34715579354446) dm-23 NETAPP,LUN
[size=1.0T][features=1 queue_if_no_path][hwhandler=0][rw]
\_ round-robin 0 [prio=8][active]
 \_ 1:0:0:0  sda        8:0     [active][ready]
 \_ 2:0:1:0  sdca       68:224  [active][ready]
\_ round-robin 0 [prio=2][enabled]
 \_ 2:0:2:0  sdct       70:16   [active][ready]
 \_ 1:0:1:0  sdq        65:0    [active][ready]


  3. Explanations are stated below :


mini_p (360a98000572d45394b34715579354446) dm-23 NETAPP,LUN
------  ---------------------------------  ---- --- ---------------
   |               |                         |    |          |-------> Product
   |               |                         |    |------------------> Vendor
   |               |                         |-----------------------> sysfs name
   |               |-------------------------------------------------> WWID of the device
   |-----------------------------------------------------------------> User defined Alias

[size=1.0T][features=1 queue_if_no_path][hwhandler=0][rw]
 ---------  ---------------------------  ----------------
     |                 |                        |--------------------> Hardware Handler
     |                 |---------------------------------------------> Features supported
     |---------------------------------------------------------------> Size of the DM


Path Group 1:


\_ round-robin 0 [prio=8][active]
-- -------------  ------  ------
 |    |              |      |----------------------------------------> Path group state
 |    |              |-----------------------------------------------> Path group priority
 |    |--------------------------------------------------------------> Path selector
 |-------------------------------------------------------------------> Path group level


First path on Path Group 1:


  \_ 1:0:0:0  sda        8:0     [active][ready]
    -------- --- ----   ------  -----
      |      |     |        |      |---------------------------------> Physical Path state
      |      |     |        |----------------------------------------> DM Path state
      |      |     |-------------------------------------------------> Major, minor numbers
      |      |-------------------------------------------------------> Linux device name
      |--------------------------------------------------------------> host,chan,scsiid,lun


Second path on Path Group 1:


  \_ 2:0:1:0  sdca       68:224  [active][ready]

Path Group 2:


 \_ 2:0:2:0  sdct       70:16   [active][ready]
 \_ 1:0:1:0  sdq        65:0    [active][ready]


4. Meaning various parameters are discussed below : 

a. polling_interval :  Specifies the interval between two path checks in seconds.
b. udev_dir     : The directory where udev device nodes are created. The default value is /dev.
c. multipath_dir : /var/lib/multipath/bindings, The directory where the dynamic shared objects are stored.
d. path_selector     : Specifies the default algorithm to use in determining what path to use for the next I/O operation.

Possible values include:

    round-robin 0: Loop through every path in the path group, sending the same amount of I/O to each.
    queue-length 0: Send the next bunch of I/O down the path with the least number of outstanding I/O requests.
    service-time 0: Send the next bunch of I/O down the path with the shortest estimated service time, which is determined

by dividing the total size of the outstanding I/O to each path by its relative throughput.
The default value is round-robin 0.

e. path_grouping_policy     : Specifies the default path grouping policy to apply to unspecified multi. paths. Possible values include:

    failover: 1 path per priority group.
    multibus: all valid paths in 1 priority group.
    group_by_serial: 1 priority group per detected serial number.
    group_by_prio: 1 priority group per path priority value. Priorities are determined by callout programs specified as global, per-controller, or per-multipath options.
  group_by_node_name: 1 priority group per target node name. Target node names are fetched in/sys/class/fc_transport/target*/node_name.

The default value is failover. 

f. getuid_callout     :

Specifies the default program and arguments to call out to obtain a unique path identifier. An absolute path is required.
The default value is /lib/udev/scsi_id --whitelisted --device=/dev/%n.

g. prio     : Specifies the default function to call to obtain a path priority value. For example, the ALUA bits in SPC-3

provide an exploitable prio value. Possible values include:
    const: Set a priority of 1 to all paths.
    emc: Generate the path priority for EMC arrays.
    alua: Generate the path priority based on the SCSI-3 ALUA settings.
    tpg_pref: Generate the path priority based on the SCSI-3 ALUA settings, using the preferred port bit.
    ontap: Generate the path priority for NetApp arrays.
    rdac: Generate the path priority for LSI/Engenio RDAC controller.
    hp_sw: Generate the path priority for Compaq/HP controller in active/standby mode.
    hds: Generate the path priority for Hitachi HDS Modular storage arrays.
        The default value is const. 

h, path_checker     :

Specifies the default method used to determine the state of the paths. Possible values include:
    readsector0: Read the first sector of the device.
    tur: Issue a TEST UNIT READY to the device.
    emc_clariion: Query the EMC Clariion specific EVPD page 0xC0 to determine the path.
    hp_sw: Check the path state for HP storage arrays with Active/Standby firmware.
    rdac: Check the path stat for LSI/Engenio RDAC storage controller.
    directio: Read the first sector with direct I/O.
The default value is directio.

i. failback     :
Manages path group failback.
    immediate :  A value of immediate specifies immediate failback to the highest priority path group that contains

active paths.
    manual : A value of manual specifies that there should not be immediate failback but that failback can happen only

with operator intervention.
    followover : A value of followover specifies that automatic failback should be performed when the first path of a

path group becomes active. This keeps a node from automatically failing back when another node requested the failover.
A numeric value greater than zero specifies deferred failback, expressed in seconds.
The default value is manual.


j.  rr_min_io :    Specifies the number of I/O requests to route to a path before switching to the next path in the current

path group. This setting is only for systems running kernels older than 2.6.31. Newer systems should use rr_min_io_rq. The

default value is 1000.

k. rr_min_io_rq :    Specifies the number of I/O requests to route to a path before switching to the next path in the current

path group, using request-based device-mapper-multipath. This setting should be used on systems running current kernels. On

systems running kernels older than 2.6.31, use rr_min_io. The default value is 1.


l. rr_weight  :    If set to priorities, then instead of sending rr_min_io requests to a path before calling path_selector to

choose the next path, the number of requests to send is determined by rr_min_io times the path's priority, as determined by

the prio function. If set to uniform, all path weights are equal. The default value is uniform.

m. no_path_retry     : A numeric value for this attribute specifies the number of times the system should attempt to use a

failed path before disabling queueing.

    fail : A value of fail indicates immediate failure, without queueing.
    queue: A value of queue indicates that queueing should not stop until the path is fixed.
The default value is 0.

n. user_friendly_names     :If set to yes, specifies that the system should use the /etc/multipath/bindings file to assign a

persistent and unique alias to the multipath, in the form of mpathn. If set to no, specifies that the system should use the

WWID as the alias for the multipath. In either case, what is specified here will be overridden by any device-specific

aliases you specify in the multipaths section of the configuration file. The default value is no.

o. queue_without_daemon :    If set to no, the multipathd daemon will disable queueing for all devices when it is shut down. The

default value is no.

p. flush_on_last_del     : If set to yes, the multipathd daemon will disable queueing when the last path to a device has

been deleted. The default value is no.

q. max_fds  :    Sets the maximum number of open file descriptors that can be opened by multipath and the multipathd daemon.

This is equivalent to the ulimit -n command. As of the Red Hat Enterprise Linux 6.3 release, the default value is max,

which sets this to the system limit from /proc/sys/fs/nr_open. For earlier releases, if this is not set the maximum number

of open file descriptors is taken from the calling process; it is usually 1024. To be safe, this should be set to the

maximum number of paths plus 32, if that number is greater than 1024.

r. checker_timeout     : The timeout to use for path checkers that issue SCSI commands with an explicit timeout, in

seconds. The default value is taken from sys/block/sdx/device/timeout.

s. fast_io_fail_tmo  :    The number of seconds the SCSI layer will wait after a problem has been detected on an FC remote

port before failing I/O to devices on that remote port. This value should be smaller than the value of dev_loss_tmo.

Setting this to off will disable the timeout. The default value is determined by the OS.

t. dev_loss_tmo :     The number of seconds the SCSI layer will wait after a problem has been detected on an FC remote port

before removing it from the system. Setting this to infinity will set this to 2147483647 seconds, or 68 years. The default

value is determined by the OS.

No comments:

Post a Comment