Quantcast
Channel: VMware Communities : Discussion List - vSphere™ Storage
Viewing all 2328 articles
Browse latest View live

One Netapp NFS datastore is showing read only

$
0
0

Hi All,

 

Facing an issue whereas one NFS V3 datastore is showing readonly on ESXi 5.0 host tried remounting with IP and readding the IPs in access list of NFS volume from storage end but no luck.

 

Tested it by adding ESXi 6 host ip in the access list of NFS volume which worked fine. So it looks like issue with ESXi 5.0 hosts only.

 

Can someone let me know what settings are required.

 

Thanks


ESXi 5.5 Another Datastore missing

$
0
0

I've been struggling to recover a missing local vfms 5 datastore, the LUN is visible but refreshing and or rescanning the Datastore just does not appear.

I am unable to mount in CLI either.

 

The HP SmartArray P400i controller shows the disks as ok although the boot information shows

1720 - S.M.A.R.T. Hard Drive(s) Detect Imminent Failure Port 2I: Box 1: Bay 4 - suggesting a SAS disk issue is imminent.

 

Is there anything I can do to recover a vm from this missing Datastore?

 

PartedUtil shows

# partedUtil getptbl /vmfs/devices/disks/mpx.vmhba1:C0:T0:L0

gpt

71380 255 63 1146734896

1 2048 1146734591 AA31E02A400F11DB9590000C2911D1B8 vmfs 0

 

 

/vmfs/volumes # esxcli storage core device list |grep -A27 ^mpx.vmhba1:C0:T0:L0

mpx.vmhba1:C0:T0:L0

   Display Name: Local VMware Disk (mpx.vmhba1:C0:T0:L0)

   Has Settable Display Name: false

   Size: 559929

   Device Type: Direct-Access

   Multipath Plugin: NMP

   Devfs Path: /vmfs/devices/disks/mpx.vmhba1:C0:T0:L0

   Vendor: VMware

   Model: Block device

   Revision: 1.0

   SCSI Level: 2

   Is Pseudo: false

   Status: on

   Is RDM Capable: false

   Is Local: true

   Is Removable: false

   Is SSD: false

   Is Offline: false

   Is Perennially Reserved: false

   Queue Full Sample Size: 0

   Queue Full Threshold: 0

   Thin Provisioning Status: unknown

   Attached Filters:

   VAAI Status: unsupported

   Other UIDs: vml.0000000000766d686261313a303a30

   Is Local SAS Device: false

   Is Boot USB Device: false

   No of outstanding IOs with competing worlds: 32

 

 

offset="128 2048"; for dev in `esxcfg-scsidevs -l | grep "Console Device:" | awk {'print $3'}`; do disk=$dev; echo $disk; partedUtil getptbl $disk; { for i in `echo $offset`; do echo "Checking offset found at $i:"; hexdump -n4 -s $((0x100000+(512*$i))) $disk; hexdump -n4 -s $((0x1300000+(512*$i))) $disk; hexdump -C -n 128 -s $((0x130001d + (512*$i))) $disk; done; } | grep -B 1 -A 5 d00d; echo "---------------------"; done

 

Result -

---------------------

/vmfs/devices/disks/mpx.vmhba1:C0:T0:L0

gpt

71380 255 63 1146734896

1 2048 1146734591 AA31E02A400F11DB9590000C2911D1B8 vmfs 0

Checking offset found at 2048:

0200000 d00d c001

0200004

1400000 f15e 2fab

1400004

0140001d  4c 43 4c 5f 52 41 49 44  30 00 00 00 00 00 00 00  |LCL_RAID0.......|

0140002d  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|

 

vmkernel.log output -

2017-06-07T17:40:21.582Z cpu2:32787)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe:237: NMP device "mpx.vmhba1:C0:T0:L0" state in doubt; requested fast path state update...

2017-06-07T17:40:21.582Z cpu2:32787)ScsiDeviceIO: 2337: Cmd(0x412e8087c140) 0x28, CmdSN 0x2c5 from world 33801 to dev "mpx.vmhba1:C0:T0:L0" failed H:0x3 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0.

2017-06-07T17:40:24.695Z cpu2:32825)<4>cciss: cmd 0x4109904559c0 has CHECK CONDITION  byte 2 = 0x3

2017-06-07T17:40:24.695Z cpu2:32787)NMP: nmp_ThrottleLogForDevice:2321: Cmd 0x28 (0x412e80859ac0, 33801) to dev "mpx.vmhba1:C0:T0:L0" on path "vmhba1:C0:T0:L0" Failed: H:0x3 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0. Act:EVAL

2017-06-07T17:40:24.695Z cpu2:32787)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe:237: NMP device "mpx.vmhba1:C0:T0:L0" state in doubt; requested fast path state update...

2017-06-07T17:40:24.695Z cpu2:32787)ScsiDeviceIO: 2337: Cmd(0x412e80859ac0) 0x28, CmdSN 0x2c7 from world 33801 to dev "mpx.vmhba1:C0:T0:L0" failed H:0x3 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0.

2017-06-07T17:40:27.807Z cpu2:32783)<4>cciss: cmd 0x4109904559c0 has CHECK CONDITION  byte 2 = 0x3

2017-06-07T17:40:27.807Z cpu2:32787)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe:237: NMP device "mpx.vmhba1:C0:T0:L0" state in doubt; requested fast path state update...

2017-06-07T17:40:27.807Z cpu2:32787)ScsiDeviceIO: 2337: Cmd(0x412e80859200) 0x28, CmdSN 0x2c9 from world 33801 to dev "mpx.vmhba1:C0:T0:L0" failed H:0x3 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0.

2017-06-07T17:40:30.920Z cpu2:32825)<4>cciss: cmd 0x4109904559c0 has CHECK CONDITION  byte 2 = 0x3

2017-06-07T17:40:30.920Z cpu2:32787)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe:237: NMP device "mpx.vmhba1:C0:T0:L0" state in doubt; requested fast path state update...

2017-06-07T17:40:30.920Z cpu2:32787)ScsiDeviceIO: 2337: Cmd(0x412e80858a80) 0x28, CmdSN 0x2cb from world 33801 to dev "mpx.vmhba1:C0:T0:L0" failed H:0x3 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0.

2017-06-07T17:40:34.032Z cpu2:32779)<4>cciss: cmd 0x4109904559c0 has CHECK CONDITION  byte 2 = 0x3

2017-06-07T17:40:34.032Z cpu2:32787)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe:237: NMP device "mpx.vmhba1:C0:T0:L0" state in doubt; requested fast path state update...

2017-06-07T17:40:34.032Z cpu2:32787)ScsiDeviceIO: 2337: Cmd(0x412e80858300) 0x28, CmdSN 0x2cd from world 33801 to dev "mpx.vmhba1:C0:T0:L0" failed H:0x3 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0.

2017-06-07T17:40:37.144Z cpu2:32793)<4>cciss: cmd 0x4109904559c0 has CHECK CONDITION  byte 2 = 0x3

2017-06-07T17:40:37.145Z cpu2:32787)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe:237: NMP device "mpx.vmhba1:C0:T0:L0" state in doubt; requested fast path state update...

2017-06-07T17:40:37.145Z cpu2:32787)ScsiDeviceIO: 2337: Cmd(0x412e808569c0) 0x28, CmdSN 0x2db from world 33801 to dev "mpx.vmhba1:C0:T0:L0" failed H:0x3 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0.

2017-06-07T17:40:40.255Z cpu2:32843)<4>cciss: cmd 0x4109904559c0 has CHECK CONDITION  byte 2 = 0x3

2017-06-07T17:40:40.255Z cpu2:32787)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe:237: NMP device "mpx.vmhba1:C0:T0:L0" state in doubt; requested fast path state update...

2017-06-07T17:40:40.255Z cpu2:32787)ScsiDeviceIO: 2337: Cmd(0x412e80856240) 0x28, CmdSN 0x2dd from world 33801 to dev "mpx.vmhba1:C0:T0:L0" failed H:0x3 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0.

2017-06-07T17:40:43.367Z cpu2:32779)<4>cciss: cmd 0x4109904559c0 has CHECK CONDITION  byte 2 = 0x3

2017-06-07T17:40:43.367Z cpu2:32787)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe:237: NMP device "mpx.vmhba1:C0:T0:L0" state in doubt; requested fast path state update...

2017-06-07T17:40:43.367Z cpu2:32787)ScsiDeviceIO: 2337: Cmd(0x412e80855ac0) 0x28, CmdSN 0x2df from world 33801 to dev "mpx.vmhba1:C0:T0:L0" failed H:0x3 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0.

2017-06-07T17:40:46.479Z cpu2:32779)<4>cciss: cmd 0x4109904559c0 has CHECK CONDITION  byte 2 = 0x3

2017-06-07T17:40:46.480Z cpu2:32787)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe:237: NMP device "mpx.vmhba1:C0:T0:L0" state in doubt; requested fast path state update...

2017-06-07T17:40:46.480Z cpu2:32787)ScsiDeviceIO: 2337: Cmd(0x412e80855340) 0x28, CmdSN 0x2e1 from world 33801 to dev "mpx.vmhba1:C0:T0:L0" failed H:0x3 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0.

2017-06-07T17:40:46.480Z cpu3:33801)Fil3: 15338: Max timeout retries exceeded for caller Fil3_FileIO (status 'Timeout')

2017-06-07T17:40:48.804Z cpu1:33801)Config: 346: "SIOControlFlag2" = 0, Old Value: 0, (Status: 0x0)

2017-06-07T17:40:52.761Z cpu1:34271)WARNING: UserEpoll: 542: UNSUPPORTED events 0x40

2017-06-07T17:40:53.563Z cpu2:34271)WARNING: LinuxSocket: 1854: UNKNOWN/UNSUPPORTED socketcall op (whichCall=0x12, args@0xffd12d8c)

2017-06-07T17:40:54.445Z cpu3:33801)Config: 346: "VMOverheadGrowthLimit" = -1, Old Value: -1, (Status: 0x0)

2017-06-07T17:40:57.728Z cpu2:33989)Hardware: 3124: Assuming TPM is not present because trusted boot is not supported.

2017-06-07T17:41:00.176Z cpu2:34050)<4>cciss: cmd 0x4109904559c0 has CHECK CONDITION  byte 2 = 0x3

2017-06-07T17:41:00.177Z cpu2:32787)NMP: nmp_ThrottleLogForDevice:2321: Cmd 0x28 (0x412e8086a4c0, 33986) to dev "mpx.vmhba1:C0:T0:L0" on path "vmhba1:C0:T0:L0" Failed: H:0x3 D:0x0 P:0x0 Possible sense data: 0x5 0x20 0x0. Act:EVAL

2017-06-07T17:41:00.177Z cpu2:32787)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe:237: NMP device "mpx.vmhba1:C0:T0:L0" state in doubt; requested fast path state update...

2017-06-07T17:41:00.177Z cpu2:32787)ScsiDeviceIO: 2337: Cmd(0x412e8086a4c0) 0x28, CmdSN 0x38e from world 33986 to dev "mpx.vmhba1:C0:T0:L0" failed H:0x3 D:0x0 P:0x0 Possible sense data: 0x5 0x20 0x0.

2017-06-07T17:41:00.247Z cpu2:34933)Boot Successful

2017-06-07T17:41:01.007Z cpu3:33804)Config: 346: "SIOControlFlag2" = 1, Old Value: 0, (Status: 0x0)

2017-06-07T17:41:01.736Z cpu1:34988)MemSched: vm 34988: 8263: extended swap to 8192 pgs

iSCSI storage throughput (benchmark)

$
0
0

Hi All,

 

I am trying to measure and analyze the correctness of data in freshly built iSCSI storage (no VMs) throughput on vSphere 6.5 (Enterprise Plus license).

 

Here is a brief description of the hardware

  1. HP Proliant DL385p G8 server with 8 NICs (4 built-in and 4 external)
  2. QNAP NAS TS-EC879U-RP with 6 NICs (2 built-in 4 external)
  3. Cisco 2960s switch

 

NAS configuration:

  1. There are 4 1TB LUNs on RAID 10 array. They are mapped into 2 iSCSI targets with 2 LUNs per target. Let me call them T0L1, T0L1, T1L0, and T1L1
  2. There is one more LUN mapped to a dedicated target
  3. There are 5 NICs dedicated for iSCSI traffic. All on the same subnet.

 

Server configuration:

  1. There are two 960 GB SSD datastores
  2. There are two iSCSI datastores: each maps LUNs from different targets. For instance DS_1 is comprised from T0L0 and T1L0, and DS_2 is comprised from T0L1 and T1L1
  3. There are 4 NICs dedicated for iSCSI traffic. All on the same vDS but each linked to different uplink by means of "Teaming and failover" policy
  4. Overall there are 100 paths on iSCSI Software adapter: (5 LUNs on NAS) x (5 NICs on NAS) x (4 VMKernel adapters on server). All configured with Vmware Round Robin policy.

 

Switch configuration:

  1. A dedicated VLAN for iSCSI traffic that connects all corresponding ports of NAS and ESXi server

 

Here is the experiment

  1. I created a 10GB file by SSH to ESXi and running the dd command on one of the SSD datastores (dd if=/dev/urandom of=10GB.bin bs=64M count=160).
  2. I first used datastore browser in Web Client (vCenter that spans this particular ESXi) to copy 10GB file from one SSD datastore to another SSD datastore. It took 35 seconds, so I calculated throughput of 10*1024*1024*1000 / 35 = 299,593,143 B/s
  3. I then copied the same file from the first SSD datastore to the first iSCSI datastore. It took 18:29 minutes. So the throughput was 9,455,148 B/s which is way lower than my expectations
  4. Then I copied that same file from the first iSCSI datastore to the second iSCSI datastore and it took 4:42 minutes with the throughput of 37,183,546 B/s (almost 4 times faster than with SSD -> iSCSI)
  5. Finally I copied it from iSCSI datastore to SSD datastore and it took 2:42 minutes with the throughput of 64,726,913 B/s.

 

Observations:

  1. Experiment 1 (item 2 from the above list) looks fair
  2. Experiment 2 (item 3) looks strange. I observed network traffic on QNAP NAS while copying the file and it appears that only one NIC (out of 5 available on NAS) was used to transfer the data. I would expect something around 70 MB/s instead of 9.4 MB/s that I got.
  3. Experiment 3 (item 4) contradicts my expectation on how iSCSI "Hardware Acceleration" supposed to work. I could be mistaken but I thought that hardware acceleration allows iSCSI to iSCSI data copying without sending data to the ESXi host. Instead I observed that all 5 NICs on the NAS were reading and writing data at the same time with approximate speed of 10 MB/s (illustrating screenshot attached).
  4. Experiment 4 (item 5) looks fair as it shows iSCSI to SSD transfer over a single NIC with the throughput of approx 64MB/s (reasonable for a single 1Gb NIC).

 

My questions:

  1. Does my storage topology (number of NICs, number of paths, path selection algorithm, etc) look reasonable? Can it be improved having the described resources?
  2. Does my experiment methodology appear reasonable?
  3. Do my metrics and the way how I measured them make sense?
  4. Would a VM migration of equivalent size (instead of 10GB file transfer) show different numbers?
  5. Would a directory transfer containing 100 files of 100MB each make any difference?
  6. Why does Experiment 2 (SSD to iSCSI) show such a low throughput?
  7. Why does "Hardware Acceleration" work in a way where all NAS NICs are being used instead of network-less file copying internally on the NAS?
  8. Any recommendations of a benchmarking methodology for vSphere iSCSI traffic?

 

Thanks in advance for all your comments.

 

--

Simon

NFC protocol

$
0
0

is there any vsphere\vstore API's to use NFC protocol for file copy\transfer?

local datastore inactive in vCenter but accessible in SSH, how to fix?

$
0
0

Hello,

 

One of ESXi hosts showing "not responding" in vCenter, all VMs on it are "disconnected", it appears the two local datastores are "inactive", but I can still SSH to the hosts, and both local datastores are mounted and browsable.

 

How to fix this? reboot?

 

Thanks.

William

Lost access to volume, P2000G3, ESXi build 5050593

$
0
0

We have a P2000 G3 iSCSI array populated with 72 disks which we are using as a vSphere 6 replication target. The iSCSI traffic is switched through 4 dedicated 2920 switches.

The storage is carved up into a few different RAID10 and RAID5 LUNs. The physical drives are all SAS 15k 300Gb.  Round Robin for the P2000 LUNs with 1 IOPS.

 

At the moment replication is turned on for around 20 smallish VM guests.

The source array is a NIMBLE which is not under any excessive load.

 

We are seeing intermittent errors for any given volumes on the P2000:

 

Lost access to volume 58cc4b53-78f6b5cb-812a-8cdcd412f1b8

(repltarvdi101) due to connectivity issues. Recovery attempt is

in progress and outcome will be reported shortly.

info

20/03/2017 19:52:20

repltarvdi101

 

Successfully restored access to volume

58cc4b53-78f6b5cb-812a-8cdcd412f1b8 (repltarvdi101)

following connectivity issues.

info

20/03/2017 19:53:42

esxvdi1.arb.co.uk

 

The switch ports all have flow control enabled on the 2920s.

 

I'm fairly confident (but not sure) that there is no bottleneck at the physical disks or network and my intuition is that we have a configuration issue.

I'm aware of the possibility that the P2000G3 may be just entirely incompatible, but I'd rather keep looking for answers for a while yet before I have to face up to that possibility.

 

Any ideas much appreciated.

 

thanks

P2000G3 - Paid Support

$
0
0

We have a long-standing issue with a P2000G3 storage array
and I am trying to bring the problem to a definitive resolution.

The fundamental issue is that we are experiencing intermittent
LUN disconnects. This has been occurring since we shifted away from ESXi 4.x to
ESXi 6.0.

The P2000G3 is not on the HCL for ESXi 6.

If I raise a Paid Support Incident, will VMware make a
genuine effort to provide me with a diagnosis of the issue, or will they decline
to look into it properly on the basis of non-HCL-compliance?

What I am looking for from the Paid Support Incident is either
a reassurance that the issue is definitely caused by an actual incompatibility
(in which case I will feel confident in investing in alternative storage to
solve the problem) or, some other configuration issue (in which case it is
worth continuing to troubleshoot the problem with the existing storage).

I don’t want pay for a support incident if VMware will just
say: “we won’t investigate – it isn’t on the HCL”

vmware esxi multi writer Centos 7


Storage vMotion - Between ESXi 5.1 and ESXi 6.1 connected to different storage

$
0
0

Hello Experts,

  

We have 2 data center where in one data center has ESXi 6.1 with v5000 storage and another data center with ESXi 5.1 connected to EMC Storage.

 

  We need to move virtual machine from EMC storage to v5030 and from host 5.1 to 6.1 version.

 

VMFS on 5.1 esxi host is VMFS 3

Is Storage vMotion possible in these scenarios?

At the moment both the storage are not connected to each other.  What connectivity we will needed between the storage?

What connectivity is required between ESX 5.1 cluster and ESXi 6.1 cluster

VMFS6 and Auto-reclaim with Dell Compellent

$
0
0

I am trying to see if there is anyone with Dell Compellent (SCOS v7x) that is successfully seeing Auto-Reclaim work against a VMFS6 volume.

 

The HCL says that Dell Compellent with v7.x SCOS code will support auto-reclamation of VMFS6 volumes.

 

I am not seeing that happen though. Even after about 5 days of monitoring and testing by creating or cloning VMs on VMFS6 and then deleting them. I see no VAAI commands being issued in estop after days. I cannot really find anything in any logs that states its not working.

 

A theory right now is that Dell Compellent default 2MB data page size is being used and Unmap granularity, in which case it will not work at all.

 

However, Dell Compellent with v7.x SCOS is supported on the VMware HCL, which is the strange part to me. No where does it say anything about the page size.

Can't add local storage after change raid config :(

$
0
0

Hallo , I have a server running esx 5.5 with two virtual disk (the first RAID 1 , Seccond RAID 5) local storage.

I change the seccond RAID to Raid 10. Restart my server.

Go to : Add the storage, Storage Type Disk/LUN, Select Disk/LUN : I see the serial attached SCSI Disk ( Path ID ... , LUN 2, Non SSD, 2.73 TB ) ,

Next , VMFS-5, Current disk layout : ... The hard disk is blank ,

 

NEXT , : "Call "HostDatastoreSystem.QueryVmfsDatastoreCreateOptions" for object "ha-datastoresystem" on ESXi "130.0.5.2" failed." .

The storage before was bigger in size 4.5 TB (Raid 5). I have also a SAN with 3 LUN's connected to the ESX via iSCSI.

 

Does somebody know why I can't add my local storage after I just change the raid config ?  TX !

Best Practice for connecting NAS Storgae

$
0
0

Hi There,

 

We have ESXi servers with 4 1GB vNICS. vmnic0 and vmnic1 are configured for management network (vmkernal port) and vmnic and vmnic3 are configured for VM traffic (virutal machine port group). I have to connect Synology NAS and present storage to ESXi.

 

Considering this sceneio and no more NICs added to the bpx, what is the best way to connect NAS. I do not want to loose redundancy for both management and VM port group.

 

As of now I have connected like this;

* Connected NAS to switch

* Assigned the IP address to NAS from same subnet of management network.

* Added software iSCSI adaprt, performed discovery and present storage to ESXi box

About VMWare ESXi 5.1 Storage vMotion

$
0
0

In papaer http://www.vmware.com/content/dam/digitalmarketing/vmware/en/pdf/techpaper/vmware-vsphere51-vmotion-performance-white-paper.pdf, mention that

“Writes to a region that has already been copied will be mirrored to the source and destination – as the DM has already passed this area and any new updates must be reflected by the IO Mirror. Writes to the region currently being copied (in between the two offsets) will be deferred and placed into a queue. Once the DM IO completes we

enqueue those writes and unlock the region by updating the offsets. As part of the updating operation we wait for inflight writes to complete. The final region is not mirrored and all writes are issued to the source only”

 

I would like to know, for not mirrored scene, whether the following optimization can be carried out?

 

write request is not migrated and the size of the new write request data block  is equal to the size of the migration data block,directly writing the new write request data block

into the source  and the destination,and update has been migrated. Because it can reduce the operation of reading the metadata at one time, reduce the storage bandwidth occupancy of the source storage, Reduced impact on business IO performance。

 

And If the size of the data block is not same to the migrate data, all writes are issued to the source only.

 

Looking forward to your reply, thks!

Used space increasing in datastore but not in Windows

$
0
0

Hi all,

 

My vms using disk as thin provisioning.

 

today i checked the size disk in datastore and see that the disk for C drive is increasing day by day. yesterday was 50GB but today is 75Gb

 

but i remote to Windows to check the used space yesterday and today are same. only 35Gb used.

 

the total space is same. only used space are diffirent.

 

anyone can tell me why ?

Correct steps to remove a datastore in vmware

$
0
0

Good afternoon mates,

 

To remove a datastore in vmware, I must follow the following step:

 

1-Unmount all my Host Esxi from the datastore that I will remove. Since the datastore can see all my Esxi.

 

2- Remove the datastore to be deleted.

 

3-Delete datastore.

 

These three steps that I am indicating are the correct ones to execute?

 

Waiting for your comments.

 

Regards...


Doubt about Write Latency observed in Datastore

$
0
0

Hi,

   I'm having the following doubt. I have a VM with a virtual disk that resides in a local Datastore backed up by 7 SAS disk in RAID5. The virtual disk was created as Thick Provission Lazy Zero. I am running inside the VM a storage software (Datacore) which zeroes each disk that is added to a new pool.

   So, I added the VM's virtual disk (which is observed as a local disk inside the VM) to a pool in DataCore software and it begins to write all zeros in all the whole disk.

   When I go to vSphere Web Client and I look at the VM's Datastore Write Latency (it is to say, I have the VM selected in the inventory, and I go to the performance advanced chart, selecting the Write Latency counter) I observe a constant one of 2ms (perhaps some spikes to 5ms, but the average is 2ms).

   But if I observed the same counter but having the ESXi host selected in the inventory, I observe that the write latency on the SAME Datastore is 15ms.

   It is the only VM having activity in the host, so, the only VM issuing commands on that Datastore is the one I am describing.

   How can be possible that I observe a smaller write latency at the VM level than the one I am observing at the host level? It is supposed the latency increases as the command is going down from the VM to the disk, and not the opposite.

    What those counters are telling me is that the VM is receiving the ACKs quicker than the host! It sounds impossible.

   

    Could be possible that the cause of this behaviour it because the virtual disk in the VM is in Lazy Zero format instead of Eager Zeroed?

 

Thanks in advanced,

Guido.

Remove dead path to iSCSI storage

$
0
0

Version     ESX 4.0.0, 236512

Storage     iSCSI

Initiator     Software

A LUN that was presented to my ESX host was deleted on the san without removing it from the host and now I’m getting failover errors in the vmkernel.

 

Mar 30 06:49:40 bl480g1-03 vmkernel: 20:09:00:10.348 cpu1:4222)WARNING: NMP: nmp_DeviceAttemptFailover: Logical device "naa.6090a01820dc7c5f9b97a440089a3483": awaiting fast path state update...

Mar 30 06:49:41 bl480g1-03 vmkernel: 20:09:00:11.348 cpu1:4222)WARNING: NMP: nmp_DeviceAttemptFailover: Retry world failover device "naa.6090a01820dc7c5f9b97a440089a3483" - issuing command 0x4100030b07c0

Mar 30 06:49:41 bl480g1-03 vmkernel: 20:09:00:11.348 cpu1:4222)WARNING: NMP: nmp_DeviceAttemptFailover: Retry world failover device "naa.6090a01820dc7c5f9b97a440089a3483" - failed to issue command due to Not found (APD), try again...

Mar 30 06:49:41 bl480g1-03 vmkernel: 20:09:00:11.348 cpu1:4222)WARNING: NMP: nmp_DeviceAttemptFailover: Logical device "naa.6090a01820dc7c5f9b97a440089a3483": awaiting fast path state update...

Mar 30 06:49:42 bl480g1-03 vmkernel: 20:09:00:12.347 cpu6:4222)WARNING: NMP: nmp_DeviceAttemptFailover: Retry world failover device "naa.6090a01820dc7c5f9b97a440089a3483" - issuing command 0x4100030b07c0

Mar 30 06:49:42 bl480g1-03 vmkernel: 20:09:00:12.347 cpu6:4222)WARNING: NMP: nmp_DeviceAttemptFailover: Retry world failover device "naa.6090a01820dc7c5f9b97a440089a3483" - failed to issue command due to Not found (APD), try again...

 

With esxcfg-mpath both paths are showing dead.

How do I remove these dead paths?

I am out of ideas - High Latency on a LUN - on hosts with no VMs

$
0
0

This has been quite an eventful week with not much sleep.

 

At the moment we are in a situation where no one knows what else we can do. Let me first explain what happened.

 

 

We introduced an additional blade to our infrastructure. It was load-tested for 10 days, all stable and nice. Monday then that host disappears from the vCenter.

The host itself is still up, just cannot connect to vCenter / Client. VMs are up too so that was a bonus. After hours with VMware support they basically gave up and we had not choice but to bounce the host - well, to add insult to the injury, HA didn't work and did not fail the VMs over.

Problem in scenarios like that is that while the (disconnected) host is still in vCenter - the VMs are too - which are disconnected but showing as powered on - which they are not. So you cannot even migrate them (like you can with powered off VMs).

 

 

Next "solution" was to remove the host from vCenter. At this stage we were finally able to add the VMs back to the inventory using other hosts.

Of course there were some corruptions / broken VMs / Fricked up VMDK descriptor files and the list (and hours) go on.

 

 

We initially thouight that was it - far from it ... we continued to see latencies on all datastores / hosts of 250k-700k ms ... yepp .. 700.000 ms ...

 

A power-on operation (or even adding VMs back into the inventory) took up to 30 minutes / VM.

 

Anyway ... we obviously opened tickets with the storage vendor as well and they of course blamed VMware .. I actually managed to get both in a phone conference, VMware and Storage vendor with VMware confirming yet again a storage issue. Three days later still no result.

 

 

At some point we had a hunch - all these VMs, which were affected, were also migrated using DRS (when you least need it) which bombed out when the host crashed the second time (before we finally pulled the blade).

 

 

Locks - our guess .. So some VMs we expected to be the culprit, were rebooted .. and ola ... latency gone.

 

 

No one can explain what happens, why that "fixed" some issues, but heh - we were happy ...

 

Well now the weirdest thing ... and to actually finally get to the point, we have two hosts .. EMPTY hosts .. no VMs, showing the same sort of device latency on ONE particular datastore. As soon as you put the hosts back into maintenance mode, the latency goes down to nothing

 

 

 

Attached shows where the host was taken out of maintenance mode and put back in again.

 

 

Now VMkernel logs show some SCSI aborts and yes, this is likely due to storage issues which we may still have - however, how can the only hosts showing now a latency with no VMs on it when they are out of maintenance mode, but look fine when in maintenance mode and all other hosts with the VMs running, are fine ?

 

 

Now we are in a blame loop - storage vendor blames vmware, vmware blames storage vendor.

 

VMware Supports also just shrugs when I try to get an explanation how a rebooted VM can cause the latency to calm down as it surely shouldn't make a difference if the storage back end is to be blamed ....

 

So I hope someone here can give me some pointers, because right now we are out of ideas (and clearly so are the vendors)

Storage help ESXI 5.5

$
0
0

Hi maybe someone can help me with this.

I have a VM datastore that has low disk space.

If I browse datastore, I don't have for 127Gb of files (see sreccnshot below).

I went on SAN Equalogic software and it says that datastore has 47Gb of free space and I have same amount of free space left on the windows partition.

How can VMWare see only 1.92Gb of free space?

 

 

Thanks for your help

Extending Physical Mode RDM.

$
0
0

I have a need to extend a disk that is currently presented as a physical mode RDM to a single VM on an ESXi host running 5.5 build 3116895.

 

I have followed the procedure per the article below:

 

Expanding the size of a Raw Device Mapping (RDM) (1007021) | VMware KB

 

To clarify, the process I followed was:

1.  Added the additional storage to the SAN (EMC VNX5600)

2.  Ran HBA rescan on the host.

3.  Ran Rescan in Disk Management on the Windows VM.

 

So here is the problem:

 

Neither my host nor my VM guest OS appears to recognize the additional storage.  There are no snapshots for the VM (physical mode RDM does not support snapshots).  I've followed this same process on another VM on another host in a different cluster (with the same build of ESXi) and I do not experience the same issue.  The host recognizes the new storage immediately after the rescan and it is immediately displayed as unallocated space on the VM guest after running a rescan disks from within windows disk management tool. 

 

Unfortunately, this is a business critical application that cannot be restarted without upsetting a lot of people and going through a difficult approval process.   According to the article I linked above, and my testing, I should not have to restart this in order to recognize the additional storage on the LUN.

 

I was curious if anyone else had run into this in the past and if so, what action needed to be taken to resolve the problem?

Viewing all 2328 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>