Incorrect NFS Datastore mount
I was contacted the other day with an issue where someone was trying to create a cluster in a box, and they were running into an issue. When trying to power on the VM with the shared, eager-zeroed-thick disk that was created they were running into an error message, saying that they did not have permission to access the disk.
I checked to see if the VM settings were correct and that the SCSI adapter was set to a different controller with the correct disk sharing options. It was.
I enquired how was the eager zeroed disk created, and that was also done correctly. The admin logged into the ESX host and created the disk with the following command on the datastore.
sudo vmkfstools -c 60G -d eagerzeroedthick /vmfs/volumes/IT_DBM_QT/shared/shared.vmdk
I logged into the host and noticed something weird. The disk was created in a folder called shared but the permissions on the folder and the files were not correct. they were set with owner:group which was not root:root the way it should have been.
From the ESX host
drwxrwxrwx 1 root root 4096 Nov 1 14:43 .
drwxr-xr-x 1 root root 512 Nov 1 15:23 ..
drwxr-xr-x 1 65534 65534 4096 Sep 20 14:31 md1
drwxr-xr-x 1 65534 65534 4096 Oct 27 11:23 md2
drwxr-xr-x 1 65534 65534 4096 Sep 19 16:21 mich_bi
drwxr-xr-x 1 65534 65534 4096 Oct 14 18:04 oem11
drwxr-xr-x 1 65534 65534 4096 Sep 16 10:20 orabi11
drwxr-xr-x 1 65534 65534 4096 Oct 3 12:09 rh5.5-m2.kickstart
drwxr-xr-x 1 65534 65534 4096 Sep 12 12:36 rh5-dg1
drwxr-xr-x 1 65534 65534 4096 Oct 14 16:53 rh5-dg1_1
drwxr-xr-x 1 65534 65534 4096 Sep 12 15:46 rh5-dg1_2
drwxr-xr-x 1 65534 65534 4096 Sep 13 11:05 rh5-dg2
drwxr-xr-x 1 65534 65534 4096 Sep 12 15:08 rh5-dg2_1
drwxr-xr-x 1 65534 65534 4096 Nov 1 15:07 rh5-rac1
drwxr-xr-x 1 65534 65534 4096 Nov 1 15:09 rh5-rac2
drwxrwxr-x 1 admin admin 4096 Nov 1 14:43 shared
drwxrwxrwx 1 root root 4096 Nov 1 12:00 .snapshot
Now the listing above shows that the owner was 65534:65534
which was not right.
Now who is user 65534
? I tried to cat /etc/passwd | grep 65534
but did not find anything there.
I have the NFS datastore mounted outside of the ESX host, for administrative purposes, so I decided to check from there. And low and behold this is what I saw.
drwxrwxrwx 17 root root 4096 2010-11-01 14:43 .
drwxr-xr-x 20 root root 4096 2010-11-01 15:21 ..
drwxr-xr-x 2 nobody nogroup 4096 2010-09-20 14:31 md1
drwxr-xr-x 2 nobody nogroup 4096 2010-10-27 11:23 md2
drwxr-xr-x 2 nobody nogroup 4096 2010-09-19 16:21 mich_bi
drwxr-xr-x 2 nobody nogroup 4096 2010-10-14 18:04 oem11
drwxr-xr-x 2 nobody nogroup 4096 2010-09-16 10:20 orabi11
drwxr-xr-x 2 nobody nogroup 4096 2010-10-03 12:09 rh5.5-m2.kickstart
drwxr-xr-x 2 nobody nogroup 4096 2010-09-12 12:36 rh5-dg1
drwxr-xr-x 2 nobody nogroup 4096 2010-10-14 16:53 rh5-dg1_1
drwxr-xr-x 2 nobody nogroup 4096 2010-09-12 15:46 rh5-dg1_2
drwxr-xr-x 2 nobody nogroup 4096 2010-09-13 11:05 rh5-dg2
drwxr-xr-x 2 nobody nogroup 4096 2010-09-12 15:08 rh5-dg2_1
drwxr-xr-x 2 nobody nogroup 4096 2010-11-01 15:07 rh5-rac1
drwxr-xr-x 2 nobody nogroup 4096 2010-11-01 15:09 rh5-rac2
drwxrwxr-x 2 admin admin 4096 2010-11-01 14:43 shared
drwxrwxrwx 10 root root 4096 2010-11-01 12:00 .snapshot
The owner was nobody:nogroup
.
I then asked the storage admin to please check the export on the NetApp filer was defined and got this back
/vol/DBM/DBM_QT -sec=sys,rw
I remembered from a while back in my first VCP course that ESX hosts have to have root access to the NFS mount in order to work.
Changed the export to:
/vol/DBM/DBM_QT -sec=sys,rw,root=1.1.2.0/24:1.1.3.0/24
I now tested the creation of a new VM from the vSphere client and verified that the permissions were now created with root:root.
Strangely enough the ESX host was able to mount the datastore, machines were working as well.
I did however still have to change the permissions on the folders and files that were already created on this datastore while the mount was not correct. This I did in order to prevent further problems and rectify the incorrect permissions
chown -R root:root /vmfs/volumes/IT_DBM_QT/*
Of course this was all done with the VM’s powered off!