VDI by Day, Compute by Night - XenServer
Tony Foster posted this article earlier today about running computer on unused GPUs at night and running VDI during the day on VMware. I have been using a rudimentary batch script to do the same for XenServer for almost a year now, so I thought I'd formalize it with his inspiration!
As always, I'm sure I stole some of this code from someone else, but I have no record of it at this point.
The first step to any problem is to define the problem and the steps required to solve it...
Problem: During the day our VDI instances use a P4-2Q profile and are spread across 3 hosts. The off hours workload ideally would have a P4-8Q profile.
For this solution we will aim to evacuate 2 hosts of all VDI workloads after hours and restore them before workers return in the morning. There could be more logic here, including checking to make sure we don't see a spike in after hours usage and need to power down compute workloads to accommodate.
Basic steps to solve:
1.) In order to change the profile we must evacuate the GPU of any running VDI desktops.
We need to know what VMs are on each host, but only the ones on GPUs we want to use. We should change the allocation method to consolidate our VMs on as few GPUs as possible during migration. Then we migrate the VMs
2.) Then we must change the vGPU profile for each GPU.
3.) We must start up our compute workloads...
So first, how do we interact with XenServer?
XenServer can be interfaced through XenCenter's command line interface xe.exe. This installed with XenCenter.
See: https://www.citrix.com/blogs/2017/12/01/scripting-citrix-xenserver-with-powershell-and-command-line/
Lets define each step...
1a.) Find all hosts.
Final Code:
To undo this, we just go in reverse...
As always, I'm sure I stole some of this code from someone else, but I have no record of it at this point.
The first step to any problem is to define the problem and the steps required to solve it...
Problem: During the day our VDI instances use a P4-2Q profile and are spread across 3 hosts. The off hours workload ideally would have a P4-8Q profile.
For this solution we will aim to evacuate 2 hosts of all VDI workloads after hours and restore them before workers return in the morning. There could be more logic here, including checking to make sure we don't see a spike in after hours usage and need to power down compute workloads to accommodate.
Basic steps to solve:
1.) In order to change the profile we must evacuate the GPU of any running VDI desktops.
We need to know what VMs are on each host, but only the ones on GPUs we want to use. We should change the allocation method to consolidate our VMs on as few GPUs as possible during migration. Then we migrate the VMs
2.) Then we must change the vGPU profile for each GPU.
3.) We must start up our compute workloads...
So first, how do we interact with XenServer?
XenServer can be interfaced through XenCenter's command line interface xe.exe. This installed with XenCenter.
See: https://www.citrix.com/blogs/2017/12/01/scripting-citrix-xenserver-with-powershell-and-command-line/
Lets define each step...
1a.) Find all hosts.
xe host-list params=name-label1b.) Find all physical GPUs on each host.
xe pgpu-list host-name-label=host vendor-name="NVIDIA Corporation" params=uuid1c.) Find vGPU type on each physical GPU (we use this to filter which VMs we move)
xe pgpu-param-get uuid=pgpu param-name=enabled-VGPU-types1d.) Find resident vGPU instances on each pGPU
xe pgpu-param-get uuid=pgpu param-name=resident-VGPUsThis output is messy, so we parse it by grabbing the piece we want from each line:
FOR /F "tokens=*" %%L IN (output_from_above) DO (
FOR %%a in (%%L) DO echo %%a)
1e.) Get the VM uuid from each vGPU instance.xe vgpu-param-get uuid=vgpu_id param-name=vm-uuid1f.) Migrate VMs off a host
xe vm-migrate uuid=vm_uuid host=migrate_to_host live=trueNOTE: to change the allocation algorithm before we motion them we can use:
xe gpu-group-listto find the uuid of the GPU group that contains the NVIDIA GPUs (my case it was 12bf28ce-0f9c-b73f-0310-9e4218b00893). It seems to be static so I hard coded it.
xe gpu-group-param-set uuid=uuid_from_above allocation-algorithm=depth-first2) Change vGPU types on the evacuated GPUs
xe pgpu-param-set uuid=pgpu enabled-VGPU-types=vgpu_type3) Resume or Start VMs
xe vm-resume vm=vm_nameor
xe vm-start vm=vm_name
Final Code:
@echo off set logfile=logfile.log echo Batch command: %0 %* >%logfile% echo Started at %date% %time% >>%logfile% echo ------------------------ >>%logfile% ::Configure these variable for your environment :: --------------------------------------------------- set xpath=C:\Program Files (x86)\Citrix\XenCenter set xuser=user set xpwd=pwd ::Set the xmaster variable to the host you expect to be the pool master set xmaster=host_master ::Compiles the full remote xe command to replace local xe for least confusion set xe="%xpath%\xe.exe" -s %xmaster% -u %xuser% -pw %xpwd% ::Change vGPU Allocation Policy :: NVIDIA P4s %xe% gpu-group-param-set uuid=uuid_of_gpu_group allocation-algorithm=depth-first >>%logfile% ::MIGRATE ALL RUNNING VMs TO A SINGLE HOST ::Get List of Hosts %xe% host-list params=name-label>hosts.txt ::For each host, get list of pGPU FOR /F "tokens=4 delims=: " %%H IN (hosts.txt) DO %xe% pgpu-list host-name-label=%%H vendor-name="NVIDIA Corporation" params=uuid>>pgpus.txt ::For each pGPU get list of vGPU instances (only if they are of a specific type) FOR /F "tokens=4 delims=: " %%P IN (pgpus.txt) DO ( %xe% pgpu-param-get uuid=%%P param-name=enabled-VGPU-types | find "vgpu_type" && %xe% pgpu-param-get uuid=%%P param-name=resident-VGPUs>>temp.txt ) ::Get the VM uuid from each vGPU instance FOR /F "tokens=*" %%L IN (temp.txt) DO ( FOR %%a in (%%L) DO echo %%a>>vgpus.txt ) del temp.txt FOR /F "tokens=*" %%V in (vgpus.txt) DO %xe% vgpu-param-get uuid=%%V param-name=vm-uuid>>vms.txt echo VMs on %vms% >>%logfile% ::Migrate VMs if they aren't on a specific host (move other VMs to that host) FOR /F "tokens=*" %%M in (vms.txt) DO ( %xe% vm-param-get uuid=%%M param-name=resident-on | find "ignoredHost_uuid" || %xe% vm-migrate uuid=%%M host=host_migrate_to live=true >>%logfile% ) ::Change vGPU types on pGPUs (I was lazy here and hardcoded the uuids, but we could pull them from the pgpu.txt file above) also vgpu type comes from xe vgpu-type-list %xe% pgpu-param-set uuid=pgpu_uuid enabled-VGPU-types=vgpu_type >>%logfile% ::Resume VMs %xe% vm-resume vm=vm_name >>%logfile% ::Cleanup files used del hosts.txt del pgpus.txt del vgpus.txt del vms.txt echo ---------------------- >>%logfile% echo Ended at %date% %time% >>%logfile% echo >>%logfile%
To undo this, we just go in reverse...
::Suspend VMs
%xe% vm-suspend vm=vm_name
::Change vGPU types on pGPUs
%xe% pgpu-param-set uuid=pgpu enabled-VGPU-types=vgpu_type
::Change vGPU Allocation Policy
:: NVIDIA P4s
%xe% gpu-group-param-set uuid=vgpu_group_uuid allocation-algorithm=breadth-first
Comments
Post a Comment