VDI by Day, Compute by Night - XenServer

Tony Foster posted this article earlier today about running computer on unused GPUs at night and running VDI during the day on VMware.  I have been using a rudimentary batch script to do the same for XenServer for almost a year now, so I thought I'd formalize it with his inspiration!

As always, I'm sure I stole some of this code from someone else, but I have no record of it at this point.

The first step to any problem is to define the problem and the steps required to solve it...

Problem: During the day our VDI instances use a P4-2Q profile and are spread across 3 hosts.  The off hours workload ideally would have a P4-8Q profile.

For this solution we will aim to evacuate 2 hosts of all VDI workloads after hours and restore them before workers return in the morning.  There could be more logic here, including checking to make sure we don't see a spike in after hours usage and need to power down compute workloads to accommodate.

Basic steps to solve:

1.) In order to change the profile we must evacuate the GPU of any running VDI desktops.

We need to know what VMs are on each host, but only the ones on GPUs we want to use.  We should change the allocation method to consolidate our VMs on as few GPUs as possible during migration.  Then we migrate the VMs

2.) Then we must change the vGPU profile for each GPU.

3.) We must start up our compute workloads...

So first, how do we interact with XenServer?

XenServer can be interfaced through XenCenter's command line interface xe.exe.  This installed with XenCenter.

See: https://www.citrix.com/blogs/2017/12/01/scripting-citrix-xenserver-with-powershell-and-command-line/

Lets define each step...

1a.) Find all hosts.
xe host-list params=name-label
1b.) Find all physical GPUs on each host.
xe pgpu-list host-name-label=host vendor-name="NVIDIA Corporation" params=uuid
1c.) Find vGPU type on each physical GPU (we use this to filter which VMs we move)
xe pgpu-param-get uuid=pgpu param-name=enabled-VGPU-types
1d.) Find resident vGPU instances on each pGPU
xe pgpu-param-get uuid=pgpu param-name=resident-VGPUs
This output is messy, so we parse it by grabbing the piece we want from each line:
FOR /F "tokens=*" %%L IN (output_from_above) DO (
 FOR %%a in (%%L) DO echo %%a)
1e.) Get the VM uuid from each vGPU instance.
xe vgpu-param-get uuid=vgpu_id param-name=vm-uuid
1f.) Migrate VMs off a host
xe vm-migrate uuid=vm_uuid host=migrate_to_host live=true
NOTE: to change the allocation algorithm before we motion them we can use:
xe gpu-group-list 
to find the uuid of the GPU group that contains the NVIDIA GPUs (my case it was 12bf28ce-0f9c-b73f-0310-9e4218b00893).  It seems to be static so I hard coded it.
xe gpu-group-param-set uuid=uuid_from_above allocation-algorithm=depth-first
2) Change vGPU types on the evacuated GPUs
xe pgpu-param-set uuid=pgpu enabled-VGPU-types=vgpu_type
3) Resume or Start VMs
xe vm-resume vm=vm_name
or
xe vm-start vm=vm_name

Final Code:
@echo off
set logfile=logfile.log
echo Batch command: %0 %* >%logfile%
echo Started at %date% %time% >>%logfile%
echo ------------------------ >>%logfile%

::Configure these variable for your environment
:: ---------------------------------------------------
set xpath=C:\Program Files (x86)\Citrix\XenCenter
set xuser=user
set xpwd=pwd
::Set the xmaster variable to the host you expect to be the pool master
set xmaster=host_master

::Compiles the full remote xe command to replace local xe for least confusion
set xe="%xpath%\xe.exe" -s %xmaster% -u %xuser% -pw %xpwd%

::Change vGPU Allocation Policy
:: NVIDIA P4s
%xe% gpu-group-param-set uuid=uuid_of_gpu_group allocation-algorithm=depth-first >>%logfile%

::MIGRATE ALL RUNNING VMs TO A SINGLE HOST
::Get List of Hosts
%xe% host-list params=name-label>hosts.txt

::For each host, get list of pGPU
FOR /F "tokens=4 delims=: " %%H IN (hosts.txt) DO %xe% pgpu-list host-name-label=%%H vendor-name="NVIDIA Corporation" params=uuid>>pgpus.txt

::For each pGPU get list of vGPU instances (only if they are of a specific type)
FOR /F "tokens=4 delims=: " %%P IN (pgpus.txt) DO (
 %xe% pgpu-param-get uuid=%%P param-name=enabled-VGPU-types | find "vgpu_type" && %xe% pgpu-param-get uuid=%%P param-name=resident-VGPUs>>temp.txt
)

::Get the VM uuid from each vGPU instance
FOR /F "tokens=*" %%L IN (temp.txt) DO (
 FOR %%a in (%%L) DO echo %%a>>vgpus.txt
)

del temp.txt

FOR /F "tokens=*" %%V in (vgpus.txt) DO %xe% vgpu-param-get uuid=%%V param-name=vm-uuid>>vms.txt

echo VMs on %vms% >>%logfile%

::Migrate VMs if they aren't on a specific host (move other VMs to that host)
FOR /F "tokens=*" %%M in (vms.txt) DO (
 %xe% vm-param-get uuid=%%M param-name=resident-on | find "ignoredHost_uuid" || %xe% vm-migrate uuid=%%M host=host_migrate_to live=true >>%logfile%
)

::Change vGPU types on pGPUs (I was lazy here and hardcoded the uuids, but we could pull them from the pgpu.txt file above) also vgpu type comes from xe vgpu-type-list
%xe% pgpu-param-set uuid=pgpu_uuid enabled-VGPU-types=vgpu_type >>%logfile%

::Resume VMs
%xe% vm-resume vm=vm_name >>%logfile%

::Cleanup files used
del hosts.txt
del pgpus.txt
del vgpus.txt
del vms.txt

echo ---------------------- >>%logfile%
echo Ended at %date% %time% >>%logfile%
echo                        >>%logfile%

To undo this, we just go in reverse...
::Suspend VMs
%xe% vm-suspend vm=vm_name

::Change vGPU types on pGPUs
%xe% pgpu-param-set uuid=pgpu enabled-VGPU-types=vgpu_type

::Change vGPU Allocation Policy
:: NVIDIA P4s
%xe% gpu-group-param-set uuid=vgpu_group_uuid allocation-algorithm=breadth-first

Comments

Popular posts from this blog

Dell R740 issues with XenServer 7.1/7.2

Autodesk in Virtual Environments and What is "Allowed"

NVIDIA GRID on Pascal - A Card Comparision