OVM CPU Pinning

If you clone or recover an Oracle VM guest, and the source used CPU pinning (Hard Partitioning), the target may not work.  The error is entirely non-intuitive, and I could not find it on the interwebs, so here is a sanitized version.

OVMAPI_5001E Job: 1416254413024/QueuedVmStartDbImpl_1416254413023/OVMJOB_1500J Start/resume vm: PRODVM, on server: PRODSERVER, failed. 
Job Failure Event: 1416254413902/Server Async Command Failed/OVMEVT_00C014D_001 Async command failed on server: PRODSERVER. 
Object: PRODVM, PID: 15431, Server error: 
Command: [‘xm’, ‘create’, ‘/OVS/Repositories/000dead000beef00cafe0421cab55bad/VirtualMachines/000dead000beef00cafef207cabdbbad/vm.cfg’] failed (1): 
stderr: Error: (22, ‘Invalid argument’) 
stdout: Using config file “/OVS/Repositories/000dead000beef00cafe0421cab55bad/VirtualMachines/000dead000beef00cafef207cabdbbad/vm.cfg”. , 
on server: PRODSERVER, associated with object: 000dead000beef00cafef207cabdbbad [Thu Apr 15 00:12:19 EDT 2021]

 

You can remove the “cpus = ‘#-#'” line from vm.cfg to reset this.

References about OVM hard partitioning includes:

xm info

xm list

xenpm get-cpu-topology

xm vcpu-list

# cd /u01/app/oracle/ovm-manager-3/ovm_utils
# ./ovm_vmcontrol -u admin -p YourPassword -h ovm-manager -v my-first-vm -c vcpuset -s 0-7
Oracle VM VM Control utility 0.6.3.
Connected.
Command : vcpuset
Pinning virtual CPUs
Pinning of virtual CPUs to physical threads  '0-7' 'my-first-vm' completed.

After that, vcpu-list will show VM names in column 1 for dedicated CPUs.

Security Defect in Intel, ARM and AMD processors

THE RISK:
The defect allows a user process to read any system memory.
A VM can read memory from the host or another guest in some environments.

WHAT IS AFFECTED:
This does NOT affect POWER/PPC architecture.
Only some of this affects AMD, and only in some modes.
Almost every ARM and Intel processor since 1995 is affected.
That includes desktops, laptops, servers, cellphones, routers, automobiles with Sync/Onstar/autopilot, etc.

DISCOVERY:
This defect was reported in June, 2017, but due to pervasiveness, has been embargoed.
It is only fully described now because patch notes leaked the problem.

THE FIX:
The actual fix would be replacement of the affected CPUs with new silicon, which does not exist yet.
There is a partial software workaround which decreases system performance.

TECHNICAL:
The issue is because the processor does not perform access checking prior to loading L1 cache.
Due to this design issue, data can be forced into L1 cache, and read, before access is denied by the TLB.
It’s fairly slow, at around 2k/second, but a long-running process can harvest everything.

Hardware Statuses:
• ARM has provided workarounds to vendors, but it’s up to them to implement
• Intel’s CEO sold off as much of his stock as possible last year after glowing projections.
• Not a peep from AMD.
• POWER/PPC is not affected.

Software Statuses:
• Windows included a partial workaround in the November security rollup.
• MacOS released a partial workaround in December’s 10.13.2
• Linux included a partial workaround in the mainline kernels 4.15, and 4.14.11.
• The workarounds decrease performance between 1% and 45% depending on the workload.
• Cloud providers are scheduling maintenance January 2018.

More Reading:
• Community: https://spectreattack.com
• Google: https://googleprojectzero.blogspot.com/2018/01/reading-privileged-memory-with-side.html
• Workaround: https://en.wikipedia.org/wiki/Kernel_page-table_isolation
• AMD: https://www.amd.com/en/corporate/speculative-execution
• A better write-up: https://techcrunch.com/2018/01/03/kernel-panic-what-are-meltdown-and-spectre-the-bugs-affecting-nearly-every-computer-and-device/
• Outlet that broke the embargo: https://www.theregister.co.uk/2018/01/02/intel_cpu_design_flaw/