LIS/lis-next

Very slow framebuffer with hyperv_fb on recent Windows hosts, especially in Gen2 VM

clouds56 opened this issue ยท 70 comments

I'm using manjaro gnome3. With linux version 4.19.
The hyperv_fb is much slower than default efifb. (you could see the buffer is rendered line by line when scrolling with hyperv_fb, but could hardly notice when using )
the mode of hyperv_fb is U:1152x864p-0 (8192kB)
after blacklist the hyperv_fb it is using hardware EFI VGA(told by /var/log/Xorg.0.log)
the mode of efifb is U:1024x768p-75 (3072kB)

I'm not sure what's wrong with hyperv_fb, and the situation goes even worse when set the resolution "video=hyperv_fb:1920x1080", I hadn't found a way to set mode of efifb to 1920x1080 so haven't test for that.

Forget to mention, the windows 10 version is 18850.1000 and the VM is creating using Hyper-v gen 2.

I checked the code of hyperv_fb.c and hv/vmbus_drv.c in the kernel, and could not find issues there on my own.
Would it possible caused by change of some part of code in Hyper-V (maybe related to the RemoteFX deprecated?)

dcui commented

Thanks for reporting the issue! Which Linux distribution are you using and are you using the built-in kernel in the distribution ? Is there a .iso from the distribution vendor's website? We'd like to create a VM from the .iso and try to reproduce the issue.

Can you please test Gen-1 VM?

I'm using manjaro with gnome, and iso here.
When using with Gen-2 VM, please follow the instruction here before install:

# press Ctrl+Alt+F3 to switch to a new tty and login
sudo pacman -Sy
sudo pacman -S xf86-video-fbdev # install the fbdev package
# there's no need restart gdm (and should not)
# just switch back to tty1 using Ctrl+Alt+F1 and continue

Gen-1 VM seems not to suffer from the issue in Xorg, it is using VESA (vesafb) instead of FBDEV,
When I switch to tty in Gen-1 VM (it would switch to hyperv_fb), it is still slow when scrolling in less.

dcui commented

I can reproduce the exact symptoms on latest Hyper-V build. It looks like recently something on the host side causes this issue. BTW, the issue can not reproduce on a old Windows Server RS2 host.

So far, please blacklist the hyperv_fb driver to work around the issue.

I'm going to report this issue to Hyper-V team, but I'm afraid it can not be resolved soon.

Any update in this topic ? I observe it on latest Windows 10 Pro (host) with Ubuntu (guest, in fact every version) but only when enable more than 1 CPU vCore for VM. Gen 1 and Gen 2 VMs.

RDC is not option for me, also using standard framebuffers is a little problematic and it would be good to see some progress here.

dcui commented

@marcinwiacek: Can you please share your host version? On the host, please press Win+R and then run "winver.exe", and you should see something like "Version XXXX (OS Build XXXXX.XXX)" . The meaning of the version numbers is explained here: https://en.wikipedia.org/wiki/Windows_10_version_history .

The slowness is introduced by recent host versions (we know RS1 and RS2 are good, and RS5 and newer are bad). Hyper-V team has been working on this, but so far a thorough fix is still not available.

At the same time, we (Linux team) are trying to mitigating the slowness by implementing on-demand framebuffer updates. We have some internal patches and are testing them. We have not finalized the patches yet, and the performance improvement may not be very big on recent hosts, before Hyper-V team fixes the hosts.

We'll keep the link updated once we make more progress.

10.0.18362.207

I was thinking about RDC (but not very good option like I said) + using vesafb / uvesafb or any other FB (but no luck with this). If you know any workaround, I will be more than happy to test -> like I said the only one option is small resolution or having one CPU core.

dcui commented

10.0.18362 is 19H1, which has the slowness issue, as I mentioned.

It looks you're saying the FB is not slow when the VM is configured with only 1 virtual CPU? We don't see this. In our tests, the FB in an SMP VM (i.e. more than 1 vCPU) is as slow as that in a 1-vCPU VM, when the VM runs on "recent" host builds, including RS5 and 19H1.

If you do need a GUI environment in a Linux VM, I suggest you run vnc server in the VM (which is fast, as it's based on TCP, not Hyper-V VMBus), e.g. https://www.digitalocean.com/community/tutorials/how-to-install-and-configure-vnc-on-ubuntu-18-04. You need a vnc client (e.g. vnc viewer) to connect to the server.

I confirm - machine with one vCPU has got hyperv_fb working fast, two and more vCPU make it slow. Can it be connected with some Spectre/prediction patches ? And why other FB work always fast ?

I also suggest to make small test - please create VM with hyperv_fb in the Ubuntu guest under older host (Windows not affected by bug), then migrate host to latest version. I had some VM (unfortunately lost), which was working fine for me in this scenario (I'm sure now let's say in 90%).

vnc - if all will fail, I will have to use (thx)

dcui commented

I guess Spectre/prediction patches are not related here.

I'm not sure what you mean by "other FB work always fast". If you can give the detailed instructions to test different FBs and how you measure "fast", I'll try to reproduce it.

Unluckily I don't have a host with exactly the same host build version, and I don't have a host that can upgrade from RS1 (or RS2) to RS5(or 19H1), but I think my build should produce the same result as yours, when we test the access speed of the framebuffers. However, I can not reproduce your symptoms and I can not understand your symptoms (e.g. the FB is not slow with 1 CPU).

In my tests on recent buggy hosts, in a Gen1 VM, the legacy FB and Hyper-V synthetic FB are both slow; in a Gen2 VM, the legacy UEFI FB is fast, but the Hyper-V synthetic FB is slow.

blacklist the hyperv_fb and you would fallback to efifb (if UEFI enabled in guest), then you might not suffer from the performance issue.

# /etc/modprobe.d/blacklist.conf
blacklist hyperv_fb

thx, I have checked vesafb, uvesafb, efifb and all of them are fast, but I don't have custom resolution (or at least big one). hyper_fb works fine with 1 CPU only (with more is slow). xrandr doesn't work.

Honestly speaking I don't understand why it's so difficult - is it possible create custom BIOS/UEFI for guests which will have set very big max resolution or many big resolution custom VESA modes? I guess, it would help.

for example 1900x900, 1920x1020, 1440x900, etc.

just predefine them please & it would be good to be able to setup them using "vga=mode" in kernel options.

ubuntu 19.04 guest, many cpu, hypervfb, no integration services in vm settings, vm gen 2, checked processor compatibility in vm settings, default numa, secure boot, no dynamic memory - works fine

critical - checking option in "hardware\processor\compatibility"

mission complete.

dcui commented

As I mentioned, the FB's performance can be quite different, depending on the configuration:
Gen1 VM vs. Gen2 VM?
Legacy FBs (PCI FB or UEFI FB) or Hyper-V synthetic FB (hyperv_fb)?
Old good hosts vs. new buggy hosts?
How big is the resolution of the legacy FB ("dmesg |grep fb" should contains the info) or Hyper-V synthetic FB (the default 1152x864 vs. a different one?)?

I really don't think the FB performance should be affected by:

  1. the numbers of the VM CPUs.
  2. whether we enable or disable the Integration Services.
  3. whether we check option in "hardware\processor\compatibility.

When we say the FB is slow or fast, we'd like to know how fast/slow it is by some tool.
I usually do a simple test: in the VM, press Ctrl+Alt+F3 to enter the text mode terminal, and run:

wget https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/plain/CREDITS?h=v5.0 -O credits.txt
time cat credits.txt

In this way, we can exactly know how fast/slow the FB is in a given VM, when we try different scenarios: 1
CPU vs. more? legacy FB vs. Hyper-V synthetic FB, etc.

I'm setting up a Gen1 Ubuntu 19.04 VM on a 18362.175 host, and will report some numbers later.

dcui commented

On a recent host (host OS build: 18362.175), I installed a Gen-2 Ubuntu 19.04 VM (Desktop version).

The CPU is 'Intel(R) Core(TM) i7-7600U CPU @ 2.80GHz".
The VM has 4 virtual CPUs and by default it uses Hyper-V FB device ("dmesg" shows "hyperv_fb: Screen resolution: 1152x864, Color depth: 3").

The test "time cat credits.txt" with text mode terminal takes 28 seconds (slow!).
When I configure 1 virtual CPU and/or enable the option in Hardware\Processor\Compatibility\Migration to a physical computer with a different processor version", the test still takes 28 seconds -- no difference at all.

Note: here, in Xorg GUI mode or text mode terminal, the same Hyper-V synthetic framebuffer device is used. That's why we can use the test "time cat credits.txt" to measure the FB device's performance.

Next, after I blacklist the hyperv_fb driver, Hyper-V synthetic framebuffer is not used ("dmesg | grep hyperv_fb" outputs nothing), and only the legacy UEFI FB device is used ("dmesg" contains "efifb: mode is 1024x768x32, linelength=4096, pages=1"). I did the test "time cat credits.txt" again and now it only takes 1.3 seconds (fast). If I change to 1 CPU and enable "Hardware\Processor\Compatibility\Migration to a physical computer with a different processor version", the result is still about 1.3 seconds.

These are what I meant by saying "in a Gen2 VM, the legacy UEFI FB is fast, but the Hyper-V synthetic FB is slow." , and I don't think the number of virual CPUs or enabling the "Hardware\Processor\Compatibility\Migration..." option should make a difference. If you're seeing something different, we'd like to have the details, just as I provided.

Note: here I don't test Gen-1 VM. In a Gen1 VM, the legacy PCI FB device and Hyper-V synthetic FB device are both slow.

dcui commented

I also did the "time cat credits.txt" test in a Gen1 Ubuntu 19.04 VM on the same host (host OS build: 18362.175). By default, with the Hyper-V FB device, the test also takes 28 seconds; if I blacklist the hyperv_fb driver, the test takes 21 seconds.

Again, the number of virtual CPUs (1 vs. 4) or enabling the "Hardware\Processor\Compatibility\Migration..." option makes NO difference.

time cat my_big_file

6 cores, disabled compatibility option, enabled hyperv_fb

1.55.048 (almost 2 minutes!)
0.004
0.322

1 core, disabled compatibility option, enabled hyperv_fb

0.4.522
0.000
0.126

6 cores, enabled compatibility option, enabled hyper_fb

0.3.337
0.000
0.313

dcui commented

@marcinwiacek Thanks for sharing the perf numbers! I suppose your host version is 10.0.18362.207, and the VM here is a Gen2 VM? Can you share your VM's "cat /proc/cpuinfo"?

Your test #1 vs. #3: it looks "disabled compatibility option" would make the FB extremely slow.
Your test #1 vs. #2: #2 is not slow despite "disabled compatibility option", and it looks using 1 CPU dramatically makes the FB a lot faster?
What about "1 core, enabled compatibility option, enabled hyperv_fb"

1 core, enabled compatibility option, enabled hyper_fb

0.5.940
0
0.151

Numbers make sense:

  • more cores = bigger speed (of course task is not CPU consuming & difference is not huge)
  • compatibility = slower speed

Compatibility is additionally resolving hyperv_fb problem.

Intel i7-6820HQ, you're right about versions

dcui commented

It looks your theory can not explain why both "1 core, enabled compatibility option" and "1 core, disabled compatibility option" are fast:

Unluckily I can not repro the same symptom with my host (18362.175, which should be very similar to yours) and CPU (i7-7600U, which is a little newer). I'll keep an eye on this symptom and try to repro it if I can find a HW/SW setup that's more similar to yours.

It looks your theory can not explain why both
"1 core, enabled compatibility option" and "1 core, disabled compatibility option" are fast:

Excluding bug which we're tracking everything looks very sensible.

It looks, that bug is in multiple CPU support & compatibility option is disabling something, which makes problem.

I won't be surprised if code for detecting CPU features is buggy somewhere.

CPU (i7-7600U, which is a little newer)

https://ark.intel.com/content/www/us/en/ark/products/88970/intel-core-i7-6820hq-processor-8m-cache-up-to-3-60-ghz.html

https://ark.intel.com/content/www/us/en/ark/products/97466/intel-core-i7-7600u-processor-4m-cache-up-to-3-90-ghz.html

6th gen vs 7th gen - they can be very different (different microcode, different graphic card & because of it drivers, etc.)

I have the same issue, on AMD.

  • Windows 10 Education 1903:
  • "manjaro-xfce-18.0.4-stable-x86_64.iso" (same issue on "archlinux-2019.08.01-x86_64.iso")
  • lscpu states "AMD Ryzen 7 3700U with Radeon Vega Mobile Gfx" (not Intel CPU), with only 1 thread/core/socket allocated to the VM.

I am seeing this same issue on Server 2019 with a SLES 12 virtual Gen 1. Its very interesting that if I have a gnome term open to full screen and do a top, Xorg runs 99% and the frame buffering weridness happens.....but if I shrink that terminal down to less than 25% of the screen, Xorg drops down to less than 10% cpu and the buffering stops.

This is on the latest 2019 build with SLES 12 patched up to current.

dcui commented

@jaredheath : this should be the known host issue I mentioned previously. Hyper-V team has not fixed the issue for Server 2019, but we (the Linux team) made 2 patches to the hyper-v framebuffer driver in Linux VM so the issue can be effectively mitigated:

[v4] video: hyperv: hyperv_fb: Obtain screen resolution from Hyper-V host: https://patchwork.kernel.org/patch/11132483/

[PATHC,v6] video: hyperv: hyperv_fb: Support deferred IO for Hyper-V frame buffer driver: https://patchwork.kernel.org/patch/11149671/

The patches will be in the mainline Linux kernel git repository soon, and will eventually propagate into new versions of various Linux distributions (which could take months or longer).

If you'd like to use the 2 patches now, you need to build & install a kernel with the 2 patches applied.

yeah, that won't be an option in our environment. I guess we have to defer deployment. Time to alert management.

Thanks for the reply.

Is blacklisting hyperv_fb an option? #655 (comment)

dcui commented

@jaredheath : As @jimbo1qaz reminded, blacklisting the hyperv_fb driver may be an option to you, especially when you use a Generation 2 VM.

I tried that. It doesn't do anything for the big memory VM. Oddly, it helps A LOT on the lower memory one (24gb vs 100+gb). Thoughts on why the amount of memory on the server would affect the video frame buffer?

These are Gen1

dcui commented

@jaredheath: maybe the real difference is the number of CPUs of the VM, not the amount of the memory of the VM?

ok, offtopic question - Ubuntu 19.10 starts about minute on HyperV, do you have any idea what could be wrong and where to report it properly?

I've started with https://bugs.launchpad.net/ubuntu/+source/xorg-server/+bug/1848534, but Microsoft help probably/maybe required.

dcui commented

@marcinwiacek : let's discuss the issue in the launchpad link

Ran into the same/similar issue: on a Windows 10 Pro host installation, booted up a Linux Mint 19.2 guest with 1 cpu, and performance is acceptable. However, using 2 or more cpu's makes the mouse very sluggish as I drag across the screen (1920 x 1080), and opening menus, windows, etc. very slow as well. When I click on the "Hardware\Processor\Compatibility\Migration..." option, the same VM is now much more responsive with 2 or more cpu's.

Windows 10 version: Version 1903 (OS Build 18362.418)
hyperv_fb is enabled; doing "dmesg|grep fb" gives as part of the output:
hyperv_fb: Screen resolution: 1920x1080, Color depth: 32

3 GHz Windows Pro host, 2 virtual cpu cores, 4096 MB ram, Ubuntu 19.10 server install, no gui, # apt-get install linux-azure ,
GRUB_CMDLINE_LINUX_DEFAULT="elevator=noop video=hyperv_fb:1200x1600 fbcon=TerminusBold12x24 fbcon=scrollback:128k"

$ time cat credits.txt
[...]
real 2m0.452s
sys 2m0.195s

This is slower than a 9600 bps modem, nice 30 year technology anniversary. Repeated it several times to verify it's not an anomaly. Variance 2m0-2m2 always.

Edit: with 1 core 1m58.629s so it's faster! :D
If i had to produce a result this bad i'd copy data to screen0,0 for every row, instead of setting a new screen0,0 pointer in a scrollback ring buffer.

Need to add that scrolling back with shift-pgup/pgdn is instantaneous, only scrolling new data printed on the console tty is slow. Tested on console tty2 as well, same results.

(Why the vertical resolution? Wider than 1080 and gives more vertical lines than 1080. This is what people have to work with when you won't provide higher resolutions for hyperv_fb)

dcui commented

Wei Hu has made 2 patches to improve the situation:

#1: video: hyperv: hyperv_fb: Obtain screen resolution from Hyper-V host (https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v5.5-rc4&id=67e7cdb4829d3246c98f2ec9b771303ebe162eab)

#2: video: hyperv: hyperv_fb: Support deferred IO for Hyper-V frame buffer drive (https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v5.5-rc4&id=d21987d709e807ba7bbf47044deb56a3c02e8be4)

The first is to support higher resolutions (see the comment in the patch. We need a PowerShell command "set-vmvideo" to set a higher resolution), and the second is to improve the performance of the framebuffer (however, IIRC the underlying host issue limits how good the patch can work: The Hyper-V synthetic framebuffer is very slow when the guest runs on recent Hyper-V. I remember the workaround is: use Generation-2 VM and blacklist the hyperv_fb driver, since the UEFI framebuffer emulated by Hyper-V is fast. If you don't want to blacklist the hyperv_fb driver, please try Wei Hu's patches by building a kernel from the source code, as the patches have not gone into any Linux distros so far.)

I just pinged Hyper-V team again, asking if there is an ETA for the host issue.
I'll let Wei Hu to check if I missed something, or if he would have more to add.

Right, you need all these two to make it perform better.
There are even third patch which could improve frame buffer performance on HyperV Gen-1 VMs. This patch has not yet been committed but it will be soon. If you can't wait, contact me and I can send you the third one privately.

Patching and compiling the kernel is beyond me nowadays, hope you could put out a new hyper-v optimized Ubuntu 19.10 gallery image some time this quarter, or at least update the linux-azure package?
But very nice to hear something's actually been done, thanks guys.

dcui commented

I'm not sure if Wei's patches will be included in the Ubuntu 19.10 gallery image or linux-azure package this quarter, because usually a patch is only included when it's in the upstream mainline stream for a while (here Wei's patches are only in v5.5-rc4 so far, and the official upstream v5.5 kernel will be released in about 4~6 weeks), and the linux-azure kernel is mainly for VMs running on Azure (so slow framebuffer and low resolution are not really an issue). That being said, I'll try to remind the releasing teams of the importance of the patches, and let's see what will come out.

I wish you could find a way to release official status beta version kernels with patches in, not just for these patches but anything that'll come.
I hope you do appreciate the fact that compiling a custom kernel with patches is beyond most users and most importantly there's no common ground there, any issues and one is left alone.

With an official beta kernel there'd be a common base in case issues arise. You don't want to hear from 100 users with their own separately compiled kernels where anything could be the issue, but you do want to hear from 100 users using the same beta version and experiencing the same issues.

It's 6,000-10,000+ lines of kernel options, after all.

Regarding linux-azure, people doing local development with it sure would appreciate. It's just smart to use the same kernel locally than in azure, to avoid issues.

dcui commented

Thanks for the suggestions! I understand and agree. BTW, actually the linux-azure kernel and the market place images are released and maintained by Canonical. It looks they do have something like "beta kernels" for linux-azure here: https://launchpad.net/~canonical-kernel-team/+archive/ubuntu/ppa .

dcui commented

BTW, Wei's third patch is here for now: https://patchwork.kernel.org/patch/11278537/ . Hopefully it would be merged into the mainline soon.

pavel commented

The issue is still there even in mainline.

Host: 10.0.18362.1
Guest kernel: 5.6.0-rc5
Guest distribution: Arch Linux

For me the issue is present for both Gen1 and Gen2 VMs.
Also compared to 5.5.8 mainline 5.6.0-rc5 is even worse as I can now see artifacts (when not using X11) for processes that continuously render text (e.g. pacman). While using X11 the rendering is slow as it was in 5.5.8:

  • noticeable mouse stuttering
  • noticeable UI rendering lag (e.g. noticeable animation stuttering when opening a new browser tab)

Blacklisting hyperv_fb does the trick but then you end up with efifb for which AFAIK the only way to set resolution is to use grub.

Rolling a guest VM back to 5.2.13 resolves the problem for me for Gen2 VMs.

dcui commented

IMO v5.5.8 and v5.6-rc5 should have the same framebuffer performance, because "git diff v5.5.8 v5.6-rc5 -- drivers/video/fbdev/hyperv_fb.c" returns nothing, except for a 2-line patch that is only used in the VM hibernation scenario.

For a Gen2 VM, yes, the Hyper-V synthetic framebuffer is still slow, even with Wei's 3 recent hyperv_fb patches:. We'll ping Hyper-V team (n+1)'th time... The workaround is to black list hyperv_fb and use the efifb framebuffer, which is fast enough, typically.

For a Gen1 VM, the Hyper-V synthetic framebuffer is also still slow, but the slowness can be effectively mitigated by Wei's third patch video: hyperv: hyperv_fb: Use physical memory for fb on HyperV Gen 1 VMs.: please remember to add a proper kernel parameter for "cma=", e.g. cma=130m (see the changelog of the patch).

Rolling a guest VM back to 5.2.13 resolves the problem for me

I can not understand. v5.2.13 should also suffer from the slow Hyper-V synthetic framebuffer issue. v5.2.13 doesn't have the 3 patches from Wei, so the slowness can not be mitigated by the cma=130m hack for Gen-1.

I would let @whu2014 provide his insights.

pavel commented

I can not understand. v5.2.13 should also suffer from the slow Hyper-V synthetic framebuffer issue. v5.2.13 doesn't have the 3 patches from Wei, so the slowness can not be mitigated by the cma=130m hack for Gen-1.

I'm sorry for the confusion. I did not specify that I tested this rollback only on Gen2 VMs. Corrected my previous comment.

I had the same issue on my hyper-v 2019 server and checking the "Processor compatibility" box helps

dcui commented

I still can't understand why checking the "Processor compatibility" box would make a difference and I can't reproduce the same symptom with my test VMs. :-(

@shubell: can you please share your host version (run "winver.exe" on the host), your Linux kernel version (run "uname -a") ? Is your Linux VM a Generation 1 VM or Generation 2?

Well the issue started when I migrated from hyper-v 2016 to 2019 (version 1809 OS buld 17763.1282). Have 3 ubuntu 16.04 LTS gen1 VM ( 4.15.0-112-generic #113~16.04.1-Ubuntu ) and the same issue was on all. Compatibility fixes the fb issue on all. Could allso be and issue with some CPUs. Mine is 2x Xeon Gold 5217.

mikov commented

I'm very surprised. On Jul 6, 2019 and earlier I had given concrete Windows version + concrete CPU model. It looks, that MS team is checking it with other CPUs... and this the most probably makes lack of reproduction success ("processor compatibility" seems to make trick and I understand, that issue is connected with some concrete CPU features).

And now very open questions: how many $ are required to find/lean/buy CPU with the same features? And why does it require > year?

And you know what? I have even found answer in 5 minutes - used laptop with this CPU costs ca. 1300 USD (and this is really edge case)

PS. I'm not using this product from longer time - just went into some other solutions and seen today GH notification.

dcui commented

Thanks shubell and mikov for sharing more info!

Hi marcinwiacek, I'm as frustrated as you on this bug... I know there must be a Hyper-V bug, and I have pinged Hyper-V team many times... Here I was just trying to collect more information, which can help to resolve the issue thoroughly. I'll ping Hyper-V team again.

Thanks shubell and mikov for sharing more info!

Hi marcinwiacek, I'm as frustrated as you on this bug... I know there must be a Hyper-V bug, and I have pinged Hyper-V team many times... Here I was just trying to collect more information, which can help to resolve the issue thoroughly. I'll ping Hyper-V team again.

To be honest - I don't know if this is your initiative (made because you're engineer from hearth and want to fix it) or some task from your manager.

I know, that lack of support makes, that people look for alternatives (and don't return when they're better). They need solutions, not explanations or excuses.

From my side - it's not important what you will do (and I'm far away from being frustrated). Google, Intel, Microsoft... People can use today many other things.

Good luck!

PS. Right now this story here is more than one year old and this bug is still young, when you compare it to some bug from Google (where I was somehow involved and which was submitted on Mar 16, 2012).

dcui commented

Today I happened to find a server with the same host version "version 1809 OS bulid 17763.1282", so I did some quick tests with a newly-created Generation-1 Ubuntu 16.04 VM (4.15.0-112-generic) and the CPU type is:

cpu family : 6
model : 79
model name : Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz
stepping : 1

I think I can reproduce shubell's observation: the VM's Xorg desktop window is not very responsive, e.g. after I right-click the desktop, the context menu pops up in near 1 second. Later, after I check "Hardware\Processor\Compatibility\Migration to a physical computer with a different processor version", the Xorg desktop becomes much more responsive, but I can still perceive that it's not 100% normal as it's supposed to be. I just reported the finding to Hyper-V team. BTW, I'm from the Linux team and we have no control over Hyper-V.

According to my tests, there is indeed a workaround if you can use Generation-2 VM: somehow the legacy EFI FB driver is not affected, so in a Generation-2 VM, we can blacklist the hyperv_fb driver and use the efifb driver, which is fast.

pavel commented

so in a Generation-2 VM, we can blacklist the hyperv_fb driver and use the efifb driver, which is fast

This will also require a change of a boot manager (e.g. rEFInd) as setting a custom resolution for efifb is not as easy as it is for hyperv_fb.
What is the most concerning part for me is that I have several Gen 2 VMs running some older linux kernels and working in 1080p using hyperv_fb without any issues. It looks like this is a regression introduced somewhere in the hyperv_fb code.

dcui commented

What is the most concerning part for me is that I have several Gen 2 VMs running some older linux kernels and working in 1080p using hyperv_fb without any issues. It looks like this is a regression introduced somewhere in the hyperv_fb code.

So you are running a slow VM and some fast VMs with older Linux kernels on the same host at the same time, and hence you think the slow VM's kernel (which is newer) introduces the slowness?

If so, can you please share the host version info (please run "winver.exe" on the host) and the kernel version info of the slow VM and the fast VMs (please run "uname -a")? Also please clarify if it's a Gen1 or Gen2 VM, if you check "Hardware\Processor\Compatibility\Migration to a physical computer with a different processor version", and when you feel the slowness, are you using a text mode tty ternimal or a Xorg window.

I'm pretty sure this slowness is introduced by the host, not the guest, so I'm asking for the above info just in case the guest somehow makes the slowness worse in recent Linux kernels.

pavel commented

Here's the setup:

Host: Version 1909 (OS Build 18363.1082)
Gen2 VM 1: Linux vm1 5.3.8-arch1-1
Gen2 VM 2: Linux vm2 5.8.10-arch1-1

vm1 operates completely normal both in tty and Xorg. vm2 however in Xorg has sluggish mouse movement and animations (e.g. opening a new tab in the browser), and in tty has noticeably slower rendering than vm1.
This is how it looks for vm2 when I run find /:
vm2-slow-tty
vm1 tty find / for comparison:
vm1-normal-tty
Both VMs are on the same host. Both VMs have "Migrate to physical computer with a different processor version" unchecked.
Both VMs used the exact same OS installation script and have video=hyperv_fb:1920x1080 kernel parameter set at boot.

So you are running a slow VM and some fast VMs with older Linux kernels on the same host at the same time, and hence you think the slow VM's kernel (which is newer) introduces the slowness?

Yes.

dcui commented

@pavel : Thanks for the detailed report! We'll take a look at the difference between 5.3.8 and 5.8.10.

dcui commented

I did some tests today in a similar environment with the same host build version, the same VM kernel versions, the same resolution 1920x1080, and I don't check "Migrate to physical computer with a different processor version". The environmental differences are: I'm using a Gen2 Ubuntu 20.04 (rather than Arch Linux) VM and I built the 5.3.8 and 5.8.10 kernel from the upstream stable kernel git repo. IMO these differences should not matter.

In my test, with v5.8.10 the framebuffer is faster than v5.3.8: 1) in the Xorg GUI environment it looks the framebuffer is only slightly faster in 5.8.10 than in v5.3.8; 2) in the case of text mode terminal (I used tty3 in my test), I run "wget 'https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/plain/CREDITS?h=v5.9-rc7' -O test.txt
time cat test.txt", and the "cat" command takes 40+ seconds with v5.3.8, but only takes 3 seconds with v5.8.10. Note: with v5.8.10, the screen becomes very blurry (I'm not sure if we're able to improve this) when "cat" is printing the lines during the 3 seconds; with v5.3.8, "cat" takes a much longer time but the screen is basically not blurry.

@pavel: Can you also do the "cat" test with the same test.txt file against v5.3.8 vs. v5.8.10?

With both the kernels, the mouse movement is basically normal to me, and I don't experience any noticeable sluggishness.

Before the Hyper-V team fixes the slow framebuffer, for a Gen2 VM, the only easy workaround is to blacklist the hyperv_fb driver and use the legacy UEFI framebuffer (which happens to be fast) -- I undersand the drawback is that it seems impossible (?) to use a larger resolution.

Another possible workaround is to use VNC server (we need to run VNC viewer to connect to the VM via network) or xrdp (This is intergrated with Hyper-V Manager: see https://docs.microsoft.com/en-us/virtualization/community/team-blog/2018/20180228-sneak-peek-taking-a-spin-with-enhanced-linux-vms and microsoft/linux-vm-tools#106 (comment). I'm not sure how easy it's to make this work for Arch Linux. The links I shared are mainly for Ubuntu)

pavel commented

Here're the results:

VM time cat test.txt
Gen2 VM 1: Linux vm1 5.3.8-arch1-1 37.17 secs
Gen2 VM 2: Linux vm2 5.8.10-arch1-1 2.7 secs

vm1
time_cat_vm1
vm2
tme_cat_vm2

dcui commented

Thanks, @pavel ! So your result is the same as mine.

Now I understood your earlier description:

Gen2 VM 1: Linux vm1 5.3.8-arch1-1
Gen2 VM 2: Linux vm2 5.8.10-arch1-1
vm1 operates completely normal both in tty and Xorg. vm2 however in Xorg has sluggish mouse movement and animations (e.g. opening a new tab in the browser), and in tty has noticeably slower rendering than vm1.

vm1 actually is not so "normal" as it takes too long (about 40 seconds) to print the text file to tty (vm2 only needs about 3 seconds). vm2 actually has faster rendering than vm1, though the contents of vm2's screen become unrecognizable in tty.

Not sure why in Xorg the mouse movement and animation are not sluggish to me -- it looks there is a little slowness, but it's not noticeable to me.

PS, I hate to say this, but we still have no update from Hyper-V team about the slow framebuffer issue. :-(

pavel commented

Got a 3rd VM up.

Gen2 VM 3: Linux vm3 5.4.0-48-generic (Ubuntu 20.04.1 LTS)
video=hyperv_fb:1920x1080

No issues in neither Xorg nor tty. ๐Ÿ˜•

I am experiencing similar issues with Debian 10.2 - Looking forward to hearing from the hyper-v team.

dcui commented

It turns out to be a Linux bug that only happens when the VM runs on recent Hyper-V since sometime in 2018. I just posted a patch here: https://lkml.org/lkml/2020/11/17/2222 . Hopefully the fix will be in v5.10 and will be integrated into various Linux distros.

dcui commented

BTW, in a Gen-1 VM on recent Hyper-V since 2018, the legacy VRAM is also mapped uncacheable by default, so I can also perceive the slowness before the Hyper-V synthetic framebuffer driver "hyperv_fb" loads. To work around that slowness, we can use this kernel parameter "video=vesafb:mtrr:3", which tells the legacy framebuffer driver "vesafb" to map the legacy VRAM cacheable.

BTW, in a Gen-1 VM on recent Hyper-V since 2018, the legacy VRAM is also mapped uncacheable by default, so I can also perceive the slowness before the Hyper-V synthetic framebuffer driver "hyperv_fb" loads. To work around that slowness, we can use this kernel parameter "video=vesafb:mtrr:3", which tells the legacy framebuffer driver "vesafb" to map the legacy VRAM cacheable.

Hi dcui, If I need to set the screen resolution to ๏ปฟ1920x1080, are there any other parameters needed in the above video statement ?

dcui commented

@tjleary75
In a Gen-1 VM, the legacy VGA device emulated by Hyper-V does not support 1920x1080 -- the highest supported resolution is 1600x1200. You can verify this by the grub "vbeinfo" command: https://linuxconfig.org/how-to-increase-tty-console-resolution-on-ubuntu-18-04-server .

To set the resolution to 1600x1200 in my Ubuntu 20.04.1 VM, I have the below 2 lines in my /etc/default/grub:
GRUB_CMDLINE_LINUX_DEFAULT="maybe-ubiquity video=vesafb:mtrr:3"
GRUB_GFXMODE=1600x1200
(Run "update-grub && reboot" to make it take effect)

pavel commented

No issues in my updated Gen2 VM 2: Linux vm2 5.10.4-arch2-1.

Before, on Ubuntu Server console, "time cat 4mbfile.txt" would be 4+ minutes. Now, after
apt install linux-image-5.10.0-1008-oem ; update-grub
it is 10 seconds. On Ubuntu Desktop terminal it's only 0.4 seconds. Both using 1920x1080 resolution.
Thank you!

Somewhat related to this, found out the hard way that using Set-VMVideo is a must to get higher resolutions, wasted considerable time testing different video:vesa/uvesa options and wondering why nothing has any effect.
Set-VMVideo -VMName namehere -HorizontalResolution:1920 -VerticalResolution:1080 -ResolutionType Single

dcui commented

FYI: For Ubuntu 20.04, as I just checked, the latest linux-azure kernel Ubuntu-azure-5.4.0-1039.41 (Jan 18) still does not have the fix, but the generic 5.4 kernel Ubuntu-5.4.0-66.74 and the HWE kernel Ubuntu-hwe-5.8-5.8.0-44.50_20.04.1 already have the fix.