Virtualisierung

Article "Virtualisierung und Performance Management"

« 
Vorheriges Thema
|
|

Um Zugang zu den Foren zu erhalten, müssen Sie angemeldet sein

Autor Nachricht

Jens-Christoph Brendel

Hi Neil,

if you will find the time to have a look at our forum I would have two
questions regarding your LTR article:

1) You wrote it not explicitly, but I would believe scheduling is a
two-stage process when it comes to virtualization, isn't it? There is a
process scheduler inside the virtual machine before the a virtual CPU
and another one in the host before the physical CPU. Do you think one
should note this, maybe as a source of additional latency?

2) You mention the stretched service time which leads to a decreased
throughput in case of many threads. Your PDQ model doesn't reflect this
and so it estimates a higher throughput then measured. Is it impossible
or impractical to teach the model a correct behaviour or could you adapt
it to calculate more appropriate data?

many thanks in advance

Jens-Christoph

Neil Gunther

Hallo Jens-Christoph,

Let me take your comment (1) first. This is a good point, of course. Yes, the apparent "double latency" could be emphasized in my text and I will review that point before submitting the final draft. It turns out, in practice, that latency for physical processor scheduling does seem to have been minimized (in VMware, at least). It's not zero overhead, of course, but it has been vastly improved. A similar improvement has not yet been achieved for disk or network I/O: the current frontier of VMM performance tuning.

My original point was to make the reader aware that such "double" latencies are present to varying degrees in all VMM architectures, by virtue of the inherent polling overhead, e.g., from the FS scheduler. And although these overheads can be minimized (like hiring a faster sprinting "cashier" in the imaginary grocery store) they cannot be made completely zero. Furthermore, these overheads may become worse in future releases even though they may have been improved now. After all, it's just software and the history of S/W development is littered with examples of regressive performance across new releases; all with the best of intentions, of course.

Therefore, one needs to keep an eye on the overall performance characteristics by measuring the system and comparing it quantitatively from release to release. Hence, the other key point I raise in my article, viz., the importance of doing *controlled measurements*. This is something that can be done in-house as part of QA, for example. In the meantime, I view VMware/EMC as having done a very good job of providing this kind of controlled performance information in their product documents, although they could always do better.

Neil Gunther

On comment (2): this seemingly innocent question has a rather deep answer.

On the one hand, we aim to keep the PDQ performance model *simple*, not necessarily *realistic*. The reasoning behind this approach is to expose the performance you are NOT getting from the VMM, rather than what you already know you ARE getting from the performance measurements. Modeling it this way forces engineers to scratch their heads and see if they can reach the higher throughput bound predicted by PDQ, and if not, to then explain why not. A very important and not so well recognized purpose of performance modeling is to force *explanation*. Modeling is not just about predictions, it's also about explanations.

In my article example, it was NOT possible to achieve a higher throughput either by tuning or by modifying the application software. The system was thread-limited by Intel's on-chip SMT (called HTT at that time). This came as a rather shocking explanation from PDQ. Up to that time, the apps engineers had no conception that they could not improve their own system. Engineers have a sometimes misplaced tendency to assume such problems can always be "solved." In this case, it could not be solved by them because it was not a problem of their creation.

Knowing that this SMT constraint was not something that could be improved was a very important outcome of such PDQ modeling because it explained a lot to engineering, allowed them not to waste any more time on scalability, and it also provided information that could be passed onto customers for the purposes of capacity planning. The CaP conclusion was: if you want more throughput, then you need to purchase more processors/cores because no improvement can come from the software application. This also helped to put the pressure on Intel rather than the application engineers.

Indeed, as you suggest, this SMT constraint could have been incorporated into my PDQ model but I generally advise against it. The purpose of performance modeling is insight, not curve-fitting. In my view it is better to see what potential performance is missing and to be reminded of it. Moreover, in the future, something could improve (e.g., VMM processor latency mentioned in point 1---see previous response). BTW, Intel has since improved SMT performance as well and an SMT-constrained PDQ model would then incorrectly underestimate throughput scalability. Leaving the PDQ model as it is, allows one to quantify any future improvement, if it should occur. And even if there is an improvement, PDQ reminds you that it is not possible to improve it beyond the theoretical maximum that it predicts.

Neil Gunther

Two new performance white-papers from VMware:

* VMware vSphere 4: The CPU Scheduler in VMware ESX 4
http://www.vmware.com/resources/techresources/10059

* Understanding Memory Resource Management in VMware ESX Server: http://www.vmware.com/resources/techresources/10062

Both discuss controlled measurements ("Experimental Environment", as described in my latest article.

Neil Gunther

I saw a comment (on the web) about time stretching on VMMs to the effect that, say sleep(60), might actually take 70+ seconds wall-clock time and they cited the VMware whitepaper
http://www.vmware.com/pdf/vmware_timekeeping.pdf

That stretch looks very big to me, but I couldn't tell if it was an impression, a guess, or fact. So, I pinged my contacts at VMware about it. Briefly summarized, they say it's bogus as a general statement. It would likely only happen as a transient effect (if at all) on certain guest kernels (the subject of the VMware WP, where timing is based on interrupt ticks) and with the VMM under very heavy load. Even then, a clock stretch of 10+ secs would be considered very unusal and probably not appear in measurements that are sample averages.

That said, I'll be very interested if anyone does see anthing like this
kind of time stretching on any virtual server.

Newsletter abonnieren!

Neueste Artikel

Jetzt abonnieren!

Neueste Foren-Beiträge