Our New Approach to Performance Testing

My team has recently taken on the challenge of conducting performance testing for our product. Performance and web-app security testing have been both been put under the same umbrella. That’s great–lots of efficiencies to be gained from our different skills, but at the same time, I’m no expert in either field. I have a perf test engineer who is good with Empirix e-Load, and he’s teaching me how we do things. We have a couple of books (including this one), and we’ve received some tool-specific training, but none of it has been enough as yet to convince me that we’re doing everything we can to test page load times accurately.

This is also one of the reasons we haven’t been posting much. With new responsibility comes new drains on time.

The issue we’re struggling with now—and think we’ve solved—is about how to satisfy management that our numbers are accurate, and how to provide numbers that not only talk about server response times under stress, but also reflect the real end-user experience times. For years, we’ve used e-Load to run a bunch of tests every week, then reported that “all pages return in under 2.7 seconds”. Every user in the field, every manual tester, every single person who ever used our software knew that was BS—because the pages themselves take up to 9 seconds to render, with no other load on the server.

Here’s the problem: AJAX and client-heavy JavaScript make the server response time numbers less useful than they used to be. Empirix has no ability (that we know of) to measure the amount of time between when the page is done being downloaded to the client machine, and when all the HTML, JavaScript, and images actually finish rendering. Therefore, my directive is to produce numbers the executives could a) believe, b) have confidence in, and c) present to customers as “writ in stone”–including this new kind of stat.

I think we’ve solved it, but now we need to put down our pencils and look around… we need now to ask the question: are we on the right track, or are we full of it? Does anyone out there have a solution for testing server capacity and response time under load and the ability to measure comprehensive client page load times simultaneously?

My background, as you would gather from this site, is all in functional testing. We’ve created a unique solution for automated functional testing, but it doesn’t help us one bit in determining how our AUT performs under stress. That’s where we had to get creative, and to get educated.

We needed a way to execute a few tests by hand while we simulate 1000 users hammering the site with requests. That brought us back to our core strengths: our functional tests are executed on a real web browser, not just through a proxy or through multi-threaded HTTP processes… so, it occurred to us that they could give us our client-side execution numbers, if only we had a way to guarantee that our numbers are accurate.

Once again, the flexibility and generosity of our engineering department has saved us. Our AUT developers injected several timing AJAX calls into every page load served to the user. When the page is finished downloading, an event fires that says “Page Download Complete”, complete with a count of the number of milliseconds it took. When all the objects on the page, HTML, JavaScript, images, are finished loading, the “please wait while page loads” DIV fires off another event, “Page Rendering Complete”, with the timing included. These numbers are stored in our AUT database, by page name, and exportable to csv format.

What we are left with is a set of data that tells us the following:

myDashboardPage: 247 ms to download, 3893 ms to render

Huh, so before, when we were just using the proxy playback method of perf testing with Empirix, it told us that this page took only 247 ms to download. Meanwhile, our users were seeing 3.8 seconds. We were off in our report by approximately 1576%. Can you see why nobody believed us?

What we do now is, we fire up 1,000 VUs with Empirix, all running tests and hammering the server. At the same time, we use our queuing system to launch our ~500 automated functional tests (4-5 at a time), and have the real-time client-side rendering numbers reported in Excel on a page-by-page basis.

We believe this is the right approach to take, but we still have some concerns:

  • Not having employed or interacted with any experts in performance testing on any level, I don’t know if this is brilliant or just something that sounds-good-but-doesn’t-work-like-I-think-it-should
  • Crunching the numbers I get into a report palatable to the VP-level and above
  • Ensuring that the Empirix (soon to be JMeter) scripts we develop are doing what we want them to do
  • Ensuring that these scripts are the right scripts to run
  • Using the Empirix numbers in concert with the Client-side numbers to extrapolate information about server (and cluster) capacity, browser compatibility, implications of various configuration changes, bottlenecks, and about a hundred other things—I think this data is trying to tell us what we need to know, I’m just not a good enough statistician to know how to get it out
  • How do we know when we’re “done” with a given release? We’ve found the bottlenecks, we’ve told the VPs what they wanted to hear… my instinct is to keep pushing, even though we think we’ll meet all SLAs, but maybe the SLAs only meet the minimum and we want to do more. (This would be a good problem to have, of course)

Anyway, these are my preliminary thoughts on the subject. We’re making the transition from Empirix to jmeter, which is pretty exciting. As Scott Barber says, “the craft isn’t about the tool”. We’re just starting out on this task, but I think we’re off to a good start.

Thoughts? Ideas?

Tags: , , , , , , , ,

5 Comments for “Our New Approach to Performance Testing”

  1. jack Says:

    while reading i was hoping you were going to go where you went… we have done this exact approach at two clients- each with huge global trading systems. we ramp up volume in LoadRunner or via java scripts pushing transactions into a FIX engine and then fire up functional transactions using QTP or even AUTOHOTKEY and mark time on everything.

    since trading platforms have may front-ends and APIs, our functional tests hit all of these during performances testing. we’ve even include wireless platforms!

    users agree with the functional times, techies agree with inter-process times & server loads and everyone is happy. End users know their mileage may vary, but we are always close enough.

    hardest part (like all perf testing) is setup before and gathering/processing metrics after the test. we get huge log files and have to ensure all time stamps line up in charts, reports, etc.

    Reply to jack

  2. Stefan Says:

    Collaboration with Dev/DBA/Network guys is very essential in my opinion in performance testing. The importance of a production-like test environment also should not be underestimated.

    Interesting to see so much time been spent on client side because of JavaScripts, must be some heavy stuff going on, large amount of data?

    I like JMeter in combination with BadBoy (used to record scripts to be exported to JMeter). However I have not used that much lately since we now have a true performance/database expert on board doing very impressive work finding bottlenecks in the database and suggesting changes on SQL and index-level.

    If you can afford it I guess Loadrunner is very good choice since you get real-time data (also from monitored servers) in the Controller GUI and you can render pre-defined reports easily. But I guess that could be accomplished with other commercial tools as well.

    Keep up your good work!

    /Stefan

    Reply to Stefan

  3. Bill C. Says:

    Jack wrote, “we’ve even include wireless platforms!”

    Ohhh that is an AWESOME idea. We need to setup one of our test runner machines with a wireless card so we can get comparative page performance results vs. our “wired” machines.

    I’m going to sugest this to the team and see what they think.

    Reply to Bill C.

  4. Inder P Singh Says:
    A couple of thoughts/ ideas: 1. How did you confirm that the page rendering time was computed on the client machine? Does the client make a call to send the page rendering times to the AUT database on the server? 2. Are you using clients of multiple configurations (low-end client maxed out client, different browsers, different application configurations etc.)? If so, the configurations of the clients could be listed as a static part of your report. 3. Have you validated a sample of the test results by manually testing the business processes and timing your results with a stopwatch? Inder

    Reply to Inder P Singh

    1. Marcus Merrell Says:

      Hello Inder,

      Thanks for your comment.

      1) Yes, our webapp sends a specific event when the page is completely rendered in the browser. This is one of the reasons why our collaboration with the developers of the software has been so important. The database stores these events and automatically calculates every phase of the page load, from the database layer to the server response to the client rendering.

      2) We haven’t run too many different kinds of clients, since we’ve wanted apples-to-apples comparisons and we don’t have access to all that much equipment. Our product only supports IE at this point, so we’ve tested against IE 6 and 7.

      3) Sure, we’ve validated visually. So far these data seem to be spot on when we do a random sample check.

      Good questions, and there’s always more to be done. Thanks again!
      MM

      Reply to Marcus Merrell

Leave a Reply