Performance Analysis of REST/HTTP Services with JMeter and Yourkit

My last post described how to accomplish stress- or load-testing of asynchronous REST/HTTP services with JMeter. However, running such tests often reveals that the system under test does not deal well with increasing load. The question is now how to find the bottleneck?

Having an in-depth look at the code to detect suspicious parts could be one alternative. But considering the potentially huge codebase and therefore the multitude of possibilities for the bottleneck to hide out1 this might not look too promising. Fortunately there are tools available that provide efficient analysis capabilities on the base of telemetry2. Recording and examination of such measurements is commonly called profiling and this post gives a little introduction of how to do this using Yourkit3.

First of all we launch our SUT (System Under Test) and use JMeter to build up system load. To do so JMeter may execute a test scenario that simulates multiple users sending a lot of requests to the SUT. The test scenario is defined in a testplan. The latter may contain listeners that allow to capture execution time of requests and provide statistics like maximum/minimum/avarage request duration, deviation, throughput and so on. This is how we detect that our system does not scale well…

After this findings we enable Yourkit to retrieve telemetry. Therefore the VM of the SUT is started with a special profiler agent. The profiler tool provides several views that allow live inspection of CPU utilization, memory consumption and so on. But for a thorough analysis of e.g. the performance of the SUT under load, Yourkit needs to capture CPU information provided by the agent via so called snapshots.

It is advisable to run the SUT, JMeter and Yourkit on separate machines to avoid falsification of the test results. Running e.g. the SUT and JMeter on the same machine could reduce throughput since JMeter threads may consume a lot of the available computation time.

With this setup in mind we run through a little example of a profiling session. The following code snippet is an excerpt of a JAX-RS based service4 we use as SUT.

@Path( "/resources/{id}" )
public class ExampleResourceProvider {

  private List<ExampleResource> resources;
  
  [...]

  @Override
  @GET
  @Produces( MediaType.TEXT_PLAIN )
  public String getContent( @PathParam( "id" ) String id ) {
    ExampleResource found = NOT_FOUND;
    for( ExampleResource resource : resources ) {
      if( resource.getId().equals( id ) ) {
        found = resource;
      }
    }
    return found.getMessage();
  }

The service performs a lookup in a list of ExampleResource instances. An ExampleResource object simply maps an identifier to a message represented as String. The message found for a given identifier is returned. As the service is called with GET requests you can test the outcome with a browser:

For demonstration purpose the glue code of the service initializes the list with 500000 elements in an unordered way.

Once we got the SUT running we can set it under load using JMeter. The testplan performs about 100 concurrent requests at a time. As shown in the picture below an average request execution takes about 1 second.

The CPU telemetry recorded by Yourkit during the JMeter testplan execution reveals the reason of the long request execution times. Selecting the Hot spots tab of the profiled snapshot shows that about 72% of the CPU utilization was consumed by list iteration. Looking at the Back Traces view which lists the caller tree of the selected hot spot method we discover that our example service method causes the list iteration.

Because of this we change the service implementation in the next step to use a binary search on a sorted list for the ExampleResource lookup.

  @Override
  @GET
  @Produces( MediaType.TEXT_PLAIN )
  public String getContent( @PathParam( "id" ) String id ) {
    ExampleResource key = new ExampleResource( id, null );
    int position = Collections.binarySearch( resources, key );
    return resources.get( position ).getMessage();
  }

After that we re-run the JMeter testplan:

The average request now takes about 3 ms which is quite an improvement.

And having a look at the Hot spots of the according CPU profiling session confirms that the bottle neck caused by our method has vanished.

Admittedly the problem in the example above seems to be very obvious. But we found a very similar one in our production code hidden in the depth of the system (shame on me…). It is important to note that the problem did not get obvious before we started our stress and load tests5.

I guess we would have spend a lot of time to examine the code base manually before – if ever – finding the cause. However the profiling session pointed us directly to the root of all evil. And as most often the actual problem was not difficult to solve. So profiling can help you a lot to handle some of your work more efficiently.

At least it does for me – and by the way – it is a lot of fun too ;-)


  1. Note that the code which causes the bottleneck could belong to a third libraries as well.
  2. As I am doing such an analysis right now in a customer project I came up with the idea to write this post :-)
  3. I am not doing any tool adverts or ratings here – I simply use tools that I am familiar with to give a reproducable example of a more abstract concept. There is a good chance that there are better tools on the market for your needs
  4. Note that the sole purpose of the code snippets in this post is to serve as example of how to find and resolve a performance bottleneck. The snippets are poorly written and should not be reused in any way!
  5. From my experience it is quite common that a newly created code base contains some of those nuggets. So having such tests is a must in order to find performance problems before the customer finds them in production…
Follow me

Frank Appel

Frank is a stalwart of agile methods and test driven development in particular. He understands software development as a craftsmanship based on a well-balanced mix of knowledge and the experience of the daily work.

fappel@codeaffine.com
Follow me

Latest posts by Frank Appel (see all)

Tags: , , , , , ,

11 Responses to “Performance Analysis of REST/HTTP Services with JMeter and Yourkit”

  • Hi Frank,

    Great article, and I’m myself a Yourkit fan, but…
    “JMeter threads may consume a lot of the available computation time”.

    Couldn’t agreed more, that’s why we developed Gatling, a fully asynchronous stress tool built on top of actors and non blocking IO.
    It only uses tens of threads for handling tens of thousands of transactions per second where JMeter uses as many threads as virtual users.

    Cheers,

    Stéphane

    • Frank Appel says:

      Stéphane, thanks for pointing me at Gatling. I was not aware of this tool and from at first sight it looks fair enough. So next time around I will give it a try, in particular as JMeter behaves – carefully worded – sometimes a little bit cumbersome…

  • LakshmiNarayanan says:

    1.How to profile the jmeter results and need to find the reason of the results performance
    2.How to test flex application using jmeter when record the flex application then parameters saved in binary file

    • Frank Appel says:

      I am sorry, but I am not familiar with Flex Application so I cannot provide a profound answer to that question.

  • Ian Major says:

    Great way to kickoff our project, thanks!

  • Jeremy says:

    This is a bit of a sidetrack, but I just wanted to point out that the first example does not have a break statement after the ExampleResource is found. Therefore it can continue iterating through the entire list after the result is already found. That may be the reason that the second example ran much faster.

  • […] performance analysis of resthttp services with jmeter and yourkit […]

  • Sebas says:

    Great article.

    I’m trying to measure the capacity of my restful services in tps. I am looking for a sort of a graph that shows in the vertical line the response time and in the horizontal the amount of tps (that are increased over time) so I the end I should be able to see what was the limit in which:
    - the service jumped to a long response time
    - the service timeouted/failed

    would you have any advice? I haven’t been able to get this using jmeter

    thanks!
    Sebas


Leave a Reply