benchmarkingblog

Elisabeth Stahl on Benchmarking and IT Optimization

Oracle’s SPARC T5 and M5 Benchmarks: Lather, Rinse, Repeat

with 21 comments

I think I’ve said this before but one of my most absolute favorite movies is Groundhog Day. (Attention: spoiler is coming but since the fricking movie is from 1993 and most of us were old even way back then, I don’t think I will be ruining it for anyone.) Groundhog Day is an American comedy film directed by Harold Ramis and starring Bill Murray and Andie MacDowell (who by the way I’ve been told that I sort of look like which is really cool since she does L’Oréal ads). In the film an arrogant and egocentric TV weatherman, covering the annual Groundhog Day event, finds himself repeating the same day again and again.

The phrase “Groundhog Day” now has entered common lexicon as a reference to an unpleasant situation that continually repeats, or seems to.

And I would say that is exactly what we have with Oracle’s new SPARC T5 and M5 benchmarks.

Just as with every Oracle processor announcement, the benchmark results do the same thing. Many of the claims are Oracle’s own benchmarks that are not published and audited. There are a small number of industry standard benchmarks — and of course these are ones where it is extremely difficult, if not impossible, to compare to other relevant results. For price claims, Oracle, as they’ve done in the past, only factors in the price of the pizza box – make sure you add in the all-important software and storage.

Let’s take a look at the T5 and M5 benchmark results:

  • SAP: The IBM POWER7+ with DB2 10 SAP SD 2-tier result from back in September was 1.3x greater per core than the M5 and 1.9x greater than the T5 result.(1) The IBM average database request time was also much better and the CPU utilization of the IBM system was also more effective.
  • TPC-C: An IBM POWER6 result from 2008, 2 generations ago, is 42% higher per core than the new T5 result on this OLTP benchmark. An IBM POWER7 result from 2010, 1 generation ago, is 2.2x better performance per core than the Oracle result. (2) The price for all Oracle database software support used in computing the price/performance for this benchmark is $2300/year – I can only guess what you get for that. Also note that this benchmark used Oracle Partitioning which may not be realistic for your real world workloads. The Oracle database software is not even available until September.
  • SPECjEnterprise2010: Oracle’s T5 result needed four times the number of database cores, four times the amount of memory and significantly more storage than the IBM POWER7 result. (3)
  • SPECjbb2013: For Java business, let’s run a benchmark that can only be compared with a couple of ProLiants, one of our old T4s, and a Supermicro. (4)
  • SPECcpu: IBM Power Systems is #1 – don’t forget to look at number of cores for integer and floating point claims.
  • TPC-H: Ha, got you. There is no TPC-H. Funny, was expecting one based on what we saw for the T4. I wonder why . . .
  • The other benchmark claims? These are once again ones that either are Oracle’s own benchmarks or ones nobody cares about because they don’t look like anything we actually run. Chance of departure from useful benchmark results: 100%.
  • Don’t let these claims distract from asking about the business value delivered by these systems.

    I wake up every day, right here, right in Cleveland, and it’s always snowing, and there’s nothing I can do about it. “Winter, slumbering in the open air, wears on its smiling face a dream… of spring.”

    ************************************************
    (1)IBM Power 780 (3.72 GHz) two-tier SAP SD Standard Application Benchmark result (SAP enhancement package 5 for the SAP ERP 6.0 application: 12 processors / 96 cores / 384 threads, POWER7+, 1536 GB memory, 57,024 SD benchmark users, running AIX® 7.1 and DB2® 10, dialog resp.: 0.98s, line items/hour: 6,234,330, Dialog steps/hour: 18,703,000, SAPS: 311,720, DB time (dialog/ update): 0.009s / 0.014s, CPU utilization: 99%, Certification #2012033

    Oracle SPARC Server M5-32 SAP SD 2-tier result of 85,050 users, Average dialog response time: 0.80 seconds, Fully processed order line items per hour: 9,452,000,Dialog steps per hour: 28,356,000,SAPS: 472,600,Average database request time (dialog/update): 0.018 sec / 0.044 sec,CPU utilization of central server: 82%,Operating system, central server: Solaris 11,RDBMS: Oracle 11g,SAP Business Suite software: SAP enhancement package 5 for SAP ERP 6.0,32 processors / 192 cores / 1536 threads,SPARC M5, 3.60 GHz, 16 KB (D) and 16 KB (I) L1 cache and128 KB L2 cache per core, 48 MB L3 cache per processor,4096 GB main memory,Certification #2013009

    Oracle SPARC Server T5-8 SAP SD 2-tier result of 40,000 users,Average dialog response time: 0.86 seconds,Fully processed order line items per hour: 4,419,000,Dialog steps per hour: 13,257,000,SAPS: 220,950,Average database request time (dialog/update): 0.049 sec / 0.131 sec,CPU utilization of central server: 88%, Operating system, central server: Solaris 11,RDBMS: Oracle 11g,SAP Business Suite software: SAP enhancement package 5 for SAP ERP 6.0, 8 processors / 128 cores / 1024 threads,SPARC T5, 3.60 GHz, 16 KB (D) and 16 KB (I) L1 cache and 128 KB L2 cache per core, 8 MB L3 cache per processor,2048 GB main memory,Certification #2013008.

    (2) IBM Power 780 (2 chips, 8 cores, 32 threads) with IBM DB2 9.5 (1,200,011 tpmC, $.69/tpmC, configuration available 10/13/10); IBM Power 595 (5 GHz, 32 chips, 64 cores, 128 threads) with IBM DB2 9.5 (6,085,166 tpmC, $2.81/tpmC, configuration available 12/10/08); vs. Oracle SPARC T5-8 (8 chips, 128 cores, 1024 threads – 8,552,523 tpmC, $.55/tpmC, configuration available 9/25/13).

    (3) WebSphere Application Server V7 on IBM Power 780 and DB2 on IBM Power 750 Express, (64 core app server, 32 core db server), 16,646.34 SPECjEnterprise2010 EjOPS vs. SPARC T5-8 server (SPARC T5-8 server base package, 8x SPARC T5 16-core processors, 128x16GB-1066 DIMMS, 2x600GB 10K RPM 2.5” SAS-2 HDD result of SPARC T5-8, 57,422.17 SPECjEnterprise2010 EjOPS.

    (4) http://www.oracle.com/us/solutions/performance-scalability/sparc-t5-2-specjbb2013-1925099.html

    Sources: http://www.spec.org, http://www.tpc.org, http://www.sap.com. Results current as of 3/26/13.

    TPC-C ,TPC-H, and TPC-E are trademarks of the Transaction Performance Processing Council (TPPC).

    SAP, mySAP and other SAP product and service names mentioned herein as well as their respective
    logos are trademarks or registered trademarks of SAP AG in Germany and in several other countries all
    over the world.

    SPEC, SPECint, SPECfp, SPECjbb, SPECweb, SPECjAppServer, SPECjEnterprise, SPECjvm, SPECvirt, SPECompM, SPECompL, SPECsfs, SPECpower, SPEC MPI and SPECpower_ssj are trademarks of the Standard Performance Evaluation Corporation (SPEC).

    The postings on this site solely reflect the personal views of the author and do not necessarily represent the views, positions, strategies or opinions of IBM or IBM management.

    technorati tags: , , ,,,,,,,,,,,,,,,

    Written by benchmarkingblog

    March 26, 2013 at 5:53 pm

    21 Responses

    Subscribe to comments with RSS.

      • Benchmarks are great, we are the world first revenue generated M5-32 customer. We have three M5-32 server running SAP applications. Recently we have completed 5TB OLTP unicode migration as well. Anyone interested know about real time experience contact us.

        fernando.sagay@almarai.com
        +966503259871

        Fernando Sagay

        December 10, 2013 at 9:00 am

    1. Benchmarks are great, when used in advertising they should be published. Especially when it comes to broad claims on multi-tier applications, there are lots of knobs that can be turned to alter the outcome both ways. What’s better than benchmarks? real world deployments… I have yet to meet an IT shop that buys based on a broad claim on the front page of the wall street journal.

      Chris Ceska

      March 29, 2013 at 12:41 pm

    2. You keep going on about per core performance in those stats. Who cares about per-core performance? That is why most of these benchmarks focus on cost per work, not work per core.

      Brian Utterback

      March 30, 2013 at 10:25 am

      • @Brian : …because you license Oracle software ***by the core*** all that matters is per-core performance.

        kevinclosson

        March 30, 2013 at 5:48 pm

        • Good point, but don’t the cost per work benchmarks take license costs into account?

          Brian Utterback

          March 30, 2013 at 5:59 pm

        • No, all that matters is the performance per processor to license when you try to talk about cost/performance. And i know that you know about the T5 core factor because you retweeted a tweet with that information 😉

          c0t0d0s0

          March 31, 2013 at 8:42 am

        • And for licensing SAP software, it is based on number of users… so you can use as many cores as you like with no license limit, including the Oracle DB that you buy from SAP.

          Tan Ah Beng

          April 2, 2013 at 1:16 am

      • Totally agreed with you, dividing benchmark result into performance/core is not applicable since TPC ask for the detailed BOQ of the system that run the test including software part. Although such approach will drive to endless conclusions since every person will come up with his own metric suck performance/thread, performance/system bus lane, performance/memory size … just stick with TPC result and TPC thankfully can sort the best results on their page. 😉

        Ibrahim

        March 31, 2013 at 5:30 am

        • Yes, but there are several ways of optimizing your price, both on what you include and how you present it. If you look at the Oracle T5-8 submission you’ll see that you do not buy the licenses for the Oracle database from Oracle, you lease the software and that is what is priced. Furthermore the Webonly support offering that is included is not the ‘normal’ support that people are buying, for their Databases.
          Try looking up Oracle’s prices for buying the DB software and their normal SULS (Software Update License and Support. It’s (47500 + 10.000) x 64 = 3680000 USD + 809600 per year in SULS minus discount. That is a very different figure than leasing for 3 years with websupport only, and one that would seriously affect the $/tpmc number.

          Furthermore also try to look at the maintenance cost of the T5-8 on TPC-C benchmark. It’s 0 USD. Right up until you find out that in the bottom of the report, there is an entry called Oracle Premier Hardware Support costing a million USD. That is kind of misleading IMHO.
          As a side note then if you factor in discounts then you’ll find that the HW support cost of the midrange T5-8 with it’s commodity x86 flash storage is actually almost the same as the old Highend POWER 595 submission with it’s 64 storage boxes and thousands of disks.

          So no matter how much people say, just look at the numbers, then you really need to read, at least the executive rapport, and understand what is included. This surely goes for all vendors.

          // Jesper

          Jesper Frimann

          April 2, 2013 at 6:26 am

    3. The best place to get answers about how Oracle broken the benchmarks records is below blog, they speaking technical language.

      https://blogs.oracle.com/BestPerf/

      Ibrahim

      March 31, 2013 at 5:18 am

    4. Important for me:
      Core performance(licensing, thread strength)
      System performance(scalability, move from 740 to 780 without down time)

      Not so important:
      Arbitrary measures like socket performance, which server that comes in gray is the fastest.

      lssrad

      April 3, 2013 at 1:11 pm

    5. So apparently, IBM has faster cores, so IBM wins the performance / core throne.

      It seems that Oracle wins the performance / cpu throne, if you look at the official benchmarks.

      So, what is most important to the customer? If it is performance per core, then go with IBM. If it is performance per socket, then go with Oracle. But in a half year IBM will release POWER8 which will challenge Sparc T5! It will be very interesting.

      Michael

      April 5, 2013 at 10:07 am

      • 500BHP V12 engine A vs 300BHP V6 engine B.
        So which engine is more powerful?
        Engine A with 500BHP (42BHP per cylinder)
        or
        Engine B with 300BHP (50BHP per cylinder)?

        It seems that the moment P7+ does not have the highest
        SAP per CPU (BHP per engine),
        they start to talk about SAPS per core (BHP per cylinder)… hah!

        Tan Ah Beng

        April 15, 2013 at 4:27 am

        • Difference being that all the cylinders work together on the same crank while 16 weak cores can not work together on the same single threaded job. Your analogy is flawed.

          Here’s a better one: Two Skodas are faster than one Ferrari. Therefore you should buy your servers from Oracle.

          lssrad

          April 15, 2013 at 1:49 pm

        • My analogy was not flawed but your understanding of SAPS rating was flawed. SAPS is a THROUGHPUT benchmark for the last 20 years and not a SINGLE THREAD benchmark. Please stop twisting the SAPS benchmark results to confuse others. 🙂

          Tan Ah Beng

          April 15, 2013 at 9:44 pm

    6. IBM’s playing catching up according to me. Some of the pointers are: On Chip Crypto Accelerators, Playing it down on clockspeed (vs Earlier 5.0 GHz from IBM compared to the present generation), Large L3 caches (upto 10 MB a core!) and many many more.

      Coming to comparing the P7+ and T5 systems, P7/P7+ systems still are on Pci Gen 2 & use a “crossbar” switch to scale memory and CPU in their larger systems. So linear scaling sucks on the larger systems ie. 770/780 etc. & adding to that, If you look at some of the benchmarks, you will notice that IBM has used sparsely populated systems (16 cores in a P780 ) for the same reason of ensuring all the system bandwidth is allocated to the limited computing resources available in that system (IBM should publish a STREAM benchmark, which will show you that the IBM so called “Powerful Cores” bandwidth starved.) . On the other hand T5 systems are already using Pci Gen 3 and are on a Glue-less architecture till their 8 socket (128 core) system.

      Since there are a lot of benchmarks references here in this page, may be its time for some of you to rethink “Benchmarks” and understand the workload of particular benchmarks. For eg. 10MB cache is certainly going to help get awesome SAPs/Core ..etc. However, How relevant is that Sales distribution benchmark from SAP applicable to your “real world” application performance ? Maybe, you are not even running SAP in the first place. Same applies to TPCC, TPCH ,SPEC and all the other “Benchmarketers” .

      Some questions I have for IBM are:

      1. Why is IBM testing with small configs & extrapolating the results ? Issue with scaling?
      2. Why do the rPerf values for Power7+ and Power7 almost match?
      3. Why no 8-chip Power7+ or Power7 TPC-C results single server & Why no direct benchmark comparisons of Power7+ and Power7?
      4. Why no SPECjbb2013 results on Power7+ and Power7? IBM certainly has no problems with SPECjbb2005(which retires this July)

      Its all about a balanced system design folks! If per-core performance so important why no 1-core servers from IBM?

      Its time to wear your thinking cap…

      Cheers

      mohba01

      April 16, 2013 at 2:41 am

    7. @mohba01 – You are misrepresenting information here. Your statements may sound to the uninformed like the “gotcha” moment all competitors like to have. However, it tells me that you have only a cursory understanding of Power technology.

      1) Power6 was a primarily In Order architecture allowing for higher clock speeds – up to 5.0 GHz. Power4, 5 & 7 (including the + models) are Out of Order designs. These servers now go up to 4.42 GHz. Btw, when turning off the energy management features you can add another 7% or so to that. Check your SPARC servers to see where IO and OoO were used – T1 was IO – Wasn’t til T4 that OoO showed up. IO has less logic allowing for higher clock speeds – unless you are talking about the T1-T3 which couldn’t break the 2.0 GHz barrier even though it was IO.
      2) IBM is absolutely not playing catch-up in the way you are eluding to. Let’s review. Oracle uses valuable chip space to put Ethernet controllers on. IBM uses this space to put Memory Compression – ability to present up to 2.25X the physical memory to a LPAR. ie 16 GB physical may look like 36 GB to the LPAR. Oracle has had crypto – ok, congrats. IBM now has it. IBM takes it further and has L3 cache on chip – not just the controller but with 10 MB per core on-chip. IBM includes a random number generator. Depending on the model and chip they offer up to 2 memory controllers, each with 4 memory channels per chip. Each MC provides roughly 65 GB/s of bandwidth. I could go on as there are more features.
      3) Power servers are about balanced computing. I’ll tell you a little secret – our DDR3 memory is also 1066 MHz. Yes, we do not have 1600 MHz like x86. Wow, these shiny objects are easy to distract customers if the overall architecture isn’t balanced. I would rather be able to feed the cores through more memory channels and a big pipe.
      4) Crossbar – No. Actually look at the SPARC servers if you want to see a crossbar. If you look at the T5 architecture whitepaper dated March 2013 on page 8 you will see where the T5 servers use a 8×9 crossbar. Not sure what points you were trying to score against Power with that but I’ll kindly pick those up 🙂 Even found in the M5000 datasheet where it states the system bus is a crossbar. I could search for the M5 and I’m confident it will have this as well……same dog, no new tricks. They have kept this architecture from the day of UltraSPARC where the bus ran at 83 MHz I believe. Fujitsu always seems to run theirs faster. Power uses a “shared everything” architecture. This means that everything runs off the clock speed. So, a Power7+ 740 running at 4.2 GHz will benefit everything downstream as the core to core, core to memory, core to I/O all run at some ratio of the clock speed. Unlike the archaic and lesser engineered crossbars which by their nature “gate” everything that comes across it.
      5) Scaling is pretty dog gone linear on Power server. You should have very good results within a base server chassis. the trick is what are the results when you start to add system boards (IBM’s 770/780 use CEC’s as a equivalent here). Now you have physics to deal with to a greater degree….there might be .5 meters separating one processor from another. This reminds me of Sun’s E10k – what a great server (serious, it was a game changer – they just never did anything revolutionary with it let alone barely evolutionary). Put the first system board in slot 1 and the 2nd in slot 2 or in slot 16 but don’t put the 2nd in slot 7 or 8 which was about 1.0 meter to the right. This was to prevent putting data onto the centerplane. Power7+ servers have this feature called Dynamic Platform Optimization (DPO) which means it works with our Power Hypervisor to continually optimize workload placement on the compute resources. Power gives you the flexibility to optimize workload placement….does SPARC?
      6) I can’t answer all of your points here or Elizabeth won’t publish this. IBM can’t publish results for every benchmark iteration but they do publish a lot of them. Hats off to x86 for the same. Wouldn’t it be nice for Oracle to publish lots of benchmarks – they were awful quiet with the T1-T3 servers weren’t they? I’d have to go check but they seem to cherry pick benchmarks that were just released as in SPECjbb2013 in January or find a gap where there isn’t a competitive platform benchmark and publish to that. I get it, why would you purposely step up just to look bad? And, then you see the crazy misrepresentations of information. Using licensing models or support options that no (ok, very few) customers actually use. All in a effort to bring down the Oracle cost to improve the price/performance that you say is “so critical”. Beauty of Power servers is due to their linear scaling and tremendous efficiency I can take a 16 core result and confidently
      extrapolate to a 32 core result, etc.
      7) You ask why rPerf values for P7 & P7+ almost match? What are you looking at? Take a 3.0 GHz Power7 4 core 720 vs a 3.6 GHz Power7+ 4 core 720 and the rPerf is 45.1 and 53.9 respectively. You are going to think I’m picking on your or being a smart-Alec by saying “those are not the same”. If I look at the 8 core 3.6 GHz P7+ 720 and compare that to the older 8 core 3.6 GHz P7 750 they are 102.4 vs 93.05.
      8) No idea about the 8 chip p7+ TPC benchmark. I wish they would do it along with the 256 core 795. I suspect it would crush Oracle’s 256 core offering. Alas, IBM is much more conservative though and they don’t pick the fight like I would like to see….which Oracle likes to do. It’s okay, I get a kick out of those WSJ adds. They are laughable but hey, Oracle is trying to run a business.
      9) SPECjbb2013 was just released in January. There are only about 10 or so entries. In time….go ahead and sit on your “accomplishment”. When and if IBM publishes I’m confident the SPARC entry will move to a distant 2nd place.
      10) 1 core servers hey – how about this….there are 1 core servers just like there are 256 core servers. I can take a 4, 6, 8, 12, 16, 24 and 32 core server and deconfigure all but 1 core for a customer to use (just stating the capability, not saying I would). Then the customer can grow into the additional capacity as needed. In the 760 and larger servers they can do this dynamically – no reboot and no impact to the LPARs…..just keep on running.

      No thinking cap required just lots of webspace to refute all of these wild claims. Everybody does benchmarks and they serve a purpose but we can all agree (can’t we?) that each vendor puts their best foot forward to achieve the desired results. Where I challenge Oracle is to bring your baddest T5 box with it’s lackluster LDOM virtualization against any of my Power7+ servers that I have for a bake-off. Power7+ will run circles around anything Oracle has with regards to straight line performance, virtualization flexibility, software licensing, RAS features, TCO, efficiency, and the number of workloads it can run on far fewer resources than a Oracle server. As a IBM Business Partner I do this day in and day out competing against Oracles Exa* line, T series and M series.

      The footrace of your 3.6 Ghz server vs the Power7+ lineup is interesting…..gets the fans waving. What gets the CFO’s attention is that Power7+ can do far more work with far fewer resources lowering what they have to write in a check to Larry each year – there is the story! He may have to put off buy New Zealand until 2014….just saying.

      Brett Murphy

      April 30, 2013 at 12:38 pm

    8. Within proven system scalability limits of both Sparc and Power servers, performance per core and licensing costs per core are extremely important, because they are huge components of TCO. However, if a customer system requires a level of scalability within proven range of only one of those two architectures, the superior performance/core of Power 7/7+ and the lower Oracle license cost/core of the T5/M5 fade into the background. If the system as required can only be planned on one of the two architectures, you grit your teeth and go with it.
      Fortunately, most system requirements will be within the range of both Power and Sparc scalability. Then sites can make their selection based on their TCO calculations – including all maintenance costs such as personnel and platform consolidation as well as price/performance.

      lonnyniederstadt

      June 6, 2013 at 10:35 am

    9. Since we spoke about the stream benchmarks:

      “IBM Power 720 and 740 Technical Overview and Introduction” http://www.redbooks.ibm.com/redpapers/pdfs/redp4637.pdf (page 64) IBM 720/740 Memory Bandwidth: 68.22 GB/s per socket
      ..Calculate Therefore 2-socket server is TWICE THIS… IBM 720/740 Memory Bandwidth: 136.44 GB/s per system

      Published IBM P720/740 STREAM data: http://www.cs.virginia.edu/stream/by_date/Bandwidth.html

      Vs a T5-x or a Mx-32 system that has a system bandwidth significantly higher than that. (600 GB/s for a 8 socket system)

      here’s my 0.02$.

      I would like to rest my case now.

      mohba01

      September 25, 2013 at 3:14 am

    10. Interesting comparison done by Intel cited by semiaccurate[1]:
      “For 4S systems, Intel says a top end 4490 is 1.8x faster than a 4S Power 750 Express with a Power 7+ CPU while being 74% cheaper. Against a Sun Sparc T5-4 4S machine the same 4490 is said to be 1.28x faster and 61% cheaper.”
      Interesting, isn’t it?

      [1]: http://semiaccurate.com/2014/02/18/intel-releases-ivy-bridge-ex-now-known-xeon-e7-v2/

      kgardas

      February 18, 2014 at 4:38 pm


    Leave a comment