Benchmarking Methodology and System Setup

Benchmarking Methodology

As has been documented by other online reviews of this new dual core processor, the introduction of dual core processors brings a lot of our current testing into question and presents some new problems.  In our past reviews, we have always looked at numerous benchmarks and applications to get an overall feel of how the new product (be it a processor, chipset or whatever) performed.  When Intel brought HyperThreading into the world, tech reviewers were brought their first new benchmarking issue that centered on multithreaded applications.

Reviewers responded by finding some applications that did use multiple threads such as our CineBench 2003 benchmark, and included them in the test suite.  However, I dare say that the majority our readers do not do high end 3D rendering on their machines, and thus the benchmarks that we did find to show the benefits of HyperThreading may not have been up to the task or very applicable to you the reader.  There simply are not many applications for desktop users that can take advantage of multiple processing threads. Using a single benchmark illustrate the benefits of HyperThreading isn’t ideal, and now with dual cores, using the same single application as a benchmark is even worse.  Instead, we turn to multitasking.

Multitasking is something that we all do on a daily basis on our computers.  We have Outlook open to check our mail, Norton Antivirus scanning our system, IE or Firefox open browsing the Internet while we listen to MP3s.  Maybe at work you have MS Office open with an Excel sheet running macros while writing in Word and browsing the Internet…. all the while chatting on ICQ (shame!).  Each of those applications has its own set of variables, task, execution order and data that can be manipulted independently of any of the other programs running.  That is the ideal situation for multiple cores to run in – instead of having a single application be responsible for multiple threads of execution (i.e. CineBench or 3D Studio Max), we have the operating system handling them. 

Windows XP Professional and most variants of Linux support multiple processors and thus multiple threads of execution.  In fact, Windows XP Pro is capable of working with a total of four execution threads, two each from two different processor cores, as is the case of the Intel Extreme Edition 840 processor with HyperThreading enabled.

So, now we know that the future of application benchmarking should reside in the ability to accurately test and measure performance of systems in a multitasking environment.  Unfortunately that is easier said than done.  Most of our current benchmarks use a single application, run a script of some kind, and report back a result based on the time it took to complete the task.  Running multiple applications and doing multiple things at the same time requires a direct involvement from the one running the benchmarks, meaning more of a chance of human error and variability in general.  How can we address this?

PC Perspective Multitasking Scenarios

My multitasking testing setup, in its first iteration, uses the following applicatoins in various ways:

  • Adobe Acrobat Reader
  • Microsoft Excel
  • Microsoft Outlook
  • Trillian
  • Norton Antivirus Trial
  • iTunes and Quicktime
  • Razor Lame MP3 Encoder
  • Firefox
  • Doom 3

This setup is a general subset of what applications I find myself running the majority of the time in my work environment.  Pinging some of our forum mods and some of my friends that use their PCs frequently, this is also pretty close to the number of applications and types of applications that they run.  Certainly many of you are users that have more applications running at any given time, just like I am sure many of our readers are very particular about having as few apps running as possible. 

In order to give some quantifiable measure of performance on a multitasking machine, at least one specific application needed to be running on a timer based function.  To me, it made the most sense for that program to be Razor Lame MP3 Encoder, encoding some WAVs into MP3 files; this application keeps a log of how long a specific task takes and we could vary the amount of applications running in the background to get differing levels of multitasking performance.  To that extent, I set up three distinct multitasking performance scenarios.

Scenario 1

I consider this to be the heavy multitasking scenario that uses nearly all of the applications listed above in some fashion.  I had Norton AV doing a virus scan on the hard drives, Trillian open and running in the background, Firefox open with three tabs on Flash-heavy sites, iTunes playing a playlist of MP3s, Acrobat open to a large, complicated PDF file with lots of layers, Excel open with a 3 MB data sheet and then timed Razor Lame encoding a dozen WAV files into MP3 format at 320 kbps.  Here is a complete step by step of my application start up process:

  1. Open Trillian
  2. Open Excel Sheet
  3. Open PDF file to page 15 (complicated data)
  4. Open Firefox, with three tabs, each of heavy Flash content (stored locally)
  5. Open iTunes and play 12 song playlist
  6. Start NAV virus scan on HDDs
  7. Open Razor Lame and add files to be encoded
  8. Encode MP3s and time

The window order, from bottom most viewable to top most was: Acrobat -> iTunes -> Norton AV -> Razor Lame.  Keeping Acrobat at least partly visible forced the system to continue processing the data in the file. 

Scenario 1+

After completing the above tests, I decided to add one more application into the suite by including Outlook importing an emails file into its database.  Here is the modified step by step:

  1. Open Trillian
  2. Open Excel Sheet
  3. Open PDF file to page 15 (complicated data)
  4. Open Firefox, with three tabs, each of heavy Flash content (stored locally)
  5. Open iTunes and play 12 song playlist
  6. Start NAV virus scan on HDDs
  7. Open Outlook and start import on 335MB email file
  8. Open Razor Lame and add files to be encoded
  9. Encode MP3s and time

The window order, from bottom most viewable to top most was: Acrobat -> iTunes -> Outlook -> Norton AV -> Razor Lame.  On this test there were two applications running tasks that had a definite end, and in our results you’ll see that at the end of the MP3 encoding, the Intel system had already finished the Outlook import while the AMD system was still completing the task.

Scenario 2

I would consider this to be a light multitasking environment with only the bare minimum applications running in the background.  In this test NAV was not doing a virus scan, but was enabled doing its standard monitoring.  iTunes was playing MP3s and we timed the Razor Lame app encoding the same MP3s.

  1. Norton AV starts on bootup, in monitoring mode only
  2. Open Trillian
  3. Open iTunes and play 12 song playlist
  4. Open Razor Lame and add files to be encoded
  5. Encode MP3s and time

The window order, from bottom most viewable to top most was: iTunes -> Razor Lame. 

Scenario 3

The third and last scenario on this test was based around someone gaming on their machine.  Most gamers would not have a whole lot of active applications running behind their games, so I didn’t include anything like the PDF or Excel document in the background.  What I did include was Trillian, iTunes (as I know many gamers listen to MP3s while playing their games) and I also threw in a Norton AV virus scan.  While you may not have Norton do a scan on purpose while you are gaming, many times I have had NAV start doing one based on its schedule while I have been doing other things, so I felt it appropriate for a worst case scenario.

  1. Open Trillian
  2. Open iTunes and play 12 song playlist
  3. Start NAV virus scan on HDDs
  4. Start Doom3 and run timedemos

The results this time were based around the average frame rate reported by the Doom 3 in-game time demos. 

Each scenario was tested three times, with the scores averaged.  I was suprised to find that the results seemed very consistent and reliable, usually the times were +/- 3 seconds of each other.

System Setup

Our testing method for this processor preview was the same as any other platform launch; we compared the new Pentium XE 840 to the other top of the line processors from both Intel and AMD.   

AMD Test System Setup

CPU

Athlon 64 FX-55
AMD Athlon 64 4000+
Athlon 64 3800+
Athlon 64 3500+

Motherboards

NVIDIA nForce4 Ultra Motherboard

Power Supply 

Antec 480 watt

Memory 

2x512MB Corsair Micro DDR500 @ 2-3-3-6
2x512MB Kingston HyperX 4300 DDR

Hard Drive

250 GB Maxtor 7200 RPM SATA

Sound Card

Creative Labs Live!

Video Card

ATI X800 XT

Video Drivers

ATI Catalyst 4.11

DirectX Version

DX 9.0c

Operating System

Windows XP w/ Service Pack 1

 

 Intel 925XE Test System Setup

CPU

Intel 3.73 GHz Extreme Edition
Intel 3.46 XE
Intel 660
Intel 560
Intel 550
Intel 3.4 XE

Motherboards

Intel 925XE Reference

Power Supply 

Antec 480 watt

Memory 

2x512MB Corsair Micro DDR2-533 @ 4-4-4-14

Hard Drive

250 GB Maxtor 7200 RPM SATA

Sound Card

Creative Labs Live!

Video Card

ATI X800 XT

Video Drivers

ATI Catalyst 4.11

DirectX Version

DX 9.0c

Operating System

Windows XP w/ Service Pack 1

 

 Intel 955X Test System Setup

CPU

Intel Extreme Edition 840 @ 3.2 GHz

Motherboards

Intel 955X Reference

Power Supply 

Antec 480 watt

Memory 

2x512MB Corsair Micro DDR2-667 @ 4-4-4-14

Hard Drive

250 GB Maxtor 7200 RPM SATA

Sound Card

Creative Labs Live!

Video Card

ATI X800 XT

Video Drivers

ATI Catalyst 4.11

DirectX Version

DX 9.0c

Operating System

Windows XP w/ Service Pack 1

The benchmarks used were:

  • SiSoft Sandra 2004 SP1
  • AIDA32
  • Cachemem
  • Quake III: Arena
  • Unreal Tournament 2003
  • X2: The Threat
  • 3D Mark 2001: SE v330
  • 3DMark03
  • Far Cry 1.1
  • Doom 3
  • PCMark04
  • Business Winstone 2004
  • Content Creation Winstone 2004
  • LAME MP3 Encoding
  • XMPEG / DivX Encoding
  • WinRAR Compression
  • CineBench 2003
  • ScienceMark 2.0 Beta
« PreviousNext »