Testing Methodology and System Setup

Benchmarking Methodology

The introduction of dual core processors brings a lot of our current testing into question and presents some new problems.  In our past reviews, we have always looked at numerous benchmarks and applications to get an overall feel of how the new product (be it a processor, chipset or whatever) performed.  When Intel brought HyperThreading into the world, tech reviewers were brought their first new benchmarking issue that centered on multithreaded applications.

Reviewers responded by finding some applications that did use multiple threads, such as our CineBench 2003 benchmark, and included them in the test suite.  However, I dare say that the majority of our readers do not do high end 3D rendering on their machines, and thus the benchmarks that we did find to show the benefits of HyperThreading may not have been up to the task or very applicable to you the reader.  There simply are not many applications for desktop users that can take advantage of multiple processing threads. Using a single benchmark to illustrate the benefits of HyperThreading isn’t ideal, and now with dual cores, using the same single application as a benchmark is even worse.  Instead, we turn to multitasking.

Multitasking is something that we all do on a daily basis on our computers.  We have Outlook open to check our mail, Norton Antivirus scanning our system, IE or Firefox open browsing the Internet while we listen to MP3s.  Maybe at work you have MS Office open with an Excel sheet running macros while writing in Word and browsing the Internet…. all the while chatting on ICQ (shame!).  Each of those applications has its own set of variables, tasks, execution order and data that can be manipulated independently of any of the other programs running.  That is the ideal situation for multiple cores to run in – instead of having a single application be responsible for multiple threads of execution (i.e. CineBench or 3D Studio Max), we have the operating system handling them. 

Windows XP Professional and most variants of Linux support multiple processors and thus multiple threads of execution.  In fact, Windows XP Pro is capable of working with a total of four execution threads, two each from two different processor cores, as is the case of the Intel Extreme Edition 840 processor with HyperThreading enabled.

So, now we know that the immediate future of application benchmarking should reside in the ability to accurately test and measure performance of systems in a multitasking environment.  Unfortunately that is easier said than done.  Most of our current benchmarks use a single application, run a script of some kind, and report back a result based on the time it took to complete the task.  Running multiple applications and doing multiple things at the same time requires a direct involvement from the one running the benchmarks, meaning more of a chance of human error and variability in general.  How can we address this?

PC Perspective Multitasking Scenarios

My multitasking testing setup, in its first iteration, uses the following applications in various ways:

  • Adobe Acrobat Reader
  • Microsoft Excel
  • Microsoft Outlook
  • Trillian
  • Norton Antivirus Trial
  • iTunes and Quicktime
  • Razor Lame MP3 Encoder
  • Firefox
  • Doom 3

This setup is a general subset of what applications I find myself running the majority of the time in my work environment.  Pinging some of our forum mods and some of my friends that use their PCs frequently, this is also pretty close to the number of applications and types of applications that they run.  Certainly many of you are users that have more applications running at any given time, just like I am sure many of our readers are very particular about having as few apps running as possible. 

In order to give some quantifiable measure of performance on a multitasking machine, at least one specific application needed to be running on a timer based function.  To me, it made the most sense for that program to be Razor Lame MP3 Encoder, encoding some WAVs into MP3 files; this application keeps a log of how long a specific task takes and we could vary the amount of applications running in the background to get differing levels of multitasking performance.  To that extent, I set up three distinct multitasking performance scenarios.

Scenario 1

I consider this to be the heavy multitasking scenario that uses nearly all of the applications listed above in some fashion.  I had Norton AV doing a virus scan on the hard drives, Trillian open and running in the background, Firefox open with three tabs on Flash-heavy sites, iTunes playing a playlist of MP3s, Acrobat open to a large, complicated PDF file with lots of layers, Excel open with a 3 MB data sheet and then timed Razor Lame encoding a dozen WAV files into MP3 format at 320 kbps.  Here is a complete step by step of my application start up process:

  1. Open Trillian
  2. Open Excel Sheet
  3. Open PDF file to page 15 (complicated data)
  4. Open Firefox, with three tabs, each of heavy Flash content (stored locally)
  5. Open iTunes and play 12 song playlist
  6. Start NAV virus scan on HDDs
  7. Open Razor Lame and add files to be encoded
  8. Encode MP3s and time

The window order, from bottom most viewable to top most was: Acrobat -> iTunes -> Norton AV -> Razor Lame.  Keeping Acrobat at least partly visible forced the system to continue processing the data in the file. 

Scenario 1+

After completing the above tests, I decided to add one more application into the suite by including Outlook importing an email file into its database.  Here is the modified step by step:

  1. Open Trillian
  2. Open Excel Sheet
  3. Open PDF file to page 15 (complicated data)
  4. Open Firefox, with three tabs, each of heavy Flash content (stored locally)
  5. Open iTunes and play 12 song playlist
  6. Start NAV virus scan on HDDs
  7. Open Outlook and start import on 335MB email file
  8. Open Razor Lame and add files to be encoded
  9. Encode MP3s and time

The window order, from bottom most viewable to top most was: Acrobat -> iTunes -> Outlook -> Norton AV -> Razor Lame.  On this test there were two applications running tasks that had a definite end, and in our results you’ll see that at the end of the MP3 encoding, the Intel system had already finished the Outlook import while the AMD system was still completing the task.

Scenario 2

I would consider this to be a light multitasking environment with only the bare minimum applications running in the background.  In this test NAV was not doing a virus scan, but was enabled doing its standard monitoring.  iTunes was playing MP3s and we timed the Razor Lame app encoding the same MP3s.

  1. Norton AV starts on bootup, in monitoring mode only
  2. Open Trillian
  3. Open iTunes and play 12 song playlist
  4. Open Razor Lame and add files to be encoded
  5. Encode MP3s and time

The window order, from bottom most viewable to top most was: iTunes -> Razor Lame. 

Each scenario was tested three times, with the scores averaged.  I was suprised to find that the results seemed very consistent and reliable, usually the times were +/- 3 seconds of each other.

System Setup  

AMD Test System Setup

CPU

Athlon 64 X2 4800+ (2.4 GHz, 1MBx2)
Athlon 64 X2 4400+ (2.2 GHz, 1MBx2) – Review
Athlon 64 X2 3800+ (2.0 GHz, 512KBx2)
Athlon 64 X2 3800+ OC (2.5 GHz, 512KBx2)
Athlon 64 4000+ ( 2.4 GHz, 1MB) – Review

Motherboards

Asus A8N-SLI Premium (nForce4 SLI)

Power Supply 

Antec 480 watt

Memory 

2x1024MB Corsair Micro DDR500 @ 2-3-3-6
2x1024MB Kingston HyperX 4300 DDR

Hard Drive

250 GB Maxtor 7200 RPM SATA

Sound Card

Creative Labs Live!

Video Card

NVIDIA 7800 GTX (490/1.30) – Review

Video Drivers

NVIDIA 81.85

DirectX Version

DX 9.0c

Operating System

Windows XP w/ Service Pack 1

 

The benchmarks used were:

  • SiSoft Sandra 2005 SP1
  • Everest
  • PCMark04
  • LAME MP3 Encoding
  • XMPEG / DivX Encoding
  • WinRAR Compression
  • CineBench 2003
  • ScienceMark 2.0
« PreviousNext »