Frequent Asked Questions on the LINPACK Benchmark

The Linpack Benchmark is a measure of a computer’s floating-point rate of execution. It is determined by running a computer program that solves a dense system of linear equations. Over the years the characteristics of the benchmark has changed a bit. In fact, there are three benchmarks included in the Linpack Benchmark report.

The Linpack Benchmark is something that grew out of the Linpack software project. It was originally intended to give users of the package a feeling for how long it would take to solve certain matrix problems. The benchmark stated as an appendix to the Linpack Users' Guide and has grown since the Linpack User’s Guide was published in 1979.

What is the Linpack Benchmark report?

The Linpack Benchmark report is entitled “Performance of Various Computers Using Standard Linear Equations Software”. The report lists the performance in Mflop/s of a number of computer systems. A copy of the report is available at http://www.netlib.org/benchmark/performance.ps.

What is the reference for the Linpack Benchmark Report?

The Linpack Benchmark report should be referenced in the following way:

“Performance of Various Computers Using Standard Linear Equations Software”, Jack Dongarra, University of Tennessee, Knoxville TN, 37996, Computer Science Technical Report Number CS - 89 – 85, today’s date, url:http://www.netlib.org/benchmark/performance.ps.

Is there a paper which describes the benchmark in some detail and gives a historical perspective?

The paper “The LINPACK Benchmark: Past, Present, and Future” by Jack Dongarra, Piotr Luszczek, and Antoine Petitet provides a look at the details of the benchmark and provides performance data in graphics form for a number of machines on basic operations. A copy of the paper is available at http://www.netlib.org/utk/people/JackDongarra/PAPERS/hpl.pdf.

What is a Mflop/s?

Mflop/s is a rate of execution, millions of floating point operations per second. Whenever this term is used it will refer to 64 bit floating point operations and the operations will be either addition or multiplication. Gflop/s refers to billions of floating point operations per second and Tflop/s refers to trillions of floating point operations per second.

What are the three benchmarks in the Linpack Benchmark report?

The three benchmarks in the Linpack Benchmark report are for Linpack Fortran n = 100 benchmark (see Table 1 for the report), Linpack n = 1000 benchmark (see Table 1 of the report), and Linpack’s Highly Parallel Computing benchmark (see Table 3 of the report).

What is the Linpack Fortran n = 100 benchmark?

The first benchmark is for a matrix of order 100 using the Linpack software in Fortran. The results can be found in Table 1 of the benchmark report. In order to run this benchmark download the file from http://www.netlib.org/benchmark/Linpackd, this is a Fortran program. In order to run the program you will need to supply a timing function called SECOND which should report the CPU time that has elapsed. The ground rules for running this benchmark are that you can make no changes to the Fortran code, not even to the comments. Only compiler optimization can be used to enhance performance.

What exactly does the Linpack Fortran n=100 benchmark time?

The Linpack benchmark measures the performance of two routines from the Linpack collection of software. These routines are DGEFA and DGESL (these are double-precision versions; SGEFA and SGESL are their single-precision counterparts). DGEFA performs the LU decomposition with partial pivoting, and DGESL uses that decomposition to solve the given system of linear equations.

Most of the time is spent in DGEFA. Once the matrix has been decomposed, DGESL is used to find the solution; this process requires O(n²) floating-point operations, as opposed to the O(n³) floating-point operations of DGEFA. The results for this benchmark can be found in Table 1 second column under “LINPACK Benchmark n = 100” of the Linpack Benchmark Report.

What is the Linpack n = 1000 benchmark (TPP, Best Effort)?

The second benchmark is for a matrix of size 1000 and can be found in Table 1 of the benchmark report. In order to run this benchmark download the file from http://www.netlib.org/benchmark/1000d, this is a Fortran driver. The ground rules for running this benchmark are a bit more relaxed in that you can specify any linear equation solve you wish, implemented in any language. A requirement is that your method must compute a solution and the solution must return a result to the prescribed accuracy. TPP stands for Toward Peak Performance; this is the title of the column in the benchmark report that lists the results.

What is the Linpack’s “Highly Parallel Computing” benchmark?

The third benchmark is called the Highly Parallel Computing Benchmark and can be found in Table 3 of the Benchmark Report. (This is the benchmark use for the Top500 report). This benchmark attempts to measure the best performance of a machine in solving a system of equations. The problem size and software can be chosen to produce the best performance.

http://www.netlib.org/benchmark/hpl/

What are the ground rules for the first benchmark?

The “ground rules” for running the first benchmark in the report, n=100 case, are that the program is run as is with no changes to the source code, not even changes to the comments are allowed. The compiler through compiler switches can perform optimization at compile time. The user must supply a timing function called SECOND. SECOND returns the running CPU time for the process. The matrix generated by the benchmark program must be used to run this case.

What are the ground rules for the second benchmark?

The “ground rules” for running the second benchmark in the report, n=1000 case, allows for a complete user replacement of the LU factorization and solver steps. The calling sequence should be the same as the original routines. The problem size should be of order 1000. The accuracy of the solution must satisfy the following bound:

(On IEEE machines this is 2^{-53 )} and n is the size of the problem. The matrix used must be the same matrix used in the driver program available from netlib.

What are the ground rules for the third benchmark?

The “ground rules” for running the third benchmark in the report, Highly Parallel case, allows for a complete user replacement of the LU factorization and solver steps. The accuracy of the solution must satisfy the following bound:

(On IEEE machines this is 2^{-53 )} and n is the size of the problem. The matrix used must be the same matrix used in the driver program available from netlib. There is no restriction on the problem size.

To what accuracy must be the solution conform?

The solution to all three benchmarks must satisfy the following mathematical formula:

(On IEEE machines this is 2^{-53 )} and n is the size of the problem. This implies the computation must be done in 64 bit floating point arithmetic.

What numerical precision is required to run and benchmark and gain an entry in the Linpack Benchmark report?

In order to have an entry included in the Linpack Benchmark report the results must be computed using full precision. By full precision we generally mean 64 bit floating point arithmetic or higher. Note that this is not an issue of single or double precision as some systems have 64-bit floating point arithmetic as single precision. It is a function of the arithmetic used.

Can I get a more personalized list of machine and performance results?

You can get a more personalized listing of machines by using the interface at http://performance.netlib.org/performance/html/PDSbrowse.html

How can I get the Linpack Benchmark program?

You can download the programs used to generate the Linpack benchmark results by using the URL is http://www.netlib.org/benchmark/linpackd. This is a Fortran program. There is a C version of the benchmark located at: http://www.netlib.org/benchmark/linpackc. There is a Java version of the benchmark that can be downloaded as an applet at:

There is a Java program at:

http://www.netlib.org/benchmark/linpackjava/

Is there a Java version of the Linpack Benchmark?

There is a Java version of the benchmark that can be downloaded as an applet at:

There is a Java program at: http://www.netlib.org/benchmark/linpackjava/

What do I do to run the Linpack Benchmark Program?

For the 100x100 based Fortran version, you need to supply a timing function called SECOND. SECOND is an elapse timer function that will be called from Fortran and is expected to return the running CPU time in seconds. In the program two called to SECOND are made and the difference taken to gather the time.

How does the Linpack Benchmark performance relate to my application?

The performance of the Linpack benchmark is typical for applications where the basic operation is based on vector primitives such as added a scalar multiple of a vector to another vector. Many applications exhibit the same performance as the Linpack Benchmark. However, results should not be taken too seriously. In order to measure the performance of any computer it’s critical to probe for the performance of your applications. The Linpack Benchmark can only give one point of reference. In addition, in multiprogramming environments it is often difficult to reliably measure the execution time of a single program. We trust that anyone actually evaluating machines and operating systems will gather more reliable and more representative data.

Are there errors in the Linpack Benchmark report?

While we make every attempt to verify the results obtained from users and vendors, errors are bound to exist and should be brought to our attention. We encourage users to obtain the programs and run the routines on their machines, reporting any discrepancies with the numbers listed here.

What is Linpack?

The Linpack package is a collection of Fortran subroutines for solving various systems of linear equations. (http://www.netlib.org/Linpack/) The software in Linpack is based on a decompositional approach to numerical linear algebra. The general idea is the following. Given a problem involving a matrix, one factors or decomposes the matrix into a product of simple, well-structured matrices which can be easily manipulated to solve the original problem. The package has the capability of handling many different matrix types and different data types, and provides a range of options. Linpack itself is built on another package called the BLAS. Linpack was designed in the late 70's and has been superseded by a package called LAPACK.

How can I get the complete Linpack software collection?

The Linpack software library is available from netlib. See http://www.netlib.org/Linpack/

What are the BLAS?

The BLAS (Basic Linear Algebra Subprograms) are high quality "building block" routines for performing basic vector and matrix operations. Level 1 BLAS do vector-vector operations, Level 2 BLAS do matrix-vector operations, and Level 3 BLAS do matrix-matrix operations. Because the BLAS are efficient, portable, and widely available, they're commonly used in the development of high quality linear algebra software, LINPACK and LAPACK for example. For additional information see: http://www.netlib.org/blas/

Where can I get an optimized version of the BLAS?

The ATLAS (Automatically Tuned Linear Algebra Software) project is an ongoing research effort focusing on applying empirical techniques in order to provide portable performance for the BLAS routines. At present, it provides C and Fortran77 interfaces to a portably efficient BLAS implementation, as well as a few routines from LAPACK. For additional information see: http://www.netlib.org/atlas/

Is Linpack the most efficient way to solve systems of equations?

Linpack is not the most efficient software for solving matrix problems. This is mainly due to the way the algorithm and resulting software accesses memory. The memory access patterns of the algorithm has disregard for the multi-layered memory hierarchies of RISC architecture and vector computers, thereby spending too much time moving data instead of doing useful floating-point operations. LAPACK addresses this problem by reorganizing the algorithms to use block matrix operations, such as matrix multiplication in the innermost loops. For each computer architecture block operations can be optimized to account for memory hierarchies, providing a transportable way to achieve high efficiency on diverse modern machines. We use the term “Transportable” instead of “portable” because, for fastest possible performance, LAPACK requires that highly optimized block matrix operations be already implemented on each machine. These operations are performed by the Level 3 BLAS in most cases.

What is LAPACK?

LAPACK is a software collection to solve various matrix problem in linear algebra. In particular, systems of linear equations, least squares problems, eigenvalue problems, and singular value decomposition. The software is based on the use of block partitioned matrix techniques that aid in achieving high performance on RISC based systems, vector computers, and shared memory parallel processors.

How can I get the whole LAPACK software collection?

LAPACK can be obtained from netlib, see (http://www.netlib.org/lapack/)

What is the history behind the Linpack Benchmark?

The Linpack Benchmark is, in some sense, an accident. It was originally designed to assist users of the Linpack package by providing information on execution times required to solve a system of linear equations. The first ``Linpack Benchmark'' report appeared as an appendix in the Linpack Users' Guide in 1979. The appendix comprised data for one commonly used path in Linpack for a matrix problem of size 100, on a collection of widely used computers (23 in all), so users could estimate the time required to solve their matrix problem.

Over the years other data was added, more as a hobby than anything else, and today the collection includes hundreds of different computer systems.

How can I add my computer's result to the table?

You can contact Jack Dongarra and send him the output from the benchmark program. When sending results please include the specific information on the computer on which the test was run, the compiler, the optimization that was used, and the site it was run on. You can contact Dongarra by sending email to dongarra@cs.utk.edu.

What is the SECOND function?

In order to run the benchmark program you will have to supply a function to gather the execution time on your computer. The execution time is requested by a call to the Fortran function SECOND. It is expected that the routine returns the accumulated execution time of your program. Two called to SECOND are made and the difference taken to compute the execution time.

How can I measure the execution time more accurately and reliably?

The Performance API (PAPI) project specifies a standard application programming interface (API) for accessing hardware performance counters available on most modern microprocessors. These counters exist as a small set of registers that count Events, occurrences of specific signals related to the processor's function. Monitoring these events facilitates correlation between the structure of source/object code and the efficiency of the mapping of that code to the underlying architecture.

For addition information see: http://icl.cs.utk.edu/projects/papi/

Should I run the single and double precision of the benchmarks?

The results reported in the benchmark report reflect performance for 64 bit floating point arithmetic. On some machines this may be DOUBLE PERCISION, such as computers that have IEEE floating point arithmetic and on other computers this may be single precision, (declared REAL in Fortran), such as Cray’s vector computers.

How can I interpret the results from the benchmark?

When and how often are the results updated in the benchmark report?

The benchmark report is updated continuously as new results arrive. They are posted to the web as they are updated.

What matrix is used to run the benchmark?

The matrices are generated using a pseudo-random number generator. The matrices are designed to force partial pivoting to be performed in Gaussian Elimination.

What is the Top500?

The Top500 list the 500 fastest computer system being used today. In 1993 the collection was started and has been updated every 6 months since then. The report lists the sites that have the 500 most powerful computer systems installed. The best Linpack benchmark performance achieved is used as a performance measure in ranking the computers. The TOP500 list has been updated twice a year since June 1993.

Where can I get a copy of the Top500 report?

The Top500 reports are maintained at http://www.top500.org/.

Where can I get the software to generate performance results for the Top500?

There is software available that has been optimized and many people use to generate the Top500 performance results. This benchmark attempts to measure the best performance of a machine in solving a system of equations. The problem size and software can be chosen to produce the best performance. A copy of that software can be downloaded from:

http://www.netlib.org/benchmark/hpl/

In order to run this you will need MPI and an optimized version of the BLAS. For MPI you can see: http://www-unix.mcs.anl.gov/mpi/mpich/download.html and for the BLAS see: http://www.netlib.org/atlas/ .

What about a list of clusters?

We are starting a new list on Clusters for more information see http://clusters.top500.org/.

How can I interpret the results from the Linpack 100x100 benchmark?

When the Linpack Fortran n = 100 benchmark is run it produces the following kind of results:

Please send the results of this run to:

Jack J. Dongarra

Computer Science Department

University of Tennessee

Knoxville, Tennessee 37996-1300

Fax: 865-974-8296

Internet: dongarra@cs.utk.edu

norm. resid resid machep x(1) x(n)

1.67005097E+00 7.41628980E-14 2.22044605E-16 1.00000000E+00 1.00000000E+00

times are reported for matrices of order 100

dgefa dgesl total mflops unit ratio

times for array with leading dimension of 201

1.540E-03 6.888E-05 1.609E-03 4.268E+02 4.686E-03 2.873E-02

1.509E-03 7.084E-05 1.579E-03 4.348E+02 4.600E-03 2.820E-02

1.509E-03 7.003E-05 1.579E-03 4.348E+02 4.600E-03 2.820E-02

1.502E-03 6.593E-05 1.568E-03 4.380E+02 4.567E-03 2.800E-02

times for array with leading dimension of 200

1.431E-03 6.716E-05 1.498E-03 4.584E+02 4.363E-03 2.675E-02

1.424E-03 6.694E-05 1.491E-03 4.605E+02 4.343E-03 2.663E-02

1.431E-03 6.699E-05 1.498E-03 4.583E+02 4.364E-03 2.676E-02

1.432E-03 6.439E-05 1.497E-03 4.588E+02 4.360E-03 2.673E-02

The norm. resid is a measure of the accuracy of the computation. The value should be O(1). If the value is much greater than O(100) it suggest that the results are not correct.

The resid is the unnormalized quantity.

The term machep measure the precision used to carry out the computation. On an IEEE floating point computer the value should be 2.22044605e-16.

The values of x(1) and x(n) are the first and last component of the solution. The problem is constructed so that the values of solution should be all ones.

There are two sets of timings performed both on matrices of size 100. The first one is where the 2-dimensional array that contained the matrix has a leading dimension of 201, and a second set where the leading dimension 200. This is done to see what effect, if any, the placement of the arrays in memory has on the performance.

Times for dgefa and dgesl are reported. dgefa factors the matrix using Gaussian elimination with partial pivoting and dgesl solves a system based on the factoriuzation. dgefa requires 2/3 n³ operations and dgesl requires n² operations. The value of total is the sum of the times and mflops is the execution rate, or millions of floating point operations per second. Here a floating point operations is taken to be floating point additions and multiplications. Unit and ratio are obsolete and should be ignored.

If the time reported is negative or zero then the clock resolution is not accurate enough for the granularity of the work. In this case a different timing routine should be used that has better resolution.

Do you have an archive of previous Linpack Benchmark reports or results?

No archive is maintained of previous results. However here is some information to provide a historical perspective. The numbers in the following tables have been extracted from old Linpack Benchmark Reports. It took a bit of ``file archaeology'' to put the list together since I don't have the complete set of reports.

Top Computers Over Time for the Linpack n=100 Benchmark

(Entries for this table began in 1979.)

Year	Computer	Number of Processors	Cycle time in nsecs	Mflop/s
2001	Fujitsu VPP5000/1	1	3.33	1156
2000	Fujitsu VPP5000/1	1	3.33	1156
1999	CRAY T916	4	2.2	1129
1995	CRAY T916	1	2.2	522
1994	CRAY C90	16	4.2	479
1993	CRAY C90	16	4.2	479
1992	CRAY C90	16	4.2	479
1991	CRAY C90	16	4.2	403
1990	CRAY Y-MP	8	6.0	275
1989	CRAY Y-MP	8	6.0	275
1988	CRAY Y-MP	1	6.0	74
1987	ETA 10-E	1	10.5	52
1986	NEC SX-2	1	6.0	46
1985	NEC SX-2	1	6.0	46
1984	CRAY X-MP	1	9.5	21
1983	CRAY 1	1	12.5	12
...
1979	CRAY 1	1	12.5	3.4

These numbers come from the Linpack Benchmark Report Table 1.

=====================================================================

Top Computers Over Time for the Linpack n=1000 Benchmark

(Entries for this table began in 1986.)

Year	Computer	Number of Processors	Cycle time in nsec.	Measured Mflop/s	Peak Mflop/s
2000	NEC SX-5/16	16	4.0	45030	64000
1995	CRAY T916	16	2.2	1940	28800
1994	Hitachi S-3800/480	4	2	16170	32000
1993	NEC SX-3/44R	4	2.5	15120	25600
1992	NEC SX-3/44	4	2.9	13420	22000
1991	Fujitsu VP2600/10	1	3.2	4009	5000
1990	Fujitsu VP2600/10	1	3.2	2919	5000
1989	CRAY Y-MP/832	8	6	2144	2667
1988	CRAY Y-MP/832	8	6	2144	2667
1987	NEC SX-2	1	6	885	1300
1986	CRAY X-MP-4	4	9.5	713	840

These numbers come from the Linpack Benchmark Report Table 1.

(Full precision; matrix size 1000; best effort programming, maximum optimization permitted.)

Top Computers Over Time for the Highly-Parallel Linpack Benchmark

(Entries for this table began in 1991.)

Year	Computer	Number of Processors	Measured Gflop/s	Size of Problem	Size of 1/2 Perf	Theoretical Peak Gflop/s
2001	ASCI White-Pacific, IBM SP Power 3	7424	7226	518096	179000	11136
2000	ASCI White-Pacific, IBM SP Power 3	7424	4938	430000		11136
1999	ASCI Red Intel Pentium II Xeon core	9632	2379	362880	75400	3207
1998	ASCI Blue-Pacific SST, IBM SP 604E	5808	2144	431344		3868
1997	Intel ASCI Option Red (200 MHz Pentium Pro)	9152	1338	235000	63000	1830
1996	Hitachi CP-PACS	2048	368.2	103680	30720	614
1995	Intel Paragon XP/S MP	6768	281.1	128600	25700	338
1994	Intel Paragon XP/S MP	6768	281.1	128600	25700	338
1993	Fujitsu NWT	140	124.5	31920	11950	236
1992	NEC SX-3/44	4	20.0	6144	832	22
1991	Fujitsu VP2600/10	1	4.0	1000	200	5

These numbers come from the Linpack Benchmark Report Table 3.

(Full precision; the manufacture is allowed to solve as large a problem as desired, maximum optimization permitted.)

Measured Gflop/s is the measured peak rate of execution for running the benchmark in billions of floating point operations per second.

Size of Problem is the matrix size at which the measured performance was observed.

Size of ½ Perf is the size of problem needed to achieve ½ the measured peak performance.

Theoretical Peak Gflop/s is the theoretical peak performance for the computer.