benchmarks/results/benchmarks.html

eaa51f12SBarry Smith<HTML>
eaa51f12SBarry Smith<HEAD>
*ae2c315eSBarry Smith<BASE HREF="http://www.mcs.anl.gov/petsc/benchmarks.html">
*ae2c315eSBarry Smith<TITLE>PETSc Benchmarks</TITLE>
eaa51f12SBarry Smith</HEAD>
eaa51f12SBarry Smith<BODY BGCOLOR="#ffffff" LINK="#0000ff" VLINK="#ff0000" ALINK="#ff0000" TEXT="#000000">
eaa51f12SBarry Smith
*ae2c315eSBarry Smith<H1 align=center>Sample PETSc Floating Point Performance</H1>
eaa51f12SBarry Smith<P>
eaa51f12SBarry Smith<H3>
eaa51f12SBarry Smith<MENU>
*ae2c315eSBarry Smith<LI> <a href="petsc.html#singleprocessor">Single Processor Floating Point Performance</a>
*ae2c315eSBarry Smith<LI> <a href="petsc.html#multiprocessor">Parallel Performance for Euler Solver</a>
*ae2c315eSBarry Smith<LI> <a href="petsc.html#laplacian">Scalability for Laplacian</a>
eaa51f12SBarry Smith</MENU>
eaa51f12SBarry Smith</H3>
*ae2c315eSBarry Smith<P>
*ae2c315eSBarry SmithWe provide these floating point performance numbers as a guide to users to indicate
*ae2c315eSBarry Smiththe type of floating point rates they should expect while using PETSc. We have done
*ae2c315eSBarry Smithour best to provide fair and accurate values but do not guarantee
*ae2c315eSBarry Smithany of the numbers presented here.
*ae2c315eSBarry Smith<P>
*ae2c315eSBarry SmithSee the "Profiling" chapter of <a href="http://www.mcs.anl.gov/petsc/manual.html#Node100">
*ae2c315eSBarry Smiththe PETSc users manual</a> for instructions on techniques to obtain accurate performance
*ae2c315eSBarry Smithnumbers with PETSc
*ae2c315eSBarry Smith
eaa51f12SBarry Smith<P><HR><P>
eaa51f12SBarry Smith
*ae2c315eSBarry Smith<A NAME="singleprocessor"> <H1 align=center>Single Processor Performance</H1></A>
eaa51f12SBarry Smith
*ae2c315eSBarry SmithIn many PDE application codes one most solve systems of linear equations
*ae2c315eSBarry Smitharising from the descretization of multicomponent PDEs, the sparse matrices computed
*ae2c315eSBarry Smithnaturally have a block structure.
*ae2c315eSBarry Smith<P>
*ae2c315eSBarry SmithPETSc has special sparse matrix storage formats and routines to take advantage of
*ae2c315eSBarry Smiththat block structure to deliver much higher (two or three times as high) floating
*ae2c315eSBarry Smithpoint computation rates. Below we give the
*ae2c315eSBarry Smithfloating point rates for the matrix-vector product for a 1503 by 1503 sparse matrix with a block
*ae2c315eSBarry Smithsize of three arising from a simple oil reservoir simulation.
eaa51f12SBarry Smith
eaa51f12SBarry Smith<p>
*ae2c315eSBarry Smith<A HREF="ftp://info.mcs.anl.gov/pub/petsc/matmultbench.ps">Embed here</A>
eaa51f12SBarry Smith<p>
eaa51f12SBarry Smith
*ae2c315eSBarry SmithThe next table depicts performance for the entire linear solve using GMRES(30) and
*ae2c315eSBarry SmithILU(0) preconditioning.
eaa51f12SBarry Smith
eaa51f12SBarry Smith<P>
*ae2c315eSBarry Smith<A HREF="ftp://info.mcs.anl.gov/pub/petsc/solvebench.ps">Embed here</A>
*ae2c315eSBarry Smith<P>
eaa51f12SBarry Smith
*ae2c315eSBarry SmithThese tests were run using
*ae2c315eSBarry Smiththe code src/sles/examples/tutorials/ex10.c with the options
eaa51f12SBarry Smith<p>
*ae2c315eSBarry Smith<tt>
*ae2c315eSBarry Smithmpirun -np 1 ex10 -f0 arco1 -f1 arco1 -pc_type ilu -ksp_gmres_unmodifiedgramschmidt -optionsleft -mat_baij -matload_block_size 3 -log_summary
*ae2c315eSBarry Smith</tt>
eaa51f12SBarry Smith
eaa51f12SBarry Smith<P><HR><P>
eaa51f12SBarry Smith
*ae2c315eSBarry Smith<A NAME="multiprocessor"> <H1 align=center>Parallel Performance for Euler Solver</H1></A>
*ae2c315eSBarry Smith
*ae2c315eSBarry Smith<A NAME="laplacian"> <H1 align=center>Scalability for Laplacian</H1></A>
*ae2c315eSBarry SmithA typical "model" problem people work with in numerical analysis for PDEs is the
*ae2c315eSBarry SmithLaplacian. Discretization of the Laplacian in two dimensions with finite differences
*ae2c315eSBarry Smithis typically done using the "five point" stencil. This results in a very sparse
*ae2c315eSBarry Smith(at most five nonzeros per row), ill-conditioned matrix.
*ae2c315eSBarry Smith
*ae2c315eSBarry Smith<P>
*ae2c315eSBarry SmithBecause the matrix is so sparse and has no block structure it is difficult to get
*ae2c315eSBarry Smithvery good sequential or parallel floating point performance, especially for small
*ae2c315eSBarry Smithproblems. Here we demonstrate scalability of the parallel PETSc matrix vector product
*ae2c315eSBarry Smithfor the five point stencil on two grids. These were run on three machines:
*ae2c315eSBarry Smithan IBM SP2 with the Power2Super chip and two memory cards at ANL, the Cray T3E at NERSC and
*ae2c315eSBarry Smiththe Origin2000 at NCSA.
*ae2c315eSBarry Smith
*ae2c315eSBarry Smith<P>
*ae2c315eSBarry SmithSince PETSc is intended for much more general problems then the Laplacian we don't consider
*ae2c315eSBarry Smiththe Laplacian to be a particularlly important benchmark; we include it due to interest
*ae2c315eSBarry Smithfrom the community.
eaa51f12SBarry Smith
eaa51f12SBarry Smith<P><HR><P>
eaa51f12SBarry Smith
*ae2c315eSBarry Smith<H2 align=center>100 by 100 Grid: Absolute Time and Speed-Up</H1>
eaa51f12SBarry Smith
*ae2c315eSBarry Smith100by100 grid
eaa51f12SBarry Smith<P>
*ae2c315eSBarry SmithNotes: The problem here is simply to small to parallelize on a distributed memory
*ae2c315eSBarry Smithcomputer.
eaa51f12SBarry Smith<P>
eaa51f12SBarry Smith
*ae2c315eSBarry Smith<H2 align=center>1000 by 1000 Grid: Absolute Time and Speed-Up</H1>
*ae2c315eSBarry Smith
*ae2c315eSBarry Smith1000by1000 grid
eaa51f12SBarry Smith<P>
eaa51f12SBarry Smith
eaa51f12SBarry Smith
eaa51f12SBarry Smith
eaa51f12SBarry Smith</BODY>
eaa51f12SBarry Smith</HTML>