1eaa51f12SBarry Smith<HTML> 2eaa51f12SBarry Smith<HEAD> 3*ae2c315eSBarry Smith<BASE HREF="http://www.mcs.anl.gov/petsc/benchmarks.html"> 4*ae2c315eSBarry Smith<TITLE>PETSc Benchmarks</TITLE> 5eaa51f12SBarry Smith</HEAD> 6eaa51f12SBarry Smith<BODY BGCOLOR="#ffffff" LINK="#0000ff" VLINK="#ff0000" ALINK="#ff0000" TEXT="#000000"> 7eaa51f12SBarry Smith 8*ae2c315eSBarry Smith<H1 align=center>Sample PETSc Floating Point Performance</H1> 9eaa51f12SBarry Smith<P> 10eaa51f12SBarry Smith<H3> 11eaa51f12SBarry Smith<MENU> 12*ae2c315eSBarry Smith<LI> <a href="petsc.html#singleprocessor">Single Processor Floating Point Performance</a> 13*ae2c315eSBarry Smith<LI> <a href="petsc.html#multiprocessor">Parallel Performance for Euler Solver</a> 14*ae2c315eSBarry Smith<LI> <a href="petsc.html#laplacian">Scalability for Laplacian</a> 15eaa51f12SBarry Smith</MENU> 16eaa51f12SBarry Smith</H3> 17*ae2c315eSBarry Smith<P> 18*ae2c315eSBarry SmithWe provide these floating point performance numbers as a guide to users to indicate 19*ae2c315eSBarry Smiththe type of floating point rates they should expect while using PETSc. We have done 20*ae2c315eSBarry Smithour best to provide fair and accurate values but do not guarantee 21*ae2c315eSBarry Smithany of the numbers presented here. 22*ae2c315eSBarry Smith<P> 23*ae2c315eSBarry SmithSee the "Profiling" chapter of <a href="http://www.mcs.anl.gov/petsc/manual.html#Node100"> 24*ae2c315eSBarry Smiththe PETSc users manual</a> for instructions on techniques to obtain accurate performance 25*ae2c315eSBarry Smithnumbers with PETSc 26*ae2c315eSBarry Smith 27eaa51f12SBarry Smith<P><HR><P> 28eaa51f12SBarry Smith 29*ae2c315eSBarry Smith<A NAME="singleprocessor"> <H1 align=center>Single Processor Performance</H1></A> 30eaa51f12SBarry Smith 31*ae2c315eSBarry SmithIn many PDE application codes one most solve systems of linear equations 32*ae2c315eSBarry Smitharising from the descretization of multicomponent PDEs, the sparse matrices computed 33*ae2c315eSBarry Smithnaturally have a block structure. 34*ae2c315eSBarry Smith<P> 35*ae2c315eSBarry SmithPETSc has special sparse matrix storage formats and routines to take advantage of 36*ae2c315eSBarry Smiththat block structure to deliver much higher (two or three times as high) floating 37*ae2c315eSBarry Smithpoint computation rates. Below we give the 38*ae2c315eSBarry Smithfloating point rates for the matrix-vector product for a 1503 by 1503 sparse matrix with a block 39*ae2c315eSBarry Smithsize of three arising from a simple oil reservoir simulation. 40eaa51f12SBarry Smith 41eaa51f12SBarry Smith<p> 42*ae2c315eSBarry Smith<A HREF="ftp://info.mcs.anl.gov/pub/petsc/matmultbench.ps">Embed here</A> 43eaa51f12SBarry Smith<p> 44eaa51f12SBarry Smith 45*ae2c315eSBarry SmithThe next table depicts performance for the entire linear solve using GMRES(30) and 46*ae2c315eSBarry SmithILU(0) preconditioning. 47eaa51f12SBarry Smith 48eaa51f12SBarry Smith<P> 49*ae2c315eSBarry Smith<A HREF="ftp://info.mcs.anl.gov/pub/petsc/solvebench.ps">Embed here</A> 50*ae2c315eSBarry Smith<P> 51eaa51f12SBarry Smith 52*ae2c315eSBarry SmithThese tests were run using 53*ae2c315eSBarry Smiththe code src/sles/examples/tutorials/ex10.c with the options 54eaa51f12SBarry Smith<p> 55*ae2c315eSBarry Smith<tt> 56*ae2c315eSBarry Smithmpirun -np 1 ex10 -f0 arco1 -f1 arco1 -pc_type ilu -ksp_gmres_unmodifiedgramschmidt -optionsleft -mat_baij -matload_block_size 3 -log_summary 57*ae2c315eSBarry Smith</tt> 58eaa51f12SBarry Smith 59eaa51f12SBarry Smith<P><HR><P> 60eaa51f12SBarry Smith 61*ae2c315eSBarry Smith<A NAME="multiprocessor"> <H1 align=center>Parallel Performance for Euler Solver</H1></A> 62*ae2c315eSBarry Smith 63*ae2c315eSBarry Smith<A NAME="laplacian"> <H1 align=center>Scalability for Laplacian</H1></A> 64*ae2c315eSBarry SmithA typical "model" problem people work with in numerical analysis for PDEs is the 65*ae2c315eSBarry SmithLaplacian. Discretization of the Laplacian in two dimensions with finite differences 66*ae2c315eSBarry Smithis typically done using the "five point" stencil. This results in a very sparse 67*ae2c315eSBarry Smith(at most five nonzeros per row), ill-conditioned matrix. 68*ae2c315eSBarry Smith 69*ae2c315eSBarry Smith<P> 70*ae2c315eSBarry SmithBecause the matrix is so sparse and has no block structure it is difficult to get 71*ae2c315eSBarry Smithvery good sequential or parallel floating point performance, especially for small 72*ae2c315eSBarry Smithproblems. Here we demonstrate scalability of the parallel PETSc matrix vector product 73*ae2c315eSBarry Smithfor the five point stencil on two grids. These were run on three machines: 74*ae2c315eSBarry Smithan IBM SP2 with the Power2Super chip and two memory cards at ANL, the Cray T3E at NERSC and 75*ae2c315eSBarry Smiththe Origin2000 at NCSA. 76*ae2c315eSBarry Smith 77*ae2c315eSBarry Smith<P> 78*ae2c315eSBarry SmithSince PETSc is intended for much more general problems then the Laplacian we don't consider 79*ae2c315eSBarry Smiththe Laplacian to be a particularlly important benchmark; we include it due to interest 80*ae2c315eSBarry Smithfrom the community. 81eaa51f12SBarry Smith 82eaa51f12SBarry Smith<P><HR><P> 83eaa51f12SBarry Smith 84*ae2c315eSBarry Smith<H2 align=center>100 by 100 Grid: Absolute Time and Speed-Up</H1> 85eaa51f12SBarry Smith 86*ae2c315eSBarry Smith100by100 grid 87eaa51f12SBarry Smith<P> 88*ae2c315eSBarry SmithNotes: The problem here is simply to small to parallelize on a distributed memory 89*ae2c315eSBarry Smithcomputer. 90eaa51f12SBarry Smith<P> 91eaa51f12SBarry Smith 92*ae2c315eSBarry Smith<H2 align=center>1000 by 1000 Grid: Absolute Time and Speed-Up</H1> 93*ae2c315eSBarry Smith 94*ae2c315eSBarry Smith1000by1000 grid 95eaa51f12SBarry Smith<P> 96eaa51f12SBarry Smith 97eaa51f12SBarry Smith 98eaa51f12SBarry Smith 99eaa51f12SBarry Smith</BODY> 100eaa51f12SBarry Smith</HTML>