xref: /petsc/src/benchmarks/results/benchmarks.html (revision ae2c315e681879b58a7fef9cc24ea114946cca4c)
1eaa51f12SBarry Smith<HTML>
2eaa51f12SBarry Smith<HEAD>
3*ae2c315eSBarry Smith<BASE HREF="http://www.mcs.anl.gov/petsc/benchmarks.html">
4*ae2c315eSBarry Smith<TITLE>PETSc Benchmarks</TITLE>
5eaa51f12SBarry Smith</HEAD>
6eaa51f12SBarry Smith<BODY BGCOLOR="#ffffff" LINK="#0000ff" VLINK="#ff0000" ALINK="#ff0000" TEXT="#000000">
7eaa51f12SBarry Smith
8*ae2c315eSBarry Smith<H1 align=center>Sample PETSc Floating Point Performance</H1>
9eaa51f12SBarry Smith<P>
10eaa51f12SBarry Smith<H3>
11eaa51f12SBarry Smith<MENU>
12*ae2c315eSBarry Smith<LI> <a href="petsc.html#singleprocessor">Single Processor Floating Point Performance</a>
13*ae2c315eSBarry Smith<LI> <a href="petsc.html#multiprocessor">Parallel Performance for Euler Solver</a>
14*ae2c315eSBarry Smith<LI> <a href="petsc.html#laplacian">Scalability for Laplacian</a>
15eaa51f12SBarry Smith</MENU>
16eaa51f12SBarry Smith</H3>
17*ae2c315eSBarry Smith<P>
18*ae2c315eSBarry SmithWe provide these floating point performance numbers as a guide to users to indicate
19*ae2c315eSBarry Smiththe type of floating point rates they should expect while using PETSc. We have done
20*ae2c315eSBarry Smithour best to provide fair and accurate values but do not guarantee
21*ae2c315eSBarry Smithany of the numbers presented here.
22*ae2c315eSBarry Smith<P>
23*ae2c315eSBarry SmithSee the "Profiling" chapter of <a href="http://www.mcs.anl.gov/petsc/manual.html#Node100">
24*ae2c315eSBarry Smiththe PETSc users manual</a> for instructions on techniques to obtain accurate performance
25*ae2c315eSBarry Smithnumbers with PETSc
26*ae2c315eSBarry Smith
27eaa51f12SBarry Smith<P><HR><P>
28eaa51f12SBarry Smith
29*ae2c315eSBarry Smith<A NAME="singleprocessor"> <H1 align=center>Single Processor Performance</H1></A>
30eaa51f12SBarry Smith
31*ae2c315eSBarry SmithIn many PDE application codes one most solve systems of linear equations
32*ae2c315eSBarry Smitharising from the descretization of multicomponent PDEs, the sparse matrices computed
33*ae2c315eSBarry Smithnaturally have a block structure.
34*ae2c315eSBarry Smith<P>
35*ae2c315eSBarry SmithPETSc has special sparse matrix storage formats and routines to take advantage of
36*ae2c315eSBarry Smiththat block structure to deliver much higher (two or three times as high) floating
37*ae2c315eSBarry Smithpoint computation rates. Below we give the
38*ae2c315eSBarry Smithfloating point rates for the matrix-vector product for a 1503 by 1503 sparse matrix with a block
39*ae2c315eSBarry Smithsize of three arising from a simple oil reservoir simulation.
40eaa51f12SBarry Smith
41eaa51f12SBarry Smith<p>
42*ae2c315eSBarry Smith<A HREF="ftp://info.mcs.anl.gov/pub/petsc/matmultbench.ps">Embed here</A>
43eaa51f12SBarry Smith<p>
44eaa51f12SBarry Smith
45*ae2c315eSBarry SmithThe next table depicts performance for the entire linear solve using GMRES(30) and
46*ae2c315eSBarry SmithILU(0) preconditioning.
47eaa51f12SBarry Smith
48eaa51f12SBarry Smith<P>
49*ae2c315eSBarry Smith<A HREF="ftp://info.mcs.anl.gov/pub/petsc/solvebench.ps">Embed here</A>
50*ae2c315eSBarry Smith<P>
51eaa51f12SBarry Smith
52*ae2c315eSBarry SmithThese tests were run using
53*ae2c315eSBarry Smiththe code src/sles/examples/tutorials/ex10.c with the options
54eaa51f12SBarry Smith<p>
55*ae2c315eSBarry Smith<tt>
56*ae2c315eSBarry Smithmpirun -np 1 ex10 -f0 arco1 -f1 arco1 -pc_type ilu -ksp_gmres_unmodifiedgramschmidt -optionsleft -mat_baij -matload_block_size 3 -log_summary
57*ae2c315eSBarry Smith</tt>
58eaa51f12SBarry Smith
59eaa51f12SBarry Smith<P><HR><P>
60eaa51f12SBarry Smith
61*ae2c315eSBarry Smith<A NAME="multiprocessor"> <H1 align=center>Parallel Performance for Euler Solver</H1></A>
62*ae2c315eSBarry Smith
63*ae2c315eSBarry Smith<A NAME="laplacian"> <H1 align=center>Scalability for Laplacian</H1></A>
64*ae2c315eSBarry SmithA typical "model" problem people work with in numerical analysis for PDEs is the
65*ae2c315eSBarry SmithLaplacian. Discretization of the Laplacian in two dimensions with finite differences
66*ae2c315eSBarry Smithis typically done using the "five point" stencil. This results in a very sparse
67*ae2c315eSBarry Smith(at most five nonzeros per row), ill-conditioned matrix.
68*ae2c315eSBarry Smith
69*ae2c315eSBarry Smith<P>
70*ae2c315eSBarry SmithBecause the matrix is so sparse and has no block structure it is difficult to get
71*ae2c315eSBarry Smithvery good sequential or parallel floating point performance, especially for small
72*ae2c315eSBarry Smithproblems. Here we demonstrate scalability of the parallel PETSc matrix vector product
73*ae2c315eSBarry Smithfor the five point stencil on two grids. These were run on three machines:
74*ae2c315eSBarry Smithan IBM SP2 with the Power2Super chip and two memory cards at ANL, the Cray T3E at NERSC and
75*ae2c315eSBarry Smiththe Origin2000 at NCSA.
76*ae2c315eSBarry Smith
77*ae2c315eSBarry Smith<P>
78*ae2c315eSBarry SmithSince PETSc is intended for much more general problems then the Laplacian we don't consider
79*ae2c315eSBarry Smiththe Laplacian to be a particularlly important benchmark; we include it due to interest
80*ae2c315eSBarry Smithfrom the community.
81eaa51f12SBarry Smith
82eaa51f12SBarry Smith<P><HR><P>
83eaa51f12SBarry Smith
84*ae2c315eSBarry Smith<H2 align=center>100 by 100 Grid: Absolute Time and Speed-Up</H1>
85eaa51f12SBarry Smith
86*ae2c315eSBarry Smith100by100 grid
87eaa51f12SBarry Smith<P>
88*ae2c315eSBarry SmithNotes: The problem here is simply to small to parallelize on a distributed memory
89*ae2c315eSBarry Smithcomputer.
90eaa51f12SBarry Smith<P>
91eaa51f12SBarry Smith
92*ae2c315eSBarry Smith<H2 align=center>1000 by 1000 Grid: Absolute Time and Speed-Up</H1>
93*ae2c315eSBarry Smith
94*ae2c315eSBarry Smith1000by1000 grid
95eaa51f12SBarry Smith<P>
96eaa51f12SBarry Smith
97eaa51f12SBarry Smith
98eaa51f12SBarry Smith
99eaa51f12SBarry Smith</BODY>
100eaa51f12SBarry Smith</HTML>