/[MITgcm]/MITgcm_contrib/cg2d_bench/README
ViewVC logotype

Annotation of /MITgcm_contrib/cg2d_bench/README

Parent Directory Parent Directory | Revision Log Revision Log | View Revision Graph Revision Graph


Revision 1.1 - (hide annotations) (download)
Fri May 12 22:28:30 2006 UTC (19 years, 2 months ago) by ce107
Branch: MAIN
Instructions for using and running the benchmark

1 ce107 1.1 # $Id$
2     Benchmarking routine of the CG2D solver in MITgcm (barotropic solve)
3    
4     To build:
5    
6     a) Parameterization of SIZE.h:
7     sNx = size of tile in x-direction (ideally fits in cache, 30-60)
8     sNy = size of tile in y-direction (ideally fits in cache, 30-60)
9     OLx = overlap size in x-direction (1 or 3 usually)
10     OLy = overlap size in y-direction (1 or 3 usually)
11     b) Compilation
12     $CC $CFLAGS -c tim.c
13     $FC $DEFINES $INCLUDES $FCFLAGS -o cg2d *.F tim.o $LIBS -lm
14    
15     $DEFINES:
16     1) For single precision add
17     -DUSE_SINGLE_PRECISION
18     2) For mixed (single for most ops, double for reductions) precision add
19     -DUSE_MIXED_PRECISION to -DUSE_SINGLE_PRECISION
20     3) Parallel (MPI) operation
21     -DALLOW_MPI -DUSE_MPI_INIT -DUSE_MPI_GSUM -DUSE_MPI_EXCH
22     4) Use MPI timing routines
23     -DUSE_MPI_TIME
24     5) Use of MPI_Sendrecv() instead of MPI_Isend/MPI_Irecv()/MPI_Waitall()
25     -DUSE_SNDRCV
26     6) Use of JAM for exchanges (not available without the hardware)
27     -DUSE_JAM_EXCH
28     7) Use of JAM for the global sum (not available without the hardware)
29     -DUSE_JAM_GSUM
30     8) In order to avoid doing the global sum in MPI do not define
31     -DUSE_MPI_GSUM
32     and all processors will see their own residual instead (dangerous)
33     9) In order to avoid doing the exchanges in MPI do not define
34     -DUSE_MPI_EXCH
35     and all processors avoid exchanging shadow regions (dangerous)
36     10) Performance counters
37     -DUSE_PAPI_FLOPS To use PAPI to produce Mflop/s
38     or
39     -DUSE_PAPI_FLIPS To use PAPI to produce Mflip/s
40     To produce this information for every iteration instead of each "timestep"
41     add a -DPAPI_PER_ITERATION to the above
42     11) Extra (nearest neighbor) exchange steps to stress comms
43     -DTEN_EXTRA_EXCHS
44     12) Extra (global) sum steps to stress comms
45     -DHUNDRED_EXTRA_SUMS
46     13) 2D (PxQ) vs 1D decomposition
47     -DDECOMP2D
48     14) To output the residual every iteration:
49     -DRESIDUAL_PER_ITERATION
50    
51     $INCLUDES (if using PAPI)
52     -I$PAPI_ROOT/include
53    
54     $LIBS (if using PAPI - depending on the platform extra libs may be needed)
55     -L$PAPI_ROOT/lib -lpapi
56    
57     c) Running
58    
59     1) Allowing the system to choose the PxQ decomposition if setup for it:
60     mpiexec -n $NPROCS ./cg2d
61     2) Create a decomp.touse with the P & Q dimensions declared in the first
62     two lines as two integers, eg.
63    
64     cat > decomp.touse << EOF
65     10
66     20
67     EOF
68    
69     mpiexec -n 200 ./cg2d
70    

  ViewVC Help
Powered by ViewVC 1.1.22