C C Primary author: Hans Johnston (johnston@math.umass.edu) C Department of Mathematics C University of Massachusetts, Amherst C February 2011 C C This program is free software; you can redistribute it and/or modify C it under the terms of the GNU General Public License as published by C the Free Software Foundation; either version 2 of the License, or C (at your option) any later version. C C This program is distributed in the hope that it will be useful, C but WITHOUT ANY WARRANTY; without even the implied warranty of C MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the C GNU General Public License for more details. C C You should have received a copy of the GNU General Public License C along with this program; if not, write to the Free Software Foundation, C Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA C C%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% README file for TREECODE3D_TARG_OPENMP TREECODE3D_TARG_OPENMP is a Fortran90 subroutine for approximating the electrostatic energy potential and force of N mutually interacting charged particles with Cartesian coordinates X_i=(x_i,y_i,z_i) and partial charge q_i (i=1,...,N) at the target points Y_k=(x_k,y_k,z_k) (k=1,...,M). The code is a multi-threaded implementation of TREECODE3D_TARG. In this version each thread builds its own tree and then computes the potential and/or force on a subset of the target particles. For each k the potential at Y_k is given by the scalar N ----- \ q_i PENG(Y_k) = | ----------- (1), / |X_i - Y_k| ----- i=1 where |X_i - Y_k| denotes the Euclidean distance between X_i and Y_k. The electrostatic force on the kth particle is given by the vector F(Y_k)=(f_x,f_y,f_z), and defined by N ----- \ (X_i - Y_k) F(Y_k) = | q_i * ----------- (2), / |X_i - Y_k|^3 ----- i=1 Direct application of either formula (1) or (2) has a computational complexity proportional to O(N^2). TREECODE3D_TARG_OPENMP evaluates the potential energy (1) and electrostatic force (2) using a hierarchical oct-tree algorithm with body-cell forces approximated via Taylor expansions. The Taylor coefficients are computed by a recurrence relation. This approach reduces the computational complexity to O(NlogN). NOTE: Please consider including the following references in any work that utilizes this code: (1) Duan, Z.-H., Krasny, R.: An adaptive treecode for computing nonbonded potential energy in classical molecular systems. J. Comput. Chem. {\bf 22} (2001) 184-195 (2) Lindsay, K., Krasny, R.: A particle method and adaptive treecode for vortex sheet motion in 3-D flow. J. Comput. Phys. {\bf 172} (2001) 879-907 (3) Lindsay, K.: A three-dimensional Cartesian tree-code and applications to vortex sheet roll-up. Ph.D. Thesis, University of Michigan (1997) (4) Li, P., Johnston, H., Krasny, R.: A Cartesian Taylor series treecode for screened Coulomb interactions. J. Comput. Phys. {\bf 228} (2009) 3858-3868 Summary of files in ./example and ./aux : ----------------------------------------- ./example treedriver_targ_openmp.f : Driver program for testing the treecode subroutine treecode3d_targ.f. treecode3d_targ_openmp.f : Treecode subroutine. ifort_compile : sample ifort compile line xyx3d.dat : Unformatted big endian binary file containing the coordinates of 50,000 particles uniformly distributed in [-0.5,0.5]x[-0.5,0.5]x[-0.5,0.5] q3d.dat : Unformatted big endian binary file containing 500,000 partial charges uniformly distributed in [-0.5,0.5]. input_250kpf : Input data file provided for testing the treecode for computing the potential energy AND force. The files xyz3d.dat and q3d.dat provide the particle data. output_250kpf : Program output using input_250kpf. ./aux gendata3d.m : MATLAB m-file script provided to generate randomly distributed coordinate and charge data for an arbitrary number of particles. The user is prompted for N, the number of particles, and the code generates both an xyz3d.dat and q3d.dat file. The supplied files xyz3d.dat and q3d.dat (in ./example) are sufficient to test the code for any N <= 500,000. Compiling and Testing the code ------------------------------ Compiling : The source code is written in Fortran90 with OpenMP directives. The results below were obtained on an 2.8 GHz early 2008 MAC Pro with 16GB of RAM, running OS X 10.6.6. The compiler is Intel Fortran 11.1/089 and the code was compiled using ifort -O3 -xhost -m64 -openmp -assume byterecl treedriver_targ_OpenMP.f treecode3d_targ_OpenMP.f -liomp5 -lpthread NOTE: A compiler that supports OpenMP 3.0 or higher is required for the proper initialization and use of dynamically allocated private variables. Testing : After compiling, a number of environment variables generally need to be set to run an OpenMP code. Foe example, on a MAC ulimit -s hard export OMP_STACKSIZE=25M export OMP_NUM_THREADS=4 The first two commands above affect the program stack, as well as the size of each thread's private stack. The program reads the number of threads from the environment variable OMP_NUM_THREADS, which here is set to 4. To test the code type the following: ./a.out < input_500kpf This executes the code with N=500,000 sources and M=250,000 targets, with 5 other user specified parameters (see the description of inputs for below) The potential energy or force is approximated by the treecode3d_targ_openmp subroutine and then computed exactly by direct summation (formulas (1) and (2)). Both the absolute and relative error in the infinity norm of the potential energy and the three force components (viewed as N vectors) are output, as well as the total time (in seconds) for each of the calculations. To ensure that the code is functioning properly on your machine, compare your screen output with the output in the file output_500kpf. treedriver_OPENMP.f inputs and treecode3d_targ_OPENMP.f calling statement: --------------------------------------------------------------------- treedriver_targ_OPENMP : ----------------------- The program treedriver will prompt the user for the following: NUMPARS : (INTEGER) Number of particles THETA : (REAL) The BMAX MAC acceptance parameter. (For more details of the BMAX MAC see the description of THETA below in the section on treecode3d_targ.f.) ORDER : (INTEGER) Order the Taylor expansions used for the approximation if the MAC is accepted. MAXPARNODE : (INTEGER) When the tree is constructed if the number of particles in the current node exceeds MAXPARNODE then the box defined by the node particles is subdivided. A smaller MAXPARNODE generally results in a deeper tree structure. SHRINK : (INTEGER) Adaptive switch used in the oct-tree construction. If equal to 1 then upon subdivision of a node the node bounds are taken the smallest Cartesian box containing the particles in the node. Otherwise each box is divided into 8 sub-boxes of equal area as determined by the midpoints in each coordinate direction. IFLAG : (INTEGER) Flag determining what is to be computed: IFLAG=1 : potential energy IFLAG=2 : potential energy AND forces NUMTARS : (INTEGER) For the test program we choose a subset comprised of the first NUMTARS particles to approximate (1) and/or (2). DIST_TOL : (REAL >= 0.0) Distance tolerance used to exclude pairwise interactions unless the distance between them is greater than DIST_TOL. For example, DIST_TOL=0.0 for the test example provided so particle self-interaction is excluded since the targets particles are a subset of the source particles. NOTE: IF THE INTERESTED USER IS SURE THAT THE TARGET AND SOURCE PARTICLE SETS DON'T INTERSECT, THEN THE DISTANCE CHECKS SHOULD BE REMOVED FOR GREATER EFFICIENCY!! treecode3d_targ_OPENMP : ---------------------- The calling statement for the subroutine is: !!!!!!!! IMPORTANT: If you only need the potential energy set IFLAG=1. You MUST still pass the FORCE variable. However, it can be a 3-vector and you should set FARRDIM=1. SUBROUTINE TREECODE3D_TARG_OPENMP(xtar,ytar,ztar,numtars, & pnum_beg,pnum_end,tpengtar,tforce, & x,y,z,q,perm,numpars, & iflag,order,theta,shrink, & maxparnode,dist_tol, & timetree,prinfo,farrdim,arrdim,myid) Input: XTAR,YTAR,ZTAR : (REAL*8 one-dimensional arrays of length NUMTARS) Cartesian coordinates of the target particles NUMTARS : (INTEGER) number of target particles PNUM_BEG : (INTEGER) beginning target particle number to compute PNUM_END : (INTEGER) ending target particle number to compute X,Y,Z : (REAL*8 one-dimensional arrays of length ARRDIM) Cartesian coordinates of the aource particles. Q : (REAL*8 one-dimensional array of length ARRDIM) Corresponding charge of each particle. PERM : (INTEGER one-dimensional array of length ARRDIM) array to track rearrangement of particle order due to tree construction. NUMPARS : (INTEGER) number of source particles IFLAG : (INTEGER) flag indicating what is to be computed: 1 - potential energy 2 - potential energy and force ORDER : (INTEGER) order of Taylor expansion used in the body-cell approximation. THETA : (REAL*8) opening angle used in the BMAX MAC (multi-pole acceptance criteria). If P is the position of the target particle and R is the "radius" (distance from the geometric center C of the box to its furthest corner) of the current box under consideration then the Taylor expansion is used if R/|P-C| < THETA. SHRINK : (INTEGER) adaptive switch used in the oct-tree construction. If equal to 1 then upon subdivision of a box the box bounds are taken the smallest Cartesian box containing the particles in the node. Otherwise each box is divided into 8 sub-boxes of equal area. MAXPARNODE : (INTEGER) When the tree is constructed if the number of particles in the current node exceeds MAXPARNODE then the box defined by the node particles is subdivided. A smaller MAXPARNODE generally results in a deeper tree structure. DIST_TOL : (REAL*8) Potential energy/force contribution is excluded if any particle lies within DIST_TOL of a target particle. (NOTE: this parameter is only used when a direct computation is performed.) PRINFO : (INTEGER) switch for outputting information in this routine. If set to 1 then information is written to stdout. FARRDIM : (INTEGER) leading dimension of TFORCE array. (IF IFLAG=1 then FARRDIM is ignored, otherwise it should be set to NUMTARS) ARRDIM : (INTEGER) declared length of arrays X,Y,Z, and Q MYID : (INTEGER) OpenMP thread number Output: TPENGTAR : (REAL*8 one-dimensional arrays of length FARRDIM) k-th entry is potential energy at k-th target particle. TFORCE : (REAL*8 two-dimensional arrays of dimension (FARRDIM,3)) force on particle is given by TFORCE(k,1:3) for k=1,...,NUMTARS. TIMETREE : (REAL*8) time (in seconds) for the tree algorithm computation.