susilehtola/erkale

dense linear algebra suggestions

GoogleCodeExporter opened this issue · 4 comments

Hi Jussi,

A talented applied math colleague of mine read through your code after I 
pointed him to it on the basis of general interest.  He made a few suggestions 
to your use of dense linear algebra, which I copied below.

I don't know if you are interested in large-scale parallelism, but if you are, 
you might consider http://code.google.com/p/elemental/, which is the library my 
colleague has developed.  Because it is written in C++, you may find it very 
easy to integrate into your code relative for FORTRAN-based alternatives.

Of course, a parallel DFT code requires both parallel dense linear algebra and 
a parallel Fock build.  If you are interested in implementing the latter, I can 
suggest a few approaches.  I've worked with NWChem, MPQC and Dalton on Blue 
Gene/P, so I can compare different strategies for parallel Fock builds.

Best regards,

Jeff Hammond
jeff.science@gmail.com
https://wiki.alcf.anl.gov/index.php/User:Jhammond

===================================================
Someone please alert the author to the fact that Cholesky => invert => multiply 
should be replaced with Cholesky => triangle solve. 
http://code.google.com/p/erkale/source/browse/trunk/src/linalg.cpp#64
http://code.google.com/p/erkale/source/browse/trunk/src/completeness/completenes
s_profile.cpp#63

Going one level higher, the Cholesky QR decomposition algorithm is a bad idea 
when you have poor conditioning. Why not just use the standard Householder 
approach with dgeqrf/zgeqrf ?
http://code.google.com/p/erkale/source/browse/trunk/src/completeness/completenes
s_profile.cpp#59
===================================================

Original issue reported on code.google.com by jeff.science@gmail.com on 12 Jul 2011 at 6:23

Sorry, this is not a defect.  I forgot to toggle the metadata accordingly. 

Original comment by jeff.science@gmail.com on 12 Jul 2011 at 6:23

Hi Jeff,


thanks for the input. I had not noticed that you had raised an issue, since for 
some reason I hadn't enabled email warnings. I'll have a look at the linalg and 
completeness_profile issues later.

I'm not targetting very big systems, at least not at the moment. I need 
orbitals and the diagonalization of the Fock matrix is an N^3 operation anyway. 
In my experience the linear algebra steps aren't a bottleneck in any case, the 
performance in big systems is mostly limited by the evaluation of the Fock 
matrix. There are already other GPL codes for really big systems, such as 
ErgoSCF.

Going MPI parallel would mean a pretty much complete restructuring of the code, 
so until I run into big performance problems I'm not going to invest time in 
doing it...

Original comment by jussi.le...@gmail.com on 19 Aug 2011 at 1:32

  • Added labels: Type-Enhancement
  • Removed labels: Type-Defect
"Cholesky => invert => multiply should be replaced with Cholesky => triangle 
solve"

AFAIK the solve algorithms only work with linear equations, i.e. Mv=v', where v 
and v' are vectors. completeness_profile does matrix multiplication.

"Going one level higher, the Cholesky QR decomposition algorithm is a bad idea 
when you have poor conditioning. Why not just use the standard Householder 
approach with dgeqrf/zgeqrf ?"

True. The completeness_profile functions in ERKALE have been more on a 
proof-of-concept level, since they have been more of Kruununhaka's playing 
ground ( http://www.chem.helsinki.fi/~manninen/kruununhaka/ ). Although future 
development will probably happen in ERKALE only.

Since one only operates with a single atom basis set when computing 
completeness profiles, one usually doesn't run into problems with near-singular 
basis sets. I did change the default now to canonical orthonormalization. 
Cholesky is beneficial in completeness optimization, which is limited by 
compute_completeness.

Original comment by jussi.le...@gmail.com on 19 Aug 2011 at 5:33

Also, if you haven't noticed, ERKALE has a parallel Fock (and Kohn-Sham-Fock) 
build. I've done some computations on systems with some 1700 basis functions, 
where the performance is limited by Fock matrix construction, i.e. the 
calculation of the XC contribution. Where the formation of the Fock matrix 
might take 6 hours, the diagonalization only takes a minute or two. And if one 
uses a threaded LAPACK library, even that can be cut down.

Anyway, thanks for your input! If you're interested in developing ERKALE, 
please let me know.

Original comment by jussi.le...@gmail.com on 19 Aug 2011 at 5:41

  • Changed state: Fixed