dense linear algebra suggestions
GoogleCodeExporter opened this issue · 4 comments
GoogleCodeExporter commented
Hi Jussi,
A talented applied math colleague of mine read through your code after I
pointed him to it on the basis of general interest. He made a few suggestions
to your use of dense linear algebra, which I copied below.
I don't know if you are interested in large-scale parallelism, but if you are,
you might consider http://code.google.com/p/elemental/, which is the library my
colleague has developed. Because it is written in C++, you may find it very
easy to integrate into your code relative for FORTRAN-based alternatives.
Of course, a parallel DFT code requires both parallel dense linear algebra and
a parallel Fock build. If you are interested in implementing the latter, I can
suggest a few approaches. I've worked with NWChem, MPQC and Dalton on Blue
Gene/P, so I can compare different strategies for parallel Fock builds.
Best regards,
Jeff Hammond
jeff.science@gmail.com
https://wiki.alcf.anl.gov/index.php/User:Jhammond
===================================================
Someone please alert the author to the fact that Cholesky => invert => multiply
should be replaced with Cholesky => triangle solve.
http://code.google.com/p/erkale/source/browse/trunk/src/linalg.cpp#64
http://code.google.com/p/erkale/source/browse/trunk/src/completeness/completenes
s_profile.cpp#63
Going one level higher, the Cholesky QR decomposition algorithm is a bad idea
when you have poor conditioning. Why not just use the standard Householder
approach with dgeqrf/zgeqrf ?
http://code.google.com/p/erkale/source/browse/trunk/src/completeness/completenes
s_profile.cpp#59
===================================================
Original issue reported on code.google.com by jeff.science@gmail.com
on 12 Jul 2011 at 6:23
GoogleCodeExporter commented
Sorry, this is not a defect. I forgot to toggle the metadata accordingly.
Original comment by jeff.science@gmail.com
on 12 Jul 2011 at 6:23
GoogleCodeExporter commented
Hi Jeff,
thanks for the input. I had not noticed that you had raised an issue, since for
some reason I hadn't enabled email warnings. I'll have a look at the linalg and
completeness_profile issues later.
I'm not targetting very big systems, at least not at the moment. I need
orbitals and the diagonalization of the Fock matrix is an N^3 operation anyway.
In my experience the linear algebra steps aren't a bottleneck in any case, the
performance in big systems is mostly limited by the evaluation of the Fock
matrix. There are already other GPL codes for really big systems, such as
ErgoSCF.
Going MPI parallel would mean a pretty much complete restructuring of the code,
so until I run into big performance problems I'm not going to invest time in
doing it...
Original comment by jussi.le...@gmail.com
on 19 Aug 2011 at 1:32
- Added labels: Type-Enhancement
- Removed labels: Type-Defect
GoogleCodeExporter commented
"Cholesky => invert => multiply should be replaced with Cholesky => triangle
solve"
AFAIK the solve algorithms only work with linear equations, i.e. Mv=v', where v
and v' are vectors. completeness_profile does matrix multiplication.
"Going one level higher, the Cholesky QR decomposition algorithm is a bad idea
when you have poor conditioning. Why not just use the standard Householder
approach with dgeqrf/zgeqrf ?"
True. The completeness_profile functions in ERKALE have been more on a
proof-of-concept level, since they have been more of Kruununhaka's playing
ground ( http://www.chem.helsinki.fi/~manninen/kruununhaka/ ). Although future
development will probably happen in ERKALE only.
Since one only operates with a single atom basis set when computing
completeness profiles, one usually doesn't run into problems with near-singular
basis sets. I did change the default now to canonical orthonormalization.
Cholesky is beneficial in completeness optimization, which is limited by
compute_completeness.
Original comment by jussi.le...@gmail.com
on 19 Aug 2011 at 5:33
GoogleCodeExporter commented
Also, if you haven't noticed, ERKALE has a parallel Fock (and Kohn-Sham-Fock)
build. I've done some computations on systems with some 1700 basis functions,
where the performance is limited by Fock matrix construction, i.e. the
calculation of the XC contribution. Where the formation of the Fock matrix
might take 6 hours, the diagonalization only takes a minute or two. And if one
uses a threaded LAPACK library, even that can be cut down.
Anyway, thanks for your input! If you're interested in developing ERKALE,
please let me know.
Original comment by jussi.le...@gmail.com
on 19 Aug 2011 at 5:41
- Changed state: Fixed