dmlc/mshadow

outer product ger and batched outer product batched_ger

Closed this issue · 0 comments

jli05 commented

Can we assume that 1-dimensional array are always contiguously allocated on CPU & GPUs? I plan to change the incx and incy to both 1 in calls to cblas_<x>ger() from within expr::BLASEngine::ger(), then to use the natural column-stride of matrices X and Y in expr::BLASEngine::batched_ger(). I'm implementing the Khatri-Rao product for tensor computations.