planc
Parallel Lowrank Approximation with Non-negativity Constraints
/Users/rnu/Documents/research/nmflibrary/distnmf/README.md
Go to the documentation of this file.
1 MPI Based DistNMF and DistNTF
2 =============================
3 
4 Install Instructions
5 --------------------
6 
7 This program depends on:
8 
9 - Download Armadillo library which can be found at https://arma.sourceforge.net
10 - Download and build OpenBLAS https://github.com/xianyi/OpenBLAS
11 
12 Once the above steps are completed, set the following environment variables.
13 
14 ````
15 export ARMADILLO_INCLUDE_DIR=/home/rnu/libraries/armadillo-6.600.5/include/
16 export LIB=$LIB:/home/rnu/libraries/openblas/lib:
17 export INCLUDE=$INCLUDE:/home/rnu/libraries/openblas/include:$ARMADILLO_INCLUDE_DIR:
18 export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/home/rnu/libraries/openblas/lib/:
19 export NMFLIB_DIR=/ccs/home/ramki/rhea/research/nmflib/
20 export INCLUDE=$INCLUDE:$ARMADILLO_INCLUDE_DIR:$NMFLIB_DIR
21 export CPATH=$CPATH:$INCLUDE:
22 export MKLROOT=/ccs/compilers/intel/rh6-x86_64/16.0.0/mkl/
23 ````
24 
25 If you have got MKL, please source MKLVARS.sh before running make/cmake
26 
27 Sparse NMF
28 ---------
29 Run cmake with -DCMAKE_BUILD_SPARSE
30 
31 Sparse Debug build
32 ------------------
33 Run cmake with -DCMAKE_BUILD_SPARSE -DCMAKE_BUILD_TYPE=Debug
34 
35 Building on Cray-EOS/Titan
36 -----------------------
37 CC=CC CXX=CC cmake ~/nmflibrary/distnmf/ -DCMAKE_IGNORE_MKL=1
38 
39 Building on Titan with NVBLAS
40 -----------------------------
41 We are using NVBLAS to offload computations to GPU.
42 By default we enable building with cuda in Titan.
43 The sample configurations files for nvblas can be found at conf/nvblas_cuda75.conf
44 and conf/nvblas_cuda91.conf for CUDA Toolkit 7.5 and 9.1 respectively.
45 
46 CC=CC CXX=CC cmake ~/nmflibrary/distnmf/ -DCMAKE_IGNORE_MKL=1 -DCMAKE_BUILD_CUDA=1
47 
48 
49 Other Macros
50 -------------
51 
52 * CMAKE macros
53 
54  For sparse NMF - cmake -DBUILD_SPARSE=1 - Default dense build
55  For timing with barrier after mpi calls - cmake -DCMAKE_WITH_BARRIER_TIMING - Default with barrier timing
56  For performance, disable the WITH__BARRIER__TIMING. Run as "cmake -DCMAKE_WITH_BARRIER_TIMING:BOOL=OFF"
57  For building cuda - -DCMAKE_BUILD_CUDA=1 - Default is off.
58 
59 * Code level macros - Defined in distutils.h
60 
61  MPI_VERBOSE - Be doubly sure about what you do. It prints all intermediary matrices.
62  Try this only for very very small matrix that is of size less than 10.
63  WRITE_RAND_INPUT - for dumping the generated random matrix
64 
65 Output interpretation
66 ======================
67 For W matrix row major ordering. That is., W_0, W_1, .., W_p
68 For H matrix column major ordering. That is., for 6 processes
69 with pr=3, pc=2, interpret as H_0, H_2, H_4, H_1, H_3, H_5
70 
71 Running
72 =======
73 mpirun -np 16 ./distnmf -a [0/1/2/3] -i rand_[lowrank/uniform] -d "rows cols" -p "pr pc" -r "W_l2 W_l1 H_l2 H_l0" -k 20 -t 20 -e 1
74 
75 Citation:
76 =========
77 
78 If you are using this MPI implementation, kindly cite.
79 
80 Ramakrishnan Kannan, Grey Ballard, and Haesun Park. 2016. A high-performance parallel algorithm for nonnegative matrix factorization. In Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP '16). ACM, New York, NY, USA, , Article 9 , 11 pages. DOI: http://dx.doi.org/10.1145/2851141.2851152