PZGESVD(3)    ScaLAPACK routine of NEC Numeric Library Collection   PZGESVD(3)



NAME
       PZGESVD  - computes the singular value decomposition (SVD) of an M-by-N
       matrix A, optionally computing the left and/or right singular vectors

SYNOPSIS
       SUBROUTINE PZGESVD( JOBU, JOBVT, M, N, A, IA, JA, DESCA, S, U, IU,  JU,
                           DESCU,  VT,  IVT,  JVT, DESCVT, WORK, LWORK, RWORK,
                           INFO)

           CHARACTER       JOBU, JOBVT

           INTEGER         IA, INFO, IU, IVT, JA, JU, JVT, LWORK, M, N

           INTEGER         DESCA(*), DESCU(*), DESCVT(*)

           COMPLEX*16      A(*), U(*), VT(*), WORK(*)

           DOUBLE          PRECISION S(*)

           DOUBLE          PRECISION RWORK(*)

PURPOSE
       PZGESVD computes the singular value decomposition (SVD)  of  an  M-by-N
       matrix  A, optionally computing the left and/or right singular vectors.
       The SVD is written as
              A = U * SIGMA * transpose(V)
       where SIGMA is an M-by-N matrix which is zero except for  its  min(M,N)
       diagonal elements, U is an M-by-M orthogonal matrix, and V is an N-by-N
       orthogonal matrix. The diagonal elements of SIGMA are the singular val-
       ues  of  A  and  the columns of U and V are the corresponding right and
       left singular vectors, respectively. The singular values  are  returned
       in array S in decreasing order and only the first min(M,N) columns of U
       and rows of VT = V**T are computed.


       Notes
       =====

       Each global data object is described by an associated description  vec-
       tor.  This vector stores the information required to establish the map-
       ping between an object element and its corresponding process and memory
       location.

       Let A be a generic term for any 2D block cyclicly distributed array.
       Such a global array has an associated description vector DESCA.
       In  the  following  comments, the character _ should be read as "of the
       global array".

       NOTATION        STORED IN      EXPLANATION
       --------------- -------------- --------------------------------------

       DTYPE_A(global) DESCA( DTYPE_ )The descriptor type.  In this case,
                                      DTYPE_A = 1.
       CTXT_A (global) DESCA( CTXT_ ) The BLACS context handle, indicating
                                      the BLACS process grid A is distribu-
                                      ted over. The context itself is glo-
                                      bal, but the handle (the integer
                                      value) may vary.
       M_A    (global) DESCA( M_ )    The number of rows in the global
                                      array A.
       N_A    (global) DESCA( N_ )    The number of columns in the global
                                      array A.
       MB_A   (global) DESCA( MB_ )   The blocking factor used to distribute
                                      the rows of the array.
       NB_A   (global) DESCA( NB_ )   The blocking factor used to distribute
                                      the columns of the array.
       RSRC_A (global) DESCA( RSRC_ ) The process row over which the first
                                      row of the array A is distributed.
       CSRC_A (global) DESCA( CSRC_ ) The process column over which the
                                      first column of the array A is
                                      distributed.
       LLD_A  (local)  DESCA( LLD_ )  The leading dimension of the local
                                      array.  LLD_A >= MAX(1,LOCr(M_A)).

       Let K be the number of rows or columns of  a  distributed  matrix,  and
       assume that its process grid has dimension r x c. LOCr( K ) denotes the
       number of elements of K that a process would receive  if  K  were  dis-
       tributed over the r processes of its process column. Similarly, LOCc( K
       ) denotes the number of elements of K that a process would receive if K
       were distributed over the c processes of its process row. The values of
       LOCr() and LOCc() may be determined via a call to  the  ScaLAPACK  tool
       function, NUMROC:
               LOCr( M ) = NUMROC( M, MB_A, MYROW, RSRC_A, NPROW ),
               LOCc( N ) = NUMROC( N, NB_A, MYCOL, CSRC_A, NPCOL ).

       An upper bound for these quantities may be computed by:
               LOCr( M ) <= ceil( ceil(M/MB_A)/NPROW )*MB_A
               LOCc( N ) <= ceil( ceil(N/NB_A)/NPCOL )*NB_A


ARGUMENTS
       MP = number of local rows in A and U
       NQ = number of local columns in A and VT
       SIZE = min( M, N )
       SIZEQ = number of local columns in U
       SIZEP = number of local rows in VT

       JOBU    (global input) CHARACTER*1
               Specifies options for computing U:
               = 'V':  the first SIZE columns of U (the left singular
                         vectors) are returned in the array U;
               = 'N':  no columns of U (no left singular vectors) are
                         computed.

       JOBVT   (global input) CHARACTER*1
               Specifies options for computing V**T:
               = 'V':  the first SIZE rows of V**T (the right singular
                         vectors) are returned in the array VT;
               = 'N':  no rows of V**T (no right singular vectors) are
                         computed.

       M       (global input) INTEGER
               The number of rows of the input matrix A.  M >= 0.

       N       (global input) INTEGER
               The number of columns of the input matrix A.  N >= 0.

       A       (local input/workspace) block cyclic COMPLEX*16
               array,
               global dimension (M, N), local dimension (MP, NQ)
               On exit, the contents of A are destroyed.

       IA      (global input) INTEGER
               The row index in the global array A indicating the first row of
               sub( A ).

       JA      (global input) INTEGER
               The column index in the global array  A  indicating  the  first
               column of sub( A ).

       DESCA   (global input) INTEGER array of dimension DLEN_
               The array descriptor for the distributed matrix A.

       S       (global output) DOUBLE PRECISION   array, dimension SIZE
               The singular values of A, sorted so that S(i) >= S(i+1).

       U       (local output) COMPLEX*16         array, local dimension
               (MP, SIZEQ), global dimension (M, SIZE)
               if JOBU = 'V', U contains the first min(m,n) columns of U
               if JOBU = 'N', U is not referenced.

       IU      (global input) INTEGER
               The row index in the global array U indicating the first row of
               sub( U ).

       JU      (global input) INTEGER
               The column index in the global array  U  indicating  the  first
               column of sub( U ).

       DESCU   (global input) INTEGER array of dimension DLEN_
               The array descriptor for the distributed matrix U.

       VT      (local output) COMPLEX*16         array, local dimension
               (SIZEP, NQ), global dimension (SIZE, N).
               If  JOBVT  =  'V',  VT contains the first SIZE rows of V**T. If
               JOBVT = 'N', VT is not referenced.

       IVT     (global input) INTEGER
               The row index in the global array VT indicating the  first  row
               of sub( VT ).

       JVT     (global input) INTEGER
               The  column  index  in the global array VT indicating the first
               column of sub( VT ).

       DESCVT   (global input) INTEGER array of dimension DLEN_
               The array descriptor for the distributed matrix VT.

       WORK    (local workspace/output) COMPLEX*16         array, dimension
               (LWORK)
               On exit, if INFO = 0, WORK(1) returns the optimal LWORK.

       LWORK   (local input) INTEGER
               The dimension of the array WORK.

               LWORK >= 1 + 2*SIZEB + MAX(WATOBD, WBDTOSVD),

               where SIZEB = MAX(M,N), and WATOBD and WBDTOSVD refer,  respec-
               tively, to the workspace required to bidiagonalize the matrix A
               and to go from the bidiagonal  matrix  to  the  singular  value
               decomposition U*S*VT.

               For WATOBD, the following holds:

               WATOBD = MAX(MAX(WPZLANGE,WPZGEBRD),
                            MAX(WPZLARED2D,WP(pre)LARED1D)),

               where   WPZLANGE,  WPZLARED1D,  WPZLARED2D,  WPZGEBRD  are  the
               workspaces required respectively for the  subprograms  PZLANGE,
               PDLARED1D, PDLARED2D, PZGEBRD. Using the standard notation

               MP = NUMROC( M, MB, MYROW, DESCA( CTXT_ ), NPROW),
               NQ = NUMROC( N, NB, MYCOL, DESCA( LLD_ ), NPCOL),

               the workspaces required for the above subprograms are

               WPZLANGE = MP,
               WPDLARED1D = NQ0,
               WPDLARED2D = MP0,
               WPZGEBRD = NB*(MP + NQ + 1) + NQ,

               where  NQ0  and MP0 refer, respectively, to the values obtained
               at MYCOL = 0 and MYROW = 0. In general, the upper limit for the
               workspace is given by a workspace required on processor (0,0):

               WATOBD <= NB*(MP0 + NQ0 + 1) + NQ0.

               In  case  of a homogeneous process grid this upper limit can be
               used as an estimate of the minimum workspace for every  proces-
               sor.

               For WBDTOSVD, the following holds:

               WBDTOSVD = SIZE*(WANTU*NRU + WANTVT*NCVT) +
                          MAX(WZBDSQR,
                          MAX(WANTU*WPZORMBRQLN, WANTVT*WPZORMBRPRT)),

               where

                               1,  if  left(right) singular vectors are wanted
               WANTU(WANTVT) =
                               0, otherwise

               and WZBDSQR, WPZORMBRQLN and WPZORMBRPRT refer respectively  to
               the  workspace  required  for  the  subprograms  ZBDSQR,  PZUN-
               MBR(QLN), and PZUNMBR(PRT), where QLN and PRT are the values of
               the arguments VECT, SIDE, and TRANS in the call to PZUNMBR. NRU
               is equal to the local number of rows of the matrix U when  dis-
               tributed 1-dimensional "column" of processes. Analogously, NCVT
               is equal to the local number of columns of the matrix  VT  when
               distributed  across  1-dimensional  "row" of processes. Calling
               the LAPACK procedure ZBDSQR requires

               WZBDSQR = MAX(1, 4*SIZE )

               on every processor. Finally,

               WPZORMBRQLN = MAX( (NB*(NB-1))/2, (SIZEQ+MP)*NB)+NB*NB,
               WPZORMBRPRT = MAX( (MB*(MB-1))/2, (SIZEP+NQ)*MB )+MB*MB,

               If LWORK = -1, then LWORK is global input and a workspace query
               is  assumed;  the  routine only calculates the minimum size for
               the work array. The required workspace is returned as the first
               element of WORK and no error message is issued by PXERBLA.

       RWORK   (workspace) REAL             array, dimension (1+4*SIZEB)
               On  exit,  if INFO = 0, RWORK(1) returns the necessary size for
               RWORK.

       INFO    (output) INTEGER
               = 0:  successful exit
               < 0:  if INFO = -i, the i-th argument had an illegal value

               > 0:  if ZBDSQR did not converge
                     If INFO = MIN(M,N) + 1, then PZGESVD has detected
                     heterogeneity by finding that eigenvalues were not
                     identical across the process grid. In this case, the
                     accuracy of the results from PZGESVD cannot be
                     guaranteed.



       The results of PZGEBRD, and therefore PZGESVD, may vary  slightly  from
       run to run with the same input data. If repeatability is an issue, call
       BLACS_SET with the appropriate option after defining the process  grid.


ALIGNMENT REQUIREMENTS
       The  routine  PZGESVD  inherits  the same alignement requirement as the
       routine PZGEBRD, namely:

       The distributed submatrix sub( A ) must verify some  alignment  proper-
       ties, namely the following expressions should be true:
       ( MB_A.EQ.NB_A .AND. IROFFA.EQ.ICOFFA )
               where NB = MB_A = NB_A,
               IROFFA = MOD( IA-1, NB ), ICOFFA = MOD( JA-1, NB ),



ScaLAPACK routine               31 October 2017                     PZGESVD(3)