PSGESVD(3)    ScaLAPACK routine of NEC Numeric Library Collection   PSGESVD(3)



NAME
       PSGESVD  -  compute the singular value decomposition (SVD) of an M-by-N
       matrix A, optionally computing the left and/or right singular vectors

SYNOPSIS
       SUBROUTINE PSGESVD( JOBU, JOBVT, M, N, A, IA, JA, DESCA, S, U, IU,  JU,
                           DESCU, VT, IVT, JVT, DESCVT, WORK, LWORK, INFO )

           CHARACTER       JOBU, JOBVT

           INTEGER         IA, INFO, IU, IVT, JA, JU, JVT, LWORK, M, N

           INTEGER         DESCA( * ), DESCU( * ), DESCVT( * )

           REAL            A( * ), S( * ), U( * ), VT( * ), WORK( * )

PURPOSE
       PSGESVD  computes  the  singular value decomposition (SVD) of an M-by-N
       matrix A, optionally computing the left and/or right singular  vectors.
       The SVD is written as
            A = U * SIGMA * transpose(V)

       where  SIGMA  is an M-by-N matrix which is zero except for its min(M,N)
       diagonal elements, U is an M-by-M orthogonal matrix, and V is an N-by-N
       orthogonal matrix. The diagonal elements of SIGMA are the singular val-
       ues of A and the columns of U and V are  the  corresponding  right  and
       left  singular  vectors, respectively. The singular values are returned
       in array S in decreasing order and only the first min(M,N) columns of U
       and rows of VT = V**T are computed.


       Notes
       =====
       Each  global data object is described by an associated description vec-
       tor. This vector stores the information required to establish the  map-
       ping between an object element and its corresponding process and memory
       location.

       Let A be a generic term for any 2D block  cyclicly  distributed  array.
       Such a global array has an associated description vector DESCA.  In the
       following comments, the character _ should be read as  "of  the  global
       array".

       NOTATION        STORED IN      EXPLANATION
       --------------- -------------- --------------------------------------
       DTYPE_A(global) DESCA( DTYPE_ )The descriptor type.  In this case,
                                      DTYPE_A = 1.
       CTXT_A (global) DESCA( CTXT_ ) The BLACS context handle, indicating
                                      the BLACS process grid A is distribu-
                                      ted over. The context itself is glo-
                                      bal, but the handle (the integer
                                      value) may vary.
       M_A    (global) DESCA( M_ )    The number of rows in the global
                                      array A.
       N_A    (global) DESCA( N_ )    The number of columns in the global
                                      array A.
       MB_A   (global) DESCA( MB_ )   The blocking factor used to distribute
                                      the rows of the array.
       NB_A   (global) DESCA( NB_ )   The blocking factor used to distribute
                                      the columns of the array.
       RSRC_A (global) DESCA( RSRC_ ) The process row over which the first
                                      row  of  the  array  A  is  distributed.
       CSRC_A (global) DESCA( CSRC_ ) The process column over which the
                                      first column of the array A is
                                      distributed.
       LLD_A  (local)  DESCA( LLD_ )  The leading dimension of the local
                                      array.  LLD_A >= MAX(1,LOCr(M_A)).

       Let K be the number of rows or columns of  a  distributed  matrix,  and
       assume that its process grid has dimension r x c. LOCr( K ) denotes the
       number of elements of K that a process would receive  if  K  were  dis-
       tributed over the r processes of its process column. Similarly, LOCc( K
       ) denotes the number of elements of K that a process would receive if K
       were distributed over the c processes of its process row. The values of
       LOCr() and LOCc() may be determined via a call to  the  ScaLAPACK  tool
       function, NUMROC:
               LOCr( M ) = NUMROC( M, MB_A, MYROW, RSRC_A, NPROW ),
               LOCc(  N ) = NUMROC( N, NB_A, MYCOL, CSRC_A, NPCOL ).  An upper
       bound for these quantities may be computed by:
               LOCr( M ) <= ceil( ceil(M/MB_A)/NPROW )*MB_A
               LOCc( N ) <= ceil( ceil(N/NB_A)/NPCOL )*NB_A


ARGUMENTS
       MP = number of local rows in A and U NQ = number of local columns in  A
       and  VT SIZE = min( M, N ) SIZEQ = number of local columns in U SIZEP =
       number of local rows in VT

       JOBU    (global input) CHARACTER*1
               Specifies options for computing U:
               = 'V':  the first SIZE columns of U (the left singular vectors)
               are  returned  in the array U; = 'N':  no columns of U (no left
               singular vectors) are computed.

       JOBVT   (global input) CHARACTER*1
               Specifies options for computing V**T:
               = 'V':  the first SIZE rows of V**T (the  right  singular  vec-
               tors) are returned in the array VT; = 'N':  no rows of V**T (no
               right singular vectors) are computed.

       M       (global input) INTEGER
               The number of rows of the input matrix A.  M >= 0.

       N       (global input) INTEGER
               The number of columns of the input matrix A.  N >= 0.

       A       (local input/workspace) block cyclic REAL array,
               global dimension (M, N), local dimension (MP, NQ) On exit,  the
               contents of A are destroyed.

       IA      (global input) INTEGER
               The row index in the global array A indicating the first row of
               sub( A ).

       JA      (global input) INTEGER
               The column index in the global array  A  indicating  the  first
               column of sub( A ).

       DESCA   (global input) INTEGER array of dimension DLEN_
               The array descriptor for the distributed matrix A.

       S       (global output) REAL array, dimension SIZE
               The singular values of A, sorted so that S(i) >= S(i+1).

       U       (local output) REAL array, local dimension
               (MP,  SIZEQ),  global dimension (M, SIZE) if JOBU = 'V', U con-
               tains the first min(m,n) columns of U if JOBU = 'N', U  is  not
               referenced.

       IU      (global input) INTEGER
               The row index in the global array U indicating the first row of
               sub( U ).

       JU      (global input) INTEGER
               The column index in the global array  U  indicating  the  first
               column of sub( U ).

       DESCU   (global input) INTEGER array of dimension DLEN_
               The array descriptor for the distributed matrix U.

       VT      (local output) REAL array, local dimension
               (SIZEP,  NQ),  global  dimension (SIZE, N).  If JOBVT = 'V', VT
               contains the first SIZE rows of V**T. If JOBVT = 'N', VT is not
               referenced.

       IVT     (global input) INTEGER
               The  row  index in the global array VT indicating the first row
               of sub( VT ).

       JVT     (global input) INTEGER
               The column index in the global array VT  indicating  the  first
               column of sub( VT ).

       DESCVT   (global input) INTEGER array of dimension DLEN_
                The array descriptor for the distributed matrix VT.

       WORK    (local workspace/output) REAL array, dimension
               (LWORK)  On  exit,  if  INFO  =  0, WORK(1) returns the optimal
               LWORK.

       LWORK   (local input) INTEGER
               The dimension of the array WORK.

               LWORK > 2 + 6*SIZEB + MAX(WATOBD, WBDTOSVD),

               where SIZEB = MAX(M,N), and WATOBD and WBDTOSVD refer,  respec-
               tively, to the workspace required to bidiagonalize the matrix A
               and to go from the bidiagonal  matrix  to  the  singular  value
               decomposition U*S*VT.

               For WATOBD, the following holds:

               WATOBD               =              MAX(MAX(WPSLANGE,WPSGEBRD),
               MAX(WPSLARED2D,WPSLARED1D)),

               where  WPSLANGE,  WPSLARED1D,  WPSLARED2D,  WPSGEBRD  are   the
               workspaces  required  respectively for the subprograms PSLANGE,
               PSLARED1D, PSLARED2D, PSGEBRD. Using the standard notation

               MP = NUMROC( M, MB, MYROW, DESCA( CTXT_ ), NPROW), NQ = NUMROC(
               N, NB, MYCOL, DESCA( LLD_ ), NPCOL),

               the workspaces required for the above subprograms are

               WPSLANGE  =  MP, WPSLARED1D = NQ0, WPSLARED2D = MP0, WPSGEBRD =
               NB*(MP + NQ + 1) + NQ,

               where NQ0 and MP0 refer, respectively, to the  values  obtained
               at MYCOL = 0 and MYROW = 0. In general, the upper limit for the
               workspace is given by a workspace required on processor (0,0):

               WATOBD <= NB*(MP0 + NQ0 + 1) + NQ0.

               In case of a homogeneous process grid this upper limit  can  be
               used  as an estimate of the minimum workspace for every proces-
               sor.

               For WBDTOSVD, the following holds:

               WBDTOSVD  =  SIZE*(WANTU*NRU  +  WANTVT*NCVT)  +   MAX(WSBDSQR,
               MAX(WANTU*WPSORMBRQLN, WANTVT*WPSORMBRPRT)),

               where

               1,  if  left(right) singular vectors are wanted WANTU(WANTVT) =
               0, otherwise

               and WSBDSQR, WPSORMBRQLN and WPSORMBRPRT refer respectively  to
               the    workspace   required   for   the   subprograms   SBDSQR,
               PSORMBR(QLN), and PSORMBR(PRT), where QLN and PRT are the  val-
               ues  of  the  arguments  VECT,  SIDE,  and TRANS in the call to
               PSORMBR. NRU is equal to the local number of rows of the matrix
               U  when distributed 1-dimensional "column" of processes. Analo-
               gously, NCVT is equal to the local number  of  columns  of  the
               matrix  VT  when distributed across 1-dimensional "row" of pro-
               cesses. Calling the LAPACK procedure SBDSQR requires

               WSBDSQR = MAX(1, 2*SIZE + (2*SIZE - 4)*MAX(WANTU, WANTVT))

               on every processor. Finally,

               WPSORMBRQLN = MAX( (NB*(NB-1))/2, (SIZEQ+MP)*NB)+NB*NB, WPSORM-
               BRPRT = MAX( (MB*(MB-1))/2, (SIZEP+NQ)*MB )+MB*MB,

               If LWORK = -1, then LWORK is global input and a workspace query
               is assumed; the routine only calculates the  minimum  size  for
               the work array. The required workspace is returned as the first
               element of WORK and no error message is issued by PXERBLA.

       INFO    (global output) INTEGER
               = 0:  successful exit.
               < 0:  if INFO = -i, the i-th argument had an illegal value.
               > 0:  if SBDSQR did not converge If INFO = MIN(M,N) +  1,  then
               PSGESVD  has detected heterogeneity by finding that eigenvalues
               were not identical across the process grid. In this  case,  the
               accuracy of the results from PSGESVD cannot be guaranteed.



ScaLAPACK routine               31 October 2017                     PSGESVD(3)