PDTRSEN(3)    ScaLAPACK routine of NEC Numeric Library Collection   PDTRSEN(3)



NAME
       PDTRSEN  -  reorders  the real Schur factorization of a real matrix A =
       Q*T*Q**T, so that a selected cluster  of  eigenvalues  appears  in  the
       leading diagonal blocks of the upper quasi-triangular matrix T, and the
       leading columns of Q form an orthonormal  basis  of  the  corresponding
       right invariant subspace

SYNOPSIS
       SUBROUTINE PDTRSEN( JOB,  COMPQ,  SELECT, PARA, N, T, IT, JT, DESCT, Q,
                           IQ, JQ, DESCQ, WR, WI,  M,  S,  SEP,  WORK,  LWORK,
                           IWORK, LIWORK, INFO )

           CHARACTER       COMPQ, JOB

           INTEGER         INFO, LIWORK, LWORK, M, N, IT, JT, IQ, JQ

           DOUBLE          PRECISION S, SEP

           LOGICAL         SELECT( N )

           INTEGER         PARA( 6 ), DESCT( * ), DESCQ( * ), IWORK( * )

           DOUBLE          PRECISION Q( * ), T( * ), WI( * ), WORK( * ), WR( *
                           )

PURPOSE
       PDTRSEN reorders the real Schur factorization of  a  real  matrix  A  =
       Q*T*Q**T,  so  that  a  selected  cluster of eigenvalues appears in the
       leading diagonal blocks of the upper quasi-triangular matrix T, and the
       leading  columns  of  Q  form an orthonormal basis of the corresponding
       right invariant subspace. The reordering is performed by PDTRORD.

       Optionally the routine computes the reciprocal condition numbers of the
       cluster  of eigenvalues and/or the invariant subspace. SCASY library is
       needed for condition estimation.

       T must be in Schur form (as returned by PDLAHQR), that is, block  upper
       triangular with 1-by-1 and 2-by-2 diagonal blocks.


       Notes
       =====

       Each  global data object is described by an associated description vec-
       tor.  This vector stores the information required to establish the map-
       ping between an object element and its corresponding process and memory
       location.

       Let A be a generic term for any 2D block  cyclicly  distributed  array.
       Such a global array has an associated description vector DESCA.  In the
       following comments, the character _ should be read as  "of  the  global
       array".

       NOTATION        STORED IN      EXPLANATION
       --------------- -------------- --------------------------------------
       DTYPE_A(global) DESCA( DTYPE_ )The descriptor type.  In this case,
                                      DTYPE_A = 1.
       CTXT_A (global) DESCA( CTXT_ ) The BLACS context handle, indicating
                                      the BLACS process grid A is distribu-
                                      ted over. The context itself is glo-
                                      bal, but the handle (the integer
                                      value) may vary.
       M_A    (global) DESCA( M_ )    The number of rows in the global
                                      array A.
       N_A    (global) DESCA( N_ )    The number of columns in the global
                                      array A.
       MB_A   (global) DESCA( MB_ )   The blocking factor used to distribute
                                      the rows of the array.
       NB_A   (global) DESCA( NB_ )   The blocking factor used to distribute
                                      the columns of the array.
       RSRC_A (global) DESCA( RSRC_ ) The process row over which the first
                                      row of the array A is distributed.
       CSRC_A (global) DESCA( CSRC_ ) The process column over which the
                                      first column of the array A is
                                      distributed.
       LLD_A  (local)  DESCA( LLD_ )  The leading dimension of the local
                                      array.  LLD_A >= MAX(1,LOCr(M_A)).

       Let  K  be  the  number of rows or columns of a distributed matrix, and
       assume that its process grid has dimension p x q.
       LOCr( K ) denotes the number of elements of  K  that  a  process  would
       receive  if K were distributed over the p processes of its process col-
       umn.
       Similarly, LOCc( K ) denotes the number of elements of K that a process
       would receive if K were distributed over the q processes of its process
       row.
       The values of LOCr() and LOCc() may be determined via  a  call  to  the
       ScaLAPACK tool function, NUMROC:
               LOCr( M ) = NUMROC( M, MB_A, MYROW, RSRC_A, NPROW ),
               LOCc(  N ) = NUMROC( N, NB_A, MYCOL, CSRC_A, NPCOL ).  An upper
       bound for these quantities may be computed by:
               LOCr( M ) <= ceil( ceil(M/MB_A)/NPROW )*MB_A
               LOCc( N ) <= ceil( ceil(N/NB_A)/NPCOL )*NB_A


ARGUMENTS
       JOB     (global input) CHARACTER*1
               Specifies whether condition numbers are required for the  clus-
               ter of eigenvalues (S) or the invariant subspace (SEP):
               = 'N': none;
               = 'E': for eigenvalues only (S);
               = 'V': for invariant subspace only (SEP);
               = 'B': for both eigenvalues and invariant subspace (S and SEP).

       COMPQ   (global input) CHARACTER*1
               = 'V': update the matrix Q of Schur vectors;
               = 'N': do not update Q.

       SELECT  (global input) LOGICAL  array, dimension (N)
               SELECT specifies the eigenvalues in the  selected  cluster.  To
               select a real eigenvalue w(j), SELECT(j) must be set to .TRUE..
               To select a complex conjugate  pair  of  eigenvalues  w(j)  and
               w(j+1),  corresponding  to  a  2-by-2  diagonal  block,  either
               SELECT(j) or SELECT(j+1) or both must be set to .TRUE.;
                a complex conjugate pair of eigenvalues must  be  either  both
               included in the cluster or both excluded.

       PARA    (global input) INTEGER*6
               Block  parameters  (some should be replaced by calls to PILAENV
               and others by meaningful default values):
               PARA(1) = maximum number of concurrent computational windows
                         allowed in the algorithm;
                         0 < PARA(1) <= min(NPROW,NPCOL) must hold;
               PARA(2) = number of eigenvalues in each window;
                         0 < PARA(2) < PARA(3) must hold;
               PARA(3) = window size; PARA(2) < PARA(3) < DESCT(MB_)
                         must hold;
               PARA(4) = minimal percentage of flops required for
                         performing matrix-matrix multiplications instead
                         of pipelined orthogonal transformations;
                         0 <= PARA(4) <= 100 must hold;
               PARA(5) = width of block column slabs for row-wise
                         application of pipelined orthogonal
                         transformations in their factorized form;
                         0 < PARA(5) <= DESCT(MB_) must hold.
               PARA(6) = the maximum number of eigenvalues moved together
                         over a process border; in practice, this will be
                         approximately half of the cross border window size
                         0 < PARA(6) <= PARA(2) must hold;

       N       (global input) INTEGER
               The order of the globally distributed matrix T. N >= 0.

       T       (local input/output) DOUBLE PRECISION array,
               dimension (LLD_T,LOCc(N)).
               On entry, the local pieces  of  the  global  distributed  upper
               quasi-triangular  matrix  T, in Schur form. On exit, T is over-
               written by the local pieces of the reordered matrix T, again in
               Schur form, with the selected eigenvalues in the globally lead-
               ing diagonal blocks.

       IT      (global input) INTEGER

       JT      (global input) INTEGER
               The row and column index in the global array T  indicating  the
               first column of sub( T ). IT = JT = 1 must hold.

       DESCT   (global and local input) INTEGER array of dimension DLEN_.
               The array descriptor for the global distributed matrix T.

       Q       (local input/output) DOUBLE PRECISION array,
               dimension (LLD_Q,LOCc(N)).
               On  entry,  if COMPQ = 'V', the local pieces of the global dis-
               tributed matrix Q of Schur vectors.
               On exit, if COMPQ = 'V',  Q  has  been  postmultiplied  by  the
               global  orthogonal  transformation matrix which reorders T; the
               leading M columns of Q form an orthonormal basis for the speci-
               fied invariant subspace.
               If COMPQ = 'N', Q is not referenced.

       IQ      (global input) INTEGER

       JQ      (global input) INTEGER
               The  column  index  in  the global array Q indicating the first
               column of sub( Q ). IQ = JQ = 1 must hold.

       DESCQ   (global and local input) INTEGER array of dimension DLEN_.
               The array descriptor for the global distributed matrix Q.

       WR      (global output) DOUBLE PRECISION array, dimension (N)

       WI      (global output) DOUBLE PRECISION array, dimension (N)
               The real and imaginary parts, respectively,  of  the  reordered
               eigenvalues  of  T.  The eigenvalues are in principle stored in
               the same order as on the diagonal of T,  with  WR(i)  =  T(i,i)
               and,  if  T(i:i+1,i:i+1)  is a 2-by-2 diagonal block, WI(i) > 0
               and WI(i+1) = -WI(i).
               Note also that if a complex eigenvalue is sufficiently ill-con-
               ditioned,  then  its  value  may  differ significantly from its
               value before reordering.

       M       (global output) INTEGER
               The dimension of the specified invariant subspace.  0 <=  M  <=
               N.

       S       (global output) DOUBLE PRECISION
               If JOB = 'E' or 'B', S is a lower bound on the reciprocal
               condition number for the selected cluster of eigenvalues.
               S  cannot underestimate the true reciprocal condition number by
               more than a factor of sqrt(N). If M = 0 or N, S = 1.
               If JOB = 'N' or 'V', S is not referenced.

       SEP     (global output) DOUBLE PRECISION
               If JOB = 'V' or 'B', SEP is the estimated reciprocal
               condition number of the specified invariant subspace. If M =  0
               or N, SEP = norm(T).
               If JOB = 'N' or 'E', SEP is not referenced.

       WORK     (local  workspace/output)  DOUBLE  PRECISION  array, dimension
       (LWORK)
               On exit, if INFO = 0, WORK(1) returns the optimal LWORK.

       LWORK   (local input) INTEGER
               The dimension of the array WORK.

               If  LWORK  = -1, then a workspace query is assumed; the routine
               only calculates the optimal size of  the  WORK  array,  returns
               this  value  as the first entry of the WORK array, and no error
               message related to LWORK is issued by PXERBLA.

       IWORK   (local workspace/output) INTEGER array, dimension (LIWORK)

       LIWORK  (local input) INTEGER
               The dimension of the array IWORK.

               If LIWORK = -1, then a workspace query is assumed; the  routine
               only  calculates  the  optimal size of the IWORK array, returns
               this value as the first entry of the IWORK array, and no  error
               message related to LIWORK is issued by PXERBLA.

       INFO    (global output) INTEGER
               = 0: successful exit
               <  0: if INFO = -i, the i-th argument had an illegal value.  If
               the i-th argument is an array and the j-entry  had  an  illegal
               value,  then  INFO  =  -(i*1000+j),  if  the i-th argument is a
               scalar and had an illegal value, then INFO = -i.
               > 0: here we have several possibilites
                 *) Reordering of T failed because some eigenvalues are too
                    close to separate (the problem is very ill-conditioned);
                    T may have been partially reordered, and WR and WI
                    contain the eigenvalues in the same order as in T.
                    On exit, INFO = {the index of T where the swap failed}.
                 *) A 2-by-2 block to be reordered split into two 1-by-1
                    blocks and the second block failed to swap with an
                    adjacent block.
                    On exit, INFO = {the index of T where the swap failed}.
                 *) If INFO = N+1, there is no valid BLACS context (see the
                    BLACS documentation for details).
                 *) If INFO = N+2, the routines used in the calculation of
                    the condition numbers raised a positive warning flag
                    (see the documentation for PGESYCTD and PSYCTCON of the
                    SCASY library).
                 *) If INFO = N+3, PGESYCTD raised an input error flag;
                    please report this bug to the authors (see below).
                    If INFO = N+4, PSYCTCON raised an input error flag;
                    please report this bug to the authors (see below).
               In a future release this subroutine may distinguish between the
               case 1 and 2 above.


       Method
       ======

       This  routine  performs  parallel  eigenvalue  reordering in real Schur
       form. The condition number estimation part is performed by using
        techniques and code from SCASY

       Additional requirements
       =======================

       The following alignment requirements must hold:
       (a) DESCT( MB_ ) = DESCT( NB_ ) = DESCQ( MB_ ) = DESCQ( NB_ )
       (b) DESCT( RSRC_ ) = DESCQ( RSRC_ )
       (c) DESCT( CSRC_ ) = DESCQ( CSRC_ )

       All matrices must be blocked by a block factor larger than or equal  to
       two  (3).  This  to simplify reordering across processor borders in the
       presence of 2-by-2 blocks.

       Limitations
       ===========

       This algorithm cannot work on submatrices of T and Q, i.e.,
       IT = JT = IQ = JQ = 1 must hold. This is however  no  limitation  since
       PDLAHQR does not compute Schur forms of submatrices anyway.

       Parallel execution recommendations
       ==================================

       Use  a  square  grid,  if  possible, for maximum performance. The block
       parameters in PARA should be kept  well  below  the  data  distribution
       block size.

       In  general,  the parallel algorithm strives to perform as much work as
       possible without crossing the block borders on the main block diagonal.



ScaLAPACK routine               31 October 2017                     PDTRSEN(3)