PZLAHQR(3)    ScaLAPACK routine of NEC Numeric Library Collection   PZLAHQR(3)



NAME
       PZLAHQR  -  i an auxiliary routine used to find the Schur decomposition
       and or eigenvalues of a matrix already in Hessenberg form from cols ILO
       to IHI

SYNOPSIS
       SUBROUTINE PZLAHQR( WANTT, WANTZ, N, ILO, IHI, A, DESCA, W, ILOZ, IHIZ,
                           Z, DESCZ, WORK, LWORK, IWORK, ILWORK, INFO )

           LOGICAL         WANTT, WANTZ

           INTEGER         IHI, IHIZ, ILO, ILOZ, ILWORK, INFO, LWORK, N

           INTEGER         DESCA( * ), DESCZ( * ), IWORK( * )

           COMPLEX*16      A( * ), W( * ), WORK( * ), Z( * )

PURPOSE
       PZLAHQR is an auxiliary routine used to find  the  Schur  decomposition
       and or eigenvalues of a matrix already in Hessenberg form from cols ILO
       to IHI. If Z = I, and WANTT=WANTZ=.TRUE., H gets replaced with Z'HZ,
         with Z'Z=I, and H in Schur form.


       Notes
       =====

       Each global data object is described by an associated description  vec-
       tor.  This vector stores the information required to establish the map-
       ping between an object element and its corresponding process and memory
       location.

       Let  A  be  a generic term for any 2D block cyclicly distributed array.
       Such a global array has an associated description vector DESCA.  In the
       following  comments,  the  character _ should be read as "of the global
       array".

       NOTATION        STORED IN      EXPLANATION
       --------------- -------------- --------------------------------------
       DTYPE_A(global) DESCA( DTYPE_ )The descriptor type.  In this case,
                                      DTYPE_A = 1.
       CTXT_A (global) DESCA( CTXT_ ) The BLACS context handle, indicating
                                      the BLACS process grid A is distribu-
                                      ted over. The context itself is glo-
                                      bal, but the handle (the integer
                                      value) may vary.
       M_A    (global) DESCA( M_ )    The number of rows in the global
                                      array A.
       N_A    (global) DESCA( N_ )    The number of columns in the global
                                      array A.
       MB_A   (global) DESCA( MB_ )   The blocking factor used to distribute
                                      the rows of the array.
       NB_A   (global) DESCA( NB_ )   The blocking factor used to distribute
                                      the columns of the array.
       RSRC_A (global) DESCA( RSRC_ ) The process row over which the first
                                      row  of  the  array  A  is  distributed.
       CSRC_A (global) DESCA( CSRC_ ) The process column over which the
                                      first column of the array A is
                                      distributed.
       LLD_A  (local)  DESCA( LLD_ )  The leading dimension of the local
                                      array.  LLD_A >= MAX(1,LOCp(M_A)).

       Let  K  be  the  number of rows or columns of a distributed matrix, and
       assume that its process grid has dimension p x q.
       LOCp( K ) denotes the number of elements of  K  that  a  process  would
       receive  if K were distributed over the p processes of its process col-
       umn.
       Similarly, LOCq( K ) denotes the number of elements of K that a process
       would receive if K were distributed over the q processes of its process
       row.
       The values of LOCp() and LOCq() may be determined via  a  call  to  the
       ScaLAPACK tool function, NUMROC:
               LOCp( M ) = NUMROC( M, MB_A, MYROW, RSRC_A, NPROW ),
               LOCq(  N ) = NUMROC( N, NB_A, MYCOL, CSRC_A, NPCOL ).  An upper
       bound for these quantities may be computed by:
               LOCp( M ) <= ceil( ceil(M/MB_A)/NPROW )*MB_A
               LOCq( N ) <= ceil( ceil(N/NB_A)/NPCOL )*NB_A


ARGUMENTS
       WANTT   (global input) LOGICAL
               = .TRUE. : the full Schur form T is required;
               = .FALSE.: only eigenvalues are required.

       WANTZ   (global input) LOGICAL
               = .TRUE. : the matrix of Schur vectors Z is required;
               = .FALSE.: Schur vectors are not required.

       N       (global input) INTEGER
               The order of the Hessenberg matrix A (and Z if WANTZ).  N >= 0.

       ILO     (global input) INTEGER
               IHI      (global input) INTEGER It is assumed that A is already
               upper quasi-triangular in rows and columns  IHI+1:N,  and  that
               A(ILO,ILO-1) = 0 (unless ILO = 1). PZLAHQR works primarily with
               the Hessenberg submatrix in rows and columns ILO  to  IHI,  but
               applies  transformations  to all of H if WANTT is .TRUE..  1 <=
               ILO <= max(1,IHI); IHI <= N.

       A       (global input/output) COMPLEX*16 array, dimension
               (DESCA(LLD_),*) On entry, the upper Hessenberg  matrix  A.   On
               exit,  if  WANTT  is  .TRUE., A is upper triangular in rows and
               columns ILO:IHI.  If WANTT is .FALSE., the contents  of  A  are
               unspecified on exit.

       DESCA   (global and local input) INTEGER array of dimension DLEN_.
               The array descriptor for the distributed matrix A.

       W      (global replicated output) COMPLEX*16 array, dimension (N)
              The  computed  eigenvalues  ILO  to IHI are stored in the corre-
              sponding elements of W.  If WANTT is .TRUE., the eigenvalues are
              stored  in  the  same order as on the diagonal of the Schur form
              returned in A.  A may be returned with  larger  diagonal  blocks
              until the next release.

       ILOZ    (global input) INTEGER
               IHIZ     (global  input) INTEGER Specify the rows of Z to which
               transformations must be applied if WANTZ is .TRUE..  1 <=  ILOZ
               <= ILO; IHI <= IHIZ <= N.

       Z       (global input/output) COMPLEX*16 array.
               If  WANTZ is .TRUE., on entry Z must contain the current matrix
               Z of transformations accumulated by PZHSEQR, and on exit Z  has
               been updated; transformations are applied only to the submatrix
               Z(ILOZ:IHIZ,ILO:IHI).  If WANTZ is .FALSE.,  Z  is  not  refer-
               enced.

       DESCZ   (global and local input) INTEGER array of dimension DLEN_.
               The array descriptor for the distributed matrix Z.

       WORK    (local output) COMPLEX*16 array of size LWORK
               (Unless LWORK=-1, in which case WORK must be at least size 1)

       LWORK   (local input) INTEGER
               WORK(LWORK) is a local array and LWORK is assumed big enough so
               that LWORK  >=  3*N  +  MAX(  2*MAX(DESCZ(LLD_),DESCA(LLD_))  +
               2*LOCq(N),    7*Ceil(N/HBL)/LCM(NPROW,NPCOL))   +   MAX(   2*N,
               (8*LCM(NPROW,NPCOL)+2)**2 ) If LWORK=-1, then WORK(1) gets  set
               to the above number and the code returns immediately.

       IWORK   (global and local input) INTEGER array of size ILWORK
               This  will  hold some of the IBLK integer arrays.  This is held
               as a place holder for a  future  release.   Currently  unrefer-
               enced.

       ILWORK  (local input) INTEGER
               This  will hold the size of the IWORK array.  This is held as a
               place holder for a future release.  Currently unreferenced.

       INFO    (global output) INTEGER
               < 0: parameter number -INFO incorrect or inconsistent
               = 0: successful exit
               > 0: PZLAHQR failed to compute all the eigenvalues ILO  to  IHI
               in  a total of 30*(IHI-ILO+1) iterations; if INFO = i, elements
               i+1:ihi of W contains those eigenvalues which  have  been  suc-
               cessfully computed.

               Logic:
               ======

               This  algorithm  is  very  similar  to  DLAHQR.  Unlike DLAHQR,
               instead of sending one double shift through the  largest  unre-
               duced  submatrix,  this  algorithm sends multiple double shifts
               and spaces them apart so that there can be  parallelism  across
               several  processor row/columns.  Another critical difference is
               that this algorithm aggregrates multiple transforms together in
               order to apply them in a block fashion.

               Important  Local Variables: IBLK = The maximum number of bulges
               that can be computed.  Currently fixed.  Future  releases  this
               won't    be    fixed.    HBL    =   The   square   block   size
               (HBL=DESCA(MB_)=DESCA(NB_)) ROTN = The number of transforms  to
               block  together  NBULGE  =  The  number  of bulges that will be
               attempted on the current submatrix.  IBULGE = The current  num-
               ber  of  bulges started.  K1(*),K2(*) = The current bulge loops
               from K1(*) to K2(*).

               Subroutines: From LAPACK, this  routine  calls:  ZLAHQR      ->
               Serial  QR  used  to  determine  shifts  and eigenvalues ZLARFG
               -> Determine the Householder transforms

               This ScaLAPACK, this routine calls: PZLACONSB  -> To  determine
               where  to  start  each  iteration  ZLAMSH     -> Sends multiple
               shifts through a small submatrix to  see  how  the  consecutive
               subdiagonals  change (if PZLACONSB indicates we can start a run
               in the middle) PZLAWIL    -> Given the shift, get the transfor-
               mation  PZLACP3     -> Parallel array to local replicated array
               copy & back.  ZLAREF     -> Row/column reflector applier.  Core
               routine  here.  PZLASMSUB  -> Finds negligible subdiagonal ele-
               ments.

               Current Notes and/or Restrictions: 1.) This code  requires  the
               distributed  block  size  to  be  square  and at least six (6);
               unlike simpler codes like LU, this algorithm is extremely  sen-
               sitive to block size.  Unwise choices of too small a block size
               can lead to bad performance.  2.) This code requires A and Z to
               be  distributed  identically  and  have  identical  contxts.  A
               future version may allow Z to have a different contxt to 1D row
               map  it  to  all nodes (so no communication on Z is necessary.)
               3.) This code does not currently block the  initial  transforms
               so that none of the rows or columns for any bulge are completed
               until all are started.  To offset pipeline start-up it is  rec-
               ommended  that  at least 2*LCM(NPROW,NPCOL) bulges are used (if
               possible) 4.) The maximum number of bulges currently  supported
               is  fixed  at 32.  In future versions this will be limited only
               by the incoming WORK and IWORK array.  5.) The matrix A must be
               in  upper  Hessenberg  form.  If elements below the subdiagonal
               are nonzero, the resulting transforms may be nonsimilar.   This
               is  also  true  with  the  LAPACK routine ZLAHQR.  6.) For this
               release, this code has only been tested for RSRC_=CSRC_=0,  but
               it  has  been written for the general case.  7.) Currently, all
               the eigenvalues are  distributed  to  all  the  nodes.   Future
               releases will probably distribute the eigenvalues by the column
               partitioning.  8.) The internals of this routine are subject to
               change.  9.) To optimize this for your architecture, try tuning
               ZLAREF.  10.) This code has only been tested for WANTZ = .TRUE.
               and may behave unpredictably for WANTZ set to .FALSE.

FURTHER DETAILS
       Contributed by Mark Fahey, June, 2000.




ScaLAPACK routine               31 October 2017                     PZLAHQR(3)