PCLATTRS(3)   ScaLAPACK routine of NEC Numeric Library Collection  PCLATTRS(3)



NAME
       PCLATTRS - solve one of the triangular systems  A * x = s*b, A**T * x =
       s*b, or A**H * x = s*b,

SYNOPSIS
       SUBROUTINE PCLATTRS( UPLO, TRANS, DIAG, NORMIN, N, A, IA, JA, DESCA, X,
                            IX, JX, DESCX, SCALE, CNORM, INFO )

           CHARACTER        DIAG, NORMIN, TRANS, UPLO

           INTEGER          IA, INFO, IX, JA, JX, N

           REAL             SCALE

           INTEGER          DESCA( * ), DESCX( * )

           REAL             CNORM( * )

           COMPLEX          A( * ), X( * )

PURPOSE
       PCLATTRS  solves  one of the triangular systems A * x = s*b, A**T * x =
       s*b, or A**H * x = s*b, with scaling to prevent overflow.  Here A is an
       upper or lower triangular matrix, A**T denotes the transpose of A, A**H
       denotes the conjugate transpose of A, x and b  are  n-element  vectors,
       and  s  is a scaling factor, usually less than or equal to 1, chosen so
       that the components of x will be less than the overflow threshold.   If
       the unscaled problem will not cause overflow, the Level 2 PBLAS routine
       PCTRSV is called. If the matrix A is singular (A(j,j) = 0 for  some  j)
       then s is set to 0 and a non-trivial solution to A*x = 0 is returned.

       This  is  very  slow relative to PCTRSV.  This should only be used when
       scaling is necessary to control overflow, or when  it  is  modified  to
       scale better.

       Notes
       =====

       Each  global data object is described by an associated description vec-
       tor.  This vector stores the information required to establish the map-
       ping between an object element and its corresponding process and memory
       location.

       Let A be a generic term for any 2D block  cyclicly  distributed  array.
       Such a global array has an associated description vector DESCA.  In the
       following comments, the character _ should be read as  "of  the  global
       array".

       NOTATION        STORED IN      EXPLANATION
       --------------- -------------- --------------------------------------
       DTYPE_A(global) DESCA( DTYPE_ )The descriptor type.  In this case,
                                      DTYPE_A = 1.
       CTXT_A (global) DESCA( CTXT_ ) The BLACS context handle, indicating
                                      the BLACS process grid A is distribu-
                                      ted over. The context itself is glo-
                                      bal, but the handle (the integer
                                      value) may vary.
       M_A    (global) DESCA( M_ )    The number of rows in the global
                                      array A.
       N_A    (global) DESCA( N_ )    The number of columns in the global
                                      array A.
       MB_A   (global) DESCA( MB_ )   The blocking factor used to distribute
                                      the rows of the array.
       NB_A   (global) DESCA( NB_ )   The blocking factor used to distribute
                                      the columns of the array.
       RSRC_A (global) DESCA( RSRC_ ) The process row over which the first
                                      row  of  the  array  A  is  distributed.
       CSRC_A (global) DESCA( CSRC_ ) The process column over which the
                                      first column of the array A is
                                      distributed.
       LLD_A  (local)  DESCA( LLD_ )  The leading dimension of the local
                                      array.  LLD_A >= MAX(1,LOCr(M_A)).

       Let K be the number of rows or columns of  a  distributed  matrix,  and
       assume that its process grid has dimension r x c.
       LOCr(  K  )  denotes  the  number of elements of K that a process would
       receive if K were distributed over the r processes of its process  col-
       umn.
       Similarly, LOCc( K ) denotes the number of elements of K that a process
       would receive if K were distributed over the c processes of its process
       row.
       The  values  of  LOCr()  and LOCc() may be determined via a call to the
       ScaLAPACK tool function, NUMROC:
               LOCr( M ) = NUMROC( M, MB_A, MYROW, RSRC_A, NPROW ),
               LOCc( N ) = NUMROC( N, NB_A, MYCOL, CSRC_A, NPCOL ).  An  upper
       bound for these quantities may be computed by:
               LOCr( M ) <= ceil( ceil(M/MB_A)/NPROW )*MB_A
               LOCc( N ) <= ceil( ceil(N/NB_A)/NPCOL )*NB_A


ARGUMENTS
       UPLO    (global input) CHARACTER*1
               Specifies whether the matrix A is upper or lower triangular.  =
               'U':  Upper triangular
               = 'L':  Lower triangular

       TRANS   (global input) CHARACTER*1
               Specifies the operation applied to A.  = 'N':  Solve A  *  x  =
               s*b     (No transpose)
               = 'T':  Solve A**T * x = s*b  (Transpose)
               = 'C':  Solve A**H * x = s*b  (Conjugate transpose)

       DIAG    (global input) CHARACTER*1
               Specifies  whether  or  not the matrix A is unit triangular.  =
               'N':  Non-unit triangular
               = 'U':  Unit triangular

       NORMIN  (global input) CHARACTER*1
               Specifies whether CNORM has been set or  not.   =  'Y':   CNORM
               contains the column norms on entry
               =  'N':  CNORM is not set on entry.  On exit, the norms will be
               computed and stored in CNORM.

       N       (global input) INTEGER
               The order of the matrix A.  N >= 0.

       A       (local input) COMPLEX array, dimension (DESCA(LLD_),*)
               The triangular matrix A.  If UPLO = 'U', the  leading  n  by  n
               upper  triangular part of the array A contains the upper trian-
               gular matrix, and the strictly lower triangular part  of  A  is
               not referenced.  If UPLO = 'L', the leading n by n lower trian-
               gular part of the array A contains the lower triangular matrix,
               and  the strictly upper triangular part of A is not referenced.
               If DIAG = 'U', the diagonal elements of A are also  not  refer-
               enced and are assumed to be 1.

       IA      (global input) pointer to INTEGER
               The global row index of the submatrix of the distributed matrix
               A to operate on.

       JA      (global input) pointer to INTEGER
               The global column index of the  submatrix  of  the  distributed
               matrix A to operate on.

       DESCA   (global and local input) INTEGER array of dimension DLEN_.
               The array descriptor for the distributed matrix A.

       X       (local input/output) COMPLEX array,
               dimension  (DESCX(LLD_),*)  On  entry, the right hand side b of
               the triangular system.  On exit, X is overwritten by the  solu-
               tion vector x.

       IX      (global input) pointer to INTEGER
               The global row index of the submatrix of the distributed matrix
               X to operate on.

       JX      (global input) pointer to INTEGER
               The global column index of the  submatrix  of  the  distributed
               matrix X to operate on.

       DESCX   (global and local input) INTEGER array of dimension DLEN_.
               The array descriptor for the distributed matrix X.

       SCALE   (global output) REAL
               The  scaling  factor  s  for the triangular system A * x = s*b,
               A**T * x = s*b,  or  A**H * x = s*b.  If SCALE = 0, the  matrix
               A  is singular or badly scaled, and the vector x is an exact or
               approximate solution to A*x = 0.

       CNORM   (global input or global output) REAL array,
               dimension (N) If NORMIN = 'Y', CNORM is an input  argument  and
               CNORM(j) contains the norm of the off-diagonal part of the j-th
               column of A.  If TRANS = 'N', CNORM(j) must be greater than  or
               equal to the infinity-norm, and if TRANS = 'T' or 'C', CNORM(j)
               must be greater than or equal to the 1-norm.

               If NORMIN = 'N', CNORM  is  an  output  argument  and  CNORM(j)
               returns  the  1-norm of the offdiagonal part of the j-th column
               of A.

       INFO    (global output) INTEGER
               = 0:  successful exit
               < 0:  if INFO = -k, the k-th argument had an illegal value

FURTHER DETAILS
       A rough bound on x is computed; if that is less than  overflow,  PCTRSV
       is  called,  otherwise, specific code is used which checks for possible
       overflow or divide-by-zero at every operation.

       A columnwise scheme is used for solving A*x = b.  The  basic  algorithm
       if A is lower triangular is

            x[1:n] := b[1:n]
            for j = 1, ..., n
                 x(j) := x(j) / A(j,j)
                 x[j+1:n] := x[j+1:n] - x(j) * A[j+1:n,j]
            end

       Define bounds on the components of x after j iterations of the loop:
          M(j) = bound on x[1:j]
          G(j) = bound on x[j+1:n]
       Initially, let M(0) = 0 and G(0) = max{x(i), i=1,...,n}.

       Then for iteration j+1 we have
          M(j+1) <= G(j) / | A(j+1,j+1) |
          G(j+1) <= G(j) + M(j+1) * | A[j+2:n,j+1] |
                 <= G(j) ( 1 + CNORM(j+1) / | A(j+1,j+1) | )

       where  CNORM(j+1) is greater than or equal to the infinity-norm of col-
       umn j+1 of A, not counting the diagonal.  Hence

          G(j) <= G(0) product ( 1 + CNORM(i) / | A(i,i) | )
                       1<=i<=j
       and

          |x(j)| <= ( G(0) / |A(j,j)| ) product ( 1 + CNORM(i) / |A(i,i)| )
                                        1<=i< j

       Since |x(j)| <= M(j), we use the Level 2 PBLAS routine  PCTRSV  if  the
       reciprocal of the largest M(j), j=1,..,n, is larger than
       max(underflow, 1/overflow).

       The  bound on x(j) is also used to determine when a step in the column-
       wise method can be performed without fear of overflow.  If the computed
       bound  is  greater  than a large constant, x is scaled to prevent over-
       flow, but if the bound overflows, x is set to 0, x(j) to 1,  and  scale
       to 0, and a non-trivial solution to A*x = 0 is found.

       Similarly, a row-wise scheme is used to solve A**T *x = b  or A**H *x =
       b.  The basic algorithm for A upper triangular is

            for j = 1, ..., n
                 x(j) := ( b(j) - A[1:j-1,j]' * x[1:j-1] ) / A(j,j)
            end

       We simultaneously compute two bounds
            G(j) = bound on ( b(i) - A[1:i-1,i]' * x[1:i-1] ), 1<=i<=j
            M(j) = bound on x(i), 1<=i<=j

       The initial values are G(0) = 0, M(0) = max{b(i), i=1,..,n}, and we add
       the  constraint G(j) >= G(j-1) and M(j) >= M(j-1) for j >= 1.  Then the
       bound on x(j) is

            M(j) <= M(j-1) * ( 1 + CNORM(j) ) / | A(j,j) |

                 <= M(0) * product ( ( 1 + CNORM(i) ) / |A(i,i)| )
                           1<=i<=j

       and we can safely call PCTRSV if 1/M(n) and  1/G(n)  are  both  greater
       than max(underflow, 1/overflow).

       Last modified by: Mark R. Fahey, August 2000




ScaLAPACK routine               31 October 2017                    PCLATTRS(3)