Alternative VE Offloading
2.7.3
|
VEO requires 32 Huge Pages per VE context. Make sure the system is configured with huge pages by checking /proc/meminfo
.
VEO does not support quadruple precision real number and variable length character strings as arguments of Fortran subroutines and functions.
VEO does not support quadruple precision real number and variable length character strings as return values of Fortran functions.
When veo_proc_create() is invoked, multiple threads for a OpenMP program are created on VE side in the default context. If you do not use OpenMP, set the environment variable VE_OMP_NUM_THREADS=1.
If using more VE contexts inside one proc, restrict the contexts to use only one OpenMP thread. Multiple contexts with multiple OpenMP threads do not work.
Synchronous APIs wait the completion of previous requests submitted by asynchronous APIs. Synchronous APIs are below:
The size of arguments passed to functions is limited to 63MB, since the size of the initial stack is 64MB. Allocate and use memory buffers on heap when you have huge argument arrays to pass.
If a user specifies an incorrect address to veo_write_mem() or veo_async_write_mem(), veo_write_mem() expects to return -1, veo_call_wait_result() expects to return VEO_COMMAND_ERROR, but may return VEO_COMMAND_OK. Please note that the user can be aware of the error because subsequent calls to the VEO API will return an error.
The transfer speed of veo_write_mem() or veo_async_write_mem() become slow depending on the memory location(NUMA node) of the write destination. The data transfer speed may become stably high by running the program via numactl. Please note that the optimal command option will change depending on the operating conditions of the machine and software. Execute the following command and try to validate if the transfer speed become stably high.
numactl --localalloc <filename>
numactl --cpunodebind=<NUMA node> --localalloc <filename>