|
1 | 1 | OpenBLAS ChangeLog |
| 2 | +==================================================================== |
| 3 | +Version 0.3.8 |
| 4 | + 9-Feb-2020 |
| 5 | + |
| 6 | +common: |
| 7 | +` * LAPACK has been updated to 3.9.0 (plus patches up to |
| 8 | + January 2nd, 2020) |
| 9 | + * CMAKE support has been improved in several areas including |
| 10 | + cross-compilation |
| 11 | + * a thread race condition in the GEMM3M kernels was resolved |
| 12 | + * the "generic" (plain C) gemm beta kernel used by many targets |
| 13 | + has been sped up |
| 14 | + * an optimized version of the LAPACK trtrs functions has been added |
| 15 | + * an incompatibilty between the LAPACK tests and the OpenBLAS |
| 16 | + implementation of XERBLA was resolved, removing the numerous |
| 17 | + warnings about wrong error exits in the former |
| 18 | + * support for NetBSD has been added |
| 19 | + * support for compilation with g95 and non-GNU versions of ld |
| 20 | + has been improved |
| 21 | + * support for compilation with (upcoming) gcc 10 has been added |
| 22 | + |
| 23 | +POWER: |
| 24 | + * worked around miscompilation of several POWER8 and POWER9 |
| 25 | + kernels by older versions of gcc |
| 26 | + * added support for big-endian POWER8 and for compilation on AIX |
| 27 | + * corrected bugs in the big-endian support for PPC440 and PPC970 |
| 28 | + * DYNAMIC_ARCH support is now available in CMAKE builds as well |
| 29 | + |
| 30 | +ARMV8: |
| 31 | + * performance of DGEMM_BETA and SGEMM_NCOPY has been improved |
| 32 | + * compilation for 32bit works again |
| 33 | + * performance of the RPCC function has been improved |
| 34 | + * improved performance on small systems |
| 35 | + * DYNAMIC_ARCH support is now available in CMAKE builds as well |
| 36 | + * cross-compilation from OSX to IOS was simplified |
| 37 | + |
| 38 | +x86_64: |
| 39 | + * a new AVX512 DGEMM kernel was added and the AVX512 SGEMM kernel |
| 40 | + was significantly improved |
| 41 | + * optimized AVX512 kernels for CGEMM and ZGEMM have been added |
| 42 | + * AVX2 kernels for STRMM, SGEMM, and CGEMM have been significantly |
| 43 | + sped up and optimized CGEMM3M and ZGEMM3M kernels have been added |
| 44 | + * added support for QEMU virtual cpus |
| 45 | + * a compilation problem with PGI and SUN compilers was fixed |
| 46 | + * Intel "Goldmont plus" is now autodetected |
| 47 | + * a potential crash on program exit on MS Windows has been fixed |
| 48 | + |
| 49 | +x86: |
| 50 | + * an unwanted case sensitivity in the implementation of LSAME |
| 51 | + on older 32bit AMD cpus was fixed |
| 52 | + |
| 53 | +zarch: |
| 54 | + * Z15 is now supported as Z14 |
| 55 | + * DYNAMIC_ARCH is now available on ZARCH as well |
| 56 | + |
2 | 57 | ==================================================================== |
3 | 58 | Version 0.3.7 |
4 | 59 | 11-Aug 2019 |
5 | 60 |
|
6 | 61 | common: |
7 | | - * having the gmake special variables TARGET_ARCH or TARGET_MACH |
8 | | - defined no longer causes build failures in ctest or utest |
9 | | - * defining NO_AFFINITY or USE_TLS to 0 in gmake builds no longer |
10 | | - has the same effect as setting them to 1 |
11 | | - * a new test program was added to allow checking the library for |
12 | | - thread safety |
13 | | - * a new option USE_LOCKING was added to ensure thread safety when |
14 | | - OpenBLAS itself is built without multithreading but will be |
15 | | - called from multiple threads. |
16 | | - * a build failure on Linux with glibc versions earlier than 2.5 |
17 | | - was fixed |
18 | | - * a runtime error with CPU enumeration (and NO_AFFINITY not set) |
19 | | - on glibc 2.6 was fixed |
20 | | - * NO_AFFINITY was added to the CMAKE options (and defaults to being |
21 | | - active on Linux, as in the gmake builds) |
| 62 | + * having the gmake special variables TARGET_ARCH or TARGET_MACH |
| 63 | + defined no longer causes build failures in ctest or utest |
| 64 | + * defining NO_AFFINITY or USE_TLS to 0 in gmake builds no longer |
| 65 | + has the same effect as setting them to 1 |
| 66 | + * a new test program was added to allow checking the library for |
| 67 | + thread safety |
| 68 | + * a new option USE_LOCKING was added to ensure thread safety when |
| 69 | + OpenBLAS itself is built without multithreading but will be |
| 70 | + called from multiple threads. |
| 71 | + * a build failure on Linux with glibc versions earlier than 2.5 |
| 72 | + was fixed |
| 73 | + * a runtime error with CPU enumeration (and NO_AFFINITY not set) |
| 74 | + on glibc 2.6 was fixed |
| 75 | + * NO_AFFINITY was added to the CMAKE options (and defaults to being |
| 76 | + active on Linux, as in the gmake builds) |
22 | 77 |
|
23 | 78 | x86_64: |
24 | | - * the build-time logic for detection of AVX512 availability in |
25 | | - the processor and compiler was fixed |
26 | | - * gmake builds on OSX now set the internal name of the library to |
27 | | - libopenblas.0.dylib (consistent with CMAKE) |
28 | | - * the Haswell DGEMM kernel received a significant speedup through |
29 | | - improved prefetch and load instructions |
30 | | - * performance of DGEMM, DTRMM, DTRSM and ZDOT on Zen/Zen2 was markedly |
31 | | - increased by avoiding vpermpd instructions |
32 | | - * the SKYLAKEX (AVX512) DGEMM helper functions have now been disabled |
33 | | - to fix remaining errors in DGEMM, DSYMM and DTRMM |
34 | | - |
35 | | -## POWER: |
36 | | - * added support for building on FreeBSD/powerpc64 and FreeBSD/ppc970 |
37 | | - * added optimized kernels for POWER9 single and double precision complex BLAS3 |
38 | | - * added optimized kernels for POWER9 SGEMM and STRMM |
39 | | - |
40 | | -## ARMV7: |
41 | | - * fixed the softfp implementations of xAMAX and IxAMAX |
42 | | - * removed the predefined -march= flags on both ARMV5 and ARMV6 as |
43 | | - they were appropriate for only a subset of platforms |
| 79 | + * the build-time logic for detection of AVX512 availability in |
| 80 | + the processor and compiler was fixed |
| 81 | + * gmake builds on OSX now set the internal name of the library to |
| 82 | + libopenblas.0.dylib (consistent with CMAKE) |
| 83 | + * the Haswell DGEMM kernel received a significant speedup through |
| 84 | + improved prefetch and load instructions |
| 85 | + * performance of DGEMM, DTRMM, DTRSM and ZDOT on Zen/Zen2 was markedly |
| 86 | + increased by avoiding vpermpd instructions |
| 87 | + * the SKYLAKEX (AVX512) DGEMM helper functions have now been disabled |
| 88 | + to fix remaining errors in DGEMM, DSYMM and DTRMM |
| 89 | + |
| 90 | +POWER: |
| 91 | + * added support for building on FreeBSD/powerpc64 and FreeBSD/ppc970 |
| 92 | + * added optimized kernels for POWER9 SGEMM and STRMM |
| 93 | + |
| 94 | +ARMV7: |
| 95 | + * fixed the softfp implementations of xAMAX and IxAMAX |
| 96 | + * removed the predefined -march= flags on both ARMV5 and ARMV6 as |
| 97 | + they were appropriate for only a subset of platforms |
44 | 98 |
|
45 | 99 | ==================================================================== |
46 | 100 | Version 0.3.6 |
|
0 commit comments