我在MATLAB和Python中设置了两个关于广播矩阵乘法的相同测试。对于Python,我使用NumPy,对于MATLAB,我使用了使用BLAS的mtimesx库。在
MATLABclose all; clear;
N = 1000 + 100; % a few initial runs to be trimmed off at the end
a = 100;
b = 30;
c = 40;
d = 50;
A = rand(b, c, a);
B = rand(c, d, a);
C = zeros(b, d, a);
times = zeros(1, N);
for ii = 1:N
tic
C = mtimesx(A,B);
times(ii) = toc;
end
times = times(101:end) * 1e3;
plot(times);
grid on;
title(median(times));
Python
^{pr2}$
mypython是从使用OpenBLAS的pip安装的默认Python。在
我用的是英特尔NUC i3。在
MATLAB代码在1ms内运行,而Python在5.8ms内运行,我不明白为什么,因为它们似乎都在使用BLAS。在
编辑
来自水蟒:In [7]: np.__config__.show()
mkl_info:
libraries = ['mkl_rt']
library_dirs = [...]
define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
include_dirs = [...]
blas_mkl_info:
libraries = ['mkl_rt']
library_dirs = [...]
define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
include_dirs = [...]
blas_opt_info:
libraries = ['mkl_rt']
library_dirs = [...]
define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
include_dirs = [...]
lapack_mkl_info:
libraries = ['mkl_rt']
library_dirs = [...]
define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
include_dirs = [...]
lapack_opt_info:
libraries = ['mkl_rt']
library_dirs = [...]
define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
include_dirs = [...]
从numpy使用pipIn [2]: np.__config__.show()
blas_mkl_info:
NOT AVAILABLE
blis_info:
NOT AVAILABLE
openblas_info:
library_dirs = [...]
libraries = ['openblas']
language = f77
include_dirs = [...]
从numpy使用pipIn [2]: np.__config__.show()
blas_mkl_info:
NOT AVAILABLE
blis_info:
NOT AVAILABLE
openblas_info:
library_dirs = [...]
libraries = ['openblas']
language = f77
define_macros = [('HAVE_CBLAS', None)]
blas_opt_info:
library_dirs = [...]
libraries = ['openblas']
language = f77
define_macros = [('HAVE_CBLAS', None)]
lapack_mkl_info:
NOT AVAILABLE
openblas_lapack_info:
library_dirs = [...]
libraries = ['openblas']
language = f77
define_macros = [('HAVE_CBLAS', None)]
lapack_opt_info:
library_dirs = [...]
libraries = ['openblas']
language = f77
define_macros = [('HAVE_CBLAS', None)]
编辑2
我试着用np.matmul(A, B, out=C)代替C = A @ B,结果得到了2倍的时间,比如11毫秒左右,真的很奇怪。在