计算二维离散随机变量的联合概率分布函数,设二维随机变量的联合概率分布如下所示

一. 定义

Joint probability distribution:

给定至少两个随机变量X,Y,…, 它们的联合概率分布(Joint probability distribution)指的是每一个随机变量的值落入特定范围或者离散点集合内的概率. 对于只有两个随机变量的情况, 称为二元分布(bivariate distribution).

联合概率分布可以使用联合累计分布函数(joint cumulative distribution function), 连续随机变量的联合概率密度函数(joint probability density function)或者离散变量的联合概率质量函数(joint probability mass function)来描述. 由此又衍生出两个概念: 边缘分布(marginal distribution)和条件概率分布(conditional probability distribution).

二. 离散变量的联合概率质量函数公式

公式:

是给定 X=x 的 Y=y 的条件概率.
而且有:

如果 X 和Y相互独立:

如果 X 和Y条件不独立(conditionally dependent):
P(X=x and Y=y)=P(X=x)⋅P(Y=y|X=x)
也可以使用联合累计分布函数的差分来计算:
联合累计分布函数定义是:

所以 F(x,y) 的导数(差分)就是 P(X=x and Y=y)

三. 使用Matlab计算离散2D联合分布

参考: Calculating a 2D joint probability distribution
离散2D联合分布可用于计算两张图片的互信息MI.

0. 定义两个离散的随机变量.

有N个点分布在边长为1的正方形区域内. 把正方形分为K1*K2的小矩形. 统计每个小矩形内的点的个数.

% DataN = 1e5; % number of pointsxy = rand(N, 2); % coordinates of pointsxy(dcdjz(2*N, 100, 1)) = 0; % add some points on one sidexy(dcdjz(2*N, 100, 1)) = 1; % add some points on the other sidexy(dcdjz(N, 100, 1), :) = 0; % add some points on one cornerxy(dcdjz(N, 100, 1), :) = 1; % add some points on one cornerinds= unique(dcdjz(N, 100, 1)); xy(inds, :) = repmat([0 1], numel(inds), 1); % add some points on one cornerinds= unique(dcdjz(N, 100, 1)); xy(inds, :) = repmat([1 0], numel(inds), 1); % add some points on one corner% Intervals for rectanglesK1 = ceil(sqrt(N/5)); % number of intervals along xK2 = K1; % number of intervals along yint_x = [0:(1 / K1):1]; % intervals along xint_y = [0:(1 / K2):1]; % intervals along y 1. 从定义出发, 使用for循环: ticcount_cells = zeros(K1, K2);for k1 = 1:K1 inds1 = (xy(:, 1) >= int_x(k1)) & (xy(:, 1) < int_x(k1 + 1)); for k2 = 1:K2 inds2 = (xy(:, 2) >= int_y(k2)) & (xy(:, 2) < int_y(k2 + 1)); count_cells(k1, k2) = sum(inds1 .* inds2);% 布尔相乘得到交集点的个数 endendtoc% Elapsed time is 39.357691 seconds.

可见使用两重循环的计算时间非常长.

2. 使用hist3函数

N=hist3(X,'Edges',edges)是matlab中专门计算二元分布的函数.
edges是包含两个递增array的cell. 第一维分组edge1是edges{1}, 第二维分组edge2是edges{2}.
也就是:
edges1(i)<=X(k,1)<edges1(i+1)
edges2(j)<=X(k,2)<edges2(j+1)
正好落在 edges1(i+1) 或者 edges2(j+1) 上的点的个数放在N的最后一行或者最后一列.
hist3不统计edges范围外的部分.
N是一个二维矩阵, 统计的落到每个单元格内的点的个数.

ticcount_cells_hist = hist3(xy, 'Edges', {int_x int_y});% 注意hist3得到的矩阵是K1+1*K2+1的, 所以把最后一行和一列去掉.% 最后一行或一列表示的是 X(k,1)= edges{1}(end)或者X(k,2) = edges{2}(end)的点数count_cells_hist(end, :) = []; count_cells_hist(:, end) = [];tocall(count_cells(:) == count_cells_hist(:))% Elapsed time is 0.017995 seconds.

显然比用两重for循环快多了.

3. 使用矩阵二元操作bsxfun

C = bsxfun(fun,A,B)对A和B做逐个元素的二元操作, 操作由函数 fun指定.
返回的C中, 1表示满足条件, 0 表示不满足条件. 可用的fun有:

funoperation@plusPlus@minusMinus@timesArraymultiply@rdivideRightarray divide@ldivideLeftarray divide@powerArray power@maxBinary maximum@minBinary minimum@remRemainder after division@modModulus after division@atan2Four-quadrant inverse tangent; result in radians@atan2dFour-quadrant inverse tangent; result in degrees@hypotSquare root of sum of squares@eqEqual@复杂的冰淇淋 equal@ltLess than@leLess than or equal to@gtGreater than@geGreater than or equal to@andElement-wiselogical AND@orElement-wiselogical OR@xorLogicalexclusive OR

使用bsxfun的matlab代码:

%% bsxfunticxcomps = single(bsxfun(@ge,xy(:,1),int_x));% 10000*143矩阵ycomps = single(bsxfun(@ge,xy(:,2),int_y));% 10000*143矩阵% 相当于求CDFcount_again = xcomps.' * ycomps; %' 143x143 = 143x1e5 * 1e5x143% 差分后是142*142count_again_fix = diff(diff(count_again')');toc% Elapsed time is 0.178316 seconds.all(count_cells_hist(:) == count_again_fix(:))

bsxfun稍逊于hist3, 可以针对没有statistics toolbox的情况下使用.

4. 使用accumarray

A= accumarray(subs,val)使用subs的元素值作为索引. subs和val是一一对应的. 将subs中相同值对应的val值累加. 也就是说, subs中元素的位置决定了val哪些元素相加, subs中元素的值决定了累加值在输出中的位置. 看matlab help中示例:

Example 1
Create a 5-by-1 vector and sum values for repeated 1-D subscripts:
val = 101:105;
subs = [1; 2; 4; 2; 4];
A = accumarray(subs, val)

A =
101 % A(1) = val(1) = 101
206 % A(2) = val(2)+val(4) = 102+104 = 206
0 % A(3) = 0
208 % A(4) = val(3)+val(5) = 103+105 = 208

subs中元素值必须是正整数值. 所以在表示分组时, 可以把[0,1]区间变为[1,K1]区间. matlab代码:

%%%%% 第五种方法Using accumarray% Another approach is to use accumarray to make the joint histogram after we 文艺的花卷 the data.% Starting with int_x, int_y, K1, xy, etc.:tic% take (0,1) data onto [1 K1], following A.Dondas approach for easy comparisonii = floor(xy(:,1)*(K1-eps))+1; ii(ii<1) = 1; ii(ii>K1) = K1;zrdttt = floor(xy(:,2)*(K1-eps))+1; zrdttt(zrdttt<1) = 1; zrdttt(zrdttt>K1) = K1;% create the histogram and normalizeH = accumarray([ii zrdttt],ones(1,size(ii,1)));PDF = H / size(xy,1); % for probabilities summing to 1toc% Elapsed time is 0.006356 seconds.all(count_cells_hist(:) == count_again_fix(:))

ms级别! 真是快!

5. 使用mex编译

mex混合编程参考: 在Matlab中使用mex函数进行C/C++混合编程

#include "mex.h"// http://stackoverflow.com/questions/19745917/calculating-a-2d-joint-probability-distributionvoid mexFunction(int nlhs, mxArray *plhs[], int nrhs, const mxArray *prhs[]){ unsigned long int hh, ctrl; /* counters */ unsigned long int N, m, n; /* size of matrices */ unsigned long int *xy; /* data */ unsigned long int *count_cells; /* joint frequencies */ /* matrices needed */ mxArray *count_cellsArray;/* Now we need to get the data */ if (nrhs == 3) { xy = (unsigned long int*) mxGetData(prhs[0]); N = (unsigned long int) mxGetM(prhs[0]);//取矩阵的行数 m = (unsigned long int) mxGetScalar(prhs[1]); n = (unsigned long int) mxGetScalar(prhs[2]); }/* Then build the matrices for the output */ count_cellsArray = mxCreateNumericMatrix(m + 1, n + 1, mxUINT32_CLASS, mxREAL); count_cells = mxGetData(count_cellsArray); plhs[0] = count_cellsArray; hh = 0; /* counter for elements of xy */ /* for all points from 1 to N */ for(hh=0; hh<N; hh++) { ctrl = (m + 1) * xy[N + hh] + xy[hh]; count_cells[ctrl] = count_cells[ctrl] + 1; }}

将代码保存为: joint_dist_points_2D.c. 在matlab cmd中运行:

mex joint_dist_points_2D.c

生成joint_dist_points_2D.mexw32文件.
matlab调用代码:

% Use mex functionticxy2 = uint32(floor(xy ./ repmat([1 / K1, 1 / K2], N, 1)));count_cells = joint_dist_points_2D(xy2, uint32(K1), uint32(K2));toc% Elapsed time is 0.011696 seconds.

也是非常快的.