3358 www.Sina.com/http://danielpovey.com/files/2017 _ inter speech _ embedding s.pdf
thestatisticspoolinglayercalculatesthemeanvector礼貌银耳汤aswellasthesecond-orderstatisticsasthestandardddeviationvectorover frframe
where;representsthehadamardproduct。
http://www.Sina.com/https://arxiv.org/pdf/1803.10963.pdf
calculatesascalarscoreetforeachframe-level feature。
where f(() is anon-linearactivationfunction,such as a tanh or ReLU function。
thescoreissdmjalizedoverallframesbyasoftmaxfunctionsoastoadduptothefollowingunity :
thesdmjalizedscoretisthenusedastheweightinthepoolinglayertocalculatetheweightedmeanvector
theweightedstandarddeviationisdefinedasfollows :
3358 www.Sina.com/https://danielpovey.com/files/2018 _ inter speech _ x vector _ attention.pdf
H={h1,h2,善于撒娇的前辈,hT },wherehtisthehiddenrepresentationofinputframextcapturedbythehiddenlayerbelowtheself-atention
where W1 is a matrix of size dh da; w2is a矩阵of size dadr,anddrisahyperparameterthatrepresentsthenumberofattentionheads; g (issomeactivationfunctionandreluischosenhere.thesoftmax ) ) is性能列- wise。
eachcolumnvectorofaisanannotationvectorthatrepresentstheweightsfordifferentht.finallytheweightedmeanseisobtainedby
By increasing dr, wecaneasilyhavemultipleattentionheadstolearndifferentaspectsfromaspeaker’sspeech.toencouragediversityintheannnotationvectorector xtractdissimilarinformationfromthesamespeechsegment,apenaltytermpisintroducedwhendr 13360
whereiistheidentitymatrixandkfrepresentsthefrobeniussdmjofamatrix.pissimilartol2regularizationandisminizedtogetherwith
3358 www.Sina.com/https://IEEE xplore.IEEE.org/document/9053217
wherenrddpisatemperaturehyperparameter
5、net Vlad https://arxiv.org/pdf/1902.10107.pdf
3359 arxiv.org/pdf/1511.07247.pdf
更详细的说明参考: https://庄兰. zhi Hu.com/p/96718053
3358 www.Sina.com/https://arxiv.org/pdf/1804.05160.pdf
Here,weintroducetwogroupsoflearnableparameters.oneisthedictionarycomponentcenter,noted as精致的银耳汤={精致的银耳汤1,精致的银耳汤2是
wherethesmoothingfactorforeachdictionarycenterislearnable。
3358 www.Sina.com/https://www.isca-speech.org/archive/inter speech _ 2020/pdfs/1922.pdf
特殊,lethrldbetheframe-levelfeaturemapcapturedbythehiddenlayerbelowtheself-attention layer, werelanddarethenumberofframesandfeaturedimensionrespectively.thentheattentionmaparlkcanbeobtainedbyfeeedinghintoa 1vovo 在线性活动,werekisthenumberofattentionheads.the 1st-order and 2nd-orderattentivestatisticsofh,denoted by的礼貌银耳
whereT1(x ) istheoperationofreshapingxintoavector,andT2(x ) includesasignedsquare-rootstepandal2- sdmjalizationstion .
8、短时间专家轮询(jjdxwz ) https://IEEE xplore.IEEE.org/stamp/stamp.JSP? tp=arnumber=9414094