0:00:39 | oh |
---|---|

0:00:41 | also |

0:00:43 | fusion techniques for extracting i-vectors |

0:00:46 | by efficient the |

0:00:49 | we went looking for some way to address |

0:00:52 | most of the memory of patient of the i-vector extractor |

0:00:55 | extracting genes |

0:00:58 | so the results a bit more state-of-the-art technology nowadays is based on i-vectors which are |

0:01:04 | very good as |

0:01:05 | produced a traditional |

0:01:08 | the computation of i-vector can be quite demanding in that at least one of the |

0:01:13 | time |

0:01:15 | so while some solutions |

0:01:19 | proposed for a system action with low memory requirements the namely |

0:01:24 | the diagonal isolate vectors proposed bigram but plus the |

0:01:29 | that is |

0:01:31 | we should also shown to have some degradation in accuracy |

0:01:36 | when |

0:01:38 | some to show some degradation of accuracy so we |

0:01:42 | well looking for a solution which does not include such degradation |

0:01:46 | but still those two |

0:01:49 | and greatly reduce the amount of memory required to store |

0:01:54 | so how variation action again today |

0:01:59 | that represent the original baser aside for i-vector extraction which is |

0:02:04 | can see that one in the previous two |

0:02:07 | then we present our conjugate gradient approach for i-vector extraction and finally present some experimental |

0:02:14 | results of these techniques |

0:02:17 | so |

0:02:20 | i guess everybody else what's i-vectors are but does brief introduction |

0:02:25 | there are not only for low dimensional informative for each utterance the presentations which is |

0:02:30 | that i don't is like model |

0:02:34 | so the most widely used |

0:02:36 | i-vector race if we |

0:02:39 | assume that |

0:02:41 | most of the speaker and channel variations like that small subspace in the supervector space |

0:02:47 | then we assume a session prior for the latent variable representing these variation |

0:02:54 | and |

0:02:55 | approximating the data likelihood by means so well with statistics we can compute the posterior |

0:03:00 | of these latent variable |

0:03:02 | and then we compute the i-vector a maximum a posteriori estimate of the latent variables |

0:03:11 | we can show that the post |

0:03:13 | is abortion these correspond to the a posteriori |

0:03:17 | and for the i-vector |

0:03:19 | so as you can see yeah |

0:03:22 | computing is the computational cost matrix |

0:03:26 | which and tasty a multiplication of the for the inverse matrix times the eigenvoice matrix |

0:03:32 | is that |

0:03:33 | or additional |

0:03:36 | these |

0:03:38 | this dataset |

0:03:39 | which are |

0:03:41 | a dimensionality which is where the i-vector dimensionality |

0:03:47 | so |

0:03:48 | we can see that |

0:03:50 | no plastic on a selection techniques can be |

0:03:54 | and |

0:03:55 | that is all so you see represents the number of abortions and the feature dimensionality |

0:04:02 | and then use the i-vector dimensionality |

0:04:05 | so if we don't put anything we have a |

0:04:09 | complexity which is the |

0:04:11 | with a quadratic in the i-vector dimensionality |

0:04:15 | and is the examples in the number of gaussian in the definition of two features |

0:04:21 | we can reduce the complexity but i mean and we ask that if we want |

0:04:26 | to this matter since he |

0:04:28 | but this is we have a shot of memory constraint which is again quadratic in |

0:04:33 | the effect of the nation's and proportional to the number of abortions |

0:04:38 | with |

0:04:39 | jessica like two thousand forty eight dimension of the ubm as used in this is |

0:04:46 | easily the most expensive |

0:04:47 | part that's memory of an i-vector instead |

0:04:52 | so i thought that was the last yeah i organisation based on that have a |

0:04:58 | nice mess over a vector instruction was proposed |

0:05:02 | which essentially okay we can i forgot mention that we can have the same as |

0:05:08 | that yeah |

0:05:10 | from the form has just by performance a normalization for the problem with statistics and |

0:05:15 | in this case of the eigenvoice matrix |

0:05:18 | then we can assume that these are simultaneously that as a model by some methods |

0:05:23 | Q and that we cannot compute an approximation of the posterior covariance which is |

0:05:30 | the yeah not so that |

0:05:32 | and session |

0:05:34 | can be performed in a much faster way with a very limited additional requirements |

0:05:40 | however you know it's |

0:05:42 | yes |

0:05:43 | right i can cause a degradation recognition accuracy |

0:05:47 | so we wanted to do better in terms of what you see here |

0:05:52 | so |

0:05:53 | and we said that the problem is the computation of the covariance matrix |

0:05:59 | the problem is that the covariance matrix is not that yeah |

0:06:03 | if you |

0:06:05 | this means that the i-vector components would be uncorrelated |

0:06:10 | you're and the posteriors that would factorize |

0:06:14 | so even though the posterior said that cannot be factorized about the different components we |

0:06:19 | look for an approximation of the posterior which factorizes all the sets of the i-vector |

0:06:25 | components |

0:06:27 | so we partition the i-vector components in to be disjoint sets |

0:06:32 | and we assume that the |

0:06:33 | here are can be approximated by |

0:06:36 | i distribution which factorizes of these states |

0:06:39 | yeah |

0:06:41 | the correlation baseband for facades a |

0:06:44 | way to estimate is the approximate posterior |

0:06:48 | by minimizing the kl divergence between the original posterior and this approximation |

0:06:55 | so |

0:06:58 | yeah i need to introduce some notation |

0:07:00 | namely we just |

0:07:03 | then all the |

0:07:05 | a simple the eigenvoices an associated to each block |

0:07:09 | of the i-vectors all each can i |

0:07:12 | we i is associated with a low that you wanna buy vector components |

0:07:18 | and these are just the compliments of those |

0:07:20 | subsets so that we can express |

0:07:24 | duplication in this way |

0:07:26 | so if we do some until we updated for each |

0:07:31 | a factor of the posterior of the approximate posterior |

0:07:35 | the its distribution is a great nor without expression which is very see that the |

0:07:41 | original i-vector inspiration |

0:07:43 | the difference is that this precision matrix is here are computed using the eigenvoices relative |

0:07:49 | to this subset |

0:07:51 | and for the mean of the posterior we are essentially centering the statistics over a |

0:07:58 | slightly different ubm |

0:08:00 | essentially we |

0:08:02 | say that |

0:08:04 | if we assume that are not components of the i-vector a fixed size and we |

0:08:09 | are |

0:08:10 | to this end the statistics of these new ubm |

0:08:15 | and |

0:08:16 | this is |

0:08:18 | these are allows us to see what is the complexity of this that be |

0:08:24 | we do not take a |

0:08:27 | okay reestimations only a new implementation implementing this technique because |

0:08:32 | if we just compute this at every time with a block size with a block |

0:08:37 | of size one |

0:08:39 | the complexity is again what that the unit vector images because every time |

0:08:44 | centering this |

0:08:47 | so we need is |

0:08:50 | we keep a supervector of a set of statistics which are always cat center of |

0:08:57 | the i-vector estimate |

0:09:00 | and we use the real well then you mean is computed by removing the centre |

0:09:07 | and all those components that we are estimating and then after we had they the |

0:09:13 | mean we update and you'll a vector of since order statistics so that its center |

0:09:19 | of the joystick to be a vector |

0:09:22 | so this way if we consider the contribution of the computational the precision matrix the |

0:09:28 | complexity of this approach is proportional to the dimensionality of i-vectors and the number of |

0:09:35 | iterations that we need to perform |

0:09:37 | to compute the i-vector |

0:09:41 | i can see is so that the similarity of this form with the original i-vector |

0:09:46 | was the covariance matrix essentially these are the block diagonal of that the last matrix |

0:09:52 | and we can model |

0:09:54 | again |

0:09:55 | two different techniques to compute the and you know we |

0:09:58 | compute |

0:09:59 | we therefore computation to compute the every time this covariance matrices |

0:10:04 | or we can restore the block diagram but also the audience matrix so in this |

0:10:09 | case we get |

0:10:11 | plus i selection time but slightly higher memory and the memory requirements depend on the |

0:10:16 | size we choose for the block |

0:10:19 | so essentially well we can show that this variational bayes and the variational bayes approach |

0:10:26 | implements a gaussian approach to the solution of this you know system |

0:10:32 | and we also investigated a different |

0:10:35 | techniques for |

0:10:37 | so it is used and namely the jacobi method in the conjugate gradient vector |

0:10:43 | what we found out is that the jacobi method is very see that this approach |

0:10:47 | but instead of updating the |

0:10:50 | i-vector after each iteration you have a vector is updated only after all components to |

0:10:55 | be estimated |

0:10:56 | in these encoders and this causes slightly slow whatever |

0:11:01 | the |

0:11:02 | the convergence rates in our experience |

0:11:06 | yeah we analyze is conjugate gradient |

0:11:09 | what's nice about squinted at it is that we don't need to be bad |

0:11:14 | the |

0:11:16 | covariance matrix here |

0:11:19 | what to do is that we don't even need to compute it really because we |

0:11:23 | just need to do the product of this matrix time a general vector which is |

0:11:27 | required by the conjugate gradient algorithm |

0:11:31 | so if we write the computation in the |

0:11:34 | but for your precious in this way we can see that the computation of this |

0:11:38 | product is a say should be you know in apples |

0:11:41 | you don't the components so it's not in the number of the components of the |

0:11:46 | ubm |

0:11:47 | number of features and dimensionality of i-vector |

0:11:50 | so we have a complexity which is the same as the variational bayes approach |

0:11:57 | so i guess |

0:11:59 | this kind of what's nice about this technique is that we don't require any kind |

0:12:03 | of additional memory |

0:12:05 | and has the for the variational bayes approach we can use this technique what's a |

0:12:11 | full covariance ubm if we do the prewhitening all the transmitters |

0:12:18 | ubm ones |

0:12:21 | so |

0:12:22 | i'll show you how we show you some results on the female dataset the extended |

0:12:28 | telephone conditions one is |

0:12:30 | so we do then |

0:12:33 | our setup is a sixty dimensional ubm we |

0:12:37 | two thousand four components |

0:12:40 | we ask for permission to make |

0:12:44 | we use |

0:12:44 | but i will length normalized i-vectors classifier you have |

0:12:50 | you know where |

0:12:51 | limitation we assume efficiency issues so i'm sure you |

0:12:55 | the results |

0:12:57 | those |

0:13:00 | so |

0:13:01 | before seen the results just one point out that |

0:13:05 | you directions |

0:13:08 | yeah one is an article |

0:13:13 | the exact i-vector also |

0:13:16 | and |

0:13:17 | so if we don't know that we can recover exactly same |

0:13:21 | accuracy or you know classifier |

0:13:26 | so you interest in is |

0:13:28 | see if we can do that i mean |

0:13:31 | we can stop yeah and still |

0:13:33 | achieve good results we |

0:13:35 | process structure of course |

0:13:38 | which one |

0:13:40 | which was the first one |

0:13:42 | so yeah i'm showing the results of the baseline system the egg that i |

0:13:49 | well approximated i-vectors |

0:13:52 | variational bayes the case we |

0:13:54 | size is ten twenty and these are the same six |

0:14:00 | we gotta |

0:14:02 | estimation that just a special yeah stuff |

0:14:06 | both |

0:14:07 | chosen |

0:14:09 | so as to was evaluated using the difference between the do not before S L |

0:14:16 | two successive based i-vector estimates |

0:14:19 | so essentially this experiment is doing between two or three iterations for estimation is |

0:14:26 | in between three and four |

0:14:29 | so that's is a specialist in this sort of two norm of the residual |

0:14:37 | so essentially what we see you know that |

0:14:40 | most of the system performance X and |

0:14:44 | yeah |

0:14:45 | and this was the reason why we phones |

0:14:48 | so that is |

0:14:50 | two |

0:14:51 | find out |

0:14:52 | so you |

0:14:53 | section |

0:14:55 | so what is that sometimes these are |

0:15:00 | this system including the required courses |

0:15:07 | and |

0:15:08 | okay system is the one which implies |

0:15:12 | the request and is comparable to the variational bayes approach does last |

0:15:22 | you see that |

0:15:24 | essentially the slow |

0:15:29 | yeah |

0:15:30 | we can be used to always |

0:15:34 | yeah voice matrix |

0:15:36 | however |

0:15:37 | note that |

0:15:38 | the lattice as we can see that |

0:15:42 | that is |

0:15:43 | quite high baseline |

0:15:45 | on the other the original the variational bayes we can obtain an accurate results just |

0:15:52 | a few percent reason |

0:15:54 | done |

0:15:55 | which one compared to |

0:15:57 | the time required tools it's not forced zero so statistics is |

0:16:03 | what was used |

0:16:07 | so that's addition |

0:16:11 | yeah he also that the not exist |

0:16:14 | the size of the box |

0:16:16 | is |

0:16:17 | and we can see that using |

0:16:19 | yeah it is of course there were requirements |

0:16:26 | this case it's function |

0:16:29 | significantly |

0:16:31 | and |

0:16:32 | essentially |

0:16:34 | it is comparable to that of the country |

0:16:38 | while the using |

0:16:40 | reason not to block size is allows us to |

0:16:45 | improve |

0:16:46 | right |

0:16:47 | and |

0:16:48 | and the |

0:16:51 | so |

0:16:53 | oh |

0:16:54 | we |

0:16:56 | we have some and you never efficient accurate vectors |

0:17:00 | techniques |

0:17:01 | which are based on variational bayes submission |

0:17:05 | and the use of and |

0:17:07 | so |

0:17:09 | yeah |

0:17:11 | we present a little sizes line |

0:17:14 | but since then |

0:17:16 | we have some role channels to it's not very accurate i-vector we |

0:17:22 | a very |

0:17:23 | we i don't know the we present the time required vector itself |

0:17:31 | well i think that is |

0:17:33 | on the other and allows to |

0:17:37 | yeah the right directions |

0:17:41 | well we use a high |

0:17:46 | i |

0:17:56 | to say let's thank the speaker |

0:17:59 | so you have |

0:18:00 | a few minutes for questions |

0:18:03 | for a |

0:18:10 | yes |

0:18:14 | yes |

0:18:15 | yes |

0:18:17 | well |

0:18:19 | i |

0:18:20 | yeah |

0:18:22 | nice |

0:18:24 | and |

0:18:26 | okay |

0:18:28 | yeah or |

0:18:30 | yes i |

0:18:32 | yeah |

0:18:35 | one |

0:18:37 | it's |

0:18:38 | then |

0:18:39 | really |

0:18:42 | yeah |

0:18:43 | oh |

0:18:46 | yeah |

0:18:49 | i |

0:18:51 | i |

0:18:53 | i |

0:18:54 | i |

0:18:57 | well |

0:18:59 | so |

0:19:01 | oh |

0:19:05 | okay |

0:19:08 | that's this |

0:19:09 | five |

0:19:10 | which was |

0:19:12 | and what's |

0:19:15 | yeah |

0:19:16 | say that the results are i see that |

0:19:21 | you know |

0:19:24 | yeah |

0:19:25 | yeah |

0:19:25 | or |

0:19:27 | but |

0:19:29 | that's right |

0:19:31 | i |

0:19:32 | yeah |

0:19:34 | yeah |

0:19:37 | of course |

0:19:39 | vol |

0:19:40 | one of us |

0:19:43 | right |

0:19:44 | you want |

0:19:52 | i |

0:19:52 | oh |

0:20:04 | i |

0:20:05 | yes as well |

0:20:08 | the base classifier |

0:20:10 | i would say that |

0:20:12 | no is this is |

0:20:13 | the classifier |

0:20:15 | right |

0:20:16 | very fast |

0:20:18 | i |

0:20:20 | i |

0:20:21 | you don't |

0:20:26 | i |

0:20:33 | i |

0:20:33 | yeah |

0:20:35 | one |

0:20:36 | yeah |

0:20:44 | questions |

0:20:47 | let me ask |

0:20:49 | i have seen the difference between what partly depend what you need or what we |

0:20:52 | try to |

0:20:54 | rotate the |

0:20:55 | the space of eigenvectors so that |

0:20:57 | it would be already gonna do you start from the same |

0:21:01 | oh |

0:21:03 | this |

0:21:04 | since |

0:21:05 | use |

0:21:09 | yeah |

0:21:13 | or |

0:21:16 | yes |

0:21:17 | say |

0:21:23 | yeah |

0:21:26 | i |

0:21:27 | i |

0:21:30 | but then you effect compared with what we did basically he try to diagonalized a |

0:21:34 | separate transmitted first and what you need to diagonal structure and i |

0:21:43 | yeah |

0:21:45 | results |

0:21:46 | oh |

0:21:47 | yeah |

0:21:49 | well as |

0:21:51 | oh |

0:21:54 | just |

0:21:57 | oh |

0:21:58 | make |

0:22:07 | that's in fact the speaker again and |

0:22:09 | i |