Struggling with implementation of the Extended Kalman Filters of Bleser et al.

Bleser et al. describe a set of Extended Kalman Filters which fuse the data of the camera and IMU. One part of their first model I don’t completely understand. It is near the end of section 4.2:

The image analysis initially provides a set of 2D/3D corespondences \((\mathbf{m}_{p,t}, \mathbf{m}_{w,t})\) with measurement noises \(\mathbf{e}^c_{p,t} \sim \cal{N}(\mathbf{0}, R_{pp,t})\) and \(\mathbf{e}^c_{w,t} \sim \cal{N}(\mathbf{0}, R_{ww,t})\), where \(R_{pp,t}\) and \(R_{ww,t}\) are diagonal matrices. The measurement \(\mathbf{m}_{p,t}\) and the covariance \(R_{pp,t}\) are first transformed to the normalised image coordinate system using (6), giving \(\mathbf{c}_t = (\mathbf{m}_{n,t}, \mathbf{m}_{w,t})\) with covariances \(R_{nn,t}\) and \(R_{ww,t}\). The implicit correspondence measurement model is then given by \(\begin{align*} \mathbf{0} &= h(\mathbf{x}_t, \mathbf{m}_{n,t}, \mathbf{m}_{w,t}, \mathbf{e}^c_{n,t}, \mathbf{e}^c_{w,t}) \\ &= [I_2 - (\mathbf{m}_{n,t} + \mathbf{e}^c_{n,t})] Q_{cs} (Q_{sw,t} (\mathbf{m}_{w,t} + \mathbf{e}^c_{w,t} - \mathbf{s}_{w,t}) - \mathbf{c}_{s}). \label{x} \end{align*} (12a)\) This can be reformulated with additive measurement noise \(\mathbf{e}^c_t\) using (5a) and (5g): \(\begin{align*} \mathbf{0} &= h(\mathbf{x}_t, \mathbf{m}_{n,t}, \mathbf{m}_{w,t}) + \mathbf{e}^c_t \\ &= [I_2 - \mathbf{m}_{n,t}] Q_{cs} (Q_{sw,t} (\mathbf{m}_{w,t} - \mathbf{s}_{w,t}) - \mathbf{c}_{s}) + \mathbf{e}^c_t, \end{align*} (12b)\) where \(\mathbf{e}^c_t \sim \mathcal{N}(\mathbf{0}, R_t)\) and with \(\mathbf{h}_t = h(\mathbf{x}_t, \mathbf{m}_{n,t}, \mathbf{m}_{w,t}, \mathbf{e}^c_{n,t}, \mathbf{e}^c_{w,t})\) \(R_t \approx \begin{bmatrix} \frac{\partial \mathbf{h}_t}{\partial \mathbf{m}_{n,t}} & \frac{\partial \mathbf{h}_t}{\partial \mathbf{m}_{w,t}} \end{bmatrix} \begin{bmatrix} R_{nn,t} & 0_{2 \times 3} \\ 0_{3 \times 2} & R_{ww,t} \end{bmatrix} \begin{bmatrix} \frac{\partial \mathbf{h}_t}{\partial \mathbf{m}_{n,t}} \\ \frac{\partial \mathbf{h}_t}{\partial \mathbf{m}_{w,t}} \end{bmatrix} . (12c)\)

\(\mathbf{q}_{cs}\) and \(\mathbf{c}_s\) are as in (8). The correspondences are all processed sequentially.

It is unknown what is \(I_2\) (identity matrix of \(2 \times 2\) or as defined on p 61:

Let \(\mathbf{m}_p = \begin{bmatrix}x, y\end{bmatrix}^T\) be a feature position in an image. Let \(I(\mathbf{m}_p)\) be the intensity value of this position in the rendering and \(J(\mathbf{m}_p)\) the intensity value in the live camera image.

I also do not know how to express that constraint within an EKF implementation.

References

  • Gabriele Bleser and Didier Stricker. Advanced tracking through efficient image processing and visual-inertial sensor fusion. In Virtual Reality Conference, pages 137–144. IEEE, 2008. [ bib | DOI ]
    
    @inproceedings{bleser2008advanced,
      author = {Bleser, Gabriele and Stricker, Didier},
      booktitle = {Virtual Reality Conference},
      title = {Advanced tracking through efficient image processing and visual-inertial sensor fusion},
      year = {2008},
      pages = {137--144},
      organization = {IEEE},
      doi = {10.1109/VR.2008.4480765}
    }