Metric or Non-Metric?

Recognition is the fundamental task of visual cognition, yet how to formalize the general recognition problem for computer vision remains an open issue. The problem is sometimes reduced to the simplest case of recognizing matching pairs, often structured to allow for metric constraints. However, visual recognition is broader than just pair matching – especially when we consider multi-class training data and large sets of features in a learning context. What we learn and how we learn it has important implications for effective algorithms. In this work, we reconsider the assumption of recognition as a pair matching test, and introduce a new formal definition that captures the broader context of the problem. Through a meta-analysis and an experimental assessment of the top algorithms on popular data sets, we gain a sense of how often metric properties are violated by good recognition algorithms. By studying these violations, useful insights come to light: we make the case that locally metric algorithms should leverage outside information to solve the general recognition problem.

This website summarizes all available published results for the Labeled Faces in the Wild (from 2008 onwards) and Caltech-101 (from 2006 onwards) data sets. Each algorithm is labeled Metric, Nonmetric or Nonmetric w/ Side-information to allow for an expansive comparison between these modes of recognition. Do you have a new algorithm to add to this summary? Submit it below.

Read a pre-print of the paper: Good Recognition is Non-Metric

View examples of violations of the triangle inequality:
PLDA Violations (LFW Unrestricted Training protocol leader)
Tom-vs-Pete Violations (LFW Image-Restricted Training protocol leader)
Multi-Attribute Spaces Violations

Recognition Accuracy of Algorithms on Labeled Faces in the Wild

lfw-meta-analysis

Recognition accuracy of algorithms on LFW. Horizontal axis is year of publication; some cluttered years are slightly separated along the horizontal axis for clarity. “Side-info” refers to algorithms that use outside data in the recognition system beyond feature extraction/alignment. Even for “pair matching,” pure metric algorithms are not very competitive. Numbers inside each point correspond to method entries below.

Recognition Accuracy of Algorithms on Caltech-101

caltech-meta-analysis-15

caltech-meta-analysis-30

Recognition accuracy of algorithms on Caltech 101, with 15 training images on the top plot and 30 on the bottom plot. The horizontal axis is year of publication; some cluttered years are slightly separated along the horizontal axis for clarity. Note the metric algorithms are generally not as accurate, but are more competitive when fewer images can be use for training. Numbers inside each point correspond to the method entries below. Note that because not all algorithms reported error bars, we do not show any error bars in this plot.

Add a new Method to the Summary

 

Methods

[1] K. Balasubramanian, K. Yu, G. Lebanon, Smooth sparse coding via marginal regression for learning sparse representations, arXiv:1210.1121.

[2] O. Barkan, J. Weill, L. Wolf, H. Aronowitz, Fast high dimensional vector multiplication face recognition, in: Proc. of the IEEE International Conference on Computer Vision, 2013, pp. 1960–1967.

[3] T. Berg, P. Belhumeur, Tom-vs-pete classifiers and identity-preserving alignment for face verification, in: Proc. of the British Machine Vision Conference, 2012, pp. 129.1–129.11.

[4] H. Bilen, V. Namboodiri, L. Van Gool, Object classification with latent window parameters, in: International Journal of Computer Vision (to appear), 2013, pp. 1–16.

[5] L. Bo, X. Ren, D. Fox, Kernel descriptors for visual recognition, in: Advances in Neural Information Processing Systems 23, 2010, pp. 244–252.

[6] L. Bo, K. Lai, X. Ren, D. Fox, Object recognition with hierarchical kernel descriptors, in: Proc. of the IEEE Conference on Computer Vision and Pattern Recognition, 2011, pp. 1729–1736.

[7] L. Bo, X. Ren, D. Fox, Multipath sparse coding using hierarchical matching pursuit, in: Proc. of the IEEE Conference on Computer Vision and Pattern Recognition, 2013, pp. 660–667.

[8] O. Boiman, E. Shechtman, M. Irani, In defense of nearest-neighbor based image classification, in: Proc. of the IEEE Conference on Computer Vision and Pattern Recognition, 2008, pp. 1–8.

[9] A. Bosch, A. Zisserman, X. Muoz, Image classification using random forests and ferns, in: Proc. of the International Conference on Computer Vision, 2007, pp. 1–8.

[10] A. Bosch, A. Zisserman, X. Munoz, Representing shape with a spatial pyramid kernel, in: ACM International Conference on Image and Video Retrieval, 2007, pp. 401–408.

[11] Y. Boureau, F. Bach, Y. LeCun, J. Ponce, Learning mid-level features for recognition, in: Proc. of the IEEE Conference on Computer Vision and Pattern Recognition, 2010, pp. 2559–2566.

[12] Y. Boureau, N. Le Roux, F. Bach, J. Ponce, Y. LeCun, Ask the locals: multi-way local pooling for image recognition, in: Proc. of the IEEE International Conference on Computer Vision, 2011, pp. 2651–2658.

[13] Z. Cao, Q. Yin, X. Tang, J. Sun, Face recognition with learning-based descriptor, in: Proc. of the IEEE Conference on Computer Vision and Pattern Recognition, 2010, pp. 2707–2714.

[14] Q. Cao, Y. Ying, P. Li, Similarity metric learning for face recognition, in: Proc. of the IEEE International Conference on Computer Vision, 2013, pp. 660–667.

[15] X. Cao, D. Wipf, F. Wen, G. Duan, A practical transfer learning algorithm for face verification, in: Proc. of the IEEE International Conference on Computer Vision, 2013, pp. 3208–3215.

[16] K. Chatfield, V. Lempitsky, A. Vedaldi, A. Zisserman, The devil is in the details: an evaluation of recent feature encoding methods, in: Proc. of the British Machine Vision Conference, 2011, pp. 76.1–76.12.

[17] D. Chen, X. Cao, L. Wang, F. Wen, J. Sun, Bayesian face revisited: A joint formulation, in: Proc. of the European Conference on Computer Vision, 2012, pp. 566–579.

[18] D. Chen, X. Cao, F. Wen, J. Sun, Blessing of dimensionality: High-dimensional feature and its efficient compression for face verification, in: Proc. of the IEEE Conference on Computer Vision and Pattern Recognition, 2013, pp. 3025–3032.

[19] A. Coates, A. Ng, The importance of encoding versus training with sparse coding and vector quantization, in: Proc. of the International Conference on Machine Learning, 2011, pp. 921–928.

[20] Z. Cui, W. Li, D. Xu, S. Shan, X. Chen, Fusing robust face region descriptors via multiple metric learning for face recognition in the wild, in: Proc. of the IEEE Conference on Computer Vision and Pattern Recognition, 2013, pp. 3554–3561.

[21] A. Frome, Y. Singer, F. Sha, J. Malik, Learning globally-consistent local distance functions for shape-based image retrieval and classification, in: Proc. of the IEEE International Conference on Computer Vision, 2007, pp. 1–8.

[22] P. Gehler, S. Nowozin, On feature combination for multiclass object classification, in: Proc. of the IEEE International Conference on Computer Vision, 2009, pp. 221–228.

[23] H. Goh, N. Thome, M. Cord, J.-H. Lim, Unsupervised and supervised visual codes with restricted boltzmann machines, in: Proc. of the European Conference on Computer Vision, 2012, pp. 298–311.

[24] C. Gu, J. Lim, P. Arbel´aez, J. Malik, Recognition using regions, in: Proc. of the IEEE Conference on Computer Vision and Pattern Recognition, 2009, pp. 1030–1037.

[25] M. Guillaumin, J. Verbeek, C. Schmid, Is that you? Metric learning approaches for face identification, in: Proc. of the IEEE International Conference on Computer Vision, 2009, pp. 498–505.

[26] G. Huang, M. Jones, E. Learned-Miller., LFW results using a combined nowak plus MERL recognizer., in: Proc. of the Faces in Real-Life Images Workshop, 2008, pp. 1–2.

[27] P. Jain, B. Kulis, K. Grauman, Fast image search for learned metrics, in: Proc. of the IEEE Conference on Computer Vision and Pattern Recognition, 2008, pp. 1–8.

[28] P. Jain, B. Kulis, J. Davis, I. Dhillon, Metric and kernel learning using a linear transformation, Journal of Machine Learning Research 13 (2012) 519–547.

[29] Y. Jia, C. Huang, T. Darrell, Beyond spatial pyramids: Receptive field learning for pooled image features, in: Proc. of the IEEE Conference on Computer Vision and Pattern Recognition, 2012, pp. 3370–3377.

[30] C. Kanan, G. Cottrell, Robust classification of objects, faces, and flowers using natural image statistics, in: Proc. of the IEEE Conference on Computer Vision and Pattern Recognition, 2010, pp. 2472–2479.

[31] A. Kapoor, K. Grauman, R. Urtasun, T. Darrell, Gaussian processes for object categorization, International Journal of Computer Vision 88 (2) (2010) 169–188.

[32] A. Kumar, A. Niculescu-Mizil, K. Kavukcuoglu, H. Daume III, A binary classification framework for two-stage multiple kernel learning, arXiv:1206.6428.

[33] N. Kumar, A. C. Berg, P. N. Belhumeur, , S. K. Nayar, Attribute and simile classifiers for face verification., in: Proc. of the IEEE International Conference on Computer Vision, 2009, pp. 365–372.

[34] F. Li, J. Carreira, C. Sminchisescu, Object recognition as ranking holistic figure-ground hypotheses, in: Proc. of the IEEE Conference on Computer Vision and Pattern Recognition, 2010, pp. 1712–1719.

[35] Q. Li, H. Zhang, J. Guo, B. Bhanu, L. An, Reference-based scheme combined with k-svd for scene image categorization, IEEE Signal Processing Letters 20 (1) (2013) 67–70.

[36] H. Li, G. Hua, Z. Lin, J. Brandt, J. Yang, Probabilistic elastic matching for pose variant face verification, in: Proc. of the IEEE Conference on Computer Vision and Pattern Recognition, to appear (2013).

[37] L. Liu, L. Wang, X. Liu, In defense of soft-assignment coding, in: Proc. of the IEEE International Conference on Computer Vision, 2011, pp. 2486–2493.

[38] F. Lu, X. Yang, W. Lin, R. Zhang, S. Yu, Image classification with multiple feature channels, Optical Engineering 50 (5) (2011) 057210.1–057210.9.

[39] X. Ma, W. Grimson, Learning coupled conditional random field for image decomposition with application on object categorization, in: Proc. of the IEEE Conference on Computer Vision and Pattern Recognition, 2008, pp. 1–8.

[40] S. McCann, D. Lowe, Local naive bayes nearest neighbor for image classification, in: Proc. of the IEEE Conference on Computer Vision and Pattern Recognition, 2012, pp. 3650–3656.

[41] S. McCann, D. Lowe, Spatially local coding for object recognition, in: Proc. of the Asian Conference on Computer Vision, 2012, pp. 204–217.

[42] J. Mutch, D. Lowe, Multiclass object recognition with sparse, localized features, in: Proc. of the IEEE Conference on Computer Vision and Pattern Recognition, 2006, pp. 11–18.

[43] V. Nair, G. E. Hinton, Rectified linear units improve restricted boltzmann machines, in: Proc. of the International Conference on Machine Learning, 2010, pp. 807–814.

[44] H. V. Nguyen, L. Bai, Cosine similarity metric learning for face verification, in: Proc. of the Asian Conference on Computer Vision, 2010, pp. 709–720.

[45] G. Oliveira, E. Nascimento, A. Vieira, M. Campos, Sparse spatial coding: A novel approach for efficient and accurate object recognition, in: Proc. of the IEEE International Conference on Robotics and Automation, 2012, pp. 2592–2598.

[46] N. Pinto, D. Cox, Beyond simple features: A large-scale feature search approach to unconstrained face recognition, in: Proc. of the IEEE International Conference on Automatic Face & Gesture Recognition, 2011, pp. 8–15.

[47] M. Qiao, J. Li, Distance-based mixture modeling for classification via hypothetical local mapping, Tech. rep., Penn State University (2012).

[48] H. J. Seo, P. Milanfar, Face verification using the lark representation., IEEE Transactions on Information Forensics and Security 6 (4) (2011) 1275–1286.

[49] K. Simonyan, O. M. Parkhi, A. Vedaldi, A. Zisserman, Fisher vector faces in the wild, in: Proc. of the British Machine Vision Conference, 2013, pp. 1–12.

[50] K. Sohn, D. Y. Jung, H. Lee, A. O. Hero, Efficient learning of sparse, distributed, convolutional feature representations for object recognition, in: Proc. of the IEEE International Conference on Computer Vision, 2011, pp. 2643–2650.

[51] A. Szlam, K. Gregor, Y. LeCun, Fast approximations to structured sparse coding and applications to object classification, arXiv:1202.6384.

[52] Y. Taigman, L. Wolf, T. Hassner, Multiple one-shots for utilizing class label information, in: Proc. of the British Machine Vision Conference, 2009, pp. 77.1–77.12.

[53] S. Todorovic, N. Ahuja, Learning subcategory relevances for category recognition, in: Proc. of the IEEE Conference on Computer Vision and Pattern Recognition, 2008, pp. 1–8.

[54] A. Vedaldi, V. Gulshan, M. Varma, A. Zisserman, Multiple kernels for object detection, in: Proc. of the IEEE International Conference on Computer Vision, 2009, pp. 606–613.

[55] J. Wang, J. Yang, K. Yu, F. Lv, T. Huang, Y. Gong, Locality-constrained linear coding for image classification, in: Proc. of the IEEE Conference on Computer Vision and Pattern Recognition, 2010, pp. 3360–3367.

[56] L. Wolf, T. Hassner, Y. Taigman, Similarity scores based on background samples, in: Proc. of the Asian Conference on Computer Vision, 2009, pp. 88–97.

[57] J. Yang, Y. Li, Y. Tian, L. Duan, W. Gao, Group-sensitive multiple kernel learning for object categorization, in: Proc. of the IEEE International Conference on Computer Vision, 2009, pp. 436–443.

[58] J. Yang, K. Yu, Y. Gong, T. Huang, Linear spatial pyramid matching using sparse coding for image classification, in: Proc. of the IEEE Conference on Computer Vision and Pattern Recognition, 2009, pp. 1794–1801.

[59] J. Yang, Y. Li, Y. Tian, L. Duan, W. Gao, Group-sensitive multiple kernel learning for object categorization, IEEE Transactions on Image Processing 21 (5) (2012) 2838–2852.

[60] D. Yi, Z. Lei, S. Z. Li, Towards pose robust face recognition, in: Proc. of the IEEE Conference on Computer Vision and Pattern Recognition, 2013, pp. 3539–3545.

[61] Q. Yin, X. Tang, J. Sun, An associate-predict model for face recognition., in: Proc. of the IEEE Conference on Computer Vision and Pattern Recognition, 2011, pp. 497–504.

[62] Y. Ying, P. Li, Distance metric learning with eigenvalue optimization, Journal of Machine Learning Research 13 (2012) 1–26.

[63] K. Yu, T. Zhang, High dimensional nonlinear learning using local coordinate coding, arXiv:0906.5190.

[64] H. Zhang, A. Berg, M. Maire, J. Malik, SVM-KNN: Discriminative nearest neighbor classification for visual category recognition, in: Proc. of the IEEE Conference on Computer Vision and Pattern Recognition, 2006, pp. 2126–2136.