CAREER: Dimensionality Reduction for Multi-label Classification (NSF IIS-1538638)

Ye, Jieping <jpye@umich.edu>

 

Abstract:

Recent advances in high-throughput technologies have unleashed a torrent of data with a large number of dimensions. Examples include gene expression pattern images, microarray gene expression data, protein/gene sequences, and neuroimages. Dimensionality reduction, which extracts a small number of features by removing the irrelevant, redundant, and noisy information, is crucial for the analysis of these data. The goal of this project is to develop efficient and effective dimensionality reduction algorihtms for multi-label classification. Multi-label dimensionality reduction poses a number of exciting research questions that will be studied in this project: How to fully exploit the class label correlation for effective dimensionality reduction? How to scale dimensionality reduction algorithms to large-scale multi-label problems?  How to effectively combine dimensionality reduction with classification? How to derive sparse dimensionality reduction algorithms to enhance model interpretability?  How to derive multi-label dimensionality reduction algorithms for multiple data sources?

 

To address these questions, a hypergraph spectral learning formulation will be developed for multi-label dimensionality reduction, in which a hypergraph is used to capture the class label correlation. A joint learning formulation will be developed, in which dimensionality reduction and multi-label classification are performed simultaneously. In addition, a multi-source dimensionality reduction framework is developed for learning from multiple heterogeneous data sources.

 

The success of this project will largely improve the state-of-the-art in dimensionality reduction for multi-label classification, and broaden this research area by opening up and addressing many new research themes.  The educational component of this project includes developing a new curriculum that incorporates research into the classroom and provides students from under-represented groups with opportunities to participate research. Project results, including open source software and data sets will be disseminated.

 

Students and postdocs

·      Liang Sun

·      Jianhui Chen

·      Rita Chattopadhyay

·      Tao Yang

·      Shuang Qiu

·      Jie Wang

 

Project Goal:

To address the fundamental challenges in multi-label dimensionality reduction, the PI proposes an integrated research and education plan based on the following three components: (1) a hypergraph spectral learning formulation for multi-label dimensionality reduction, in which a hypergraph is used to capture the class label correlation; (2) a joint learning formulation, in which dimensionality reduction and multi-label classification are performed simultaneously; and (3) a multi-source dimensionality reduction framework, in which dimensionality reduction is performed from multiple heterogeneous data sources.

 

Research Challenges:

Multi-label dimensionality reduction poses a number of exciting research questions: How to fully exploit the class label correlation for effective dimensionality reduction? How to scale dimensionality reduction algorithms to large-scale multi-label problems?  How to effectively combine dimensionality reduction with classification? How to derive sparse dimensionality reduction algorithms to enhance model interpretability?  How to derive multi-label dimensionality reduction algorithms for multiple data sources?

 

Current Results:

·      Annotated biological images using multi-label/task learning algorithms [3,7,15,17,28,32,35]

·      Develop efficient algorithms for various sparsity/low-rank/multi-label learning algorithms [27,29,31,33,34]

·      Analyzed generalization performance of multi-task/label feature learning algorithms [22,30]

·      Developed an efficient algorithm for rank minimization for multiple output (e.g., multi-label) linear regression problems [20]

·      Developed efficient and effective multi-task learning algorithms [4,11,14,16,18,19,21,23, 24, 25]

·      Developed a two-stage algorithm for large-scale multi-label dimensionality reduction [5]

·      Developed a least squares formulation for Canonical Correlation Analysis for multi-label dimensionality reduction [7]

·      Developed a shared-subspace learning framework for multi-label classification [2]

·      Developed hypergraph spectral learning for multi-label dimensionality reduction [1]

 

Publications

 [35] Wenlu Zhang, Rongjian Li, Tao Zeng, Qian Sun, Sudhir Kumar, Jieping Ye, and Shuiwang Ji. Deep Model Based Transfer and Multi-Task Learning for Biological Image Analysis. IEEE Transactions on Big Data, 2016. PDF

 [34] Jie Wang and Jieping Ye. Multi-layer feature reduction for tree structured group lasso via hierarchical projection. Advances in Neural Information Processing Systems (NIPS 2015). PDF

 [33] Jie Wang and Jieping Ye. Safe screening for multi-task feature learning with multiple data matrices. The 32nd International Conference on Machine Learning (ICML 2015). PDF

 [32] Tao Yang, Xinlin Zhao, Binbin Lin, Tao Zeng, Shuiwang Ji, and Jieping Ye. Automated Gene Expression Pattern Annotation in the Mouse Brain. Pacific Symposium on Biocomputing (PSB 2015). PDF

[31] Zheng Wang, Ming-Jun Lai, Zhaosong Lu, Wei Fan, Hasan Davulcu, and Jieping Ye. Orthogonal Rank-One Matrix Pursuit for Low Rank Matrix Completion. SIAM Journal on Scientific Computing. PDF

[30] Qi Yan, Jieping Ye, and Xiaotong Shen. Simultaneous pursuit of sparseness and rank structures for matrix decomposition. Journal of Machine Learning Research. PDF

[29] Jie Wang, Peter Wonka, and Jieping Ye. Lasso Screening Rules via Dual Polytope Projection. Journal of Machine Learning Research. PDF

[28] Lei Yuan, Cheng Pan, Shuiwang Ji, Michael McCutchan, Zhi-Hua Zhou, Stuart J. Newfeld, Sudhir Kumar, and Jieping Ye. Automated Annotation of Developmental Stages ofDrosophila Embryos in Images Containing Spatial Patterns of Expression. Bioinformatics, 30(2):266-273, 2014. PDF

[27] Pinghua Gong, Jiayu Zhou, Wei Fan, and Jieping Ye. Efficient Multi-Task Feature Learning with Calibration. The Twentieth ACM SIGKDD International Conference On Knowledge Discovery and Data Mining (SIGKDD 2014). PDF

[26] Liang Sun, Shuiwang Ji, and Jieping Ye. Multi-Label Dimensionality Reduction. Chapman & Hall/CRC. 2013. Link to the book.

[25] Pinghua Gong, Jieping Ye, and Changshui Zhang. Multi-Stage Multi-Task Feature Learning. Journal of Machine Learning Research, 14(Oct):2979-3010, 2013. PDF

[24] Jianhui Chen, Lei Tang, Jun Liu, and Jieping Ye. A Convex Formulation for Learning Shared Structures from Multiple Tasks. IEEE Transactions on Pattern Analysis and Machine Intelligence. Vol. 35, No. 5, pp. 1025-1038, 2013. PDF

[23] Shayok Chakraborty, Jiayu Zhou, Vineeth Balasubramanian, Sethuraman Panchanathan, Ian Davidson, and Jieping Ye. Active Matrix Completion. The Thirteenth IEEE International Conference on Data Mining (ICDM 2013). PDF

[22] Pinghua Gong, Jieping Ye, and Changshui Zhang. Multi-Stage Multi-Task Feature Learning. The Twenty-Sixth Annual Conference on Neural Information Processing Systems (NIPS 2012). PDF

[21] Binbin Lin, Sen Yang, Chiyuan Zhang, Jieping Ye, and Xiaofei He. Multi-task Vector Field Learning. The Twenty-Sixth Annual Conference on Neural Information Processing Systems (NIPS 2012). PDF

[20] Shuo Xiang, Yunzhang  Zhu, Xiaotong Shen, and Jieping Ye. Optimal Exact Least Squares Rank Minimization. The Eighteenth ACM SIGKDD International Conference On Knowledge Discovery and Data Mining (SIGKDD 2012), pp. 480-488. PDF

[19] Pinghua  Gong, Jieping Ye, and Changshui  Zhang. Robust Multi-Task Feature Learning. The Eighteenth ACM SIGKDD International Conference On Knowledge Discovery and Data Mining (SIGKDD 2012), pp. 895-903. PDF

[18] Lei Yuan, Yalin Wang, Paul Thompson, Vaibhav Narayan, and Jieping Ye. Multi-Source Feature Learning for Joint Analysis of Incomplete Multiple Heterogeneous Neuroimaging Data. NeuroImage (5-year impact factor=6.608), 61(3):622–632, 2012. PDF

[17] Lei Yuan, Alexander Woodard, Shuiwang Ji, Yuan Jiang, Zhi-Hua Zhou, Sudhir Kumar and Jieping Ye. Learning Sparse Representations for Fruit-Fly Gene Expression Pattern Image Annotation and Retrieval. BMC Bioinformatics, 13:107, 2012. PDF

[16] Jianhui Chen, Ji Liu, and Jieping Ye. Learning Incoherent Sparse and Low-Rank Patterns from Multiple Tasks. ACM Transactions on Knowledge Discovery from Data, Vol. 5, No. 4, pp. 22:1-22:31, 2012. PDF

[15] Ying-Xin Li, Shuiwang Ji, Sudhir Kumar, Jieping Ye, and Zhi-Hua Zhou. Drosophila gene expression pattern annotation through multi-instance multi-label learning. ACM/IEEE Transactions on Computational Biology and Bioinformatics, Vol. 9, No. 1, pp.  98-112, 2012. PDF

[14] Jiayu Zhou, Jianhui Chen, and Jieping Ye. Clustered Multi-Task Learning Via Alternating Structure Optimization. The Twenty-Fifth Annual Conference on Neural Information Processing Systems (NIPS 2011). PDF

[13] Qian Sun, Rita Chattopadhyay, Sethuraman Panchanathan, and Jieping Ye. A Two-Stage Weighting Framework for Multi-Source Domain Adaptation. The Twenty-Fifth Annual Conference on Neural Information Processing Systems (NIPS 2011). PDF

[12] Lei Yuan, Jun Liu, and Jieping Ye. Efficient Methods for Overlapping Group Lasso. The Twenty-Fifth Annual Conference on Neural Information Processing Systems (NIPS 2011). PDF

[11] Jianhui Chen, Jiayu Zhou, and Jieping Ye. Integrating Low-Rank and Group-Sparse Structures for Robust Multi-Task Learning. The Seventeenth ACM SIGKDD International Conference On Knowledge Discovery and Data Mining (SIGKDD 2011). PDF

[10] Rita Chattopadhyay, Jieping Ye, Sethuraman Panchanathan, Ian Davidson, and Wei Fan. Multi-Source Domain Adaptation and Its Application to Early Detection of Fatigue. The Seventeenth ACM SIGKDD International Conference On Knowledge Discovery and Data Mining (SIGKDD 2011). PDF KDD Best Research Paper Nomination

[9] Zheng Zhao, Lei Wang, Huan Liu, and Jieping Ye. On Similarity Preserving Feature Selection. IEEE Transactions on Knowledge and Data Engineering, Vol. 24, No. 3, pp. 619-632, 2013. PDF

[8] Shipeng Yu, Jinbo Bi, and Jieping Ye. Matrix-Variate and Higher-Order Probabilistic Projections. Data Mining and Knowledge Discovery, Vol. 22, No. 3, pp. 372-392, 2011. PDF

[7] Liang Sun, Shuiwang Ji, and Jieping Ye. Canonical Correlation Analysis for Multi-Label Classification: A Least Squares Formulation, Extensions and Analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 33, No. 1, pp. 194-200, 2011.  PDF  CODE

[6] Jun Liu, Lei Yuan, and Jieping Ye. An Efficient Algorithm for a Class of Fused Lasso Problems. The Sixteenth ACM SIGKDD International Conference On Knowledge Discovery and Data Mining (SIGKDD 2010). Full Presentation. PDF  CODE

 

[5] Liang Sun, Betul Ceran, and Jieping Ye. A Scalable Two-Stage Approach for a Class of Dimensionality Reduction Techniques. The Sixteenth ACM SIGKDD International Conference On Knowledge Discovery and Data Mining (SIGKDD 2010). Full Presentation. PDF  CODE  KDD Best Research Paper Award Honorable Mention

 

[4] Jianhui Chen, Ji Liu, and Jieping Ye. Learning Incoherent Sparse and Low-Rank Patterns from Multiple Tasks. The Sixteenth ACM SIGKDD International Conference On Knowledge Discovery and Data Mining (SIGKDD 2010). Full Presentation. PDF  KDD Best Research Paper Award Honorable Mention

[3] Ting Kei Pong, Paul Tseng, Shuiwang Ji, and Jieping Ye. Trace Norm Regularization: Reformulations, Algorithms, and Multi-task Learning. SIAM Journal on Optimization, Vol. 20, No. 6, pp. 3465-3489, 2010.  PDF  CODE

[2] Shuiwang Ji, Lei Tang, Shipeng Yu, and Jieping Ye. A Shared-subspace Learning Framework for Multi-label Classification. ACM Transactions on Knowledge Discovery from Data, Vol. 2, No. 1, pp. 8:1-8:29, 2010. PDF  CODE

 

[1] Liang Sun, Shuiwang Ji, and Jieping Ye. Hypergraph Spectral Learning for Multi-label Classification. The Fourteenth ACM SIGKDD International Conference On Knowledge Discovery and Data Mining (SIGKDD 2008), pp. 668-676. PDF

 

Software

·      MLDR

·      LSCCA

·      LSML: Multi-label Dimensionality Reduction

·      DPC

·      SLEP

 

Broader Impacts

We have applied the algorithms developed in this project to annotate biological images and analyze data from Alzheimer’s Disease Neuroimaging Initiative (ADNI).

 

Point of Contact: Jieping Ye (jpye@umich.edu)

 

Last update: 7/30/2016