文字
縮放
增大字体
减小字体
南加州大学 Jinchi Lv教授: "RANK: Large-Scale Inference with Graphical Nonlinear Knockoffs"

([西财新闻] 发布于 :2019-07-03 )

光華講壇——社會名流與企業家論壇第5482期

 

主題: "RANK: Large-Scale Inference with Graphical Nonlinear Knockoffs"

主講人:南加州大学 Jinchi Lv教授

主持人:郭斌 副教授

時間:2019年7月5日(星期五)上午9:00-10:00

地點:人人棋牌柳林校区弘远楼408会议室

主辦單位:统计研究中心 统计学院 科研处

 

主講人簡介:

Jinchi Lv is Kenneth King Stonier Chair in Business Administration and Professor in Data Sciences and Operations Department of the Marshall School of Business at the University of Southern California, Professor in Department of Mathematics at USC, and an Associate Fellow of USC Dornsife Institute for New Economic Thinking (INET). He received his Ph.D. in Mathematics from Princeton University in 2007. He was McAlister Associate Professor in Business Administration at USC from 2016-2019. His research interests include statistics, machine learning, data science, and business applications.

 His papers have been published in journals in statistics, economics, computer science, information theory, and biology, and one of them was published as a Discussion Paper in Journal of the Royal Statistical Society Series B (2008). He is the recipient of Fellow of Institute of Mathematical Statistics (2019), USC Marshall Dean's Award for Research Impact (2017), Adobe Data Science Research Award (2017), the Royal Statistical Society Guy Medal in Bronze (2015), NSF Faculty Early Career Development (CAREER) Award (2010), USC Marshall Dean's Award for Research Excellence (2009), and Zumberge Individual Award from USC's James H. Zumberge Faculty Research and Innovation Fund (2008). He has served as an associate editor of the Annals of Statistics (2013-2018), Journal of Business & Economic Statistics (2018-present), and Statistica Sinica (2008-2016).

吕金翅是美国南加州大学Marshall商学院数据科学与运营系教授,美国南加州大学商学院数据科学与运营系讲座教授,美国南加州大学数学系教授,南加州大学多恩西夫新经济思维研究所(INET)副研究员。2007年,他在普林斯顿大学获得数学博士学位。2016年至2019年,他在南加州大学担任工商管理副教授。他的研究兴趣包括统计学、机器学习、数据科学和商业应用。他的论文发表在统计学,经济学,计算机科学,信息理论和生物学等期刊上,其中一篇论文发表在“Journal of the Royal Statistical Society Series B”(2008)的讨论论文中。他2019年Institute of Mathematical Statistics的Follow,2017年荣获南加州大学马歇尔院长研究影响奖和Adobe数据科学研究奖,2015年荣获英国皇家统计学会的Guy Medal in Bronze,2010年取得NSF早期职业发展(CAREER)奖,2019年获得南加州大学 Marshall颁发的卓越研究奖等。他曾担任the Annals of Statistics(2013-2018),Journal of Business & Economic Statistics(2018年至今)和Statistica Sinica(2008-2016)的副主编。

主要內容:

Power and reproducibility are key to enabling refined scientific discoveries in contemporary big data applications with general high-dimensional nonlinear models. In this paper, we provide theoretical foundations on the power and robustness for the model-X knockoffs procedure introduced recently in Candès, Fan, Janson and Lv (2018) in high-dimensional setting when the covariate distribution is characterized by Gaussian graphical model. We establish that under mild regularity conditions, the power of the oracle knockoffs procedure with known covariate distribution in high-dimensional linear models is asymptotically one as sample size goes to infinity.  When moving away from the ideal case, we suggest the modified model-X knockoffs method called graphical nonlinear knockoffs (RANK) to accommodate the unknown covariate distribution. We provide theoretical justifications on the robustness of our modified procedure by showing that the false discovery rate (FDR) is asymptotically controlled at the target level and the power is asymptotically one with the estimated covariate distribution. To the best of our knowledge, this is the first formal theoretical result on the power for the knockoffs procedure. Simulation results demonstrate that compared to existing approaches, our method performs competitively in both FDR control and power. A real data set is analyzed to further assess the performance of the suggested knockoffs procedure. This is a joint work with Emre Demirkaya, Yingying Fan and Gaorong Li.

在現代大數據應用中,對于一般的高維非線性模型,勢能和再現性是實現精准科學發現的關鍵。對于Candès, Fan, Janson and Lv (2018)等人在高維設定下引入的X仿形過程,本文在X的協方差分布爲高斯圖模型的情況下,爲仿形過程提供了勢能和穩健性的理論基礎。我們證明了,在適度的正則條件下,當樣本量趨于無窮大時,高維線性模型中已知協變量分布的Oracle仿形過程的勢能是漸近于1。當遠離理想情況時,或當協變量分布未知時,我們建議使用改進的x仿形方法,稱爲圖形非線性仿形(RANK)。我們證明了RANK的假陽性率(FDR)可以漸近控制在目標水平以下,並且勢能是漸近趨于1,並給出協變量分布的估計,從而提供了穩健型的理論依據。據我們所知,這是基于仿形過程的勢能的第一個正式的理論結果。模擬結果表明,與現有方法相比,我們的方法在假陽性率(FDR)控制和勢能控制方面都具有很強的競爭力。對于真實的數據,我們給出了對RANK的進一步評估建議。這是與Emre Demirkaya, Yingying Fan and Gaorong Li.的聯合工作。

☆該新聞已被浏覽: 次★

【打印本文】 【關閉窗口