Varying Naïve Bayes Models With Applications to Classification of Chinese Text Documents |
| |
Authors: | Guoyu Guan Jianhua Guo Hansheng Wang |
| |
Institution: | 1. Key Laboratory for Applied Statistics of the Ministry of Education, and School of Mathematics and Statistics, Northeast Normal University, Changchun 130024, P. R. China (guangy599@nenu.edu.cn;2. jhguo@nenu.edu.cn);3. Department of Business Statistics and Econometrics, Guanghua School of Management, Peking University, Beijing 100871, P. R. China (hansheng@gsm.pku.edu.cn) |
| |
Abstract: | Document classification is an area of great importance for which many classification methods have been developed. However, most of these methods cannot generate time-dependent classification rules. Thus, they are not the best choices for problems with time-varying structures. To address this problem, we propose a varying naïve Bayes model, which is a natural extension of the naïve Bayes model that allows for time-dependent classification rule. The method of kernel smoothing is developed for parameter estimation and a BIC-type criterion is invented for feature selection. Asymptotic theory is developed and numerical studies are conducted. Finally, the proposed method is demonstrated on a real dataset, which was generated by the Mayor Public Hotline of Changchun, the capital city of Jilin Province in Northeast China. |
| |
Keywords: | BIC Chinese document classification Screening consistency Time-dependent classification rule |
|
|