期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Predicting weed invasion in a sugarcane cultivar using multispectral image

Ana J. Righetto Luiz R. Nakamura Pedro L. D. B. Castanho Christel Faes Taciana V. Savian 《Journal of applied statistics》2019,46(1):1-12

The cultivation of sugar cane has been gaining great focus in several countries due to its diversity of use. The modernization of agriculture has allowed high productivity, which is affected by the invasion of weeds. With sustainable agriculture, the use of herbicides has been increasingly avoided in society, requiring more effective weed control methods. In this paper, we propose a statistical model capable of identifying the invasion of weeds in the field, using four color spectra as regressor variables obtained by a multispectral camera mounted on an unmanned aerial vehicle. With the exact identification of the weed infestation, it is possible to carry out the management in the field with herbicide applications in the exact places, thus avoiding the increase of the cost of production or even dispensing with the use of herbicides, effecting the mechanical removal of them. Results show that in the experimental field, it was possible to reduce herbicide spraying by 57%. 相似文献

2.

Temporal Disaggregation of Economic Time Series using Artificial Neural Networks

L. Hedhili Zaier M. Abed 《统计学通讯:理论与方法》2014,43(8):1824-1833

Several methods based on smoothing or statistical criteria have been used for deriving disaggregated values compatible with observed annual totals. The present method is based on the artificial neural networks. This article evaluates the use of artificial neural networks (ANNs) for the disaggregation of annual US GDP data to quarterly time increments. A feed-forward neural network with back-propagation algorithm for learning was used. An ANN model is introduced and evaluated in this paper. The proposed method is considered as a temporal disaggregation method without related series. A comparison with previous temporal disaggregation methods without related series has been done. The disaggregated quarterly GDP data compared well with observed quarterly data. In addition, they preserved all the basic statistics such as summing to the annual data value, cross correlation structure among quarterly flows, etc. 相似文献

3.

A comparison of artificial neural network and multinomial logit models in predicting mergers

Nilgun Fescioglu-Unver 《Journal of applied statistics》2013,40(4):712-720

A merger proposal discloses a bidder firm's desire to purchase the control rights in a target firm. Predicting who will propose (bidder candidacy) and who will receive (target candidacy) merger bids is important to investigate why firms merge and to measure the price impact of mergers. This study investigates the performance of artificial neural networks and multinomial logit models in predicting bidder and target candidacy. We use a comprehensive data set that covers the years 1979–2004 and includes all deals with publicly listed bidders and targets. We find that both models perform similarly while predicting target and non-merger firms. The multinomial logit model performs slightly better in predicting bidder firms. 相似文献

4.

Estimation in linear models using gradient descent with early stopping

K. Skouras C. Goutis M. J. Bramson 《Statistics and Computing》1994,4(4):271-278

A new shrinkage estimator of the coefficients of a linear model is derived. The estimator is motivated by the gradient-descent algorithm used to minimize the sum of squared errors and results from early stopping of the algorithm. The statistical properties of the estimator are examined and compared with other well-established methods such as least squares and ridge regression, both analytically and through a simulation study. An important result is that the new estimator is shown to be comparable to other shrinkage estimators in terms of mean squared error of parameters and of predictions, and superior under certain circumstances.Supported by the Greek State Scholarships Foundation 相似文献

5.

Non-parametric smoothing of the location model in mixed variable discrimination

Asparoukhov O. Krzanowski W. J. 《Statistics and Computing》2000,10(4):289-297

The location model is a familiar basis for discriminant analysis of mixtures of categorical and continuous variables. Its usual implementation involves second-order smoothing, using multivariate regression for the continuous variables and log-linear models for the categorical variables. In spite of the smoothing, these procedures still require many parameters to be estimated and this in turn restricts the categorical variables to a small number if implementation is to be feasible. In this paper we propose non-parametric smoothing procedures for both parts of the model. The number of parameters to be estimated is dramatically reduced and the range of applicability thereby greatly increased. The methods are illustrated on several data sets, and the performances are compared with a range of other popular discrimination techniques. The proposed method compares very favourably with all its competitors. 相似文献

6.

Dimension Reduction in the Linear Model for Right-Censored Data: Predicting the Change of HIV-I RNA Levels using Clinical and Protease Gene Mutation Data

Huang J Harrington D 《Lifetime data analysis》2004,10(4):425-443

With rapid development in the technology of measuring disease characteristics at molecular or genetic level, it is possible to collect a large amount of data on various potential predictors of the clinical outcome of interest in medical research. It is often of interest to effectively use the information on a large number of predictors to make prediction of the interested outcome. Various statistical tools were developed to overcome the difficulties caused by the high-dimensionality of the covariate space in the setting of a linear regression model. This paper focuses on the situation, where the interested outcomes are subjected to right censoring. We implemented the extended partial least squares method along with other commonly used approaches for analyzing the high-dimensional covariates to the ACTG333 data set. Especially, we compared the prediction performance of different approaches with extensive cross-validation studies. The results show that the Buckley–James based partial least squares, stepwise subset model selection and principal components regression have similar promising predictive power and the partial least square method has several advantages in terms of interpretability and numerical computation. 相似文献

7.

Evidence of bias in the Eurovision song contest: modelling the votes using Bayesian hierarchical models

Marta Blangiardo 《Journal of applied statistics》2014,41(10):2312-2322

The Eurovision Song Contest is an annual musical competition held among active members of the European Broadcasting Union since 1956. The event is televised live across Europe. Each participating country presents a song and receive a vote based on a combination of tele-voting and jury. Over the years, this has led to speculations of tactical voting, discriminating against some participants and thus inducing bias in the final results. In this paper we investigate the presence of positive or negative bias (which may roughly indicate favouritisms or discrimination) in the votes based on geographical proximity, migration and cultural characteristics of the participating countries through a Bayesian hierarchical model. Our analysis found no evidence of negative bias, although mild positive bias does seem to emerge systematically, linking voters to performers. 相似文献

8.

Patterns and trends in occupational attainment of first jobs in the Netherlands, 1930–1995: ordinary least squares regression versus conditional multinomial logistic regression

Jos A. G. Dessens Wim Jansen Harry B. G. Ganzeboom Peter G. M. van der Heijden 《Journal of the Royal Statistical Society. Series A, (Statistics in Society)》2003,166(1):63-84

相似文献

9.

Some issues in logistic regression

Thomas P. Ryan 《统计学通讯:理论与方法》2013,42(9-10):2019-2032

Much research has been performed in the area of multiple linear regression, with the resuit that the field is well-developed. This is not true of logistic regression, however. The latter presents special problems because the response is not continuous. Some of these problems are: the difficulty of developing a suitable R² statistic, possibly poor results produced by the method of maximum likelihood, and the challenge to develop suitable graphical techniques. We describe recent work in some of these directions, and discuss the need for additional research. 相似文献

10.

Inclusion of binary proxy variables in logistic regression improves treatment effect estimation in observational studies in the presence of binary unmeasured confounding variables

Cornelius Rosenbaum Qingzhao Yu Sarah Buzhardt Elizabeth Sutton Andrew G. Chapple 《Pharmaceutical statistics》2023,22(6):995-1015

We present a simulation study and application that shows inclusion of binary proxy variables related to binary unmeasured confounders improves the estimate of a related treatment effect in binary logistic regression. The simulation study included 60,000 randomly generated parameter scenarios of sample size 10,000 across six different simulation structures. We assessed bias by comparing the probability of finding the expected treatment effect relative to the modeled treatment effect with and without the proxy variable. Inclusion of a proxy variable in the logistic regression model significantly reduced the bias of the treatment or exposure effect when compared to logistic regression without the proxy variable. Including proxy variables in the logistic regression model improves the estimation of the treatment effect at weak, moderate, and strong association with unmeasured confounders and the outcome, treatment, or proxy variables. Comparative advantages held for weakly and strongly collapsible situations, as the number of unmeasured confounders increased, and as the number of proxy variables adjusted for increased. 相似文献

11.

A comparison of ordinal logistic regression models using Classical and Bayesian approaches in an analysis of factors associated with diabetic retinopathy

K. Vaitheeswaran M. Subbiah R. Ramakrishnan T. Kannan 《Journal of applied statistics》2016,43(12):2254-2260

Estimating the risk factors of a disease such as diabetic retinopathy (DR) is one of the important research problems among bio-medical and statistical practitioners as well as epidemiologists. Incidentally many studies have focused in building models with binary outcomes, that may not exploit the available information. This article has investigated the importance of retaining the ordinal nature of the response variable (e.g. severity level of a disease) while determining the risk factors associated with DR. A generalized linear model approach with appropriate link functions has been studied using both Classical and Bayesian frameworks. From the result of this study, it can be observed that the ordinal logistic regression with probit link function could be more appropriate approach in determining the risk factors of DR. The study has emphasized the ways to handle the ordinal nature of the response variable with better model fit compared to other link functions. 相似文献

12.

On the use of cross-validation to assess performance in multivariate prediction

Jonathan P. Krzanowski W. J. McCarthy W. V. 《Statistics and Computing》2000,10(3):209-229

We describe a Monte Carlo investigation of a number of variants of cross-validation for the assessment of performance of predictive models, including different values of k in leave-k-out cross-validation, and implementation either in a one-deep or a two-deep fashion. We assume an underlying linear model that is being fitted using either ridge regression or partial least squares, and vary a number of design factors such as sample size n relative to number of variables p, and error variance. The investigation encompasses both the non-singular (i.e. n > p) and the singular (i.e. n p) cases. The latter is now common in areas such as chemometrics but has as yet received little rigorous investigation. Results of the experiments enable us to reach some definite conclusions and to make some practical recommendations. 相似文献

13.

Functional-coefficient cointegration models in the presence of deterministic trends

Masayuki Hirukawa Mari Sakudo 《Econometric Reviews》2018,37(5):507-533

In this article, we extend the functional-coefficient cointegration model (FCCM) to the cases in which nonstationary regressors contain both stochastic and deterministic trends. A nondegenerate distributional theory on the local linear (LL) regression smoother of the FCCM is explored. It is demonstrated that even when integrated regressors are endogenous, the limiting distribution is the same as if they were exogenous. Finite-sample performance of the LL estimator is investigated via Monte Carlo simulations in comparison with an alternative estimation method. As an application of the FCCM, electricity demand analysis in Illinois is considered. 相似文献

14.

Response-based multiple imputation method for minimizing the impact of covariate detection limit in logistic regression

Shahadut Hossain Zahirul Hoque Jacek Wesolowski 《统计学通讯:理论与方法》2021,50(2):371-386

Abstract

Presence of detection limit (DL) in covariates causes inflated bias and inaccurate mean squared error to the estimators of the regression parameters. This paper suggests a response-driven multiple imputation method to correct the deleterious impact introduced by the covariate DL in the estimators of the parameters of simple logistic regression model. The performance of the method has been thoroughly investigated, and found to outperform the existing competing methods. The proposed method is computationally simple and easily implementable by using three existing R libraries. The method is robust to the violation of distributional assumption for the covariate of interest. 相似文献

15.

Systematic and nonrandom sampling in the presence of polynomial trends

Robert L. Fountain Pramod K. Pathak 《统计学通讯:理论与方法》2013,42(3):1061-1071

In the case of finite populations with low-order polynomial trends present, the use of the least squares regression estimator of the mean is discussed. A sampling scheme, which optimizes the efficiency of the regression estimator over a particular class of schemes, is presented. 相似文献

16.

Classification rules for identifying individuals at high risk of developing myocardial infarction based on ApoB,ApoA1 and the ratio were determined using a Bayesian approach

S. Islam S. Anand M. McQueen J. Hamid L. Thabane S. Yusuf 《Journal of applied statistics》2018,45(2):210-224

We have developed a new approach to determine the threshold of a biomarker that maximizes the classification accuracy of a disease. We consider a Bayesian estimation procedure for this purpose and illustrate the method using a real data set. In particular, we determine the threshold for Apolipoprotein B (ApoB), Apolipoprotein A1 (ApoA1) and the ratio for the classification of myocardial infarction (MI). We first conduct a literature review and construct prior distributions. We then develop classification rules based on the posterior distribution of the location and scale parameters for these biomarkers. We identify the threshold for ApoB and ApoA1, and the ratio as 0.908 (gram/liter), 1.138 (gram/liter) and 0.808, respectively. We also observe that the threshold for disease classification varies substantially across different age and ethnic groups. Next, we identify the most informative predictor for MI among the three biomarkers. Based on this analysis, ApoA1 appeared to be a stronger predictor than ApoB for MI classification. Given that we have used this data set for illustration only, the results will require further investigation for use in clinical applications. However, the approach developed in this article can be used to determine the threshold of any continuous biomarker for a binary disease classification. 相似文献

17.

Machine learning and design of experiments with an application to product innovation in the chemical industry

Rosa Arboretti Riccardo Ceccato Luca Pegoraro Luigi Salmaso Chris Housmekerides Luca Spadoni Elisabetta Pierangelo Sara Quaggia Catherine Tveit Sebastiano Vianello 《Journal of applied statistics》2022,49(10):2674

Industrial statistics plays a major role in the areas of both quality management and innovation. However, existing methodologies must be integrated with the latest tools from the field of Artificial Intelligence. To this end, a background on the joint application of Design of Experiments (DOE) and Machine Learning (ML) methodologies in industrial settings is presented here, along with a case study from the chemical industry. A DOE study is used to collect data, and two ML models are applied to predict responses which performance show an advantage over the traditional modeling approach. Emphasis is placed on causal investigation and quantification of prediction uncertainty, as these are crucial for an assessment of the goodness and robustness of the models developed. Within the scope of the case study, the models learned can be implemented in a semi-automatic system that can assist practitioners who are inexperienced in data analysis in the process of new product development. 相似文献

18.

Likelihood-based cross-validation score for selecting the smoothing parameter in maximum penalized likelihood estimation

Wataru Sakamoto Shingo Shirahata 《统计学通讯:理论与方法》2013,42(7):1671-1698

Maximum penalized likelihood estimation is applied in non(semi)-para-metric regression problems, and enables us exploratory identification and diagnostics of nonlinear regression relationships. The smoothing parameter A controls trade-off between the smoothness and the goodness-of-fit of a function. The method of cross-validation is used for selecting A, but the generalized cross-validation, which is based on the squared error criterion, shows bad be¬havior in non-normal distribution and can not often select reasonable A. The purpose of this study is to propose a method which gives more suitable A and to evaluate the performance of it.

A method of simple calculation for the delete-one estimates in the likeli¬hood-based cross-validation (LCV) score is described. A score of similar form to the Akaike information criterion (AIC) is also derived. The proposed scores are compared with the ones of standard procedures by using data sets in liter¬atures. Simulations are performed to compare the patterns of selecting A and overall goodness-of-fit and to evaluate the effects of some factors. The LCV-scores by the simple calculation provide good approximation to the exact one if λ is not extremeiy smaii Furthermore the LCV scores by the simple size it possible to select X adaptively They have the effect, of reducing the bias of estimates and provide better performance in the sense of overall goodness-of fit. These scores are useful especially in the case of small sample size and in the case of binary logistic regression. 相似文献

19.

A comparison of classification models to identify the Fragile X Syndrome

Rafael Pino-Mejías Mercedes Carrasco-Mairena Antonio Pascual-Acosta María-Dolores Cubiles-De-La-Vega Joaquín Muñoz-García 《Journal of applied statistics》2008,35(3):233-244

The main models of machine learning are briefly reviewed and considered for building a classifier to identify the Fragile X Syndrome (FXS). We have analyzed 172 patients potentially affected by FXS in Andalusia (Spain) and, by means of a DNA test, each member of the data set is known to belong to one of two classes: affected, not affected. The whole predictor set, formed by 40 variables, and a reduced set with only nine predictors significantly associated with the response are considered. Four alternative base classification models have been investigated: logistic regression, classification trees, multilayer perceptron and support vector machines. For both predictor sets, the best accuracy, considering both the mean and the standard deviation of the test error rate, is achieved by the support vector machines, confirming the increasing importance of this learning algorithm. Three ensemble methods - bagging, random forests and boosting - were also considered, amongst which the bagged versions of support vector machines stand out, especially when they are constructed with the reduced set of predictor variables. The analysis of the sensitivity, the specificity and the area under the ROC curve agrees with the main conclusions extracted from the accuracy results. All of these models can be fitted by free R programs. 相似文献

20.

Estimation of the linear-plateau segmented regression model in the presence of measurement error

Scott D. Grimshaw 《统计学通讯:理论与方法》2013,42(8):2399-2413

It is well known that when the true values of the independent variable are unobservable due to measurement error, the least squares estimator for a regression model is biased and inconsistent. When repeated observations on each x_i are taken, consistent estimators for the linear-plateau model can be formed. The repeated observations are required to classify each observation to the appropriate line segment. Two cases of repeated observations are treated in detail. First, when a single value of y_i is observed with the repeated observations of x_i the least squares estimator using the mean of the repeated x_i observations is consistent and asymptotically normal. Second, when repeated observations on the pair (x_i, y_i ) are taken the least squares estimator is inconsistent, but two consistent estimators are proposed: one that consistently estimates the bias of the least squares estimator and adjusts accordingly; the second is the least squares estimator using the mean of the repeated observations on each pair. 相似文献