Current Search: Research Repository (x) » * (x) » Statistics (x)
Search results
Pages
 Title
 PERCENTILE RESIDUAL LIFE FUNCTIONS  PROPERTIES, TESTING AND ESTIMATION.
 Creator

JOE, HARRY SUE WAH., Florida State University
 Abstract/Description

Let F be a life distribution with survival function F(' )(TBOND)(' )1  F. Conditional on survival to time t, the remaining life has survival function, F(,t)(x) = F(t + x)/F(t), x (GREATERTHEQ) 0, 0 (LESSTHEQ) t
Show moreLet F be a life distribution with survival function F(' )(TBOND)(' )1  F. Conditional on survival to time t, the remaining life has survival function, F(,t)(x) = F(t + x)/F(t), x (GREATERTHEQ) 0, 0 (LESSTHEQ) t < F('1)(1)., The mean residual life function of F is, (DIAGRAM, TABLE OR GRAPHIC OMITTED...PLEASE SEE DAI), if F has a finite mean. The (alpha)percentile or quantile (0 < (alpha) < 1) residual life function of F is, q(,(alpha),F)(t) = F(,t)('1)((alpha)) = F('1)(1  (alpha)F(t))  t, 0 (LESSTHEQ) t < F('1)(1),, where (alpha) = 1  (alpha). Statisticians find it useful to categorize life distributions according to different aging properties. Categories which involve m(,F)(t) are the decreasing mean residual life (DMRL) class and the new better than used in expectation (NBUE) class. The DMRL class consists of distributions F such that m(,F)(t) is monotone decreasing on (0, F('1)(1)) and the NBUE class consists of distributions F such that m(,F)(0) (GREATERTHEQ) m(,F)(t) for all 0 < t < F('1)(1). Analogous categories which involve q(,(alpha),F)(t) are the decreasing (alpha)percentile residual life (DPRL(alpha)) class and the new better than used with respect to the (alpha)percentile (NBUP(alpha)) class., The mean residual life function is of interest in biometry, actuarial studies and reliability, and the DMRL and NBUE classes of life distributions are useful for modelling situations where items deteriorate with age. In the statistical literature, there are several papers which consider properties or estimation of the mean residual life function or consider testing situations involving the DMRL and NBUE classes. Only one previous paper discusses the (alpha)percentile residual life function. This dissertation is concerned with properties and estimation of the (alpha)percentile residual life function, and with testing problems involving the (alpha)percentile residual life function., Properties of q(,(alpha),F)(t) and of the DPRL(alpha), NBUP(alpha) and their dual classes are studied in Chapter II. In Chapter III, tests are developed for testing exponentiality against alternatives of DPRL(alpha) and NBUP(alpha). In Chapter IV, these tests are extended to accommodate randomly censored data. In Chapter V, a distributionfree twosample test is developed for testing the hypothesis that two life distributions F and G are equal against the alternative that q(,(alpha),F)(t) (GREATERTHEQ) q(,(alpha),G)(t) for all t. In Chapter VI, strong consistency, asymptotic normality, bias and mean squared error of the estimator F(,n)('1)(1(' )(' )(alpha)F(,n)(t))  t of q(,(alpha),F)(t) are studied, where F(,n) is the empirical distribution function and F(,n)(' )(TBOND)(' )1  F(,n).
Show less  Date Issued
 1982, 1982
 Identifier
 AAI8214932, 3085276, FSDT3085276, fsu:74771
 Format
 Document (PDF)
 Title
 A NEW METHOD FOR ESTIMATING LIFE DISTRIBUTIONS FROM INCOMPLETE DATA.
 Creator

KITCHIN, JOHN FRANCIS., Florida State University
 Abstract/Description

We construct a new estimator for a continuous life distribution from incomplete data, the Piecewise Exponential Estimator (PEXE)., To date the principal method of nonparametric estimation from incomplete data is the ProductLimit Estimator (PLE) introduced by Kaplan and Meier {J. Amer. Statist. Assoc. (1958) 53}. Our formulation of the estimation problem posed by incomplete data is essentially that of Kaplan and Meier, but we approach its solution from the viewpoint of reliability and life...
Show moreWe construct a new estimator for a continuous life distribution from incomplete data, the Piecewise Exponential Estimator (PEXE)., To date the principal method of nonparametric estimation from incomplete data is the ProductLimit Estimator (PLE) introduced by Kaplan and Meier {J. Amer. Statist. Assoc. (1958) 53}. Our formulation of the estimation problem posed by incomplete data is essentially that of Kaplan and Meier, but we approach its solution from the viewpoint of reliability and life testing., In this work we establish rigorously the asymptotic (large sample) properties of the PEXE. Our results include the strong consistency of the PEXE under various sets of assumptions plus the weak convergence of the PEXE, suitably normalized, to a Gaussian process. From an intermediate result in our weak convergence proof we derive asymptotic confidence bands and a goodnessoffit test based on the PEXE., Though our main objective is the introduction of a new estimator for incomplete data and the study of its asymptotic properties, our second contribution to this area of research is the extension of the asymptotic results of the extensively used PLE. In particular, our results extend the work of Peterson {J. Amer. Statist. Assoc. (1977) 72} and Langberg, Proschan, and Quinzi {Ann. Statist. (1980) 8} in strong consistency and that of Breslow and Crowley {Ann. Statist. (1974) 2} in weak convergence., Finally, we show that the New PEXE, as an alternative to the traditional PLE, has several advantages for estimating a continuous life distribution from incomplete data, along with some drawbacks. Since the two estimators are so alike asymptotically, we concentrate on differences in the PEXE and the PLE for estimation from small samples.
Show less  Date Issued
 1980, 1980
 Identifier
 AAI8104261, 3084762, FSDT3084762, fsu:74263
 Format
 Document (PDF)
 Title
 A MATHEMATICAL STUDY OF THE DIRICHLET PROCESS.
 Creator

TIWARI, RAM CHANDRA., Florida State University
 Abstract/Description

This dissertation is a contribution to the theory of Bayesian nonparametrics. A construction of the Dirichlet process (Ferguson {1973}) on a finite set (chi) is introduced in such a way that it leads to the Blackwell's (1973) constructive definition of a Dirichlet process on a Borel space ((chi),A). If ((chi),A) is a Borel space and P is a random probability measure on ((chi),A) with a Dirichlet process prior D('(alpha)), then under the condition that the (alpha)measure of every open subset...
Show moreThis dissertation is a contribution to the theory of Bayesian nonparametrics. A construction of the Dirichlet process (Ferguson {1973}) on a finite set (chi) is introduced in such a way that it leads to the Blackwell's (1973) constructive definition of a Dirichlet process on a Borel space ((chi),A). If ((chi),A) is a Borel space and P is a random probability measure on ((chi),A) with a Dirichlet process prior D('(alpha)), then under the condition that the (alpha)measure of every open subset of (chi) is positive, for almost every realization P of P the set of discrete mass points of P is dense in (chi)., A more general constructive definition introduced by Sethuraman (1978) is used to derive several new properties of the Dirichlet process and to present in a unified way some of the known properties of the process. An alternative construction of Dalal's (1975) Ginvariant Dirichlet process (G being a finite group of transformations) is presented., The Bayes estimates of an estimable parameter of degree k(k (GREATERTHEQ) 1), namely, (DIAGRAM, TABLE OR GRAPHIC OMITTED...PLEASE SEE DAI), where h is a symmetric kernel, are derived for the no sample size and for a sample of size n from P under the squared error loss function and a Dirichlet process prior. Using the result of the Bayes estimate of (psi)(,k)(P) for the no sample size the (marginal) distribution of a sample from P (when the prior for P is the Dirichlet process) is obtained. The extension to the case when the prior for P is Ginvariant Dirichlet process is also obtained.(,), Let ((chi),A) be the onedimensional Euclidean space (R(,1),B(,1)). Consider a sequence {D('(alpha)(,N)+(gamma))} of Dirichlet processes such that (alpha)(,N)((chi)) converges to zero as N tends to infinity, where (gamma) and (alpha)(,N)'s are finite measures on A. It is shown that D('(alpha)(,N)+(gamma)) converges weakly to D('(gamma)) in the topology of weak, convergence on P, the class of all probability measures on ((chi),A). As a corollary, it follows that D('(alpha)(,N)+nF(,n)) converges weakly to D('nF(,n)), where F(,n) is the empirical distribution of the sample. Suppose (alpha)(,N)((chi)) converges to zero and (alpha)(,N)/(alpha)(,N)((chi)) converges uniformly to (alpha)/(alpha)((chi)) as N tends to infinity. If, {D('(alpha)(,N))} is a sequence of Dirichlet process priors for a random probability measure P on ((chi),A), then P, in the limit, is a random probability measure concentrated on the set of degenerate probability measures on ((chi),A) and the point of degeneracy is distributed as (alpha)/(alpha)((chi)) on ((chi),A). To the sequence of priors (D('(alpha)(,N))) for P, there corresponds a sequence of the Bayes estimates of (psi)(,k)(P). The limit of this sequence of the Bayes estimates when (alpha)(,N)((chi)) converges to zero as N tends to infinity, called the limiting Bayes estimate of (psi)(,k)(P), is obtained., When P is a random probability measure on {0, 1}, Sethuraman (1978) proposed a more general class of conjugate priors for P which contains both the family of Dirichlet processes and the family of priors introduced by Dubins and Freedman (1966). As an illustration, a numerical example is considered and the Bayes estimates of the mean and the variance of P are computed under three distinct priors chosen from Sethuraman's class of priors. The computer algorithm for this calculation is presented.
Show less  Date Issued
 1981, 1981
 Identifier
 AAI8108190, 3084828, FSDT3084828, fsu:74329
 Format
 Document (PDF)
 Title
 AN INVESTIGATION OF THE EFFECT OF THE SWAMPING PHENOMENON ON SEVERAL BLOCK PROCEDURES FOR MULTIPLE OUTLIERS IN UNIVARIATE SAMPLES.
 Creator

WOOLLEY, THOMAS WILLIAM, JR., Florida State University
 Abstract/Description

Statistical outliers have been an issue of concern to researchers for over two centuries, and are the focus of this study. Sources of outliers, and various means for dealing with them are discussed. Also presented are general descriptions of univariate outlier tests as well as the two approaches to handling multiple outlier situations, consecutive and block testing. The major problems inherent in these latter methods, masking and swamping, respectively, are recounted., Specifically, the...
Show moreStatistical outliers have been an issue of concern to researchers for over two centuries, and are the focus of this study. Sources of outliers, and various means for dealing with them are discussed. Also presented are general descriptions of univariate outlier tests as well as the two approaches to handling multiple outlier situations, consecutive and block testing. The major problems inherent in these latter methods, masking and swamping, respectively, are recounted., Specifically, the primary aim of this study is to assess the susceptibility to swamping of four block procedures for multiple outliers in univariate samples., Pseudorandom samples are generated from a unit normal distribution, and varying numbers of upper outliers are placed in them according to specified criteria. A swamping index is created which reflects the relative vulnerability of each test to declare a block of outliers and the most extreme upper nonoutlier discordant, as a unit., The results of this investigation reveal that the four block tests disagree in their respective susceptibilities to swamping depending upon sample size and the prespecified number of outliers assumed to be present. Rank orderings of these four tests based upon their vulnerability to swamping under varying circumstances are presented. In addition, alternate approaches to calculating the swamping index when four or more outliers exist are described., Recommendations concerning the appropriate application of the four block procedures under differing situations, and proposals for further research, are advanced.
Show less  Date Issued
 1981, 1981
 Identifier
 AAI8113272, 3084903, FSDT3084903, fsu:74401
 Format
 Document (PDF)
 Title
 ESTIMATION AND PREDICTION FOR EXPONENTIAL TIME SERIES MODELS.
 Creator

MOHAMED, FOUAD YEHIA., Florida State University
 Abstract/Description

This work is concerned with the study of stationary time series models in which the marginal distribution of the observations follows an exponential distribution. This is in contrast to the standard models in the literature where the error sequence and hence the marginal distributions of the o
 Date Issued
 1981, 1981
 Identifier
 AAI8205698, 3085176, FSDT3085176, fsu:74671
 Format
 Document (PDF)
 Title
 TimeVarying Coefficient Models with ARMAGARCH Structures for Longitudinal Data Analysis.
 Creator

Zhao, Haiyan, Niu, Xufeng, Huﬀer, Fred, Nolder, Craig, McGee, Dan, Department of Statistics, Florida State University
 Abstract/Description

The motivation of my research comes from the analysis of the Framingham Heart Study (FHS) data. The FHS is a long term prospective study of cardiovascular disease in the community of Framingham, Massachusetts. The study began in 1948 and 5,209 subjects were initially enrolled. Examinations were given biennially to the study participants and their status associated with the occurrence of disease was recorded. In this dissertation, the event we are interested in is the incidence of the coronary...
Show moreThe motivation of my research comes from the analysis of the Framingham Heart Study (FHS) data. The FHS is a long term prospective study of cardiovascular disease in the community of Framingham, Massachusetts. The study began in 1948 and 5,209 subjects were initially enrolled. Examinations were given biennially to the study participants and their status associated with the occurrence of disease was recorded. In this dissertation, the event we are interested in is the incidence of the coronary heart disease (CHD). Covariates considered include sex, age, cigarettes per day (CSM), serum cholesterol (SCL), systolic blood pressure (SBP) and body mass index (BMI, weight in kilograms/height in meters squared). Statistical literature review indicates that effects of the covariates on Cardiovascular disease or death caused by all possible diseases in the Framingham study change over time. For example, the effect of SCL on Cardiovascular disease decreases linearly over time. In this study, I would like to examine the timevarying effects of the risk factors on CHD incidence. Timevarying coefficient models with ARMAGARCH structure are developed in this research. The maximum likelihood and the marginal likelihood methods are used to estimate the parameters in the proposed models. Since highdimensional integrals are involved in the calculations of the marginal likelihood, the Laplace approximation is employed in this study. Simulation studies are conducted to evaluate the performance of these two estimation methods based on our proposed models. The KullbackLeibler (KL) divergence and the root mean square error are employed in the simulation studies to compare the results obtained from different methods. Simulation results show that the marginal likelihood approach gives more accurate parameter estimates, but is more computationally intensive. Following the simulation study, our proposed models are applied to the Framingham Heart Study to investigate the timevarying effects of covariates with respect to CHD incidence. To specify the timeseries structures of the effects of risk factors, the Bayesian Information Criterion (BIC) is used for model selection. Our study shows that the relationship between CHD and risk factors changes over time. For males, there is an obviously decreasing linear trend for age effect, which implies that the age effect on CHD is less significant for elder patients than younger patients. The effect of CSM stays almost the same in the first 30 years and decreases thereafter. There are slightly decreasing linear trends for both effects of SBP and BMI. Furthermore, the coefficients of SBP are mostly positive over time, i.e., patients with higher SBP are more likely developing CHD as expected. For females, there is also an obviously decreasing linear trend for age effect, while the effects of SBP and BMI on CHD are mostly positive and do not change too much over time.
Show less  Date Issued
 2010
 Identifier
 FSU_migr_etd0527
 Format
 Thesis
 Title
 TESTS OF DISPLACEMENT AND ORDERED MEAN HYPOTHESES.
 Creator

SINCLAIR, DENNIS FRANKLIN., Florida State University
 Abstract/Description

Character displacement is an ecological process by which, theoretically, coexisting species diverge in size to reduce competition. A closely allied concept is deletion, in which species are excluded from a habitat because they do not differ sufficiently from other species living there. Character displacement has been a controversial topic in recent years, largely due to a lack of statistical procedures for testing its existence. We propose herein a variety of approaches for testing...
Show moreCharacter displacement is an ecological process by which, theoretically, coexisting species diverge in size to reduce competition. A closely allied concept is deletion, in which species are excluded from a habitat because they do not differ sufficiently from other species living there. Character displacement has been a controversial topic in recent years, largely due to a lack of statistical procedures for testing its existence. We propose herein a variety of approaches for testing displacement and deletion hypotheses. The applicability of the methods extends beyond the motivating ecological problem to other fields., Consider the model, X(,ij) = (mu)(,i) + (epsilon)(,ij), i = 1, ..., k; j = 1, ..., n(,i),, where X(,ij) is the j('th) observation on species i with population mean (mu)(,i). The (epsilon)(,ij)'s are independent normally distributed error terms with mean zero and common variance., Traditionally ecologists have regarded species sizes as randomly distributed. We develop tests for displacement and deletion by considering uniform, lognormal and loguniform distributions for species sizes. (A random variable Y has a loguniform distribution if log Y has a uniform distribution.), Most claimed manifestations of character displacement concern the ratios of each species size to the next smallest one (contiguous ratios). All but one of the test statistics are functions of spacings (logarithms of contiguous ratios). We prove a useful characterization of distributions in terms of spacings, and show that the loguniform distribution produces constant expected contiguous ratiosan important property in character displacement studies. The random effects approaches generally lack power in detecting the suspected patterns., We develop further tests for the model in which the (mu)(,i)'s are regarded as fixed. This fixed effects approach, which may be more realistic ecologically, produces considerably more powerful tests. Displacement hypotheses in the fixed effects framework are expressed naturally in terms of the ordered means (mu)(,(1)) < (mu)(,(2)) < ... < (mu)(,(k)). We develop a general theory by which a particular class of linear hypotheses about any number of sets of ordered means may be tested., Finally a functional relation is used to model the movement of species means from one environment to another. Existing asymptotic tests are shown to perform remarkably well for small samples.
Show less  Date Issued
 1982, 1982
 Identifier
 AAI8223194, 3085332, FSDT3085332, fsu:74827
 Format
 Document (PDF)
 Title
 SOME RESULTS ON THE DISTRIBUTION OF GRUBBS ESTIMATORS.
 Creator

BRINDLEY, DENNIS ALFRED., Florida State University
 Abstract/Description

This dissertation is concerned with the estimation of error variances in a nonreplicated twoway classification and with inferences based on the estimators so derived. The postulated model used throughout the present work is, y(,ij) = (mu)(,i) + (beta)(,j) + (epsilon)(,ij),, where y(,ij) is the observation in the i('th) row and j('th) column, (mu)(,i) is the parameter representing the mean of the i('th) row, (beta)(,j) is the parameter representing the additional effect of the j('th) column,...
Show moreThis dissertation is concerned with the estimation of error variances in a nonreplicated twoway classification and with inferences based on the estimators so derived. The postulated model used throughout the present work is, y(,ij) = (mu)(,i) + (beta)(,j) + (epsilon)(,ij),, where y(,ij) is the observation in the i('th) row and j('th) column, (mu)(,i) is the parameter representing the mean of the i('th) row, (beta)(,j) is the parameter representing the additional effect of the j('th) column,, (DIAGRAM, TABLE OR GRAPHIC OMITTED...PLEASE SEE DAI), and the (epsilon)(,ij) are independent, zeromean, normal variates with, (DIAGRAM, TABLE OR GRAPHIC OMITTED...PLEASE SEE DAI), A set of unbiased estimates, (DIAGRAM, TABLE OR GRAPHIC OMITTED...PLEASE SEE DAI), developed in earlier work by Grubbs (J. AMER. STATIST. ASSOC. 43 (1948), 243264), Ehrenberg (BIOMETRIKA 37 (1950), 347357) and Russell and Bradley (BIOMETRIKA 45 (1958), 111129) are considered., The exact joint density of Q(,1), ..., Q(,r) is obtained for r = 3 and two exact results are derived for testing the null hypothesis,, (DIAGRAM, TABLE OR GRAPHIC OMITTED...PLEASE SEE DAI), unknown, versus the two specific alternatives,, (DIAGRAM, TABLE OR GRAPHIC OMITTED...PLEASE SEE DAI), for at least some j, j = 1, 2, 3, and,, (DIAGRAM, TABLE OR GRAPHIC OMITTED...PLEASE SEE DAI)
Show less  Date Issued
 1982, 1982
 Identifier
 AAI8229146, 3085401, FSDT3085401, fsu:74896
 Format
 Document (PDF)
 Title
 LARGE DEVIATION LOCAL LIMIT THEOREMS, WITH APPLICATIONS.
 Creator

CHAGANTY, NARASINGA RAO., Florida State University
 Abstract/Description

Let {X(,n), n (GREATERTHEQ) 1} be a sequence of i.i.d. random variables withE(X(,1)) = 0, Var(X(,1)) = 1. Let (psi)(s) be the cumulant generating function (c.g.f.) and, (DIAGRAM, TABLE OR GRAPHIC OMITTED...PLEASE SEE DAI), be the large deviation rate of X(,1). Let S(,n) = X(,1) + ... + X(,n). Under some mild conditions on (psi), Richter (Theory Prob. Appl. (1957) 2, 206219) showed that the probability density function f(,n) of(' )S(,n)/SQRT.(n has the asymptotic expression, (DIAGRAM, TABLE...
Show moreLet {X(,n), n (GREATERTHEQ) 1} be a sequence of i.i.d. random variables withE(X(,1)) = 0, Var(X(,1)) = 1. Let (psi)(s) be the cumulant generating function (c.g.f.) and, (DIAGRAM, TABLE OR GRAPHIC OMITTED...PLEASE SEE DAI), be the large deviation rate of X(,1). Let S(,n) = X(,1) + ... + X(,n). Under some mild conditions on (psi), Richter (Theory Prob. Appl. (1957) 2, 206219) showed that the probability density function f(,n) of(' )S(,n)/SQRT.(n has the asymptotic expression, (DIAGRAM, TABLE OR GRAPHIC OMITTED...PLEASE SEE DAI), whenever x(,n) = o(SQRT.(n) and SQRT.(n x(,n) > 1. In this dissertation we obtain similar large deviation local limit theorems for arbitrary sequences of random variables, not necessarily sums of i.i.d. random variables, thereby increasing the applicability of Richter's theorem. Let {T(,n), n (GREATERTHEQ) 1} be an arbitrary sequence of nonlattice random variables with characteristic function (c.f.) (phi)(,n). Let (psi)(,n), (gamma)(,n) be the c.g.f. and the large deviation rate of T(,n)/n. The main theorem in Chapter II shows that under some standard conditions on (psi)(,n), which imply that T(,n)/n converges to a constant in probability, the density function K(,n) of T(,n)/n has the asymptotic expression, (DIAGRAM, TABLE OR GRAPHIC OMITTED...PLEASE SEE DAI), where m(,n) is any sequence of real numbers and (tau)(,n) is defined by(psi)(,n)'((tau)(,n)) = m(,n). When T(,n) is the sum of n i.i.d. random variables our result reduces to Richter's theorem. Similar theorems for lattice valued random variables are also presented which are useful in obtaining asymptotic probabilities for Wilcoxon signedrank test statistic and Kendall's tau., In Chapter III we use the results of Chapter II to obtain central limit theorem for sums of a triangular array of dependent random variables X(,j)('(n)), j = 1, ..., n with joint distribution given by z(,n)('1)exp{H(,n)(x(,1), ..., x(,n))}(PI)dP(x(,j)), where x(,i) (ELEM) R (FOR ALL) i (GREATERTHEQ) 1. The function H(,n)(x(,1), ..., x(,n)) is known as the Hamiltonian. Here P is a probability measure on R. When H(,n)(x(,1), ..., x(,n)) = log (phi)(,n)(s(,n)/n), where s(,n) = x(,1) + ... + x(,n) and the probability measure P satisfies appropriate conditions, we show that there exists an integer r (GREATERTHEQ) 1 and a sequence (tau)(,n) such that (S(,n)  n(tau)(,n))/n('1 1/2r) has a limiting distribution which is nonGaussian if r (GREATERTHEQ) 2. This result generalizes the theorems of JongWoo Jeon (Ph.D. Thesis, Dept. of Stat., F.S.U. (1979)) and Ellis and Newman (Z. Wahrscheinlichkeitstheorie und Verw. Gebiete. (1978) 44, 117139). Chapters IV and V extend the above to the multivariate case.
Show less  Date Issued
 1982, 1982
 Identifier
 AAI8225279, 3085419, FSDT3085419, fsu:74914
 Format
 Document (PDF)
 Title
 A Comparison of Estimators in Hierarchical Linear Modeling: Restricted Maximum Likelihood versus Bootstrap via Minimum Norm Quadratic Unbiased Estimators.
 Creator

Delpish, Ayesha Nneka, Niu, XuFeng, Tate, Richard L., Huﬀer, Fred W., Zahn, Douglas, Department of Statistics, Florida State University
 Abstract/Description

The purpose of the study was to investigate the relative performance of two estimation procedures, the restricted maximum likelihood (REML) and the bootstrap via MINQUE, for a twolevel hierarchical linear model under a variety of conditions. Specific focus lay on observing whether the bootstrap via MINQUE procedure offered improved accuracy in the estimation of the model parameters and their standard errors in situations where normality may not be guaranteed. Through Monte Carlo simulations,...
Show moreThe purpose of the study was to investigate the relative performance of two estimation procedures, the restricted maximum likelihood (REML) and the bootstrap via MINQUE, for a twolevel hierarchical linear model under a variety of conditions. Specific focus lay on observing whether the bootstrap via MINQUE procedure offered improved accuracy in the estimation of the model parameters and their standard errors in situations where normality may not be guaranteed. Through Monte Carlo simulations, the importance of this assumption for the accuracy of multilevel parameter estimates and their standard errors was assessed using the accuracy index of relative bias and by observing the coverage percentages of 95% confidence intervals constructed for both estimation procedures. The study systematically varied the number of groups at level2 (30 versus 100), the size of the intraclass correlation (0.01 versus 0.20) and the distribution of the observations (normal versus chisquared with 1 degree of freedom). The number of groups and intraclass correlation factors produced effects consistent with those previously reported—as the number of groups increased, the bias in the parameter estimates decreased, with a more significant effect observed for those estimates obtained via REML. High levels of the intraclass correlation also led to a decrease in the efficiency of parameter estimation under both methods. Study results show that while both the restricted maximum likelihood and the bootstrap via MINQUE estimates of the fixed effects were accurate, the efficiency of the estimates was affected by the distribution of errors with the bootstrap via MINQUE procedure outperforming the REML. Both procedures produced less efficient estimators under the chisquared distribution, particularly for the variancecovariance component estimates.
Show less  Date Issued
 2006
 Identifier
 FSU_migr_etd0771
 Format
 Thesis
 Title
 Estimation from Data Representing a Sample of Curves.
 Creator

Auguste, Anna L., Bunea, Florentina, Mason, Patrick, Hollander, Myles, Huﬀer, Fred, Department of Statistics, Florida State University
 Abstract/Description

This dissertation introduces and assesses an algorithm to generate confidence bands for a regression function or a main effect when multiple data sets are available. In particular it proposes to construct confidence bands for different trajectories and then aggregate these to produce an overall confidence band for a mean function. An estimator of the regression function or main effect is also examined. First, nonparametric estimators and confidence bands are formed on each data set separately...
Show moreThis dissertation introduces and assesses an algorithm to generate confidence bands for a regression function or a main effect when multiple data sets are available. In particular it proposes to construct confidence bands for different trajectories and then aggregate these to produce an overall confidence band for a mean function. An estimator of the regression function or main effect is also examined. First, nonparametric estimators and confidence bands are formed on each data set separately. Then each data set is in turn treated as a testing set for aggregating the preliminary results from the remaining data sets. The criterion used for this aggregation is either the least squares (LS) criterion or a BIC type penalized LS criterion. The proposed estimator is the average over data sets of these aggregates. It is thus a weighted sum of the preliminary estimators. The proposed confidence band is the minimum L1 band of all the M aggregate bands when we only have a main effect. In the case where there is some random effect we suggest an adjustment to the confidence band. In this case, the proposed confidence band is the minimum L1 band of all the M adjusted aggregate bands. Desirable asymptotic properties are shown to hold. A simulation study examines the performance of each technique relative to several alternate methods and theoretical benchmarks. An application to seismic data is conducted.
Show less  Date Issued
 2006
 Identifier
 FSU_migr_etd0286
 Format
 Thesis
 Title
 Modelling experimental data analysis.
 Creator

Ford, Charles Wesley, Jr., Florida State University
 Abstract/Description

An important goal of research in scientific databases is to provide a capability for managing the acquisition and analysis of data and to assist in the development and evaluation of data analysis functions. This dissertation is an important step toward this goal and represents an extension and application of objectoriented database technology. It uses the objectoriented approach to provide a model which describes a very general strategy for data analysis. This model is used to precisely...
Show moreAn important goal of research in scientific databases is to provide a capability for managing the acquisition and analysis of data and to assist in the development and evaluation of data analysis functions. This dissertation is an important step toward this goal and represents an extension and application of objectoriented database technology. It uses the objectoriented approach to provide a model which describes a very general strategy for data analysis. This model is used to precisely define data acquisition and data analysis in a style which is well suited to management by a database system. The model has been implemented and evaluated in the context of a very complex experimental physics project. The implementation is described in detail with examples of the database schema and operations using the C${++}$ binding of ODMG93. As a result of the careful application of objectoriented methods, the management of data acquisition and analysis has become feasible for domain scientists. The application of the methods described herein to specific problems in experimental physics has resulted in a database which will be used to manage all of the data acquisition and data analysis for the Large Acceptance Spectrometer (CLAS) at the Continuous Electron Beam Accelerator Facility (CEBAF), a U.S. Department of Energy project.
Show less  Date Issued
 1995, 1995
 Identifier
 AAI9529599, 3088659, FSDT3088659, fsu:77461
 Format
 Document (PDF)
 Title
 On a general repair model for repairable systems.
 Creator

Dorado, Crisanto Ayap., Florida State University
 Abstract/Description

The minimal repair process assumes that upon repair a system is restored to its functioning condition just before failure. For systems with few vulnerable components it is more reasonable to assume that repair actually brings the state of the system to a level that is between "completely new" and "prior to failure". Kijima (1989) introduced models for such a repair process based on the notion of age reduction. Under age reduction, the system, upon repair, is functionally the same as an...
Show moreThe minimal repair process assumes that upon repair a system is restored to its functioning condition just before failure. For systems with few vulnerable components it is more reasonable to assume that repair actually brings the state of the system to a level that is between "completely new" and "prior to failure". Kijima (1989) introduced models for such a repair process based on the notion of age reduction. Under age reduction, the system, upon repair, is functionally the same as an identical system of lesser age. An alternative to age reduction is the notion of extra life. Under this notion, the system, upon repair, enjoys a longer expected remaining life than it would have had under a minimal repair., In this dissertation, we introduce a repair model that generalizes Kijima's models so as to include both the notions of age reduction and extra life. We then look at the problem of estimating system reliability based on observations of the repair process from several systems working independently. We make use of counting processes and martingales to derive large sample properties of the estimator.
Show less  Date Issued
 1995, 1995
 Identifier
 AAI9540050, 3088702, FSDT3088702, fsu:77504
 Format
 Document (PDF)
 Title
 PART 1  THE LIMITING DISTRIBUTION OF THE LIKELIHOOD RATIO STATISTIC 2 LOG(LAMBDA(N)) UNDER A CLASS OF LOCAL ALTERNATIVES. PART 2  MINIMUM AVERAGE RISK DECISION PROCEDURES FOR THE NONCENTRAL CHISQUARE DISTRIBUTION.
 Creator

LEVER, WILLIAM EDWIN., The Florida State University
 Date Issued
 1968, 1968
 Identifier
 AAI6811680, 2985783, FSDT2985783, fsu:70292
 Format
 Document (PDF)
 Title
 TESTING WHETHER NEW IS BETTER THAN USED OF A SPECIFIED AGE.
 Creator

PARK, DONG HO., Florida State University
 Abstract/Description

This research contributes to the theory and methods of testing hypotheses for classes of life distributions. Two classes of life distributions considered in this dissertation are: (1) The New Better Than Used (NBU) Class: The life distribution F is NBU if F(x+y)(' )(LESSTHEQ)(' )F(x)F(y) for all x, y (GREATERTHEQ) 0, where F(' )(TBOND)(' )1  F. (2) The New Better Than Used at t(,0) (NBUt(,0)) Class: The life distribution F is NBUt(,0) if F(x+t(,0))(' )(LESSTHEQ)(' )F(x)F(t(,0)) for all x ...
Show moreThis research contributes to the theory and methods of testing hypotheses for classes of life distributions. Two classes of life distributions considered in this dissertation are: (1) The New Better Than Used (NBU) Class: The life distribution F is NBU if F(x+y)(' )(LESSTHEQ)(' )F(x)F(y) for all x, y (GREATERTHEQ) 0, where F(' )(TBOND)(' )1  F. (2) The New Better Than Used at t(,0) (NBUt(,0)) Class: The life distribution F is NBUt(,0) if F(x+t(,0))(' )(LESSTHEQ)(' )F(x)F(t(,0)) for all x (GREATERTHEQ) 0., The NBU and NBUt(,0) classes have dual classes (New Worse Than Used and New Worse Than Used At t(,0), respectively) defined by reversing the inequality., The NBUt(,0) class is a new class of life distributions and contains the NBU class. We study the basic properties of the NBUt(,0) class and propose a test of H(,0): F(x+t(,0))(' )=(' )F(x)F(t(,0)) for all x (GREATERTHEQ) 0, versus H(,A): F(x+t(,0))(' )(LESSTHEQ)(' )F(x)F(t(,0)) for all x (GREATERTHEQ) 0 and the inequality holds for some x (GREATERTHEQ) 0, based on a complete random sample X(,1), ..., X(,n) from F. Our test can also be used to test H(,0) against the NWUt(,0) alternatives. Asymptotic relative efficiencies of our test with respect to the Hollander and Proschan (1972, Ann. Math. Statist. 43, 11361146) NBU test are calculated for several distributions., We extend our test of H(,0) versus H(,A) to accommodate randomly censored data. For the censored data situation our test is based on the statistic, (DIAGRAM, TABLE OR GRAPHIC OMITTED...PLEASE SEE DAI), where F is the KaplanMeier (1958, J. Amer. Statist. Assoc. 53, 457481) estimator of(' )F. Under mild regularity conditions on the amount of censoring, a consistent test of H(,0) versus H(,A) for the randomly censored model is obtained., In Chapter III we develop a twosample NBU test of the null hypothesis that two distributions F and G are equal, versus the alternative that F is "more NBU" than is G. Our test is based on the statistic, (DIAGRAM, TABLE OR GRAPHIC OMITTED...PLEASE SEE DAI), where m and n are the sample sizes from F and G, and F(,m) and G(,n) are the empirical distributions of F and G. Asymptotic normality of T(,m,n), suitably normalized, is a direct consequence of Hoeffding's (1948, Ann. Math. Statist. 19, 293325) Ustatistic theorem. Then, using a consistent estimator of the null asymptotic variance of N(' 1/2)T(,m,n), where N = m + n, we obtain an asymptotically distributionfree test. We extend the twosample NBU test to the ksample case., Our test of H(,0) versus H(,A) utilizes the KaplanMeier estimator. However, there are other possible estimators of the survival function for the randomly censored model. . . . (Author's abstract exceeds stipulated maximum length. Discontinued here with permission of author.) UMI
Show less  Date Issued
 1982, 1982
 Identifier
 AAI8301540, 3085466, FSDT3085466, fsu:74958
 Format
 Document (PDF)
 Title
 AN INCREASING FAILURE RATE APPROACH TO CONSERVATIVE LOW DOSE EXTRAPOLATION (SAFE DOSE).
 Creator

SCHELL, MICHAEL J., Florida State University
 Abstract/Description

This dissertation provides a new method of treating the conservative low dose extrapolation problem. One wishes to determine the largest dose d, called the "safe" dose, for which P(F(d) (LESSTHEQ) r) (GREATERTHEQ) 1  (eta) where F(d) is the proportion of failures, say cancers induced, at dose d by time T. F is a life distribution function, presumed to come from some class of functions F, T is prespecified, r () {0,1}, denotes the proportion of failures at doses (x,y) by fixed time T. Four...
Show moreThis dissertation provides a new method of treating the conservative low dose extrapolation problem. One wishes to determine the largest dose d, called the "safe" dose, for which P(F(d) (LESSTHEQ) r) (GREATERTHEQ) 1  (eta) where F(d) is the proportion of failures, say cancers induced, at dose d by time T. F is a life distribution function, presumed to come from some class of functions F, T is prespecified, r () {0,1}, denotes the proportion of failures at doses (x,y) by fixed time T. Four extensions of the univariate class of IFR functions are introduced, differing in the way that convexity of the hazard function, H(x,y) = ln(1F(x,y)) is posited. The notion of dependent action is considered and a hypothesis test for its existence given., Conservative low dose extrapolation techniques for the two most prominent classes are given. An upper bound for the hazard function is established for low doses with proofs that the bounds are sharp.
Show less  Date Issued
 1984, 1984
 Identifier
 AAI8427325, 3085936, FSDT3085936, fsu:75422
 Format
 Document (PDF)
 Title
 TESTING WHETHER MEAN RESIDUAL LIFE CHANGES TREND.
 Creator

GUESS, FRANK MITCHELL., Florida State University
 Abstract/Description

Given that an item is of age t, the expected value of the random remaining life is called the mean residual life (MRL) at age t. We propose two new nonparametric classes of life distributions for modeling aging based on MRL. The first class of life distributions consists of those with "increasing initially, then decreasing mean residual life" (IDMRL). The IDMRL class models aging that is initially beneficial, then adverse. The second class, "decreasing, then increasing mean residual life" ...
Show moreGiven that an item is of age t, the expected value of the random remaining life is called the mean residual life (MRL) at age t. We propose two new nonparametric classes of life distributions for modeling aging based on MRL. The first class of life distributions consists of those with "increasing initially, then decreasing mean residual life" (IDMRL). The IDMRL class models aging that is initially beneficial, then adverse. The second class, "decreasing, then increasing mean residual life" (DIMRL), models aging that is initially adverse, then beneficial. We present situations where IDMRL (DIMRL) distributions are useful models. We propose two testing procedures for H(,0): constant MRL (i.e., exponentiality) versus H(,1): IDMRL, but not constant MRL (or H(,1)(''): DIMRL, but not constant MRL). The first testing procedure assumes the turning point, (tau), from IMRL to DMRL is specified by the user or is known. Our IDMRL((tau)) test statistic, T(,n), is a differentiable statistical function of order 1; thus, T(,n), suitably standardized is asymptotically normal. The second procedure assumes knowledge of the proportion, (rho), of the population that "dies" at or before the turning point (knowledge of (tau) itself is not assumed). We use Lstatistic theory to show our IDMRL((rho)) test statistic, V(,n)('*), appropriately standardized is asymptotically normal. The exact null distribution of V(,n)('*) is established. For each of these procedures an application is given. After this we modify the complete data tests to yield analogous censored data procedures. The standard KaplanMeier Estimator is a key tool that we exploit for our censored data tests. A limited Monte Carlo study investigates the censored data procedures.
Show less  Date Issued
 1984, 1984
 Identifier
 AAI8428699, 3085942, FSDT3085942, fsu:75428
 Format
 Document (PDF)
 Title
 ON SEQUENTIAL UNBIASED AND BAYESTYPE ESTIMATES OF PARAMETERS IN A CONTINGENCY TABLE.
 Creator

CHEN, CHENGCHUNG., Florida State University
 Abstract/Description

Estimation of the probability parameters in a contingency table with linear and/or loglinear constraints on the parameters is the principal concern of this thesis. Sequential unbiased estimates of the cell probabilities as well as some Bayes posterior mean type estimates are considered., Chapter I is a review of some earlier work on the sequential unbiased estimation of the probability parameter in a Bernoulli process. The review begins with the classical work of Girshick, Mosteller and...
Show moreEstimation of the probability parameters in a contingency table with linear and/or loglinear constraints on the parameters is the principal concern of this thesis. Sequential unbiased estimates of the cell probabilities as well as some Bayes posterior mean type estimates are considered., Chapter I is a review of some earlier work on the sequential unbiased estimation of the probability parameter in a Bernoulli process. The review begins with the classical work of Girshick, Mosteller and Savage (1946) and some followup studies like Wolfowitz (1946), Savage (1947), Blackwell (1947), Lehmann and Stein (1950), Degroot (1959) and Kagan, Linnik and Rao (1973). In several cases the original proofs have been simplified and the arguments streamlined., Chapter II deals with the problem of sequential unbiased estimation of the parameters in a contingency table with linear and/or loglinear constraints. Multinomial Girschick, Mosteller and Savage (GMS) type stopping rules are discussed and the corresponding unbiased estimates based on the minimal sufficient statistic described. Consistency, in the sence of Wolfowitz (1947), of such estimates is demonstrated. Unbiased estimates of parametric functions like logcontrasts are derived. Sufficient conditions for the completeness of the GMStype stopping rules are given., In Chapter III, the problem of sequential unbiased estimation of the probability parameters in the BradleyTerry (1952) model of paired comparisons is studied.g The BradleyTerry model can be summarized as follows. Suppose that there are t treatments T(,1), ..., T(,t) that can be pairwise compared. The BradleyTerry model postulates that associated with treatement T(,i) is a :strenth" parameter (PI)(,i) > 0, i = 1, ..., t, such that if treatments T(,i) and T(,j) are compared, the probability that T(,i) is preferred to T(,j) is (theta)(,ij) = (PI)(,i)/((PI)(,i) + (PI)(,j)). The model imposes loglinear constraints on he (theta)(,ij)'s so that techniques similar to those in Chapter II may be used to obtain unbiased estimates, based on a sufficient statistic., In Chapter IV, two Bayestype procedures for estimating the multnomial cell probability vector p, in he presence of linear constraints on the parameters, are proposed and illustrated with examples. A general prior is used with the restriction that the moment generating function of the prior exists in a closed form. The estimators are shown to be strongly consistent. Estimation under loglinear constraints is also considered. Finally, Bayestype estimators for the covariance matrix of the cell frequencies are presented for some special cases of linearly and loglinearly constrained problems., Chapter V is concerned with a Bayesian approach to the estimation of parameters in the BradleyTerry model of paired comparisons. It is assumed that the sum of the treatment parameters (PI)(,i) is 1, and a Dirichlet prior for (PI) = ((PI)(,1), ..., (PI)(,t)) is used. Using the induced prior of (theta)(,ij) and Z(,ij) = (PI)(,i) + (PI)(,j), an estimate (PI)(,ij) of (PI)(,i), based on the data arising from the comparisons of(' ) treatments T(,i) and T(,j), is obtained. An estimate of (PI)(,i) based on all the data is a weighted combination of the (PI)(,ij)'s that minimizes a(' ) risk function. Similarly, estimates for logcontrasts of the (PI)(,i)'s areobtained. This technique of estimation is extended to the Lucemodel of multiple comparisons.(,)
Show less  Date Issued
 1981, 1981
 Identifier
 AAI8125818, 3085061, FSDT3085061, fsu:74559
 Format
 Document (PDF)
 Title
 PARTIAL SEQUENTIAL TESTS FOR THE MEAN OF A NORMAL DISTRIBUTION.
 Creator

ARGHAMI, NASSER REZA., Florida State University
 Abstract/Description

Recently, Billard (1977) introduced a truncated partial sequential procedure for testing a null hypothesis about a normal mean with known variance against a twosided alternative hypothesis. That procedure had the disadvantage that a large number of observations is necessary if the null hypothesis is to be accepted. A new procedure is introduced which reduces the expected sample size for all mean values with considerable reductions for values near the null mean value. Theoretical operating...
Show moreRecently, Billard (1977) introduced a truncated partial sequential procedure for testing a null hypothesis about a normal mean with known variance against a twosided alternative hypothesis. That procedure had the disadvantage that a large number of observations is necessary if the null hypothesis is to be accepted. A new procedure is introduced which reduces the expected sample size for all mean values with considerable reductions for values near the null mean value. Theoretical operating characteristic and average sample number functions are derived, and the empirical distribution of the sample size in some special cases is obtained., For the case of unknown variance and a onesided alternative hypothesis, there are a number of tests, the best known of which are those of Wald (1947) and Barnard (1952). These tests have concerned themselves with tests for units of (mu)/(sigma). In this work, a partial sequential test procedure is introduced for hypotheses concerned only with (mu). An advantage of this new procedure is its relative simplicity and ease of execution when compared to the above tests. This is essentially due to the fact that in the present procedure the transformed observations follow a central tdistribution as distinct from the noncentral tdistribution. The difficulties caused by the noncentral distribution explain the relative lack of progress in obtaining the results about the properties, such as the operating characteristic and average sample number functions, of the tests of Barnard and Wald. The key element in the present procedure is that a number of observations is taken initially before any decision is made; subsequent observations are then taken in batches, the sizes of which depend on the estimate for the variance obtained from the initial set of observations. Some properties of the procedure are studied. In particular, an approximation to the theoretical operating characteristic function is derived and the sensitivity of the average sample number function to changes in some of the test parameters is investigated., The ideas developed for the partial sequential ttest are extended to develop tests of hypotheses concerning the parameters of a simple linear regression equation, general linear hypotheses and hypotheses about the mean of special cases of the multivariate normal.
Show less  Date Issued
 1981, 1981
 Identifier
 AAI8125865, 3085070, FSDT3085070, fsu:74568
 Format
 Document (PDF)
 Title
 ON DETERMINING THE NUMBER OF PREDICTORS IN A REGRESSION EQUATION USED FOR PREDICTION.
 Creator

CARR, MEG BRADY., Florida State University
 Abstract/Description

It is generally recognized that all the available variables should not necessarily be used as predictors in a linear regression equation. The problems which may arise from using too many predictors become especially acute in a regression equation used for prediction with independent data. In this case, the skill of prediction may actually deteriorate with increasing numbers of predictors. However, there is no definitive explanation as to why this should be so. There is also no universally...
Show moreIt is generally recognized that all the available variables should not necessarily be used as predictors in a linear regression equation. The problems which may arise from using too many predictors become especially acute in a regression equation used for prediction with independent data. In this case, the skill of prediction may actually deteriorate with increasing numbers of predictors. However, there is no definitive explanation as to why this should be so. There is also no universally accepted procedure for determining the number of predictors to use. The various regression methods which do exist are logically contrived but are also largely based on subjective considerations., The goal of this research is to develop and test a criterion that will indicate a priori the "optimum" number of predictors to use in a prediction equation. The mean square error statistic is used to evaluate the performance of a regression equation in both the dependent and independent samples. Selecting the "best" prediction equation consists of determining the equation with the minimum estimated independent sample mean square error. Several approximations and estimators of the independent sample mean square error which have appeared in the literature are discussed and two new estimators are derived., These approximations and estimators are tested in Monte Carlo simulations to determine their skill in indicating the number of predictors which will yield the best prediction equation. The sample size, number of available predictors, correlations among the variables, distribution of the variables, and selection method are manipulated to explore how these various factors influence the performances of the mean square error estimators. It is found that the better estimators are capable of indicating a number of predictors to include in the regression equation for which the corresponding independent sample mean square error is near the minimum value., As a practical test, the various estimators of the independent sample mean square error are applied to the data used in deriving the Model Output Statistics (MOS) maximum and minimum temperature forecast equations used by the National Weather Service. These prediction equations are linear regression equations derived using a forward selection method. The sequence of prediction equations corresponding to the forward trace of all the available predictors is derived for each of 192 cases and then applied to independent data. The forecasts made by the operational p = 10 predictor MOS equations are compared with those made by the equations determined by the estimators of the independent sample mean square error. The operational equations have the best overall verification statistics. The estimators persistently underestimate the values of the independent sample mean square error, but one of the new estimators is able to determine MOS forecast equations that perform as well as the operational equations. Furthermore, it is able to accomplish this without the use of an independent sample to help determine the optimum number of predictors.
Show less  Date Issued
 1980, 1980
 Identifier
 AAI8026121, 3084691, FSDT3084691, fsu:74192
 Format
 Document (PDF)
 Title
 TWOWAY CLUSTER ANALYSIS WITH NOMINAL DATA.
 Creator

COOPER, PAUL GAYLORD., Florida State University
 Abstract/Description

Consider an M by N data matrix X whose elements may assume values 0, 1, 2, . . ., H. Denote the rows of X by (alpha)(,1), (alpha)(,2), . . ., (alpha)(,M). A tree on the rows of X is a sequence of distinct partitions {P(,1)}(,i=1) such that: (a) P(,1) = {((alpha)(,1)), . . ., ((alpha)(,M))}, (b) P(,i) is a refinement of P(,i+1) for i = 1, . . ., k1, and (c) P(,k) = {((alpha)(,1), . . ., (alpha)(,M))}. The twoway clustering problem consists of simultaneously constructing trees on the rows,...
Show moreConsider an M by N data matrix X whose elements may assume values 0, 1, 2, . . ., H. Denote the rows of X by (alpha)(,1), (alpha)(,2), . . ., (alpha)(,M). A tree on the rows of X is a sequence of distinct partitions {P(,1)}(,i=1) such that: (a) P(,1) = {((alpha)(,1)), . . ., ((alpha)(,M))}, (b) P(,i) is a refinement of P(,i+1) for i = 1, . . ., k1, and (c) P(,k) = {((alpha)(,1), . . ., (alpha)(,M))}. The twoway clustering problem consists of simultaneously constructing trees on the rows, columns, and elements of X. A generalization of a twoway joining algorithm (TWJA) introduced by J. A. Hartigan (1975) is used to construct the three trees., The TWJA requires the definition of measures of dissimilarity between row clusters and column clusters respectively. Two approaches are used in the construction of these dissimilarity coefficientsone based on intuition and one based on a formal prediction model. For matrices with binary elements (0 or 1), measures of dissimilarity between row or column clusters are based on the number of mismatching pairs. Consider two distinct row clusters R(,p) and R(,q) containing m(,p) and m(,q) rows respectively. One measure of dissimilarity, d(,0)(R(,p), R(,q)), between R(,p) and R(,q), is, (DIAGRAM, TABLE OR GRAPHIC OMITTED...PLEASE SEE DAI), where b(,p(beta)) and b(,q(beta)) are the number of ones in column (beta) of clusters R(,p) and R(,q) respectively. Two additional intuitive dissimilarity coefficients are also defined and studied., For matrices containing nominal level data, dissimilarity coefficients are based on a formal prediction model. Analogous to the procedure of Cleveland and Relles (1974), for a given data matrix, the model consists of a scheme for random selection of two rows (or columns) from the matrix and an identification rule for distinguishing between the two rows (or columns). A loss structure is defined for both rows and columns and the expected loss due to incorrect row or column identification is computed. The dissimilarity between two (say) row clusters is then defined to be the increase in expected loss due to joining those two row clusters into a single cluster., Stopping criteria are suggested for both the intuitive and prediction model approaches. For the intuitive approach, it is suggested that joining be stopped when the dissimilarity between the (say) row clusters to be joined next exceeds that expected by chance under the assumption that the (say) column totals of the matrix are fixed. For the prediction model approach the stopping criterion is based on a cluster prediction model in which the objective is to distinguish between row or column clusters. A cluster identification rule is defined based on the information in the partitioned data matrix and the expected loss due to incorrect cluster identification is computed. The expected cluster loss is also computed when cluster identification is based on strict randomization. The relative decrease in expected cluster loss due to identification based on the partitioned matrix versus that based on randomization is suggested as a stopping criterion., Both contrived and real data examples are used to illustrate and compare the two clustering procedures. Computational aspects of the procedure are discussed and it is concluded that the intuitive approach is less costly in terms of computation time. Further, five admissibility properties are defined and, for certain intuitive dissimilarity coefficients, the trees produced by the TWJA are shown to possess three of the five properties.
Show less  Date Issued
 1980, 1980
 Identifier
 AAI8026123, 3084693, FSDT3084693, fsu:74194
 Format
 Document (PDF)
 Title
 ON NONPARAMETRIC ESTIMATION OF DENSITY AND REGRESSION FUNCTIONS.
 Creator

CHENG, PHILIP E., The Florida State University
 Abstract/Description

In the field of statistical estimation, nonparametric procedures have received increased attention for the past decade. In particular, various nonparametric estimates of probability density functions and regression curves have been extensively studied, with special attention to large sample pr
 Date Issued
 1980, 1980
 Identifier
 AAI8020329, 2989654, FSDT2989654, fsu:74161
 Format
 Document (PDF)
 Title
 STOCHASTIC VERSIONS OF REARRANGEMENT INEQUALITIES WITH APPLICATIONS TO STATISTICS.
 Creator

D'ABADIE, CATHERINE ANNE., Florida State University
 Abstract/Description

In this dissertation we develop a theory which offers a unified approach to the problem of obtaining stochastic versions of deterministic rearrangement inequalities., To develop the theory we first define two new classes of functions and establish preservation properties of these functions under various statistical and mathematical operations., Next we introduce the notion of stochastically similarly arranged (SSA) pairs of random vectors. We prove that if the random vectors (X,Y) are SSA and...
Show moreIn this dissertation we develop a theory which offers a unified approach to the problem of obtaining stochastic versions of deterministic rearrangement inequalities., To develop the theory we first define two new classes of functions and establish preservation properties of these functions under various statistical and mathematical operations., Next we introduce the notion of stochastically similarly arranged (SSA) pairs of random vectors. We prove that if the random vectors (X,Y) are SSA and the function f from R('n) x R('n) into R('n) is monotone with respect to a certain partial ordering on R('n) x R('n) then for every permutation (pi) the stochastic inequalities, (DIAGRAM, TABLE OR GRAPHIC OMITTED...PLEASE SEE DAI), hold. This result yields a unified way of obtaining stochastic versions of rearrangement inequalities., We then show that many multivariate densities of interest in statistical practice govern pairs of random vectors which are SSA., Next we show that under certain statistical operations on pairs of SSA random vectors the property of being SSA is preserved. For example, we show that the rank order of SSA random variables is SSA. We also show that the SSA property is preserved under certain contamination models., Finally, we show how the results we obtain can be applied to problems in hypothesis testing.
Show less  Date Issued
 1981, 1981
 Identifier
 AAI8205717, 3085181, FSDT3085181, fsu:74676
 Format
 Document (PDF)