Current Search: Research Repository (x) » * (x) » Statistics (x)
Search results
Pages
- Title
- PERCENTILE RESIDUAL LIFE FUNCTIONS -- PROPERTIES, TESTING AND ESTIMATION.
- Creator
-
JOE, HARRY SUE WAH., Florida State University
- Abstract/Description
-
Let F be a life distribution with survival function F(' )(TBOND)(' )1 - F. Conditional on survival to time t, the remaining life has survival function, F(,t)(x) = F(t + x)/F(t), x (GREATERTHEQ) 0, 0 (LESSTHEQ) t
Show moreLet F be a life distribution with survival function F(' )(TBOND)(' )1 - F. Conditional on survival to time t, the remaining life has survival function, F(,t)(x) = F(t + x)/F(t), x (GREATERTHEQ) 0, 0 (LESSTHEQ) t < F('-1)(1)., The mean residual life function of F is, (DIAGRAM, TABLE OR GRAPHIC OMITTED...PLEASE SEE DAI), if F has a finite mean. The (alpha)-percentile or quantile (0 < (alpha) < 1) residual life function of F is, q(,(alpha),F)(t) = F(,t)('-1)((alpha)) = F('-1)(1 - (alpha)F(t)) - t, 0 (LESSTHEQ) t < F('-1)(1),, where (alpha) = 1 - (alpha). Statisticians find it useful to categorize life distributions according to different aging properties. Categories which involve m(,F)(t) are the decreasing mean residual life (DMRL) class and the new better than used in expectation (NBUE) class. The DMRL class consists of distributions F such that m(,F)(t) is monotone decreasing on (0, F('-1)(1)) and the NBUE class consists of distributions F such that m(,F)(0) (GREATERTHEQ) m(,F)(t) for all 0 < t < F('-1)(1). Analogous categories which involve q(,(alpha),F)(t) are the decreasing (alpha)-percentile residual life (DPRL-(alpha)) class and the new better than used with respect to the (alpha)-percentile (NBUP-(alpha)) class., The mean residual life function is of interest in biometry, actuarial studies and reliability, and the DMRL and NBUE classes of life distributions are useful for modelling situations where items deteriorate with age. In the statistical literature, there are several papers which consider properties or estimation of the mean residual life function or consider testing situations involving the DMRL and NBUE classes. Only one previous paper discusses the (alpha)-percentile residual life function. This dissertation is concerned with properties and estimation of the (alpha)-percentile residual life function, and with testing problems involving the (alpha)-percentile residual life function., Properties of q(,(alpha),F)(t) and of the DPRL-(alpha), NBUP-(alpha) and their dual classes are studied in Chapter II. In Chapter III, tests are developed for testing exponentiality against alternatives of DPRL-(alpha) and NBUP-(alpha). In Chapter IV, these tests are extended to accommodate randomly censored data. In Chapter V, a distribution-free two-sample test is developed for testing the hypothesis that two life distributions F and G are equal against the alternative that q(,(alpha),F)(t) (GREATERTHEQ) q(,(alpha),G)(t) for all t. In Chapter VI, strong consistency, asymptotic normality, bias and mean squared error of the estimator F(,n)('-1)(1(' )-(' )(alpha)F(,n)(t)) - t of q(,(alpha),F)(t) are studied, where F(,n) is the empirical distribution function and F(,n)(' )(TBOND)(' )1 - F(,n).
Show less - Date Issued
- 1982, 1982
- Identifier
- AAI8214932, 3085276, FSDT3085276, fsu:74771
- Format
- Document (PDF)
- Title
- A NEW METHOD FOR ESTIMATING LIFE DISTRIBUTIONS FROM INCOMPLETE DATA.
- Creator
-
KITCHIN, JOHN FRANCIS., Florida State University
- Abstract/Description
-
We construct a new estimator for a continuous life distribution from incomplete data, the Piecewise Exponential Estimator (PEXE)., To date the principal method of nonparametric estimation from incomplete data is the Product-Limit Estimator (PLE) introduced by Kaplan and Meier {J. Amer. Statist. Assoc. (1958) 53}. Our formulation of the estimation problem posed by incomplete data is essentially that of Kaplan and Meier, but we approach its solution from the viewpoint of reliability and life...
Show moreWe construct a new estimator for a continuous life distribution from incomplete data, the Piecewise Exponential Estimator (PEXE)., To date the principal method of nonparametric estimation from incomplete data is the Product-Limit Estimator (PLE) introduced by Kaplan and Meier {J. Amer. Statist. Assoc. (1958) 53}. Our formulation of the estimation problem posed by incomplete data is essentially that of Kaplan and Meier, but we approach its solution from the viewpoint of reliability and life testing., In this work we establish rigorously the asymptotic (large sample) properties of the PEXE. Our results include the strong consistency of the PEXE under various sets of assumptions plus the weak convergence of the PEXE, suitably normalized, to a Gaussian process. From an intermediate result in our weak convergence proof we derive asymptotic confidence bands and a goodness-of-fit test based on the PEXE., Though our main objective is the introduction of a new estimator for incomplete data and the study of its asymptotic properties, our second contribution to this area of research is the extension of the asymptotic results of the extensively used PLE. In particular, our results extend the work of Peterson {J. Amer. Statist. Assoc. (1977) 72} and Langberg, Proschan, and Quinzi {Ann. Statist. (1980) 8} in strong consistency and that of Breslow and Crowley {Ann. Statist. (1974) 2} in weak convergence., Finally, we show that the New PEXE, as an alternative to the traditional PLE, has several advantages for estimating a continuous life distribution from incomplete data, along with some drawbacks. Since the two estimators are so alike asymptotically, we concentrate on differences in the PEXE and the PLE for estimation from small samples.
Show less - Date Issued
- 1980, 1980
- Identifier
- AAI8104261, 3084762, FSDT3084762, fsu:74263
- Format
- Document (PDF)
- Title
- A MATHEMATICAL STUDY OF THE DIRICHLET PROCESS.
- Creator
-
TIWARI, RAM CHANDRA., Florida State University
- Abstract/Description
-
This dissertation is a contribution to the theory of Bayesian nonparametrics. A construction of the Dirichlet process (Ferguson {1973}) on a finite set (chi) is introduced in such a way that it leads to the Blackwell's (1973) constructive definition of a Dirichlet process on a Borel space ((chi),A). If ((chi),A) is a Borel space and P is a random probability measure on ((chi),A) with a Dirichlet process prior D('(alpha)), then under the condition that the (alpha)-measure of every open subset...
Show moreThis dissertation is a contribution to the theory of Bayesian nonparametrics. A construction of the Dirichlet process (Ferguson {1973}) on a finite set (chi) is introduced in such a way that it leads to the Blackwell's (1973) constructive definition of a Dirichlet process on a Borel space ((chi),A). If ((chi),A) is a Borel space and P is a random probability measure on ((chi),A) with a Dirichlet process prior D('(alpha)), then under the condition that the (alpha)-measure of every open subset of (chi) is positive, for almost every realization P of P the set of discrete mass points of P is dense in (chi)., A more general constructive definition introduced by Sethuraman (1978) is used to derive several new properties of the Dirichlet process and to present in a unified way some of the known properties of the process. An alternative construction of Dalal's (1975) G-invariant Dirichlet process (G being a finite group of transformations) is presented., The Bayes estimates of an estimable parameter of degree k(k (GREATERTHEQ) 1), namely, (DIAGRAM, TABLE OR GRAPHIC OMITTED...PLEASE SEE DAI), where h is a symmetric kernel, are derived for the no sample size and for a sample of size n from P under the squared error loss function and a Dirichlet process prior. Using the result of the Bayes estimate of (psi)(,k)(P) for the no sample size the (marginal) distribution of a sample from P (when the prior for P is the Dirichlet process) is obtained. The extension to the case when the prior for P is G-invariant Dirichlet process is also obtained.(,), Let ((chi),A) be the one-dimensional Euclidean space (R(,1),B(,1)). Consider a sequence {D('(alpha)(,N)+(gamma))} of Dirichlet processes such that (alpha)(,N)((chi)) converges to zero as N tends to infinity, where (gamma) and (alpha)(,N)'s are finite measures on A. It is shown that D('(alpha)(,N)+(gamma)) converges weakly to D('(gamma)) in the topology of weak, convergence on P, the class of all probability measures on ((chi),A). As a corollary, it follows that D('(alpha)(,N)+nF(,n)) converges weakly to D('nF(,n)), where F(,n) is the empirical distribution of the sample. Suppose (alpha)(,N)((chi)) converges to zero and (alpha)(,N)/(alpha)(,N)((chi)) converges uniformly to (alpha)/(alpha)((chi)) as N tends to infinity. If, {D('(alpha)(,N))} is a sequence of Dirichlet process priors for a random probability measure P on ((chi),A), then P, in the limit, is a random probability measure concentrated on the set of degenerate probability measures on ((chi),A) and the point of degeneracy is distributed as (alpha)/(alpha)((chi)) on ((chi),A). To the sequence of priors (D('(alpha)(,N))) for P, there corresponds a sequence of the Bayes estimates of (psi)(,k)(P). The limit of this sequence of the Bayes estimates when (alpha)(,N)((chi)) converges to zero as N tends to infinity, called the limiting Bayes estimate of (psi)(,k)(P), is obtained., When P is a random probability measure on {0, 1}, Sethuraman (1978) proposed a more general class of conjugate priors for P which contains both the family of Dirichlet processes and the family of priors introduced by Dubins and Freedman (1966). As an illustration, a numerical example is considered and the Bayes estimates of the mean and the variance of P are computed under three distinct priors chosen from Sethuraman's class of priors. The computer algorithm for this calculation is presented.
Show less - Date Issued
- 1981, 1981
- Identifier
- AAI8108190, 3084828, FSDT3084828, fsu:74329
- Format
- Document (PDF)
- Title
- AN INVESTIGATION OF THE EFFECT OF THE SWAMPING PHENOMENON ON SEVERAL BLOCK PROCEDURES FOR MULTIPLE OUTLIERS IN UNIVARIATE SAMPLES.
- Creator
-
WOOLLEY, THOMAS WILLIAM, JR., Florida State University
- Abstract/Description
-
Statistical outliers have been an issue of concern to researchers for over two centuries, and are the focus of this study. Sources of outliers, and various means for dealing with them are discussed. Also presented are general descriptions of univariate outlier tests as well as the two approaches to handling multiple outlier situations, consecutive and block testing. The major problems inherent in these latter methods, masking and swamping, respectively, are recounted., Specifically, the...
Show moreStatistical outliers have been an issue of concern to researchers for over two centuries, and are the focus of this study. Sources of outliers, and various means for dealing with them are discussed. Also presented are general descriptions of univariate outlier tests as well as the two approaches to handling multiple outlier situations, consecutive and block testing. The major problems inherent in these latter methods, masking and swamping, respectively, are recounted., Specifically, the primary aim of this study is to assess the susceptibility to swamping of four block procedures for multiple outliers in univariate samples., Pseudo-random samples are generated from a unit normal distribution, and varying numbers of upper outliers are placed in them according to specified criteria. A swamping index is created which reflects the relative vulnerability of each test to declare a block of outliers and the most extreme upper non-outlier discordant, as a unit., The results of this investigation reveal that the four block tests disagree in their respective susceptibilities to swamping depending upon sample size and the prespecified number of outliers assumed to be present. Rank orderings of these four tests based upon their vulnerability to swamping under varying circumstances are presented. In addition, alternate approaches to calculating the swamping index when four or more outliers exist are described., Recommendations concerning the appropriate application of the four block procedures under differing situations, and proposals for further research, are advanced.
Show less - Date Issued
- 1981, 1981
- Identifier
- AAI8113272, 3084903, FSDT3084903, fsu:74401
- Format
- Document (PDF)
- Title
- ESTIMATION AND PREDICTION FOR EXPONENTIAL TIME SERIES MODELS.
- Creator
-
MOHAMED, FOUAD YEHIA., Florida State University
- Abstract/Description
-
This work is concerned with the study of stationary time series models in which the marginal distribution of the observations follows an exponential distribution. This is in contrast to the standard models in the literature where the error sequence and hence the marginal distributions of the o
- Date Issued
- 1981, 1981
- Identifier
- AAI8205698, 3085176, FSDT3085176, fsu:74671
- Format
- Document (PDF)
- Title
- Time-Varying Coefficient Models with ARMA-GARCH Structures for Longitudinal Data Analysis.
- Creator
-
Zhao, Haiyan, Niu, Xufeng, Huffer, Fred, Nolder, Craig, McGee, Dan, Department of Statistics, Florida State University
- Abstract/Description
-
The motivation of my research comes from the analysis of the Framingham Heart Study (FHS) data. The FHS is a long term prospective study of cardiovascular disease in the community of Framingham, Massachusetts. The study began in 1948 and 5,209 subjects were initially enrolled. Examinations were given biennially to the study participants and their status associated with the occurrence of disease was recorded. In this dissertation, the event we are interested in is the incidence of the coronary...
Show moreThe motivation of my research comes from the analysis of the Framingham Heart Study (FHS) data. The FHS is a long term prospective study of cardiovascular disease in the community of Framingham, Massachusetts. The study began in 1948 and 5,209 subjects were initially enrolled. Examinations were given biennially to the study participants and their status associated with the occurrence of disease was recorded. In this dissertation, the event we are interested in is the incidence of the coronary heart disease (CHD). Covariates considered include sex, age, cigarettes per day (CSM), serum cholesterol (SCL), systolic blood pressure (SBP) and body mass index (BMI, weight in kilograms/height in meters squared). Statistical literature review indicates that effects of the covariates on Cardiovascular disease or death caused by all possible diseases in the Framingham study change over time. For example, the effect of SCL on Cardiovascular disease decreases linearly over time. In this study, I would like to examine the time-varying effects of the risk factors on CHD incidence. Time-varying coefficient models with ARMA-GARCH structure are developed in this research. The maximum likelihood and the marginal likelihood methods are used to estimate the parameters in the proposed models. Since high-dimensional integrals are involved in the calculations of the marginal likelihood, the Laplace approximation is employed in this study. Simulation studies are conducted to evaluate the performance of these two estimation methods based on our proposed models. The Kullback-Leibler (KL) divergence and the root mean square error are employed in the simulation studies to compare the results obtained from different methods. Simulation results show that the marginal likelihood approach gives more accurate parameter estimates, but is more computationally intensive. Following the simulation study, our proposed models are applied to the Framingham Heart Study to investigate the time-varying effects of covariates with respect to CHD incidence. To specify the time-series structures of the effects of risk factors, the Bayesian Information Criterion (BIC) is used for model selection. Our study shows that the relationship between CHD and risk factors changes over time. For males, there is an obviously decreasing linear trend for age effect, which implies that the age effect on CHD is less significant for elder patients than younger patients. The effect of CSM stays almost the same in the first 30 years and decreases thereafter. There are slightly decreasing linear trends for both effects of SBP and BMI. Furthermore, the coefficients of SBP are mostly positive over time, i.e., patients with higher SBP are more likely developing CHD as expected. For females, there is also an obviously decreasing linear trend for age effect, while the effects of SBP and BMI on CHD are mostly positive and do not change too much over time.
Show less - Date Issued
- 2010
- Identifier
- FSU_migr_etd-0527
- Format
- Thesis
- Title
- TESTS OF DISPLACEMENT AND ORDERED MEAN HYPOTHESES.
- Creator
-
SINCLAIR, DENNIS FRANKLIN., Florida State University
- Abstract/Description
-
Character displacement is an ecological process by which, theoretically, co-existing species diverge in size to reduce competition. A closely allied concept is deletion, in which species are excluded from a habitat because they do not differ sufficiently from other species living there. Character displacement has been a controversial topic in recent years, largely due to a lack of statistical procedures for testing its existence. We propose herein a variety of approaches for testing...
Show moreCharacter displacement is an ecological process by which, theoretically, co-existing species diverge in size to reduce competition. A closely allied concept is deletion, in which species are excluded from a habitat because they do not differ sufficiently from other species living there. Character displacement has been a controversial topic in recent years, largely due to a lack of statistical procedures for testing its existence. We propose herein a variety of approaches for testing displacement and deletion hypotheses. The applicability of the methods extends beyond the motivating ecological problem to other fields., Consider the model, X(,ij) = (mu)(,i) + (epsilon)(,ij), i = 1, ..., k; j = 1, ..., n(,i),, where X(,ij) is the j('th) observation on species i with population mean (mu)(,i). The (epsilon)(,ij)'s are independent normally distributed error terms with mean zero and common variance., Traditionally ecologists have regarded species sizes as randomly distributed. We develop tests for displacement and deletion by considering uniform, lognormal and loguniform distributions for species sizes. (A random variable Y has a loguniform distribution if log Y has a uniform distribution.), Most claimed manifestations of character displacement concern the ratios of each species size to the next smallest one (contiguous ratios). All but one of the test statistics are functions of spacings (logarithms of contiguous ratios). We prove a useful characterization of distributions in terms of spacings, and show that the loguniform distribution produces constant expected contiguous ratios--an important property in character displacement studies. The random effects approaches generally lack power in detecting the suspected patterns., We develop further tests for the model in which the (mu)(,i)'s are regarded as fixed. This fixed effects approach, which may be more realistic ecologically, produces considerably more powerful tests. Displacement hypotheses in the fixed effects framework are expressed naturally in terms of the ordered means (mu)(,(1)) < (mu)(,(2)) < ... < (mu)(,(k)). We develop a general theory by which a particular class of linear hypotheses about any number of sets of ordered means may be tested., Finally a functional relation is used to model the movement of species means from one environment to another. Existing asymptotic tests are shown to perform remarkably well for small samples.
Show less - Date Issued
- 1982, 1982
- Identifier
- AAI8223194, 3085332, FSDT3085332, fsu:74827
- Format
- Document (PDF)
- Title
- SOME RESULTS ON THE DISTRIBUTION OF GRUBBS ESTIMATORS.
- Creator
-
BRINDLEY, DENNIS ALFRED., Florida State University
- Abstract/Description
-
This dissertation is concerned with the estimation of error variances in a non-replicated two-way classification and with inferences based on the estimators so derived. The postulated model used throughout the present work is, y(,ij) = (mu)(,i) + (beta)(,j) + (epsilon)(,ij),, where y(,ij) is the observation in the i('th) row and j('th) column, (mu)(,i) is the parameter representing the mean of the i('th) row, (beta)(,j) is the parameter representing the additional effect of the j('th) column,...
Show moreThis dissertation is concerned with the estimation of error variances in a non-replicated two-way classification and with inferences based on the estimators so derived. The postulated model used throughout the present work is, y(,ij) = (mu)(,i) + (beta)(,j) + (epsilon)(,ij),, where y(,ij) is the observation in the i('th) row and j('th) column, (mu)(,i) is the parameter representing the mean of the i('th) row, (beta)(,j) is the parameter representing the additional effect of the j('th) column,, (DIAGRAM, TABLE OR GRAPHIC OMITTED...PLEASE SEE DAI), and the (epsilon)(,ij) are independent, zero-mean, normal variates with, (DIAGRAM, TABLE OR GRAPHIC OMITTED...PLEASE SEE DAI), A set of unbiased estimates, (DIAGRAM, TABLE OR GRAPHIC OMITTED...PLEASE SEE DAI), developed in earlier work by Grubbs (J. AMER. STATIST. ASSOC. 43 (1948), 243-264), Ehrenberg (BIOMETRIKA 37 (1950), 347-357) and Russell and Bradley (BIOMETRIKA 45 (1958), 111-129) are considered., The exact joint density of Q(,1), ..., Q(,r) is obtained for r = 3 and two exact results are derived for testing the null hypothesis,, (DIAGRAM, TABLE OR GRAPHIC OMITTED...PLEASE SEE DAI), unknown, versus the two specific alternatives,, (DIAGRAM, TABLE OR GRAPHIC OMITTED...PLEASE SEE DAI), for at least some j, j = 1, 2, 3, and,, (DIAGRAM, TABLE OR GRAPHIC OMITTED...PLEASE SEE DAI)
Show less - Date Issued
- 1982, 1982
- Identifier
- AAI8229146, 3085401, FSDT3085401, fsu:74896
- Format
- Document (PDF)
- Title
- LARGE DEVIATION LOCAL LIMIT THEOREMS, WITH APPLICATIONS.
- Creator
-
CHAGANTY, NARASINGA RAO., Florida State University
- Abstract/Description
-
Let {X(,n), n (GREATERTHEQ) 1} be a sequence of i.i.d. random variables withE(X(,1)) = 0, Var(X(,1)) = 1. Let (psi)(s) be the cumulant generating function (c.g.f.) and, (DIAGRAM, TABLE OR GRAPHIC OMITTED...PLEASE SEE DAI), be the large deviation rate of X(,1). Let S(,n) = X(,1) + ... + X(,n). Under some mild conditions on (psi), Richter (Theory Prob. Appl. (1957) 2, 206-219) showed that the probability density function f(,n) of(' )S(,n)/SQRT.(n has the asymptotic expression, (DIAGRAM, TABLE...
Show moreLet {X(,n), n (GREATERTHEQ) 1} be a sequence of i.i.d. random variables withE(X(,1)) = 0, Var(X(,1)) = 1. Let (psi)(s) be the cumulant generating function (c.g.f.) and, (DIAGRAM, TABLE OR GRAPHIC OMITTED...PLEASE SEE DAI), be the large deviation rate of X(,1). Let S(,n) = X(,1) + ... + X(,n). Under some mild conditions on (psi), Richter (Theory Prob. Appl. (1957) 2, 206-219) showed that the probability density function f(,n) of(' )S(,n)/SQRT.(n has the asymptotic expression, (DIAGRAM, TABLE OR GRAPHIC OMITTED...PLEASE SEE DAI), whenever x(,n) = o(SQRT.(n) and SQRT.(n x(,n) > 1. In this dissertation we obtain similar large deviation local limit theorems for arbitrary sequences of random variables, not necessarily sums of i.i.d. random variables, thereby increasing the applicability of Richter's theorem. Let {T(,n), n (GREATERTHEQ) 1} be an arbitrary sequence of non-lattice random variables with characteristic function (c.f.) (phi)(,n). Let (psi)(,n), (gamma)(,n) be the c.g.f. and the large deviation rate of T(,n)/n. The main theorem in Chapter II shows that under some standard conditions on (psi)(,n), which imply that T(,n)/n converges to a constant in probability, the density function K(,n) of T(,n)/n has the asymptotic expression, (DIAGRAM, TABLE OR GRAPHIC OMITTED...PLEASE SEE DAI), where m(,n) is any sequence of real numbers and (tau)(,n) is defined by(psi)(,n)'((tau)(,n)) = m(,n). When T(,n) is the sum of n i.i.d. random variables our result reduces to Richter's theorem. Similar theorems for lattice valued random variables are also presented which are useful in obtaining asymptotic probabilities for Wilcoxon signed-rank test statistic and Kendall's tau., In Chapter III we use the results of Chapter II to obtain central limit theorem for sums of a triangular array of dependent random variables X(,j)('(n)), j = 1, ..., n with joint distribution given by z(,n)('-1)exp{-H(,n)(x(,1), ..., x(,n))}(PI)dP(x(,j)), where x(,i) (ELEM) R (FOR ALL) i (GREATERTHEQ) 1. The function H(,n)(x(,1), ..., x(,n)) is known as the Hamiltonian. Here P is a probability measure on R. When H(,n)(x(,1), ..., x(,n)) = -log (phi)(,n)(s(,n)/n), where s(,n) = x(,1) + ... + x(,n) and the probability measure P satisfies appropriate conditions, we show that there exists an integer r (GREATERTHEQ) 1 and a sequence (tau)(,n) such that (S(,n) - n(tau)(,n))/n('1- 1/2r) has a limiting distribution which is non-Gaussian if r (GREATERTHEQ) 2. This result generalizes the theorems of Jong-Woo Jeon (Ph.D. Thesis, Dept. of Stat., F.S.U. (1979)) and Ellis and Newman (Z. Wahrscheinlichkeitstheorie und Verw. Gebiete. (1978) 44, 117-139). Chapters IV and V extend the above to the multivariate case.
Show less - Date Issued
- 1982, 1982
- Identifier
- AAI8225279, 3085419, FSDT3085419, fsu:74914
- Format
- Document (PDF)
- Title
- A Comparison of Estimators in Hierarchical Linear Modeling: Restricted Maximum Likelihood versus Bootstrap via Minimum Norm Quadratic Unbiased Estimators.
- Creator
-
Delpish, Ayesha Nneka, Niu, Xu-Feng, Tate, Richard L., Huffer, Fred W., Zahn, Douglas, Department of Statistics, Florida State University
- Abstract/Description
-
The purpose of the study was to investigate the relative performance of two estimation procedures, the restricted maximum likelihood (REML) and the bootstrap via MINQUE, for a two-level hierarchical linear model under a variety of conditions. Specific focus lay on observing whether the bootstrap via MINQUE procedure offered improved accuracy in the estimation of the model parameters and their standard errors in situations where normality may not be guaranteed. Through Monte Carlo simulations,...
Show moreThe purpose of the study was to investigate the relative performance of two estimation procedures, the restricted maximum likelihood (REML) and the bootstrap via MINQUE, for a two-level hierarchical linear model under a variety of conditions. Specific focus lay on observing whether the bootstrap via MINQUE procedure offered improved accuracy in the estimation of the model parameters and their standard errors in situations where normality may not be guaranteed. Through Monte Carlo simulations, the importance of this assumption for the accuracy of multilevel parameter estimates and their standard errors was assessed using the accuracy index of relative bias and by observing the coverage percentages of 95% confidence intervals constructed for both estimation procedures. The study systematically varied the number of groups at level-2 (30 versus 100), the size of the intraclass correlation (0.01 versus 0.20) and the distribution of the observations (normal versus chi-squared with 1 degree of freedom). The number of groups and intraclass correlation factors produced effects consistent with those previously reported—as the number of groups increased, the bias in the parameter estimates decreased, with a more significant effect observed for those estimates obtained via REML. High levels of the intraclass correlation also led to a decrease in the efficiency of parameter estimation under both methods. Study results show that while both the restricted maximum likelihood and the bootstrap via MINQUE estimates of the fixed effects were accurate, the efficiency of the estimates was affected by the distribution of errors with the bootstrap via MINQUE procedure outperforming the REML. Both procedures produced less efficient estimators under the chi-squared distribution, particularly for the variance-covariance component estimates.
Show less - Date Issued
- 2006
- Identifier
- FSU_migr_etd-0771
- Format
- Thesis
- Title
- Estimation from Data Representing a Sample of Curves.
- Creator
-
Auguste, Anna L., Bunea, Florentina, Mason, Patrick, Hollander, Myles, Huffer, Fred, Department of Statistics, Florida State University
- Abstract/Description
-
This dissertation introduces and assesses an algorithm to generate confidence bands for a regression function or a main effect when multiple data sets are available. In particular it proposes to construct confidence bands for different trajectories and then aggregate these to produce an overall confidence band for a mean function. An estimator of the regression function or main effect is also examined. First, nonparametric estimators and confidence bands are formed on each data set separately...
Show moreThis dissertation introduces and assesses an algorithm to generate confidence bands for a regression function or a main effect when multiple data sets are available. In particular it proposes to construct confidence bands for different trajectories and then aggregate these to produce an overall confidence band for a mean function. An estimator of the regression function or main effect is also examined. First, nonparametric estimators and confidence bands are formed on each data set separately. Then each data set is in turn treated as a testing set for aggregating the preliminary results from the remaining data sets. The criterion used for this aggregation is either the least squares (LS) criterion or a BIC type penalized LS criterion. The proposed estimator is the average over data sets of these aggregates. It is thus a weighted sum of the preliminary estimators. The proposed confidence band is the minimum L1 band of all the M aggregate bands when we only have a main effect. In the case where there is some random effect we suggest an adjustment to the confidence band. In this case, the proposed confidence band is the minimum L1 band of all the M adjusted aggregate bands. Desirable asymptotic properties are shown to hold. A simulation study examines the performance of each technique relative to several alternate methods and theoretical benchmarks. An application to seismic data is conducted.
Show less - Date Issued
- 2006
- Identifier
- FSU_migr_etd-0286
- Format
- Thesis
- Title
- Modelling experimental data analysis.
- Creator
-
Ford, Charles Wesley, Jr., Florida State University
- Abstract/Description
-
An important goal of research in scientific databases is to provide a capability for managing the acquisition and analysis of data and to assist in the development and evaluation of data analysis functions. This dissertation is an important step toward this goal and represents an extension and application of object-oriented database technology. It uses the object-oriented approach to provide a model which describes a very general strategy for data analysis. This model is used to precisely...
Show moreAn important goal of research in scientific databases is to provide a capability for managing the acquisition and analysis of data and to assist in the development and evaluation of data analysis functions. This dissertation is an important step toward this goal and represents an extension and application of object-oriented database technology. It uses the object-oriented approach to provide a model which describes a very general strategy for data analysis. This model is used to precisely define data acquisition and data analysis in a style which is well suited to management by a database system. The model has been implemented and evaluated in the context of a very complex experimental physics project. The implementation is described in detail with examples of the database schema and operations using the C${++}$ binding of ODMG-93. As a result of the careful application of object-oriented methods, the management of data acquisition and analysis has become feasible for domain scientists. The application of the methods described herein to specific problems in experimental physics has resulted in a database which will be used to manage all of the data acquisition and data analysis for the Large Acceptance Spectrometer (CLAS) at the Continuous Electron Beam Accelerator Facility (CEBAF), a U.S. Department of Energy project.
Show less - Date Issued
- 1995, 1995
- Identifier
- AAI9529599, 3088659, FSDT3088659, fsu:77461
- Format
- Document (PDF)
- Title
- On a general repair model for repairable systems.
- Creator
-
Dorado, Crisanto Ayap., Florida State University
- Abstract/Description
-
The minimal repair process assumes that upon repair a system is restored to its functioning condition just before failure. For systems with few vulnerable components it is more reasonable to assume that repair actually brings the state of the system to a level that is between "completely new" and "prior to failure". Kijima (1989) introduced models for such a repair process based on the notion of age reduction. Under age reduction, the system, upon repair, is functionally the same as an...
Show moreThe minimal repair process assumes that upon repair a system is restored to its functioning condition just before failure. For systems with few vulnerable components it is more reasonable to assume that repair actually brings the state of the system to a level that is between "completely new" and "prior to failure". Kijima (1989) introduced models for such a repair process based on the notion of age reduction. Under age reduction, the system, upon repair, is functionally the same as an identical system of lesser age. An alternative to age reduction is the notion of extra life. Under this notion, the system, upon repair, enjoys a longer expected remaining life than it would have had under a minimal repair., In this dissertation, we introduce a repair model that generalizes Kijima's models so as to include both the notions of age reduction and extra life. We then look at the problem of estimating system reliability based on observations of the repair process from several systems working independently. We make use of counting processes and martingales to derive large sample properties of the estimator.
Show less - Date Issued
- 1995, 1995
- Identifier
- AAI9540050, 3088702, FSDT3088702, fsu:77504
- Format
- Document (PDF)
- Title
- PART 1 - THE LIMITING DISTRIBUTION OF THE LIKELIHOOD RATIO STATISTIC 2 LOG(LAMBDA(N)) UNDER A CLASS OF LOCAL ALTERNATIVES. PART 2 - MINIMUM AVERAGE RISK DECISION PROCEDURES FOR THE NONCENTRAL CHI-SQUARE DISTRIBUTION.
- Creator
-
LEVER, WILLIAM EDWIN., The Florida State University
- Date Issued
- 1968, 1968
- Identifier
- AAI6811680, 2985783, FSDT2985783, fsu:70292
- Format
- Document (PDF)
- Title
- TESTING WHETHER NEW IS BETTER THAN USED OF A SPECIFIED AGE.
- Creator
-
PARK, DONG HO., Florida State University
- Abstract/Description
-
This research contributes to the theory and methods of testing hypotheses for classes of life distributions. Two classes of life distributions considered in this dissertation are: (1) The New Better Than Used (NBU) Class: The life distribution F is NBU if F(x+y)(' )(LESSTHEQ)(' )F(x)F(y) for all x, y (GREATERTHEQ) 0, where F(' )(TBOND)(' )1 - F. (2) The New Better Than Used at t(,0) (NBU-t(,0)) Class: The life distribution F is NBU-t(,0) if F(x+t(,0))(' )(LESSTHEQ)(' )F(x)F(t(,0)) for all x ...
Show moreThis research contributes to the theory and methods of testing hypotheses for classes of life distributions. Two classes of life distributions considered in this dissertation are: (1) The New Better Than Used (NBU) Class: The life distribution F is NBU if F(x+y)(' )(LESSTHEQ)(' )F(x)F(y) for all x, y (GREATERTHEQ) 0, where F(' )(TBOND)(' )1 - F. (2) The New Better Than Used at t(,0) (NBU-t(,0)) Class: The life distribution F is NBU-t(,0) if F(x+t(,0))(' )(LESSTHEQ)(' )F(x)F(t(,0)) for all x (GREATERTHEQ) 0., The NBU and NBU-t(,0) classes have dual classes (New Worse Than Used and New Worse Than Used At t(,0), respectively) defined by reversing the inequality., The NBU-t(,0) class is a new class of life distributions and contains the NBU class. We study the basic properties of the NBU-t(,0) class and propose a test of H(,0): F(x+t(,0))(' )=(' )F(x)F(t(,0)) for all x (GREATERTHEQ) 0, versus H(,A): F(x+t(,0))(' )(LESSTHEQ)(' )F(x)F(t(,0)) for all x (GREATERTHEQ) 0 and the inequality holds for some x (GREATERTHEQ) 0, based on a complete random sample X(,1), ..., X(,n) from F. Our test can also be used to test H(,0) against the NWU-t(,0) alternatives. Asymptotic relative efficiencies of our test with respect to the Hollander and Proschan (1972, Ann. Math. Statist. 43, 1136-1146) NBU test are calculated for several distributions., We extend our test of H(,0) versus H(,A) to accommodate randomly censored data. For the censored data situation our test is based on the statistic, (DIAGRAM, TABLE OR GRAPHIC OMITTED...PLEASE SEE DAI), where F is the Kaplan-Meier (1958, J. Amer. Statist. Assoc. 53, 457-481) estimator of(' )F. Under mild regularity conditions on the amount of censoring, a consistent test of H(,0) versus H(,A) for the randomly censored model is obtained., In Chapter III we develop a two-sample NBU test of the null hypothesis that two distributions F and G are equal, versus the alternative that F is "more NBU" than is G. Our test is based on the statistic, (DIAGRAM, TABLE OR GRAPHIC OMITTED...PLEASE SEE DAI), where m and n are the sample sizes from F and G, and F(,m) and G(,n) are the empirical distributions of F and G. Asymptotic normality of T(,m,n), suitably normalized, is a direct consequence of Hoeffding's (1948, Ann. Math. Statist. 19, 293-325) U-statistic theorem. Then, using a consistent estimator of the null asymptotic variance of N(' 1/2)T(,m,n), where N = m + n, we obtain an asymptotically distribution-free test. We extend the two-sample NBU test to the k-sample case., Our test of H(,0) versus H(,A) utilizes the Kaplan-Meier estimator. However, there are other possible estimators of the survival function for the randomly censored model. . . . (Author's abstract exceeds stipulated maximum length. Discontinued here with permission of author.) UMI
Show less - Date Issued
- 1982, 1982
- Identifier
- AAI8301540, 3085466, FSDT3085466, fsu:74958
- Format
- Document (PDF)
- Title
- AN INCREASING FAILURE RATE APPROACH TO CONSERVATIVE LOW DOSE EXTRAPOLATION (SAFE DOSE).
- Creator
-
SCHELL, MICHAEL J., Florida State University
- Abstract/Description
-
This dissertation provides a new method of treating the conservative low dose extrapolation problem. One wishes to determine the largest dose d, called the "safe" dose, for which P(F(d) (LESSTHEQ) r) (GREATERTHEQ) 1 - (eta) where F(d) is the proportion of failures, say cancers induced, at dose d by time T. F is a life distribution function, presumed to come from some class of functions F, T is prespecified, r () {0,1}, denotes the proportion of failures at doses (x,y) by fixed time T. Four...
Show moreThis dissertation provides a new method of treating the conservative low dose extrapolation problem. One wishes to determine the largest dose d, called the "safe" dose, for which P(F(d) (LESSTHEQ) r) (GREATERTHEQ) 1 - (eta) where F(d) is the proportion of failures, say cancers induced, at dose d by time T. F is a life distribution function, presumed to come from some class of functions F, T is prespecified, r () {0,1}, denotes the proportion of failures at doses (x,y) by fixed time T. Four extensions of the univariate class of IFR functions are introduced, differing in the way that convexity of the hazard function, H(x,y) = -ln(1-F(x,y)) is posited. The notion of dependent action is considered and a hypothesis test for its existence given., Conservative low dose extrapolation techniques for the two most prominent classes are given. An upper bound for the hazard function is established for low doses with proofs that the bounds are sharp.
Show less - Date Issued
- 1984, 1984
- Identifier
- AAI8427325, 3085936, FSDT3085936, fsu:75422
- Format
- Document (PDF)
- Title
- TESTING WHETHER MEAN RESIDUAL LIFE CHANGES TREND.
- Creator
-
GUESS, FRANK MITCHELL., Florida State University
- Abstract/Description
-
Given that an item is of age t, the expected value of the random remaining life is called the mean residual life (MRL) at age t. We propose two new nonparametric classes of life distributions for modeling aging based on MRL. The first class of life distributions consists of those with "increasing initially, then decreasing mean residual life" (IDMRL). The IDMRL class models aging that is initially beneficial, then adverse. The second class, "decreasing, then increasing mean residual life" ...
Show moreGiven that an item is of age t, the expected value of the random remaining life is called the mean residual life (MRL) at age t. We propose two new nonparametric classes of life distributions for modeling aging based on MRL. The first class of life distributions consists of those with "increasing initially, then decreasing mean residual life" (IDMRL). The IDMRL class models aging that is initially beneficial, then adverse. The second class, "decreasing, then increasing mean residual life" (DIMRL), models aging that is initially adverse, then beneficial. We present situations where IDMRL (DIMRL) distributions are useful models. We propose two testing procedures for H(,0): constant MRL (i.e., exponentiality) versus H(,1): IDMRL, but not constant MRL (or H(,1)(''): DIMRL, but not constant MRL). The first testing procedure assumes the turning point, (tau), from IMRL to DMRL is specified by the user or is known. Our IDMRL((tau)) test statistic, T(,n), is a differentiable statistical function of order 1; thus, T(,n), suitably standardized is asymptotically normal. The second procedure assumes knowledge of the proportion, (rho), of the population that "dies" at or before the turning point (knowledge of (tau) itself is not assumed). We use L-statistic theory to show our IDMRL((rho)) test statistic, V(,n)('*), appropriately standardized is asymptotically normal. The exact null distribution of V(,n)('*) is established. For each of these procedures an application is given. After this we modify the complete data tests to yield analogous censored data procedures. The standard Kaplan-Meier Estimator is a key tool that we exploit for our censored data tests. A limited Monte Carlo study investigates the censored data procedures.
Show less - Date Issued
- 1984, 1984
- Identifier
- AAI8428699, 3085942, FSDT3085942, fsu:75428
- Format
- Document (PDF)
- Title
- ON SEQUENTIAL UNBIASED AND BAYES-TYPE ESTIMATES OF PARAMETERS IN A CONTINGENCY TABLE.
- Creator
-
CHEN, CHENG-CHUNG., Florida State University
- Abstract/Description
-
Estimation of the probability parameters in a contingency table with linear and/or log-linear constraints on the parameters is the principal concern of this thesis. Sequential unbiased estimates of the cell probabilities as well as some Bayes posterior mean type estimates are considered., Chapter I is a review of some earlier work on the sequential unbiased estimation of the probability parameter in a Bernoulli process. The review begins with the classical work of Girshick, Mosteller and...
Show moreEstimation of the probability parameters in a contingency table with linear and/or log-linear constraints on the parameters is the principal concern of this thesis. Sequential unbiased estimates of the cell probabilities as well as some Bayes posterior mean type estimates are considered., Chapter I is a review of some earlier work on the sequential unbiased estimation of the probability parameter in a Bernoulli process. The review begins with the classical work of Girshick, Mosteller and Savage (1946) and some follow-up studies like Wolfowitz (1946), Savage (1947), Blackwell (1947), Lehmann and Stein (1950), Degroot (1959) and Kagan, Linnik and Rao (1973). In several cases the original proofs have been simplified and the arguments streamlined., Chapter II deals with the problem of sequential unbiased estimation of the parameters in a contingency table with linear and/or log-linear constraints. Multinomial Girschick, Mosteller and Savage (GMS) type stopping rules are discussed and the corresponding unbiased estimates based on the minimal sufficient statistic described. Consistency, in the sence of Wolfowitz (1947), of such estimates is demonstrated. Unbiased estimates of parametric functions like log-contrasts are derived. Sufficient conditions for the completeness of the GMS-type stopping rules are given., In Chapter III, the problem of sequential unbiased estimation of the probability parameters in the Bradley-Terry (1952) model of paired comparisons is studied.g The Bradley-Terry model can be summarized as follows. Suppose that there are t treatments T(,1), ..., T(,t) that can be pairwise compared. The Bradley-Terry model postulates that associated with treatement T(,i) is a :strenth" parameter (PI)(,i) > 0, i = 1, ..., t, such that if treatments T(,i) and T(,j) are compared, the probability that T(,i) is preferred to T(,j) is (theta)(,ij) = (PI)(,i)/((PI)(,i) + (PI)(,j)). The model imposes log-linear constraints on he (theta)(,ij)'s so that techniques similar to those in Chapter II may be used to obtain unbiased estimates, based on a sufficient statistic., In Chapter IV, two Bayes-type procedures for estimating the multnomial cell probability vector p, in he presence of linear constraints on the parameters, are proposed and illustrated with examples. A general prior is used with the restriction that the moment generating function of the prior exists in a closed form. The estimators are shown to be strongly consistent. Estimation under log-linear constraints is also considered. Finally, Bayes-type estimators for the covariance matrix of the cell frequencies are presented for some special cases of linearly and log-linearly constrained problems., Chapter V is concerned with a Bayesian approach to the estimation of parameters in the Bradley-Terry model of paired comparisons. It is assumed that the sum of the treatment parameters (PI)(,i) is 1, and a Dirichlet prior for (PI) = ((PI)(,1), ..., (PI)(,t)) is used. Using the induced prior of (theta)(,ij) and Z(,ij) = (PI)(,i) + (PI)(,j), an estimate (PI)(,ij) of (PI)(,i), based on the data arising from the comparisons of(' ) treatments T(,i) and T(,j), is obtained. An estimate of (PI)(,i) based on all the data is a weighted combination of the (PI)(,ij)'s that minimizes a(' ) risk function. Similarly, estimates for log-contrasts of the (PI)(,i)'s areobtained. This technique of estimation is extended to the Luce-model of multiple comparisons.(,)
Show less - Date Issued
- 1981, 1981
- Identifier
- AAI8125818, 3085061, FSDT3085061, fsu:74559
- Format
- Document (PDF)
- Title
- PARTIAL SEQUENTIAL TESTS FOR THE MEAN OF A NORMAL DISTRIBUTION.
- Creator
-
ARGHAMI, NASSER REZA., Florida State University
- Abstract/Description
-
Recently, Billard (1977) introduced a truncated partial sequential procedure for testing a null hypothesis about a normal mean with known variance against a two-sided alternative hypothesis. That procedure had the disadvantage that a large number of observations is necessary if the null hypothesis is to be accepted. A new procedure is introduced which reduces the expected sample size for all mean values with considerable reductions for values near the null mean value. Theoretical operating...
Show moreRecently, Billard (1977) introduced a truncated partial sequential procedure for testing a null hypothesis about a normal mean with known variance against a two-sided alternative hypothesis. That procedure had the disadvantage that a large number of observations is necessary if the null hypothesis is to be accepted. A new procedure is introduced which reduces the expected sample size for all mean values with considerable reductions for values near the null mean value. Theoretical operating characteristic and average sample number functions are derived, and the empirical distribution of the sample size in some special cases is obtained., For the case of unknown variance and a one-sided alternative hypothesis, there are a number of tests, the best known of which are those of Wald (1947) and Barnard (1952). These tests have concerned themselves with tests for units of (mu)/(sigma). In this work, a partial sequential test procedure is introduced for hypotheses concerned only with (mu). An advantage of this new procedure is its relative simplicity and ease of execution when compared to the above tests. This is essentially due to the fact that in the present procedure the transformed observations follow a central t-distribution as distinct from the noncentral t-distribution. The difficulties caused by the noncentral distribution explain the relative lack of progress in obtaining the results about the properties, such as the operating characteristic and average sample number functions, of the tests of Barnard and Wald. The key element in the present procedure is that a number of observations is taken initially before any decision is made; subsequent observations are then taken in batches, the sizes of which depend on the estimate for the variance obtained from the initial set of observations. Some properties of the procedure are studied. In particular, an approximation to the theoretical operating characteristic function is derived and the sensitivity of the average sample number function to changes in some of the test parameters is investigated., The ideas developed for the partial sequential t-test are extended to develop tests of hypotheses concerning the parameters of a simple linear regression equation, general linear hypotheses and hypotheses about the mean of special cases of the multivariate normal.
Show less - Date Issued
- 1981, 1981
- Identifier
- AAI8125865, 3085070, FSDT3085070, fsu:74568
- Format
- Document (PDF)
- Title
- ON DETERMINING THE NUMBER OF PREDICTORS IN A REGRESSION EQUATION USED FOR PREDICTION.
- Creator
-
CARR, MEG BRADY., Florida State University
- Abstract/Description
-
It is generally recognized that all the available variables should not necessarily be used as predictors in a linear regression equation. The problems which may arise from using too many predictors become especially acute in a regression equation used for prediction with independent data. In this case, the skill of prediction may actually deteriorate with increasing numbers of predictors. However, there is no definitive explanation as to why this should be so. There is also no universally...
Show moreIt is generally recognized that all the available variables should not necessarily be used as predictors in a linear regression equation. The problems which may arise from using too many predictors become especially acute in a regression equation used for prediction with independent data. In this case, the skill of prediction may actually deteriorate with increasing numbers of predictors. However, there is no definitive explanation as to why this should be so. There is also no universally accepted procedure for determining the number of predictors to use. The various regression methods which do exist are logically contrived but are also largely based on subjective considerations., The goal of this research is to develop and test a criterion that will indicate a priori the "optimum" number of predictors to use in a prediction equation. The mean square error statistic is used to evaluate the performance of a regression equation in both the dependent and independent samples. Selecting the "best" prediction equation consists of determining the equation with the minimum estimated independent sample mean square error. Several approximations and estimators of the independent sample mean square error which have appeared in the literature are discussed and two new estimators are derived., These approximations and estimators are tested in Monte Carlo simulations to determine their skill in indicating the number of predictors which will yield the best prediction equation. The sample size, number of available predictors, correlations among the variables, distribution of the variables, and selection method are manipulated to explore how these various factors influence the performances of the mean square error estimators. It is found that the better estimators are capable of indicating a number of predictors to include in the regression equation for which the corresponding independent sample mean square error is near the minimum value., As a practical test, the various estimators of the independent sample mean square error are applied to the data used in deriving the Model Output Statistics (MOS) maximum and minimum temperature forecast equations used by the National Weather Service. These prediction equations are linear regression equations derived using a forward selection method. The sequence of prediction equations corresponding to the forward trace of all the available predictors is derived for each of 192 cases and then applied to independent data. The forecasts made by the operational p = 10 predictor MOS equations are compared with those made by the equations determined by the estimators of the independent sample mean square error. The operational equations have the best overall verification statistics. The estimators persistently underestimate the values of the independent sample mean square error, but one of the new estimators is able to determine MOS forecast equations that perform as well as the operational equations. Furthermore, it is able to accomplish this without the use of an independent sample to help determine the optimum number of predictors.
Show less - Date Issued
- 1980, 1980
- Identifier
- AAI8026121, 3084691, FSDT3084691, fsu:74192
- Format
- Document (PDF)
- Title
- TWO-WAY CLUSTER ANALYSIS WITH NOMINAL DATA.
- Creator
-
COOPER, PAUL GAYLORD., Florida State University
- Abstract/Description
-
Consider an M by N data matrix X whose elements may assume values 0, 1, 2, . . ., H. Denote the rows of X by (alpha)(,1), (alpha)(,2), . . ., (alpha)(,M). A tree on the rows of X is a sequence of distinct partitions {P(,1)}(,i=1) such that: (a) P(,1) = {((alpha)(,1)), . . ., ((alpha)(,M))}, (b) P(,i) is a refinement of P(,i+1) for i = 1, . . ., k-1, and (c) P(,k) = {((alpha)(,1), . . ., (alpha)(,M))}. The two-way clustering problem consists of simultaneously constructing trees on the rows,...
Show moreConsider an M by N data matrix X whose elements may assume values 0, 1, 2, . . ., H. Denote the rows of X by (alpha)(,1), (alpha)(,2), . . ., (alpha)(,M). A tree on the rows of X is a sequence of distinct partitions {P(,1)}(,i=1) such that: (a) P(,1) = {((alpha)(,1)), . . ., ((alpha)(,M))}, (b) P(,i) is a refinement of P(,i+1) for i = 1, . . ., k-1, and (c) P(,k) = {((alpha)(,1), . . ., (alpha)(,M))}. The two-way clustering problem consists of simultaneously constructing trees on the rows, columns, and elements of X. A generalization of a two-way joining algorithm (TWJA) introduced by J. A. Hartigan (1975) is used to construct the three trees., The TWJA requires the definition of measures of dissimilarity between row clusters and column clusters respectively. Two approaches are used in the construction of these dissimilarity coefficients--one based on intuition and one based on a formal prediction model. For matrices with binary elements (0 or 1), measures of dissimilarity between row or column clusters are based on the number of mismatching pairs. Consider two distinct row clusters R(,p) and R(,q) containing m(,p) and m(,q) rows respectively. One measure of dissimilarity, d(,0)(R(,p), R(,q)), between R(,p) and R(,q), is, (DIAGRAM, TABLE OR GRAPHIC OMITTED...PLEASE SEE DAI), where b(,p(beta)) and b(,q(beta)) are the number of ones in column (beta) of clusters R(,p) and R(,q) respectively. Two additional intuitive dissimilarity coefficients are also defined and studied., For matrices containing nominal level data, dissimilarity coefficients are based on a formal prediction model. Analogous to the procedure of Cleveland and Relles (1974), for a given data matrix, the model consists of a scheme for random selection of two rows (or columns) from the matrix and an identification rule for distinguishing between the two rows (or columns). A loss structure is defined for both rows and columns and the expected loss due to incorrect row or column identification is computed. The dissimilarity between two (say) row clusters is then defined to be the increase in expected loss due to joining those two row clusters into a single cluster., Stopping criteria are suggested for both the intuitive and prediction model approaches. For the intuitive approach, it is suggested that joining be stopped when the dissimilarity between the (say) row clusters to be joined next exceeds that expected by chance under the assumption that the (say) column totals of the matrix are fixed. For the prediction model approach the stopping criterion is based on a cluster prediction model in which the objective is to distinguish between row or column clusters. A cluster identification rule is defined based on the information in the partitioned data matrix and the expected loss due to incorrect cluster identification is computed. The expected cluster loss is also computed when cluster identification is based on strict randomization. The relative decrease in expected cluster loss due to identification based on the partitioned matrix versus that based on randomization is suggested as a stopping criterion., Both contrived and real data examples are used to illustrate and compare the two clustering procedures. Computational aspects of the procedure are discussed and it is concluded that the intuitive approach is less costly in terms of computation time. Further, five admissibility properties are defined and, for certain intuitive dissimilarity coefficients, the trees produced by the TWJA are shown to possess three of the five properties.
Show less - Date Issued
- 1980, 1980
- Identifier
- AAI8026123, 3084693, FSDT3084693, fsu:74194
- Format
- Document (PDF)
- Title
- ON NONPARAMETRIC ESTIMATION OF DENSITY AND REGRESSION FUNCTIONS.
- Creator
-
CHENG, PHILIP E., The Florida State University
- Abstract/Description
-
In the field of statistical estimation, nonparametric procedures have received increased attention for the past decade. In particular, various nonparametric estimates of probability density functions and regression curves have been extensively studied, with special attention to large sample pr
- Date Issued
- 1980, 1980
- Identifier
- AAI8020329, 2989654, FSDT2989654, fsu:74161
- Format
- Document (PDF)
- Title
- STOCHASTIC VERSIONS OF REARRANGEMENT INEQUALITIES WITH APPLICATIONS TO STATISTICS.
- Creator
-
D'ABADIE, CATHERINE ANNE., Florida State University
- Abstract/Description
-
In this dissertation we develop a theory which offers a unified approach to the problem of obtaining stochastic versions of deterministic rearrangement inequalities., To develop the theory we first define two new classes of functions and establish preservation properties of these functions under various statistical and mathematical operations., Next we introduce the notion of stochastically similarly arranged (SSA) pairs of random vectors. We prove that if the random vectors (X,Y) are SSA and...
Show moreIn this dissertation we develop a theory which offers a unified approach to the problem of obtaining stochastic versions of deterministic rearrangement inequalities., To develop the theory we first define two new classes of functions and establish preservation properties of these functions under various statistical and mathematical operations., Next we introduce the notion of stochastically similarly arranged (SSA) pairs of random vectors. We prove that if the random vectors (X,Y) are SSA and the function f from R('n) x R('n) into R('n) is monotone with respect to a certain partial ordering on R('n) x R('n) then for every permutation (pi) the stochastic inequalities, (DIAGRAM, TABLE OR GRAPHIC OMITTED...PLEASE SEE DAI), hold. This result yields a unified way of obtaining stochastic versions of rearrangement inequalities., We then show that many multivariate densities of interest in statistical practice govern pairs of random vectors which are SSA., Next we show that under certain statistical operations on pairs of SSA random vectors the property of being SSA is preserved. For example, we show that the rank order of SSA random variables is SSA. We also show that the SSA property is preserved under certain contamination models., Finally, we show how the results we obtain can be applied to problems in hypothesis testing.
Show less - Date Issued
- 1981, 1981
- Identifier
- AAI8205717, 3085181, FSDT3085181, fsu:74676
- Format
- Document (PDF)