Current Search: Research Repository (x) » * (x) » Thesis (x) » Department of Statistics (x)
Search results
Pages
 Title
 TimeVarying Coefficient Models with ARMAGARCH Structures for Longitudinal Data Analysis.
 Creator

Zhao, Haiyan, Niu, Xufeng, Huﬀer, Fred, Nolder, Craig, McGee, Dan, Department of Statistics, Florida State University
 Abstract/Description

The motivation of my research comes from the analysis of the Framingham Heart Study (FHS) data. The FHS is a long term prospective study of cardiovascular disease in the community of Framingham, Massachusetts. The study began in 1948 and 5,209 subjects were initially enrolled. Examinations were given biennially to the study participants and their status associated with the occurrence of disease was recorded. In this dissertation, the event we are interested in is the incidence of the coronary...
Show moreThe motivation of my research comes from the analysis of the Framingham Heart Study (FHS) data. The FHS is a long term prospective study of cardiovascular disease in the community of Framingham, Massachusetts. The study began in 1948 and 5,209 subjects were initially enrolled. Examinations were given biennially to the study participants and their status associated with the occurrence of disease was recorded. In this dissertation, the event we are interested in is the incidence of the coronary heart disease (CHD). Covariates considered include sex, age, cigarettes per day (CSM), serum cholesterol (SCL), systolic blood pressure (SBP) and body mass index (BMI, weight in kilograms/height in meters squared). Statistical literature review indicates that effects of the covariates on Cardiovascular disease or death caused by all possible diseases in the Framingham study change over time. For example, the effect of SCL on Cardiovascular disease decreases linearly over time. In this study, I would like to examine the timevarying effects of the risk factors on CHD incidence. Timevarying coefficient models with ARMAGARCH structure are developed in this research. The maximum likelihood and the marginal likelihood methods are used to estimate the parameters in the proposed models. Since highdimensional integrals are involved in the calculations of the marginal likelihood, the Laplace approximation is employed in this study. Simulation studies are conducted to evaluate the performance of these two estimation methods based on our proposed models. The KullbackLeibler (KL) divergence and the root mean square error are employed in the simulation studies to compare the results obtained from different methods. Simulation results show that the marginal likelihood approach gives more accurate parameter estimates, but is more computationally intensive. Following the simulation study, our proposed models are applied to the Framingham Heart Study to investigate the timevarying effects of covariates with respect to CHD incidence. To specify the timeseries structures of the effects of risk factors, the Bayesian Information Criterion (BIC) is used for model selection. Our study shows that the relationship between CHD and risk factors changes over time. For males, there is an obviously decreasing linear trend for age effect, which implies that the age effect on CHD is less significant for elder patients than younger patients. The effect of CSM stays almost the same in the first 30 years and decreases thereafter. There are slightly decreasing linear trends for both effects of SBP and BMI. Furthermore, the coefficients of SBP are mostly positive over time, i.e., patients with higher SBP are more likely developing CHD as expected. For females, there is also an obviously decreasing linear trend for age effect, while the effects of SBP and BMI on CHD are mostly positive and do not change too much over time.
Show less  Date Issued
 2010
 Identifier
 FSU_migr_etd0527
 Format
 Thesis
 Title
 A Comparison of Estimators in Hierarchical Linear Modeling: Restricted Maximum Likelihood versus Bootstrap via Minimum Norm Quadratic Unbiased Estimators.
 Creator

Delpish, Ayesha Nneka, Niu, XuFeng, Tate, Richard L., Huﬀer, Fred W., Zahn, Douglas, Department of Statistics, Florida State University
 Abstract/Description

The purpose of the study was to investigate the relative performance of two estimation procedures, the restricted maximum likelihood (REML) and the bootstrap via MINQUE, for a twolevel hierarchical linear model under a variety of conditions. Specific focus lay on observing whether the bootstrap via MINQUE procedure offered improved accuracy in the estimation of the model parameters and their standard errors in situations where normality may not be guaranteed. Through Monte Carlo simulations,...
Show moreThe purpose of the study was to investigate the relative performance of two estimation procedures, the restricted maximum likelihood (REML) and the bootstrap via MINQUE, for a twolevel hierarchical linear model under a variety of conditions. Specific focus lay on observing whether the bootstrap via MINQUE procedure offered improved accuracy in the estimation of the model parameters and their standard errors in situations where normality may not be guaranteed. Through Monte Carlo simulations, the importance of this assumption for the accuracy of multilevel parameter estimates and their standard errors was assessed using the accuracy index of relative bias and by observing the coverage percentages of 95% confidence intervals constructed for both estimation procedures. The study systematically varied the number of groups at level2 (30 versus 100), the size of the intraclass correlation (0.01 versus 0.20) and the distribution of the observations (normal versus chisquared with 1 degree of freedom). The number of groups and intraclass correlation factors produced effects consistent with those previously reported—as the number of groups increased, the bias in the parameter estimates decreased, with a more significant effect observed for those estimates obtained via REML. High levels of the intraclass correlation also led to a decrease in the efficiency of parameter estimation under both methods. Study results show that while both the restricted maximum likelihood and the bootstrap via MINQUE estimates of the fixed effects were accurate, the efficiency of the estimates was affected by the distribution of errors with the bootstrap via MINQUE procedure outperforming the REML. Both procedures produced less efficient estimators under the chisquared distribution, particularly for the variancecovariance component estimates.
Show less  Date Issued
 2006
 Identifier
 FSU_migr_etd0771
 Format
 Thesis
 Title
 Estimation from Data Representing a Sample of Curves.
 Creator

Auguste, Anna L., Bunea, Florentina, Mason, Patrick, Hollander, Myles, Huﬀer, Fred, Department of Statistics, Florida State University
 Abstract/Description

This dissertation introduces and assesses an algorithm to generate confidence bands for a regression function or a main effect when multiple data sets are available. In particular it proposes to construct confidence bands for different trajectories and then aggregate these to produce an overall confidence band for a mean function. An estimator of the regression function or main effect is also examined. First, nonparametric estimators and confidence bands are formed on each data set separately...
Show moreThis dissertation introduces and assesses an algorithm to generate confidence bands for a regression function or a main effect when multiple data sets are available. In particular it proposes to construct confidence bands for different trajectories and then aggregate these to produce an overall confidence band for a mean function. An estimator of the regression function or main effect is also examined. First, nonparametric estimators and confidence bands are formed on each data set separately. Then each data set is in turn treated as a testing set for aggregating the preliminary results from the remaining data sets. The criterion used for this aggregation is either the least squares (LS) criterion or a BIC type penalized LS criterion. The proposed estimator is the average over data sets of these aggregates. It is thus a weighted sum of the preliminary estimators. The proposed confidence band is the minimum L1 band of all the M aggregate bands when we only have a main effect. In the case where there is some random effect we suggest an adjustment to the confidence band. In this case, the proposed confidence band is the minimum L1 band of all the M adjusted aggregate bands. Desirable asymptotic properties are shown to hold. A simulation study examines the performance of each technique relative to several alternate methods and theoretical benchmarks. An application to seismic data is conducted.
Show less  Date Issued
 2006
 Identifier
 FSU_migr_etd0286
 Format
 Thesis
 Title
 Statistical Shape Analysis on Manifolds with Applications to Planar Contours and Structural Proteomics.
 Creator

Ellingson, Leif A., Patrangenaru, Vic, Mio, Washington, Zhang, Jinfeng, Niu, Xufeng, Department of Statistics, Florida State University
 Abstract/Description

The technological advances in recent years have produced a wealth of intricate digital imaging data that is analyzed effectively using the principles of shape analysis. Such data often lies on either highdimensional or infinitedimensional manifolds. With computing power also now strong enough to handle this data, it is necessary to develop theoreticallysound methodology to perform the analysis in a computationally efficient manner. In this dissertation, we propose approaches of doing so...
Show moreThe technological advances in recent years have produced a wealth of intricate digital imaging data that is analyzed effectively using the principles of shape analysis. Such data often lies on either highdimensional or infinitedimensional manifolds. With computing power also now strong enough to handle this data, it is necessary to develop theoreticallysound methodology to perform the analysis in a computationally efficient manner. In this dissertation, we propose approaches of doing so for planar contours and the threedimensional atomic structures of protein binding sites. First, we adapt Kendall's definition of direct similarity shapes of finite planar configurations to shapes of planar contours under certain regularity conditions and utilize Ziezold's nonparametric view of Frechet mean shapes. The space of direct similarity shapes of regular planar contours is embedded in a space of HilbertSchmidt operators in order to obtain the VeroneseWhitney extrinsic mean shape. For computations, it is necessary to use discrete approximations of both the contours and the embedding. For cases when landmarks are not provided, we propose an automated, randomized landmark selection procedure that is useful for contour matching within a population and is consistent with the underlying asymptotic theory. For inference on the extrinsic mean direct similarity shape, we consider a onesample neighborhood hypothesis test and the use of nonparametric bootstrap to approximate confidence regions. Bandulasiri et al (2008) suggested using extrinsic reflection sizeandshape analysis to study the relationship between the structure and function of protein binding sites. In order to obtain meaningful results for this approach, it is necessary to identify the atoms common to a group of binding sites with similar functions and obtain proper correspondences for these atoms. We explore this problem in depth and propose an algorithm for simultaneously finding the common atoms and their respective correspondences based upon the Iterative Closest Point algorithm. For a benchmark data set, our classification results compare favorably with those of leading established methods. Finally, we discuss current directions in the field of statistics on manifolds, including a computational comparison of intrinsic and extrinsic analysis for various applications and a brief introduction of sample spaces with manifold stratification.
Show less  Date Issued
 2011
 Identifier
 FSU_migr_etd0053
 Format
 Thesis
 Title
 Examining the Effect of Treatment on the Distribution of Blood Pressure in the Population Using Observational Data.
 Creator

Kucukemiroglu, Saryet Alexa, McGee, Daniel, Slate, Elizabeth H., Hurt, Myra M., Huffer, Fred W. (Fred William), Florida State University, College of Arts and Sciences,...
Show moreKucukemiroglu, Saryet Alexa, McGee, Daniel, Slate, Elizabeth H., Hurt, Myra M., Huffer, Fred W. (Fred William), Florida State University, College of Arts and Sciences, Department of Statistics
Show less  Abstract/Description

Since the introduction of antihypertensive medications in the mid1950s, there has been an increased use of blood pressure medications in the US. The growing use of antihypertensive treatment has affected the distribution of blood pressure in the population over time. Now observational data no longer reflect natural blood pressure levels. Our goal is to examine the effect of antihypertensive drugs on distributions of blood pressure using several wellknown observational studies. The...
Show moreSince the introduction of antihypertensive medications in the mid1950s, there has been an increased use of blood pressure medications in the US. The growing use of antihypertensive treatment has affected the distribution of blood pressure in the population over time. Now observational data no longer reflect natural blood pressure levels. Our goal is to examine the effect of antihypertensive drugs on distributions of blood pressure using several wellknown observational studies. The statistical concept of censoring is used to estimate the distribution of blood pressure in populations if no treatment were available. The treated and estimated untreated distributions are then compared to determine the general effect of these medications in the population. Our analyses show that these drugs have an increasing impact on controlling blood pressure distributions in populations that are heavily treated.
Show less  Date Issued
 2017
 Identifier
 FSU_FALL2017_Kucukemiroglu_fsu_0071E_14275
 Format
 Thesis
 Title
 SemiParametric Generalized Estimating Equations with Kernel Smoother: A Longitudinal Study in Financial Data Analysis.
 Creator

Yang, Liu, Niu, Xufeng, Cheng, Yingmei, Huffer, Fred W. (Fred William), Tao, Minjing, Florida State University, College of Arts and Sciences, Department of Statistics
 Abstract/Description

Longitudinal studies are widely used in various fields, such as public health, clinic trials and financial data analysis. A major challenge for longitudinal studies is repeated measurements from each subject, which cause time dependent correlation within subjects. Generalized Estimating Equations can deal with correlated outcomes for longitudinal data through marginal effect. My model will base on Generalized Estimating Equations with semiparametric approach, providing a flexible structure...
Show moreLongitudinal studies are widely used in various fields, such as public health, clinic trials and financial data analysis. A major challenge for longitudinal studies is repeated measurements from each subject, which cause time dependent correlation within subjects. Generalized Estimating Equations can deal with correlated outcomes for longitudinal data through marginal effect. My model will base on Generalized Estimating Equations with semiparametric approach, providing a flexible structure for regression models: coefficients for parametric covariates will be estimated and nuisance covariates will be fitted in kernel smoothers for nonparametric part. Profile kernel estimator and the seemingly unrelated kernel estimator (SUR) will be used to deliver consistent and efficient semiparametric estimators comparing to parametric models. We provide simulation results for estimating semiparametric models with one or multiple nonparametric terms. In application part, we would like to focus on financial market: a credit card loan data will be used with the payment information for each customer across 6 months, investigating whether gender, income, age or other factors will influence payment status significantly. Furthermore, we propose model comparisons to evaluate whether our model should be fitted based on different levels of factors, such as male and female or based on different types of estimating methods, such as parametric estimation or semiparametric estimation.
Show less  Date Issued
 2017
 Identifier
 FSU_FALL2017_YANG_fsu_0071E_14219
 Format
 Thesis
 Title
 Bayesian Modeling and Variable Selection for Complex Data.
 Creator

Li, Hanning, Pati, Debdeep, Huffer, Fred W. (Fred William), Kercheval, Alec N., Sinha, Debajyoti, Bradley, Jonathan R., Florida State University, College of Arts and Sciences,...
Show moreLi, Hanning, Pati, Debdeep, Huffer, Fred W. (Fred William), Kercheval, Alec N., Sinha, Debajyoti, Bradley, Jonathan R., Florida State University, College of Arts and Sciences, Department of Statistics
Show less  Abstract/Description

As we routinely encounter highthroughput datasets in complex biological and environment research, developing novel models and methods for variable selection has received widespread attention. In this dissertation, we addressed a few key challenges in Bayesian modeling and variable selection for highdimensional data with complex spatial structures. a) Most Bayesian variable selection methods are restricted to mixture priors having separate components for characterizing the signal and the...
Show moreAs we routinely encounter highthroughput datasets in complex biological and environment research, developing novel models and methods for variable selection has received widespread attention. In this dissertation, we addressed a few key challenges in Bayesian modeling and variable selection for highdimensional data with complex spatial structures. a) Most Bayesian variable selection methods are restricted to mixture priors having separate components for characterizing the signal and the noise. However, such priors encounter computational issues in high dimensions. This has motivated continuous shrinkage priors, resembling the twocomponent priors facilitating computation and interpretability. While such priors are widely used for estimating highdimensional sparse vectors, selecting a subset of variables remains a daunting task. b) Spatial/spatialtemporal data sets with complex structures are nowadays commonly encountered in various scientific research fields ranging from atmospheric sciences, forestry, environmental science, biological science, and social science. Selecting important spatial variables that have significant influences on occurrences of events is undoubtedly necessary and essential for providing insights to researchers. Selfexcitation, which is a feature that occurrence of an event increases the likelihood of more occurrences of the same type of events nearby in time and space, can be found in many natural/social events. Research on modeling data with selfexcitation feature has increasingly drawn interests recently. However, existing literature on selfexciting models with inclusion of highdimensional spatial covariates is still underdeveloped. c) Gaussian Process is among the most powerful model frames for spatial data. Its major bottleneck is the computational complexity which stems from inversion of dense matrices associated with a Gaussian process covariance. Hierarchical divideconquer Gaussian Process models have been investigated for ultra large data sets. However, computation associated with scaling the distributing computing algorithm to handle a large number of subgroups poses a serious bottleneck. In chapter 2 of this dissertation, we propose a general approach for variable selection with shrinkage priors. The presence of very few tuning parameters makes our method attractive in comparison to ad hoc thresholding approaches. The applicability of the approach is not limited to continuous shrinkage priors, but can be used along with any shrinkage prior. Theoretical properties for nearcollinear design matrices are investigated and the method is shown to have good performance in a wide range of synthetic data examples and in a real data example on selecting genes affecting survival due to lymphoma. In Chapter 3 of this dissertation, we propose a new selfexciting model that allows the inclusion of spatial covariates. We develop algorithms which are effective in obtaining accurate estimation and variable selection results in a variety of synthetic data examples. Our proposed model is applied on Chicago crime data where the influence of various spatial features is investigated. In Chapter 4, we focus on a hierarchical Gaussian Process regression model for ultrahigh dimensional spatial datasets. By evaluating the latent Gaussian process on a regular grid, we propose an efficient computational algorithm through circulant embedding. The latent Gaussian process borrows information across multiple subgroups, thereby obtaining a more accurate prediction. The hierarchical model and our proposed algorithm are studied through simulation examples.
Show less  Date Issued
 2017
 Identifier
 FSU_FALL2017_Li_fsu_0071E_14159
 Format
 Thesis
 Title
 Spatial Statistics and Its Applications in Biostatistics and Environmental Statistics.
 Creator

Hu, Guanyu, Huffer, Fred W. (Fred William), Paek, Insu, Sinha, Debajyoti, Slate, Elizabeth H., Bradley, Jonathan R., Florida State University, College of Arts and Sciences,...
Show moreHu, Guanyu, Huffer, Fred W. (Fred William), Paek, Insu, Sinha, Debajyoti, Slate, Elizabeth H., Bradley, Jonathan R., Florida State University, College of Arts and Sciences, Department of Statistics
Show less  Abstract/Description

This dissertation presents some topics in spatial statistics and their application in biostatistics and environmental statistics. The field of spatial statistics is an energetic area in statistics. In Chapter 2 and Chapter 3, the goal is to build subregion models under the assumption that the responses or the parameters are spatially correlated. For regression models, considering spatially varying coecients is a reasonable way to build subregion models. There are two different techniques for...
Show moreThis dissertation presents some topics in spatial statistics and their application in biostatistics and environmental statistics. The field of spatial statistics is an energetic area in statistics. In Chapter 2 and Chapter 3, the goal is to build subregion models under the assumption that the responses or the parameters are spatially correlated. For regression models, considering spatially varying coecients is a reasonable way to build subregion models. There are two different techniques for exploring spatially varying coecients. One is geographically weighted regression (Brunsdon et al. 1998). The other is a spatially varying coecients model which assumes a stationary Gaussian process for the regression coecients (Gelfand et al. 2003). Based on the ideas of these two techniques, we introduce techniques for exploring subregion models in survival analysis which is an important area of biostatistics. In Chapter 2, we introduce modied versions of the KaplanMeier and NelsonAalen estimators which incorporate geographical weighting. We use ideas from counting process theory to obtain these modied estimators, to derive variance estimates, and to develop associated hypothesis tests. In Chapter 3, we introduce a Bayesian parametric accelerated failure time model with spatially varying coefficients. These two techniques can explore subregion models in survival analysis using both nonparametric and parametric approaches. In Chapter 4, we introduce Bayesian parametric covariance regression analysis for a response vector. The proposed method denes a regression model between the covariance matrix of a pdimensional response vector and auxiliary variables. We propose a constrained MetropolisHastings algorithm to get the estimates. Simulation results are presented to show performance of both regression and covariance matrix estimates. Furthermore, we have a more realistic simulation experiment in which our Bayesian approach has better performance than the MLE. Finally, we illustrate the usefulness of our model by applying it to the Google Flu data. In Chapter 5, we give a brief summary of future work.
Show less  Date Issued
 2017
 Identifier
 FSU_FALL2017_Hu_fsu_0071E_14205
 Format
 Thesis
 Title
 The Oneand TwoSample Problem for Data on Hilbert Manifolds with Applications to Shape Analysis.
 Creator

Qiu, Mingfei, Patrangenaru, Victor, Liu, Xiuwen, Slate, Elizabeth H., Barbu, Adrian G. (Adrian Gheorghe), Clickner, Robert Paul, Paige, Robert, Florida State University, College...
Show moreQiu, Mingfei, Patrangenaru, Victor, Liu, Xiuwen, Slate, Elizabeth H., Barbu, Adrian G. (Adrian Gheorghe), Clickner, Robert Paul, Paige, Robert, Florida State University, College of Arts and Sciences, Department of Statistics
Show less  Abstract/Description

This dissertation is concerned with high level imaging analysis. In particular, our focus is on extracting the projective shape information or the similarity shape from digital camera images or Magnetic Resonance Imaging(MRI). The approach is statistical without making any assumptions about the distributions of the random object under investigation. The data is organized as points on a Hilbert manifold. In the case of projective shapes of finite dimensional configuration of points, we...
Show moreThis dissertation is concerned with high level imaging analysis. In particular, our focus is on extracting the projective shape information or the similarity shape from digital camera images or Magnetic Resonance Imaging(MRI). The approach is statistical without making any assumptions about the distributions of the random object under investigation. The data is organized as points on a Hilbert manifold. In the case of projective shapes of finite dimensional configuration of points, we consider testing a onesample null hypothesis, while in the infinite dimensional case, we considered a neighborhood hypothesis testing methods. For 3D scenes, we retrieve the 3D projective shape, and use the Lie group structure of the projective shape space. We test the equality of two extrinsic means, by introducing the mean projective shape change. For 2D MRI of midsections of Corpus Callosum contours, we use an automatic matching technique that is necessary in pursuing a onesample neighborhood hypothesis testing for the similarity shapes. We conclude that the mean similarity shape of the Corpus Callosum of average individuals is very far from the shape of Albert Einstein's, which may explain his geniality. Another application of our Hilbert manifold methodology is twosample testing problem for VeroneseWhitney means of projective shapes of 3D contours. Particularly, our data consisting comparing 3D projective shapes of contours of leaves from the same tree species.
Show less  Date Issued
 2015
 Identifier
 FSU_2015fall_Qiu_fsu_0071E_12922
 Format
 Thesis
 Title
 Examining the Relationship of Dietary Component Intakes to Each Other and to Mortality.
 Creator

Alrajhi, Sharifah, McGee, Daniel, Levenson, Cathy W., Niu, Xufeng, Sinha, Debajyoti, Florida State University, College of Arts and Sciences, Department of Statistics
 Abstract/Description

In this essay we present analysis examining the basic dietary structure and its relationship to mortality in the first National Health and Nutrition Examination Survey (NHANES I) conducted between 1971 and 1975. We used results from 24hour recalls on 10,483 individuals in this study. All of the indivduals in the analytic sample were followed through 1992 for vital status. The mean followup period for the participants was 16 years. During followup 2,042 (48%) males and 1,754 (27%) females...
Show moreIn this essay we present analysis examining the basic dietary structure and its relationship to mortality in the first National Health and Nutrition Examination Survey (NHANES I) conducted between 1971 and 1975. We used results from 24hour recalls on 10,483 individuals in this study. All of the indivduals in the analytic sample were followed through 1992 for vital status. The mean followup period for the participants was 16 years. During followup 2,042 (48%) males and 1,754 (27%) females died. We first attempted to capture the inherent structure of the dietary data using principal components analyses (PCA). We performed this estimation separately for each race (white and black) and gender (male and female) and compared the estimated principal components among these four strata. We found that the principal components were similar (but not identical) in the four strata. we also related our estimated principal components to mortality using Cox Proportional Hazards (CPH) models and related dietary component to mortality using forward variable selection.
Show less  Date Issued
 2015
 Identifier
 FSU_2015fall_Alrajhi_fsu_0071E_12802
 Format
 Thesis
 Title
 Median Regression for Complex Survey Data.
 Creator

Fraser, Raphael André, Sinha, Debajyoti, Lipsitz, Stuart, Carlson, Elwood, Slate, Elizabeth H., Huffer, Fred W. (Fred William), Florida State University, College of Arts and...
Show moreFraser, Raphael André, Sinha, Debajyoti, Lipsitz, Stuart, Carlson, Elwood, Slate, Elizabeth H., Huffer, Fred W. (Fred William), Florida State University, College of Arts and Sciences, Department of Statistics
Show less  Abstract/Description

The ready availability of publicuse data from various large national complex surveys has immense potential for the assessment of population characteristicsmeans, proportions, totals, etcetera. Using a modelbased approach, complex surveys can be used to evaluate the effectiveness of treatments and to identify risk factors for important diseases such as cancer. Existing statistical methods based on estimating equations and/or utilizing resampling methods are often not valid with survey data...
Show moreThe ready availability of publicuse data from various large national complex surveys has immense potential for the assessment of population characteristicsmeans, proportions, totals, etcetera. Using a modelbased approach, complex surveys can be used to evaluate the effectiveness of treatments and to identify risk factors for important diseases such as cancer. Existing statistical methods based on estimating equations and/or utilizing resampling methods are often not valid with survey data due to design features such as stratification, multistage sampling and unequal selection probabilities. In this paper, we accommodate these design features in the analysis of highly skewed response variables arising from large complex surveys. Specifically, we propose a doubletransformbothsides based estimating equations approach to estimate the median regression parameters of the highly skewed response; the doubletransformbothsides method applies the same transformation twice to both the response and regression function. The usual sandwich variance estimate can be used in our approach, whereas a resampling approach would be needed for a pseudolikelihood based on minimizing absolute deviations. Furthermore, the doubletransformbothsides estimator is relatively robust to the true underlying distribution, and has much smaller mean square error than the least absolute deviations estimator. The method is motivated by an analysis of laboratory data on urinary iodine concentration from the National Health and Nutrition Examination Survey.
Show less  Date Issued
 2015
 Identifier
 FSU_2015fall_Fraser_fsu_0071E_12825
 Format
 Thesis
 Title
 Matched Sample Based Approach for CrossPlatform Normalization on Gene Expression Data.
 Creator

Shao, Jiang, Zhang, Jinfeng, Sang, QingXiang Amy, Wu, Wei, Niu, Xufeng, Florida State University, College of Arts and Sciences, Department of Statistics
 Abstract/Description

Geneexpression data profile are widely used in all kinds of biomedical studies especially in cancer research. This dissertation work focus on solving the problem of how to combine datasets arising from different studies. Of particular interest is how to remove platform effect alone. The matched sample based crossplatform normalization method we developed are designed to tackle data merging problem in two scenarios: The first is affyagilent crossplatform normalization which are belong to...
Show moreGeneexpression data profile are widely used in all kinds of biomedical studies especially in cancer research. This dissertation work focus on solving the problem of how to combine datasets arising from different studies. Of particular interest is how to remove platform effect alone. The matched sample based crossplatform normalization method we developed are designed to tackle data merging problem in two scenarios: The first is affyagilent crossplatform normalization which are belong to classic microarray gene expression profile. The second is the integration of microarray data with Next Generation Sequencing genome data. We use several general validation measures to assess and compare with the popular Distanceweighted discrimination method. With the public webbased tool NCI60 CellMiner and The Cancer Genome Atlas data portal supported, our proposed method outperformed DWD in both crossplatform scenarios. It can be further assessed by the ability of exploring biological features in the studies of cancer type discrimination. We applied our method onto two classification problem: One is Breast cancer tumor/normal status classification on microarray and next generation sequencing datasets; The other is Breast cancer patients chemotherapy response classification on GPL96 and GPL570 microarray datasets. Both problems show the classification power are increased after our matched sample based crossplatform normalization method.
Show less  Date Issued
 2015
 Identifier
 FSU_2015fall_Shao_fsu_0071E_12833
 Format
 Thesis
 Title
 Individual PatientLevel Data MetaAnalysis: A Comparison of Methods for the Diverse Populations Collaboration Data Set.
 Creator

Dutton, Matthew Thomas, McGee, Daniel, Becker, Betsy, Niu, Xufeng, Zhang, Jinfeng, Department of Statistics, Florida State University
 Abstract/Description

DerSimonian and Laird define metaanalysis as "the statistical analysis of a collection of analytic results for the purpose of integrating their findings. One alternative to classical metaanalytic approaches in known as Individual PatientLevel Data, or IPD, metaanalysis. Rather than depending on summary statistics calculated for individual studies, IPD metaanalysis analyzes the complete data from all included studies. Two potential approaches to incorporating IPD data into the meta...
Show moreDerSimonian and Laird define metaanalysis as "the statistical analysis of a collection of analytic results for the purpose of integrating their findings. One alternative to classical metaanalytic approaches in known as Individual PatientLevel Data, or IPD, metaanalysis. Rather than depending on summary statistics calculated for individual studies, IPD metaanalysis analyzes the complete data from all included studies. Two potential approaches to incorporating IPD data into the metaanalytic framework are investigated. A twostage analysis is first conducted, in which individual models are fit for each study and summarized using classical metaanalysis procedures. Secondly, a onestage approach that singularly models the data and summarizes the information across studies is investigated. Data from the Diverse Populations Collaboration data set are used to investigate the differences between these two methods in a specific example. The bootstrap procedure is used to determine if the two methods produce statistically different results in the DPC example. Finally, a simulation study is conducted to investigate the accuracy of each method in given scenarios.
Show less  Date Issued
 2011
 Identifier
 FSU_migr_etd0620
 Format
 Thesis
 Title
 Analysis of the Wealth Distribution at Equilibrium in a Heterogeneous Agent Economy.
 Creator

Badshah, Muffasir H., Srivastava, Anuj, Beaumont, Paul, Wu, Wei, Kercheval, Alec, Department of Statistics, Florida State University
 Abstract/Description

This paper aims at analyzing a macro economy with a continuum of infinitelylived households that make rational decisions about consumption and wealth savings in the face of employment and aggregate productivity shocks. The heterogeneous population structure arises when households differ in wealth and employment status against which they cannot insure. In this framework, the household wealth evolution is modeled as a mixture Markov process. The stationary wealth distributions are obtained...
Show moreThis paper aims at analyzing a macro economy with a continuum of infinitelylived households that make rational decisions about consumption and wealth savings in the face of employment and aggregate productivity shocks. The heterogeneous population structure arises when households differ in wealth and employment status against which they cannot insure. In this framework, the household wealth evolution is modeled as a mixture Markov process. The stationary wealth distributions are obtained using eigen structures of transition matrices under the PerronFrobenius theorem. This step is utilized repeatedly to find the equilibrium state of the system, and it leads to an efficient framework for studying the dynamic general equilibrium. A systematic evaluation of the equilibrium state under different initial conditions is further presented and analyzed.
Show less  Date Issued
 2010
 Identifier
 FSU_migr_etd0844
 Format
 Thesis
 Title
 Minimax Tests for Nonparametric Alternatives with Applications to High Frequency Data.
 Creator

Yu, Han, Song, KaiSheng, Professor, Jack Quine, Professor, Fred Huﬀer, Professor, Dan McGee, Department of Statistics, Florida State University
 Abstract/Description

We present a general methodology for developing an asymptotically distributionfree, asymptotic minimax tests. The tests are constructed via a nonparametric densityquantile function and the limiting distribution is derived by a martingale approach. The procedure can be viewed as a novel parametric extension of the classical parametric likelihood ratio test. The proposed tests are shown to be omnibus within an extremely large class of nonparametric global alternatives characterized by simple...
Show moreWe present a general methodology for developing an asymptotically distributionfree, asymptotic minimax tests. The tests are constructed via a nonparametric densityquantile function and the limiting distribution is derived by a martingale approach. The procedure can be viewed as a novel parametric extension of the classical parametric likelihood ratio test. The proposed tests are shown to be omnibus within an extremely large class of nonparametric global alternatives characterized by simple conditions. Furthermore, we establish that the proposed tests provide better minimax distinguishability. The tests have much greater power for detecting highfrequency nonparametric alternatives than the existing classical tests such as KolmogorovSmirnov and Cramervon Mises tests. The good performance of the proposed tests is demonstrated by Monte Carlo simulations and applications in High Energy Physics.
Show less  Date Issued
 2006
 Identifier
 FSU_migr_etd0796
 Format
 Thesis
 Title
 Testing for the Equality of Two Distributions on High Dimensional Object Spaces and Nonparametric Inference for Location Parameters.
 Creator

Guo, Ruite, Patrangenaru, Victor, Mio, Washington, Barbu, Adrian G. (Adrian Gheorghe), Bradley, Jonathan R., Florida State University, College of Arts and Sciences, Department...
Show moreGuo, Ruite, Patrangenaru, Victor, Mio, Washington, Barbu, Adrian G. (Adrian Gheorghe), Bradley, Jonathan R., Florida State University, College of Arts and Sciences, Department of Statistics
Show less  Abstract/Description

Our view is that while some of the basic principles of data analysis are going to remain unchanged, others are to be gradually replaced with Geometry and Topology methods. Linear methods are still making sense for functional data analysis, or in the context of tangent bundles of object spaces. Complex nonstandard data is represented on object spaces. An object space admitting a manifold stratification may be embedded in an Euclidean space. One defines the extrinsic energy distance associated...
Show moreOur view is that while some of the basic principles of data analysis are going to remain unchanged, others are to be gradually replaced with Geometry and Topology methods. Linear methods are still making sense for functional data analysis, or in the context of tangent bundles of object spaces. Complex nonstandard data is represented on object spaces. An object space admitting a manifold stratification may be embedded in an Euclidean space. One defines the extrinsic energy distance associated with two probability measures on an arbitrary object space embedded in a numerical space, and one introduces an extrinsic energy statistic to test for homogeneity of distributions of two random objects (r.o.'s) on such an object space. This test is validated via a simulation example on the Kendall space of planar kads with a VeroneseWhitney (VW) embedding. One considers an application to medical imaging, to test for the homogeneity of the distributions of Kendall shapes of the midsections of the Corpus Callosum in a clinically normal population vs a population of ADHD diagnosed individuals. Surprisingly, due to the high dimensionality, these distributions are not significantly different, although they are known to have highly significant VWmeans. New spread and location parameters are to be added to reflect the nontrivial topology of certain object spaces. TDA is going to be adapted to object spaces, and hypothesis testing for distributions is going to be based on extrinsic energy methods. For a random point on an object space embedded in an Euclidean space, the mean vector cannot be represented as a point on that space, except for the case when the embedded space is convex. To address this misgiving, since the mean vector is the minimizer of the expected square distance, following Frechet (1948), on an embedded compact object space, one may consider both minimizers and maximizers of the expected square distance to a given point on the embedded object space as mean, respectively antimean of the random point. Of all distances on an object space, one considers here the chord distance associated with the embedding of the object space, since for such distances one can give a necessary and sufficient condition for the existence of a unique Frechet mean (respectively Frechet antimean). For such distributions these location parameters are called extrinsic mean (respectively extrinsic antimean), and the corresponding sample statistics are consistent estimators of their population counterparts. Moreover around the extrinsic mean ( antimean ) located at a smooth point, one derives the limit distribution of such estimators.
Show less  Date Issued
 2017
 Identifier
 FSU_SUMMER2017_Guo_fsu_0071E_13977
 Format
 Thesis
 Title
 A Bayesian Wavelet Based Analysis of Longitudinally Observed Skewed Heteroscedastic Responses.
 Creator

Baker, Danisha S. (Danisha Sharice), Chicken, Eric, Sinha, Debajyoti, Harper, Kristine, Pati, Debdeep, Florida State University, College of Arts and Sciences, Department of...
Show moreBaker, Danisha S. (Danisha Sharice), Chicken, Eric, Sinha, Debajyoti, Harper, Kristine, Pati, Debdeep, Florida State University, College of Arts and Sciences, Department of Statistics
Show less  Abstract/Description

Unlike many of the current statistical models focusing on highly skewed longitudinal data, we present a novel model accommodating a skewed error distribution, partial linear median regression function, nonparametric wavelet expansion, and serial observations on the same unit. Parameters are estimated via a semiparametric Bayesian procedure using an appropriate Dirichlet process mixture prior for the skewed error distribution. We use a hierarchical mixture model as the prior for the wavelet...
Show moreUnlike many of the current statistical models focusing on highly skewed longitudinal data, we present a novel model accommodating a skewed error distribution, partial linear median regression function, nonparametric wavelet expansion, and serial observations on the same unit. Parameters are estimated via a semiparametric Bayesian procedure using an appropriate Dirichlet process mixture prior for the skewed error distribution. We use a hierarchical mixture model as the prior for the wavelet coefficients. For the "vanishing" coefficients, the model includes a level dependent prior probability mass at zero. This practice implements wavelet coefficient thresholding as a Bayesian Rule. Practical advantages of our method are illustrated through a simulation study and via analysis of a cardiotoxicity study of children of HIV infected mother.
Show less  Date Issued
 2017
 Identifier
 FSU_SUMMER2017_Baker_fsu_0071E_14036
 Format
 Thesis
 Title
 Regression Methods for Skewed and Heteroscedastic Response with HighDimensional Covariates.
 Creator

Wang, Libo, Sinha, Debajyoti, Taylor, Miles G., Pati, Debdeep, She, Yiyuan, Yang, Yun (Professor of Statistics), Florida State University, College of Arts and Sciences,...
Show moreWang, Libo, Sinha, Debajyoti, Taylor, Miles G., Pati, Debdeep, She, Yiyuan, Yang, Yun (Professor of Statistics), Florida State University, College of Arts and Sciences, Department of Statistics
Show less  Abstract/Description

The rise of studies with highdimensional potential covariates has invited a renewed interest in dimension reduction that promotes more parsimonious models, ease of interpretation and computational tractability. However, current variable selection methods restricted to continuous response often assume Gaussian response for methodological as well as theoretical developments. In this thesis, we consider regression models that induce sparsity, gain prediction power, and accommodates response...
Show moreThe rise of studies with highdimensional potential covariates has invited a renewed interest in dimension reduction that promotes more parsimonious models, ease of interpretation and computational tractability. However, current variable selection methods restricted to continuous response often assume Gaussian response for methodological as well as theoretical developments. In this thesis, we consider regression models that induce sparsity, gain prediction power, and accommodates response distributions beyond Gaussian with common variance. The first part of this thesis is a transformbothside Bayesian variable selection model (TBS) which allows skewness, heteroscedasticity and extreme heavy tailed responses. Our method develops a framework which facilitates computationally feasible inference in spite of inducing nonlocal priors on the original regression coefficients. Even if the transformed conditional mean is no longer linear with respect to covariates, we still prove the consistency of our Bayesian TBS estimators. Simulation studies and real data analysis demonstrate the advantages of our methods. Another main part of this thesis deals the above challenges from a frequentist standpoint. This model incorporates a penalized likelihood to accommodate skewed response, arising from an epsilonskewnormal (ESN) distribution. With suitable optimization techniques to handle this twopiece penalized likelihood, our method demonstrates substantial gains in sensitivity and specificity even under highdimensional settings. We conclude this thesis with a novel Bayesian semiparametric modal regression method along with its implementation and simulation studies.
Show less  Date Issued
 2017
 Identifier
 FSU_SUMMER2017_Wang_fsu_0071E_13950
 Format
 Thesis
 Title
 Nonparametric Change Point Detection Methods for Profile Variability.
 Creator

Geneus, Vladimir J. (Vladimir Jacques), Chicken, Eric, Liu, Guosheng (Professor of Earth, Ocean and Atmospheric Science), Sinha, Debajyoti, Zhang, Xin (Professor of Engineering)...
Show moreGeneus, Vladimir J. (Vladimir Jacques), Chicken, Eric, Liu, Guosheng (Professor of Earth, Ocean and Atmospheric Science), Sinha, Debajyoti, Zhang, Xin (Professor of Engineering), Florida State University, College of Arts and Sciences, Department of Statistics
Show less  Abstract/Description

Due to the importance of seeing profile change in devices such as of medical apparatus, measuring the change point in variability of a different functions is important. In a sequence of functional observations (each of the same length), we wish to determine as quickly as possible when a change in the observations has occurred. Waveletbased change point methods are proposed that determine when the variability of the noise in a sequence of functional profiles (i.e. the precision profile of...
Show moreDue to the importance of seeing profile change in devices such as of medical apparatus, measuring the change point in variability of a different functions is important. In a sequence of functional observations (each of the same length), we wish to determine as quickly as possible when a change in the observations has occurred. Waveletbased change point methods are proposed that determine when the variability of the noise in a sequence of functional profiles (i.e. the precision profile of medical devices) has occurred; goes out of control from a known, fixed value, or an estimated incontrol value. Various methods have been proposed which focus on changes in the form of the function. One method, the NEWMA, based on EWMA, focuses on changes in both. However, the drawback is that the form of the incontrol function is known. Others methods, including the χ² for Phase I & Phase II make some assumption about the function. Our interest, however, is in detecting changes in the variance from one function to the next. In particular, we are interested not on differences from one profile to another (variance between), rather differences in variance (variance within). The functional portion of the profiles is allowed to come from a large class of functions and may vary from profile to profile. The estimator is evaluated on a variety of conditions, including allowing the wavelet noise subspace to be substantially contaminated by the profile's functional structure, and is compared to two competing noise monitoring methods. Nikoo and Noorossana (2013) propose a nonparametric wavelet regression method that uses both change point techniques to monitor the variance: a Nonparametric Control Charts, via the mean of m median control charts, and a Parametric Control Charts, via χ²distribution. We propose improvements to their method by incorporating prior data and making use of likelihood ratios. Our methods make use of the orthogonal properties of wavelet projections to accurately and efficiently monitor the level of noise from one profile to the next; detect changes in noise in Phase II setting. We show through simulation results that our proposed methods have better power and are more robust against the confounding effect between variance estimation and function estimation. The proposed methods are shown to be very efficient at detecting when the variability has changed through an extensive simulation study. Extensions are considered that explore the usage of windowing and estimated incontrol values for the MAD method; and the effect of the exact distribution under normality rather than the asymptotic distribution. These developments are implemented in the parametric, nonparametric scale, and complete nonparameric settings. The proposed methodologies are tested through simulation and applicable to various biometric and health related topics; and have the potential to improve in computational efficiency and in reducing the number of assumptions required.
Show less  Date Issued
 2017
 Identifier
 FSU_SUMMER2017_Geneus_fsu_0071E_13862
 Format
 Thesis
 Title
 Scalable and Structured High Dimensional Covariance Matrix Estimation.
 Creator

Sabnis, Gautam, Pati, Debdeep, Kercheval, Alec N., Sinha, Debajyoti, Chicken, Eric, Florida State University, College of Arts and Sciences, Department of Statistics
 Abstract/Description

With rapid advances in data acquisition and storage techniques, modern scientific investigations in epidemiology, genomics, imaging and networks are increasingly producing challenging data structures in the form of highdimensional vectors, matrices and multiway arrays (tensors) rendering traditional statistical and computational tools inappropriate. One hope for meaningful inferences in such situations is to discover an inherent lowerdimensional structure that explains the physical or...
Show moreWith rapid advances in data acquisition and storage techniques, modern scientific investigations in epidemiology, genomics, imaging and networks are increasingly producing challenging data structures in the form of highdimensional vectors, matrices and multiway arrays (tensors) rendering traditional statistical and computational tools inappropriate. One hope for meaningful inferences in such situations is to discover an inherent lowerdimensional structure that explains the physical or biological process generating the data. The structural assumptions impose constraints that force the objects of interest to lie in lowerdimensional spaces, thereby facilitating their estimation and interpretation and, at the same time reducing computational burden. The assumption of an inherent structure, motivated by various scientific applications, is often adopted as the guiding light in the analysis and is fast becoming a standard tool for parsimonious modeling of such high dimensional data structures. The content of this thesis is specifically directed towards methodological development of statistical tools, with attractive computational properties, for drawing meaningful inferences though such structures. The third chapter of this thesis proposes a distributed computing framework, based on a divide and conquer strategy and hierarchical modeling, to accelerate posterior inference for highdimensional Bayesian factor models. Our approach distributes the task of highdimensional covariance matrix estimation to multiple cores, solves each subproblem separately via a latent factor model, and then combines these estimates to produce a global estimate of the covariance matrix. Existing divide and conquer methods focus exclusively on dividing the total number of observations n into subsamples while keeping the dimension p fixed. The approach is novel in this regard: it includes all of the n samples in each subproblem and, instead, splits the dimension p into smaller subsets for each subproblem. The subproblems themselves can be challenging to solve when p is large due to the dependencies across dimensions. To circumvent this issue, a novel hierarchical structure is specified on the latent factors that allows for flexible dependencies across dimensions, while still maintaining computational efficiency. Our approach is readily parallelizable and is shown to have computational efficiency of several orders of magnitude in comparison to fitting a full factor model. The fourth chapter of this thesis proposes a novel way of estimating a covariance matrix that can be represented as a sum of a lowrank matrix and a diagonal matrix. The proposed method compresses highdimensional data, computes the sample covariance in the compressed space, and lifts it back to the ambient space via a decompression operation. A salient feature of our approach relative to existing literature on combining sparsity and lowrank structures in covariance matrix estimation is that we do not require the lowrank component to be sparse. A principled framework for estimating the compressed dimension using Stein's Unbiased Risk Estimation theory is demonstrated. In the final chapter of this thesis, we tackle the problem of variable selection in high dimensions. Consistent model selection in high dimensions has received substantial interest in recent years and is an extremely challenging problem for Bayesians. The literature on model selection with continuous shrinkage priors is even lessdeveloped due to the unavailability of exact zeros in the posterior samples of parameter of interest. Heuristic methods based on thresholding the posterior mean are often used in practice which lack theoretical justification, and inference is highly sensitive to the choice of the threshold. We aim to address the problem of selecting variables through a novel method of post processing the posterior samples.
Show less  Date Issued
 2017
 Identifier
 FSU_SUMMER2017_Sabnis_fsu_0071E_14043
 Format
 Thesis
 Title
 An Examination of the Relationship between Alcohol and Dementia in a Longitudinal Study.
 Creator

Hu, Tingting, McGee, Daniel, Slate, Elizabeth H., Hurt, Myra M., Niu, Xufeng, Florida State University, College of Arts and Sciences, Department of Statistics
 Abstract/Description

The high mortality rate and huge expenditure caused by dementia makes it a pressing concern for public health researchers. Among the potential risk factors in diet and nutrition, the relation between alcohol usage and dementia has been investigated in many studies, but no clear picture has emerged. This association has been reported as protective, neurotoxic, Ushaped curve, and insignificant in different sources. An individual’s alcohol usage is dynamic and could change over time, however,...
Show moreThe high mortality rate and huge expenditure caused by dementia makes it a pressing concern for public health researchers. Among the potential risk factors in diet and nutrition, the relation between alcohol usage and dementia has been investigated in many studies, but no clear picture has emerged. This association has been reported as protective, neurotoxic, Ushaped curve, and insignificant in different sources. An individual’s alcohol usage is dynamic and could change over time, however, to our knowledge, only one study took this timevarying nature into account when assessing the association between alcohol intake and cognition. Using Framingham Heart Study (FHS) data, our work fills an important gap in that both alcohol use and dementia status were included into the analysis longitudinally. Furthermore, we incorporated a genderspecific categorization of alcohol consumption. In this study, we examined three aspects of the association: (1) Concurrent alcohol usage and dementia, longitudinally, (2) Past alcohol usage and later dementia, (3) Cumulative alcohol usage and dementia. The data consisted of 2,192 FHS participants who took Exams 1723 during 19811996, which included dementia assessment, and had complete data on alcohol use (mean followup = 40 years) and key covariates. Cognitive status was determined using information from the MiniMental State Examinations (MMSE) and the examiner’s assessment. Alcohol consumption was determined in oz/week and also categorized as none, moderate and heavy. We investigated both total alcohol consumption and consumption by type of alcoholic beverage. Results showed that the association between alcohol and dementia may differ by gender and by alcoholic type.
Show less  Date Issued
 2018
 Identifier
 2018_Su_Hu_fsu_0071E_14330
 Format
 Thesis
 Title
 A Study of Some Issues of GoodnessofFit Tests for Logistic Regression.
 Creator

Ma, Wei, McGee, Daniel, Mai, Qing, Levenson, Cathy W., Niu, Xufeng, Florida State University, College of Arts and Sciences, Department of Statistics
 Abstract/Description

Goodnessoffit tests are important to assess how well a model fits a set of observations. HosmerLemeshow (HL) test is a popular and commonly used method to assess the goodnessoffit for logistic regression. However, there are two issues for using HL test. One of them is that we have to specify the number of partition groups and the different groups often suggest the different decisions. So in this study, we propose several grouping tests to combine multiple HL tests with varying the number...
Show moreGoodnessoffit tests are important to assess how well a model fits a set of observations. HosmerLemeshow (HL) test is a popular and commonly used method to assess the goodnessoffit for logistic regression. However, there are two issues for using HL test. One of them is that we have to specify the number of partition groups and the different groups often suggest the different decisions. So in this study, we propose several grouping tests to combine multiple HL tests with varying the number of groups to make the decision instead of just using one arbitrary group or finding the optimum group. This is due to the reason that the best selection for the groups is datadependent and it is not easy to find. The other drawback of HL test is that it is not powerful to detect the violation of missing interactions between continuous and dichotomous covariates. Therefore, we propose global and interaction tests in order to capture such violations. Simulation studies are carried out to assess the Type I errors and powers for all the proposed tests. These tests are illustrated by the bone mineral density data from NHANES III.
Show less  Date Issued
 2018
 Identifier
 2018_Su_Ma_fsu_0071E_14681
 Format
 Thesis
 Title
 AP Student Visual Preferences for Problem Solving.
 Creator

Swoyer, Liesl, Department of Statistics
 Abstract/Description

The purpose of this study is to explore the mathematical preference of high school AP Calculus students by examining their tendencies for using differing methods of thought. A student's preferred mode of thinking was measured on a scale ranging from a preference for analytical thought to a preference for visual thought as they completed derivative and antiderivative tasks presented both algebraically and graphically. This relates to previous studies by continuing to analyze the factors that...
Show moreThe purpose of this study is to explore the mathematical preference of high school AP Calculus students by examining their tendencies for using differing methods of thought. A student's preferred mode of thinking was measured on a scale ranging from a preference for analytical thought to a preference for visual thought as they completed derivative and antiderivative tasks presented both algebraically and graphically. This relates to previous studies by continuing to analyze the factors that have been found to mediate the students' performance and preference in regards to a variety of calculus tasks. Data was collected by Dr. Erhan Haciomeroglu at the University of Central Florida. Students' preferences were not affected by gender. Students were found to approach graphical and algebraic tasks similarly, without any significant change with regards to derivative or antiderivative nature of the tasks. Highly analytic and highly visual students revealed the same proportion of change in visuality as harmonic students when more difficult calculus tasks were encountered. Thus, a strong preference for visual thinking when completing algebraic tasks was not the determining factor of their preferred method of thinking when approaching graphical tasks.
Show less  Date Issued
 2012
 Identifier
 FSU_migr_uhm0052
 Format
 Thesis
 Title
 Elastic Functional Principal Component Analysis for Modeling and Testing of Functional Data.
 Creator

Duncan, Megan, Srivastava, Anuj, Klassen, E., Huffer, Fred W., Wu, Wei, Florida State University, College of Arts and Sciences, Department of Statistics
 Abstract/Description

Statistical analysis of functional data requires tools for comparing, summarizing and modeling observed functions as elements of a function space. A key issue in Functional Data Analysis (FDA) is the presence of the phase variability in the observed data. A successful statistical model of functional data has to account for the presence of phase variability. Otherwise the ensuing inferences can be inferior. Recent methods for FDA include steps for phase separation or functional alignment. For...
Show moreStatistical analysis of functional data requires tools for comparing, summarizing and modeling observed functions as elements of a function space. A key issue in Functional Data Analysis (FDA) is the presence of the phase variability in the observed data. A successful statistical model of functional data has to account for the presence of phase variability. Otherwise the ensuing inferences can be inferior. Recent methods for FDA include steps for phase separation or functional alignment. For example, Elastic Functional Principal Component Analysis (Elastic FPCA) uses the strengths of Functional Principal Component Analysis (FPCA), along with the tools from Elastic FDA, to perform joint phaseamplitude separation and modeling. A related problem in FDA is to quantify and test for the amount of phase in a given data. We develop two types of hypothesis tests for testing the significance of phase variability: a metricbased approach and a modelbased approach. The metricbased approach treats phase and amplitude as independent components and uses their respective metrics to apply the FriedmanRafsky Test, Schilling's Nearest Neighbors, and Energy Test to test the differences between functions and their amplitudes. In the modelbased test, we use Concordance Correlation Coefficients as a tool to quantify the agreement between functions and their reconstructions using FPCA and Elastic FPCA. We demonstrate this framework using a number of simulated and real data, including weather, tecator, and growth data.
Show less  Date Issued
 2018
 Identifier
 2018_Sp_Duncan_fsu_0071E_14470
 Format
 Thesis
 Title
 Elastic Functional Regression Model.
 Creator

Ahn, Kyungmin, Srivastava, Anuj, Klassen, E., Wu, Wei, Huffer, Fred W., Florida State University, College of Arts and Sciences, Department of Statistics
 Abstract/Description

Functional variables serve important roles as predictors in a variety of pattern recognition and vision applications. Focusing on a specific subproblem, termed scalaronfunction regression, most current approaches adopt the standard L2 inner product to form a link between functional predictors and scalar responses. These methods may perform poorly when predictor functions contain nuisance phase variability, i.e., predictors are temporally misaligned due to noise. While a simple solution...
Show moreFunctional variables serve important roles as predictors in a variety of pattern recognition and vision applications. Focusing on a specific subproblem, termed scalaronfunction regression, most current approaches adopt the standard L2 inner product to form a link between functional predictors and scalar responses. These methods may perform poorly when predictor functions contain nuisance phase variability, i.e., predictors are temporally misaligned due to noise. While a simple solution could be to prealign predictors as a preprocessing step, before applying a regression model, this alignment is seldom optimal from the perspective of regression. In this dissertation, we propose a new approach, termed elastic functional regression, where alignment is included in the regression model itself, and is performed in conjunction with the estimation of other model parameters. This model is based on a normpreserving warping of predictors, not the standard time warping of functions, and provides better prediction in situations where the shape or the amplitude of the predictor is more useful than its phase. We demonstrate the effectiveness of this framework using simulated and real data.
Show less  Date Issued
 2018
 Identifier
 2018_Sp_Ahn_fsu_0071E_14452
 Format
 Thesis
 Title
 Building a Model Performance Measure for Examining Clinical Relevance Using Net Benefit Curves.
 Creator

Mukherjee, Anwesha, McGee, Daniel, Hurt, Myra M., Slate, Elizabeth H., Sinha, Debajyoti, Florida State University, College of Arts and Sciences, Department of Statistics
 Abstract/Description

ROC curves are often used to evaluate predictive accuracy of statistical prediction models. This thesis studies other measures which not only incorporate the statistical but also the clinical consequences of using a particular prediction model. Depending on the disease and population under study, the misclassification costs of false positives and false negatives vary. The concept of Decision Curve Analysis (DCA) takes this cost into account, by using the threshold probability (the...
Show moreROC curves are often used to evaluate predictive accuracy of statistical prediction models. This thesis studies other measures which not only incorporate the statistical but also the clinical consequences of using a particular prediction model. Depending on the disease and population under study, the misclassification costs of false positives and false negatives vary. The concept of Decision Curve Analysis (DCA) takes this cost into account, by using the threshold probability (the probability above which a patient opts for treatment). Using the DCA technique, a Net Benefit Curve is built by plotting "Net Benefit", a function of the expected benefit and expected harm of using a model, by the threshold probability. Only the threshold probability range that is relevant to the disease and the population under study is used to plot the net benefit curve to obtain the optimum results using a particular statistical model. This thesis concentrates on the process of construction of a summary measure to find which predictive model yields highest net benefit. The most intuitive approach is to calculate the area under the net benefit curve. We examined whether the use of weights such as, the estimated empirical distribution of the threshold probability to compute the weighted area under the curve, creates a better summary measure. Real data from multiple cardiovascular research studies The Diverse Population Collaboration (DPC) datasets, is used to compute the summary measures: area under the ROC curve (AUROC), area under the net benefit curve (ANBC) and weighted area under the net benefit curve (WANBC). The results from the analysis are used to compare these measures to examine whether these measures are in agreement with each other and which would be the best to use in specified clinical scenarios. For different models the summary measures and its standard errors (SE) were calculated to study the variability in the measure. The method of metaanalysis is used to summarize these estimated summary measures to reveal if there is significant variability among these studies.
Show less  Date Issued
 2018
 Identifier
 2018_Sp_Mukherjee_fsu_0071E_14350
 Format
 Thesis
 Title
 NonParametric and SemiParametric Estimation and Inference with Applications to Finance and Bioinformatics.
 Creator

Tran, Hoang Trong, She, Yiyuan, Ökten, Giray, Chicken, Eric, Niu, Xufeng, Tao, Minjing, Florida State University, College of Arts and Sciences, Department of Statistics
 Abstract/Description

In this dissertation, we develop tools from nonparametric and semiparametric statistics to perform estimation and inference. In the first chapter, we propose a new method called NonParametric Outlier Identification and Smoothing (NOIS), which robustly smooths stock prices, automatically detects outliers and constructs pointwise confidence bands around the resulting curves. In real world examples of highfrequency data, NOIS successfully detects erroneous prices as outliers and uncovers...
Show moreIn this dissertation, we develop tools from nonparametric and semiparametric statistics to perform estimation and inference. In the first chapter, we propose a new method called NonParametric Outlier Identification and Smoothing (NOIS), which robustly smooths stock prices, automatically detects outliers and constructs pointwise confidence bands around the resulting curves. In real world examples of highfrequency data, NOIS successfully detects erroneous prices as outliers and uncovers borderline cases for further study. NOIS can also highlight notable features and reveal new insights in interday chart patterns. In the second chapter, we focus on a method for nonparametric inference called empirical likelihood (EL). Computation of EL in the case of a fixed parameter vector is a convex optimization problem easily solved by Lagrange multipliers. In the case of a composite empirical likelihood (CEL) test where certain components of the parameter vector are free to vary, the optimization problem becomes nonconvex and much more difficult. We propose a new algorithm for the CEL problem named the BILinear Algorithm for Composite EmPirical Likelihood (BICEP). We extend the BICEP framework by introducing a new method called Robust Empirical Likelihood (REL) that detects outliers and greatly improves the inference in comparison to the nonrobust EL. The REL method is combined with CEL by the TRILinear Algorithm for Composite EmPirical Likelihood (TRICEP). We demonstrate the efficacy of the proposed methods on simulated and real world datasets. We present a novel semiparametric method for variable selection with interesting biological applications in the final chapter. In bioinformatics datasets the experimental units often have structured relationships that are nonlinear and hierarchical. For example, in microbiome data the individual taxonomic units are connected to each other through a phylogenetic tree. Conventional techniques for selecting relevant taxa either do not account for the pairwise dependencies between taxa, or assume linear relationships. In this work we propose a new framework for variable selection called SemiParametric Affinity Based Selection (SPAS), which has the flexibility to utilize struc tured and nonparametric relationships between variables. In synthetic data experiments SPAS outperforms existing methods and on real world microbiome datasets it selects taxa according to their phylogenetic similarities.
Show less  Date Issued
 2018
 Identifier
 2018_Sp_Tran_fsu_0071E_14477
 Format
 Thesis
 Title
 Bayesian Analysis of Survival Data with Missing Censoring Indicators and Simulation of Interval Censored Data.
 Creator

Bunn, Veronica, Sinha, Debajyoti, Brownstein, Naomi Chana, Slate, Elizabeth H., Linero, Antonio Ricardo, Florida State University, College of Arts and Sciences, Department of...
Show moreBunn, Veronica, Sinha, Debajyoti, Brownstein, Naomi Chana, Slate, Elizabeth H., Linero, Antonio Ricardo, Florida State University, College of Arts and Sciences, Department of Statistics
Show less  Abstract/Description

In some large clinical studies, it may be impractical to give physical examinations to every subject at his/her last monitoring time in order to diagnose the occurrence of an event of interest. This challenge creates survival data with missing censoring indicators where the probability of missing may depend on time of last monitoring. We present a fully Bayesian semiparametric method for such survival data to estimate regression parameters of Cox's proportional hazards model [Cox, 1972]....
Show moreIn some large clinical studies, it may be impractical to give physical examinations to every subject at his/her last monitoring time in order to diagnose the occurrence of an event of interest. This challenge creates survival data with missing censoring indicators where the probability of missing may depend on time of last monitoring. We present a fully Bayesian semiparametric method for such survival data to estimate regression parameters of Cox's proportional hazards model [Cox, 1972]. Simulation studies show that our method performs better than competing methods. We apply the proposed method to data from the Orofacial Pain: Prospective Evaluation and Risk Assessment (OPPERA) study. Clinical studies often include interval censored data. We present a method for the simulation of interval censored data based on Poisson processes. We show that our method gives simulated data that fulfills the assumption of independent interval censoring, and is more computationally efficient that other methods used for simulating interval censored data.
Show less  Date Issued
 2018
 Identifier
 2018_Su_Bunn_fsu_0071E_14742
 Format
 Thesis
 Title
 Generalized Mahalanobis Depth in Point Process and Its Application in Neural Coding and SemiSupervised Learning in Bioinformatics.
 Creator

Liu, Shuyi, Wu, Wei, Wang, Xiaoqiang, Zhang, Jinfeng, Mai, Qing, Florida State University, College of Arts and Sciences, Department of Statistics
 Abstract/Description

In the first project, we propose to generalize the notion of depth in temporal point process observations. The new depth is defined as a weighted product of two probability terms: 1) the number of events in each process, and 2) the centeroutward ranking on the event times conditioned on the number of events. In this study, we adopt the Poisson distribution for the first term and the Mahalanobis depth for the second term. We propose an efficient bootstrapping approach to estimate parameters...
Show moreIn the first project, we propose to generalize the notion of depth in temporal point process observations. The new depth is defined as a weighted product of two probability terms: 1) the number of events in each process, and 2) the centeroutward ranking on the event times conditioned on the number of events. In this study, we adopt the Poisson distribution for the first term and the Mahalanobis depth for the second term. We propose an efficient bootstrapping approach to estimate parameters in the defined depth. In the case of Poisson process, the observed events are order statistics where the parameters can be estimated robustly with respect to sample size. We demonstrate the use of the new depth by ranking realizations from a Poisson process. We also test the new method in classification problems using simulations as well as real neural spike train data. It is found that the new framework provides more accurate and robust classifications as compared to commonly used likelihood methods. In the second project, we demonstrate the value of semisupervised dimension reduction in clinical area. The advantage of semisupervised dimension reduction is very easy to understand. SemiSupervised dimension reduction method adopts the unlabeled data information to perform dimension reduction and it can be applied to help build a more precise prediction model comparing with common supervised dimension reduction techniques. After thoroughly comparing with dimension embedding methods with label data only, we show the improvement of semisupervised dimension reduction with unlabeled data in breast cancer chemotherapy clinical area. In our semisupervised dimension reduction method, we not only explore adding unlabeled data to linear dimension reduction such as PCA, we also explore semisupervised nonlinear dimension reduction, such as semisupervised LLE and semisupervised Isomap.
Show less  Date Issued
 2018
 Identifier
 2018_Sp_Liu_fsu_0071E_14367
 Format
 Thesis
 Title
 Volatility Matrix Estimation for HighFrequency Financial Data.
 Creator

Xue, Yang, Tao, Minjing, Cheng, Yingmei, Fendler, Rachel Loveitt, Huffer, Fred W., Niu, Xufeng, Florida State University, College of Arts and Sciences, Department of Statistics
 Abstract/Description

Volatility is usually employed to measure the dispersion of asset returns, and it’s widely used in risk analysis and asset management. This first chapter studies a kernelbased spot volatility matrix estimator with preaveraging approach for highfrequency data contaminated by market microstructure noise. When the sample size goes to infinity and the bandwidth vanishes, we show that our estimator is consistent and its asymptotic normality is established with achieving an optimal convergence...
Show moreVolatility is usually employed to measure the dispersion of asset returns, and it’s widely used in risk analysis and asset management. This first chapter studies a kernelbased spot volatility matrix estimator with preaveraging approach for highfrequency data contaminated by market microstructure noise. When the sample size goes to infinity and the bandwidth vanishes, we show that our estimator is consistent and its asymptotic normality is established with achieving an optimal convergence rate. We also construct a consistent pairwise spot covolatility estimator with HayashiYoshida method for nonsynchronous highfrequency data with noise contamination. The simulation studies demonstrate that the proposed estimators work well under different noise levels, and their estimation performances are improved by the increasing sample frequency. In empirical applications, we implement the estimators on the intraday prices of four component stocks of Dow Jones Industrial Average. The second chapter shows a factorbased vast volatility matrix estimation method for high frequency financial data with market microstructure noise, finite large jumps and infinite activity small jumps. We construct the sample volatility matrix estimator based on the approximate factor model, and use the preaveraging and thresholding estimation method (PATH) to digest the noise and jumps. After using the principle component analysis (PCA) to decompose the sample volatility matrix estimator, our proposed volatility matrix estimator is finally obtained by imposing the blockdiagonal regularization on the residual covariance matrix through sorting the assets with the global industry classification standard (GICS) codes. The Monte Carlo simulation shows that our proposed volatility matrix estimator can remove the majority effects of noise and jumps, and its estimation performance improves fast when the sample frequency increases. Finally, the PCAbased estimators are employed to perform volatility matrix estimation and asset allocation for S&P 500 stocks. To compare with PCAbased estimators, we also include the exchangetraded funds (ETFs) data to construct observable factors such as the FamaFrench factors for volatility estimation.
Show less  Date Issued
 2018
 Identifier
 2018_Sp_Xue_fsu_0071E_14471
 Format
 Thesis
 Title
 WaveletBased Bayesian Approaches to Sequential Profile Monitoring.
 Creator

Varbanov, Roumen, Chicken, Eric, Linero, Antonio Ricardo, Huffenberger, Kevin M., Yang, Yanyun, Florida State University, College of Arts and Sciences, Department of Statistics
 Abstract/Description

We consider changepoint detection and estimation in sequences of functional observations. This setting often arises when the quality of a process is characterized by such observations, termed profiles, and monitoring profiles for changes in structure can be used to ensure the stability of the process over time. While interest in profile monitoring has grown, few methods approach the problem from a Bayesian perspective. In this dissertation, we propose three waveletbased Bayesian approaches...
Show moreWe consider changepoint detection and estimation in sequences of functional observations. This setting often arises when the quality of a process is characterized by such observations, termed profiles, and monitoring profiles for changes in structure can be used to ensure the stability of the process over time. While interest in profile monitoring has grown, few methods approach the problem from a Bayesian perspective. In this dissertation, we propose three waveletbased Bayesian approaches to profile monitoring  the last of which can be extended to a general process monitoring setting. First, we develop a general framework for the problem of interest in which we base inference on the posterior distribution of the change point without placing restrictive assumptions on the form of profiles. The proposed method uses an analytic form of the posterior distribution in order to run online without relying on Markov chain Monte Carlo (MCMC) simulation. Wavelets, an effective tool for estimating nonlinear signals from noisecontaminated observations, enable the method to flexibly distinguish between sustained changes in profiles and the inherent variability of the process. Second, we modify the initial framework in a posterior approximation algorithm designed to utilize past information in a computationally efficient manner. We show that the approximation can detect changes of smaller magnitude better than traditional alternatives for curbing computational cost. Third, we introduce a monitoring scheme that allows an unchanged process to run infinitely long without a false alarm; the scheme maintains the ability to detect a change with probability one. We include theoretical results regarding these properties and illustrate the implementation of the scheme in the previously established framework. We demonstrate the efficacy of proposed methods on simulated data and significantly outperform a relevant frequentist competitor.
Show less  Date Issued
 2018
 Identifier
 2018_Sp_Varbanov_fsu_0071E_14513
 Format
 Thesis
 Title
 Tests and Classifications in Adaptive Designs with Applications.
 Creator

Chen, Qiusheng, Niu, Xufeng, McGee, Daniel, Slate, Elizabeth H., Zhang, Jinfeng, Florida State University, College of Arts and Sciences, Department of Statistics
 Abstract/Description

Statistical tests for biomarker identification and classification methods for patient grouping are two important topics in adaptive designs of clinical trials. In this article, we evaluate four test methods for biomarker identification: a modelbased identification method, the popular ttest, the nonparametric Wilcoxon Rank Sum test, and the Least Absolute Shrinkage and Selection Operator (Lasso) method. For selecting the best classification methods in Stage 2 of an adaptive design, we...
Show moreStatistical tests for biomarker identification and classification methods for patient grouping are two important topics in adaptive designs of clinical trials. In this article, we evaluate four test methods for biomarker identification: a modelbased identification method, the popular ttest, the nonparametric Wilcoxon Rank Sum test, and the Least Absolute Shrinkage and Selection Operator (Lasso) method. For selecting the best classification methods in Stage 2 of an adaptive design, we examine classification methods including the recently developed machine learning approaches such as Random Forest, Lasso and ElasticNet Regularized Generalized Linear Models (Glmnet), Support Vector Machine (SVM), Gradient Boosting Machine (GBM), and Extreme Gradient Boost ing (XGBoost). Statistical simulations are carried out in our study to assess the performance of biomarker identification methods and the classification methods. The best identification method and the classification technique will be selected based on the True Positive Rate (TPR,also called Sensitivity) and the True Negative Rate (TNR,also called Specificity). The optimal test method for gene identification and classification method for patient grouping will be applied to the Adap tive Signature Design (ASD) for the purpose of evaluating the performance of ASD in different situations, including simulated data and a real data set for breast cancer patients.
Show less  Date Issued
 2018
 Identifier
 2018_Sp_Chen_fsu_0071E_14309
 Format
 Thesis
 Title
 Statistical Shape Analysis of Neuronal Tree Structures.
 Creator

Duncan, Adam, Srivastava, Anuj, Klassen, E., Wu, Wei, Huffer, Fred W., Florida State University, College of Arts and Sciences, Department of Statistics
 Abstract/Description

Neuron morphology plays a central role in characterizing cognitive health and functionality of brain structures. The problem of quantifying neuron shapes, and capturing statistical variability of shapes, is difficult because axons and dendrites have tree structures that differ in both geometry and topology. In this work, we restrict to the trees that consist of: (1) a main branch viewed as a parameterized curve in ℝ³, and (2) some number of secondary branches  also parameterized curves in...
Show moreNeuron morphology plays a central role in characterizing cognitive health and functionality of brain structures. The problem of quantifying neuron shapes, and capturing statistical variability of shapes, is difficult because axons and dendrites have tree structures that differ in both geometry and topology. In this work, we restrict to the trees that consist of: (1) a main branch viewed as a parameterized curve in ℝ³, and (2) some number of secondary branches  also parameterized curves in ℝ³  which emanate from the main branch at arbitrary points. We present two shapeanalytic frameworks which each give a metric structure to the set of such tree shapes, Both frameworks are based on an elastic metric on the space of curves with certain shapepreserving nuisance variables modded out. In the first framework, the side branches are treated as a continuum of curvevalued annotations to the main branch. In the second framework, the side branches are treated as discrete entities and are matched to each other by permutation. We show geodesic deformations between tree shapes in both frameworks, and we show Fréchet means and modes of variability, as well as crossvalidated classification between different experimental groups using the second framework. We conclude with a smaller project which extends some of these ideas to more general weighted attributed graphs.
Show less  Date Issued
 2018
 Identifier
 2018_Sp_Duncan_fsu_0071E_14500
 Format
 Thesis
 Title
 Two Studies on the Application of Machine Learning for Biomedical Big Data.
 Creator

Lung, PeiYau, Zhang, Jinfeng, Liu, Xiuwen, Barbu, Adrian G., Wu, Wei, Florida State University, College of Arts and Sciences, Department of Statistics
 Abstract/Description

Large volumes of genomic data and new scientific discoveries in biomedical research are being made every day by laboratories in both academia and industry. However, two issues severely affect the usability of socalled biomedical big data: 1) the majority of the public genomic data do not contain enough clinical information, and 2) scientific discoveries are stored in text as unstructured data. This dissertation presents two studies, which address each issue using machine learning methods, in...
Show moreLarge volumes of genomic data and new scientific discoveries in biomedical research are being made every day by laboratories in both academia and industry. However, two issues severely affect the usability of socalled biomedical big data: 1) the majority of the public genomic data do not contain enough clinical information, and 2) scientific discoveries are stored in text as unstructured data. This dissertation presents two studies, which address each issue using machine learning methods, in order to maximize the usability of biomedical big data. In the first study, we infer missing clinical information using multiple gene expression data sets and a wide variety of machine learning methods. We proposed a new performance measure, Proportion of Positives which can be predicted with High accuracy (PPH), to evaluate models in term of their effectiveness in recovering data with missing clinical information. PPH estimates the percentage of data that can be recovered given a desired level of accuracy. The experiment results demonstrate the effectiveness of the predicted clinical information in downstream inference tasks. In the second study, we propose a threestage computational method to automatically extract chemicalprotein interactions (CPIs) from a given text. Our method extracts CPIpairs and CPItriplets from sentences; where a CPIpair consists of a chemical compound and a protein name, and a CPItriplet consists of a CPIpair along with an interaction word describing their relationship. We extract a diverse set of features from sentences, which are used to build multiple machine learning models. Our models contain both simple features, which can be directly computed from sentences, and more sophisticated features derived using sentence structure analysis techniques. Our method performed the best among systems which use nondeeplearning methods, and outperformed several deeplearningbased systems in the track 5 of the BioCreative VI challenge. The features we designed in this study are informative and can be applied to other machine learning methods including deep learning.
Show less  Date Issued
 2019
 Identifier
 2019_Summer_Lung_fsu_0071E_15134
 Format
 Thesis
 Title
 Survival Analysis Using Bayesian Joint Models.
 Creator

Xu, Zhixing, Sinha, Debajyoti, Schatschneider, Christopher, Bradley, Jonathan R., Chicken, Eric, Lin, Lifeng, Florida State University, College of Arts and Sciences, Department...
Show moreXu, Zhixing, Sinha, Debajyoti, Schatschneider, Christopher, Bradley, Jonathan R., Chicken, Eric, Lin, Lifeng, Florida State University, College of Arts and Sciences, Department of Statistics
Show less  Abstract/Description

In many clinical studies, each patient is at risk of recurrent events as well as the terminating event. In Chapter 2, we present a novel latentclass based semiparametric joint model that offers clinically meaningful and estimable association between the recurrence profile and risk of termination. Unlike previous sharedfrailty based joint models, this model has a coherent interpretation of the covariate effects on all relevant functions and model quantities that are either conditional or...
Show moreIn many clinical studies, each patient is at risk of recurrent events as well as the terminating event. In Chapter 2, we present a novel latentclass based semiparametric joint model that offers clinically meaningful and estimable association between the recurrence profile and risk of termination. Unlike previous sharedfrailty based joint models, this model has a coherent interpretation of the covariate effects on all relevant functions and model quantities that are either conditional or unconditional on events history. We offer a fully Bayesian method for estimation and prediction using a complete specification of the prior process of the baseline functions. When there is a lack of prior information about the baseline functions, we derive a practical and theoretically justifiable partial likelihood based semiparametric Bayesian approach. Our Markov Chain Monte Carlo tools for both Bayesian methods are implementable via publicly available software. Practical advantages of our methods are illustrated via a simulation study and the analysis of a transplant study with recurrent NonFatal Graft Rejections (NFGR) and the termination event of death due to total graft rejection. In Chapter 3, we are motivated by the important problem of estimating Daily Fine Particulate Matter (PM2.5) over the US. Tracking and estimating Daily Fine Particulate Matter (PM2.5) is very important as it has been shown that PM2.5 is directly related to mortality related to the lungs, cardiovascular system, and stroke. That is, high values of PM2.5 constitute a public health problem in the US, and it is important that we precisely estimate PM2.5 to aid in public policy decisions. Thus, we propose a Bayesian hierarchical model for highdimensional ``multitype" responses. By ``multitype" responses we mean a collection of correlated responses that have different distributional assumptions (e.g., continuous skewed observations, and countvalued observations). The Centers for Disease Control and Prevention (CDC) database provides counts of mortalities related to PM2.5 and daily averaged PM2.5 which are treated as responses in our analysis. Our model capitalizes on the shared conjugate structure between the Weibull (to model PM2.5), Poisson (to model diseases mortalities), and multivariate loggamma distributions, and use dimension reduction to aid with computation. Our model can also be used to improve the precision of estimates and estimate at undisclosed/missing counties. We provide a simulation study to illustrate the performance of the model and give an indepth analysis of the CDC dataset.
Show less  Date Issued
 2019
 Identifier
 2019_Spring_Xu_fsu_0071E_15078
 Format
 Thesis
 Title
 Fused Lasso and Tensor Covariance Learning with Robust Estimation.
 Creator

Kunz, Matthew Ross, She, Yiyuan, Stiegman, Albert E., Mai, Qing, Chicken, Eric, Florida State University, College of Arts and Sciences, Department of Statistics
 Abstract/Description

With the increase in computation and data storage, there has been a vast collection of information gained with scientific measurement devices. However, with this increase in data and variety of domain applications, statistical methodology must be tailored to specific problems. This dissertation is focused on analyzing chemical information with an underlying structure. Robust fused lasso leverages information about the neighboring regression coefficient structure to create blocks of...
Show moreWith the increase in computation and data storage, there has been a vast collection of information gained with scientific measurement devices. However, with this increase in data and variety of domain applications, statistical methodology must be tailored to specific problems. This dissertation is focused on analyzing chemical information with an underlying structure. Robust fused lasso leverages information about the neighboring regression coefficient structure to create blocks of coefficients. Robust modifications are made to the mean to account for gross outliers in the data. This method is applied to near infrared spectral measurements in prediction of an aqueous analyte concentration and is shown to improve prediction accuracy. Expansion on the robust estimation and structure analysis is performed by examining graph structures within a clustered tensor. The tensor is subjected to wavelet smoothing and robust sparse precision matrix estimation for a detailed look into the covariance structure. This methodology is applied to catalytic kinetics data where the graph structure estimates the elementary steps within the reaction mechanism.
Show less  Date Issued
 2018
 Identifier
 2018_Fall_Kunz_fsu_0071E_14844
 Format
 Thesis
 Title
 Marked Determinantal Point Processes.
 Creator

Feng, Yiming, Nolder, Craig, Niu, Xufeng, Bradley, Jonathan R., Huffer, Fred W. (Fred William), Florida State University, College of Arts and Sciences, Department of Statistics
 Abstract/Description

Determinantal point processes (DPPs), which can be dened by their correlation kernels with known moments, are useful models for point patterns where nearby points exhibit repulsion. They have many nice properties, such as closedform densities, tractable estimation of parameterized families, and no edge eects. In the past, univariate DPPs have been wellstudied, both in discrete and continuous settings although their statistical applications are fairly recent and still rather limited, whereas...
Show moreDeterminantal point processes (DPPs), which can be dened by their correlation kernels with known moments, are useful models for point patterns where nearby points exhibit repulsion. They have many nice properties, such as closedform densities, tractable estimation of parameterized families, and no edge eects. In the past, univariate DPPs have been wellstudied, both in discrete and continuous settings although their statistical applications are fairly recent and still rather limited, whereas the multivariate DPPs, or the socalled multitype marked DPPs, have been little explored. In this thesis, we propose a class of multivariate DPPs based on a block kernel construction. For the marked DPP, we show that the conditions of existence of DPP can easily be satised. The block construction allows us to model the individually marked DPPs as well as controlling the scale of repulsion of points having dierent marks. Unlike other researchers who model the kernel function of a DPP, we model its spectral representation, which not only guarantees the existence of the multivariate DPP, but makes the simulationbased estimation methods readily available. In our research, we adopted bivariate complex Fourier basis, which demonstrates nice properties such as constant intensity and approximate isotropy within a short distance between the nearby points. The parameterized block kernels can approximate to commonlyused covariance functions using Fourier expansion. The parameters can be estimated using Maximum Likelihood Estimation, Bayesian approach and Minimum Contrast Estimation.
Show less  Date Issued
 2019
 Identifier
 2019_Spring_Feng_fsu_0071E_15011
 Format
 Thesis
 Title
 Bayesian Tractography Using Geometric Shape Priors.
 Creator

Dong, Xiaoming, Srivastava, Anuj, Klassen, E. (Eric), Wu, Wei, Huffer, Fred W. (Fred William), Florida State University, College of Arts and Sciences, Department of Statistics
 Abstract/Description

Diffusionweighted image(DWI) and tractography have been developed for decades and are key elements in recent, largescale efforts for mapping the human brain. The two techniques together provide us a unique possibility to access the macroscopic structure and connectivity of the human brain noninvasively and in vivo. The information obtained not only can help visualize brain connectivity and help segment the brain into different functional areas but also provides tools for understanding some...
Show moreDiffusionweighted image(DWI) and tractography have been developed for decades and are key elements in recent, largescale efforts for mapping the human brain. The two techniques together provide us a unique possibility to access the macroscopic structure and connectivity of the human brain noninvasively and in vivo. The information obtained not only can help visualize brain connectivity and help segment the brain into different functional areas but also provides tools for understanding some major cognitive diseases such as multiple sclerosis, schizophrenia, epilepsy, etc. There are lots of efforts have been put into this area. On the one hand, a vast spectrum of tractography algorithms have been developed in recent years, ranging from deterministic approaches through probabilistic methods to global tractography; On the other hand, various mathematical models, such as diffusion tensor, multitensor model, spherical deconvolution, Qball modeling, have been developed to better exploit the acquisition dependent signal of Diffusionweighted image(DWI). Despite considerable progress in this area, current methods still face many challenges, such as sensitive to noise, lots of false positive/negative fibers, incapable of handling complex fiber geometry and expensive computation cost. More importantly, recent researches have shown that, even with highquality data, the results using current tractography methods may not be improved, suggesting that it is unlikely to obtain an anatomically accurate map of the human brain solely based on the diffusion profile. Motivated by these issues, this dissertation develops a global approach that incorporates anatomical validated geometric shape prior when reconstructing neuron fibers. The fiber tracts between regions of interest are initialized and updated via deformations based on gradients of the posterior energy defined in this paper. This energy has contributions from diffusion data, shape prior information, and roughness penalty. The dissertation first describes and demonstrates the proposed method on the 2D dataset and then extends it to 3D Phantom data and the real brain data. The results show that the proposed method is relatively immune to issues such as noise, complicated fiber structure like fiber crossings and kissing, false positive fibers, and achieve more explainable tractography results.
Show less  Date Issued
 2019
 Identifier
 2019_Spring_DONG_fsu_0071E_15144
 Format
 Thesis
 Title
 Envelopes, Subspace Learning and Applications.
 Creator

Wang, Wenjing, Zhang, Xin, Tao, Minjing, Li, Wen, Huffer, Fred W. (Fred William), Florida State University, College of Arts and Sciences, Department of Statistics
 Abstract/Description

Envelope model is a nascent dimension reduction technique. We focus on extending the envelope methodology to broader applications. In the first part of this thesis we propose a common reducing subspace model that can simultaneously estimating covariance, precision matrices and their differences across multiple populations. This model leads to substantial dimension reduction and efficient parameter estimation. We explicitly quantify the efficiency gain through an asymptotic analysis. In the...
Show moreEnvelope model is a nascent dimension reduction technique. We focus on extending the envelope methodology to broader applications. In the first part of this thesis we propose a common reducing subspace model that can simultaneously estimating covariance, precision matrices and their differences across multiple populations. This model leads to substantial dimension reduction and efficient parameter estimation. We explicitly quantify the efficiency gain through an asymptotic analysis. In the second part, we propose a set of new mixture models called CLEMM (Clustering with Envelope Mixture Models) that is based on the widely used Gaussian mixture model assumptions. The proposed CLEMM framework and the associated envelopeEM algorithms provides the foundations for envelope methodology in unsupervised and semisupervised learning problems. We also illustrate the performance of these models with simulation studies and empirical applications. Also, we have extended the envelope discriminant analysis from vector data to tensor data in the third part of this thesis. Another study on copulabased models for forecasting realized volatility matrix is included, which is an important financial application of estimating covariance matrices. We consider multivariatet, Clayton, and bivariate t, Gumbel, Clayton copulas to model and forecast oneday ahead realized volatility matrices. Empirical results show that copula based models can achieve significant performance both in terms of statistical precision and economical efficiency.
Show less  Date Issued
 2019
 Identifier
 2019_Spring_Wang_fsu_0071E_15085
 Format
 Thesis
 Title
 A Bayesian Semiparametric Joint Model for Longitudinal and Survival Data.
 Creator

Wang, Pengpeng, Slate, Elizabeth H., Bradley, Jonathan R., Wetherby, Amy M., Lin, Lifeng, Florida State University, College of Arts and Sciences, Department of Statistics
 Abstract/Description

Many biomedical studies monitor both a longitudinal marker and a survival time on each subject under study. Modeling these two endpoints as joint responses has potential to improve the inference for both. We consider the approach of Brown and Ibrahim (2003) that proposes a Bayesian hierarchical semiparametric joint model. The model links the longitudinal and survival outcomes by incorporating the mean longitudinal trajectory as a predictor for the survival time. The usual parametric mixed...
Show moreMany biomedical studies monitor both a longitudinal marker and a survival time on each subject under study. Modeling these two endpoints as joint responses has potential to improve the inference for both. We consider the approach of Brown and Ibrahim (2003) that proposes a Bayesian hierarchical semiparametric joint model. The model links the longitudinal and survival outcomes by incorporating the mean longitudinal trajectory as a predictor for the survival time. The usual parametric mixed effects model for the longitudinal trajectory is relaxed by using a Dirichlet process prior on the coefficients. A Cox proportional hazards model is then used for the survival time. The complicated joint likelihood increases the computational complexity. We develop a computationally efficient method by using a multivariate loggamma distribution instead of Gaussian distribution to model the data. We use Gibbs sampling combined with Neal's algorithm (2000) and the MetropolisHastings method for inference. Simulation studies illustrate the procedure and compare this loggamma joint model with the Gaussian joint models. We apply this joint modeling method to a human immunodeciency virus (HIV) data and a prostatespecific antigen (PSA) data.
Show less  Date Issued
 2019
 Identifier
 2019_Spring_Wang_fsu_0071E_15120
 Format
 Thesis
 Title
 HighDimensional Statistical Methods for Tensor Data and Efficient Algorithms.
 Creator

Pan, Yuqing, Mai, Qing, Zhang, Xin, Yu, Weikuan, Slate, Elizabeth H., Florida State University, College of Arts and Sciences, Department of Statistics
 Abstract/Description

In contemporary sciences, it is of great interest to study supervised and unsupervised learning problems of highdimensional tensor data. In this dissertation, we develop new methods for tensor classification and clustering problems, and discuss algorithms to enhance their performance. For supervised learning, we propose CATCH model, in short for CovariateAdjusted Tensor Classification in Highdimensions, which efficiently integrates the lowdimensional covariates and the tensor to perform...
Show moreIn contemporary sciences, it is of great interest to study supervised and unsupervised learning problems of highdimensional tensor data. In this dissertation, we develop new methods for tensor classification and clustering problems, and discuss algorithms to enhance their performance. For supervised learning, we propose CATCH model, in short for CovariateAdjusted Tensor Classification in Highdimensions, which efficiently integrates the lowdimensional covariates and the tensor to perform classification and variable selection. The CATCH model preserves and utilizes the structures of the data for maximum interpretability and optimal prediction. We propose a penalized approach to select a subset of tensor predictor entries that has direct discriminative effects after adjusting for covariates. Theoretical results confirm that our approach achieves variable selection consistency and optimal classification accuracy. For unsupervised learning, we consider clustering problem on highdimensional tensor data. we propose an efficient procedure based on EM algorithm. It directly estimates the sparse discriminant vector from a penalized objective function and provides computationally efficient rules to update all other parameters. Meanwhile, the algorithm takes advantage of the tensor structure to reduce the number of parameters, which leads to lower storage costs. The performance of our method over existing methods is demonstrated in simulated and real data examples. Moreover, based on tensor computation, we propose a novel algorithm referred to as the SMORE algorithm for differential network analysis. The SMORE algorithm has low storage cost and high computation speed, especially in the presence of strong sparsity. It also provides a unified framework for binary and multiple network problems. In addition, we note that the SMORE algorithm can be applied to highdimensional quadratic discriminant analysis problems, providing a new approach for multiclass highdimensional quadratic discriminant analysis. In the end, we discuss some directions of the future work, including new approaches, applications and relaxing assumptions.
Show less  Date Issued
 2019
 Identifier
 2019_Spring_Pan_fsu_0071E_15135
 Format
 Thesis
 Title
 Univariate and Multivariate Volatility Models for Portfolio Value at Risk.
 Creator

Xiao, Jingyi, Niu, Xufeng, Ökten, Giray, Wu, Wei, Huffer, Fred W. (Fred William), Florida State University, College of Arts and Sciences, Department of Statistics
 Abstract/Description

In modern day financial risk management, modeling and forecasting stock return movements via their conditional volatilities, particularly predicting the Value at Risk (VaR), became increasingly more important for a healthy economical environment. In this dissertation, we evaluate and compare two main families of models for the conditional volatilities  GARCH and Stochastic Volatility (SV)  in terms of their VaR prediction performance of 5 major US stock indices. We calculate GARCHtype...
Show moreIn modern day financial risk management, modeling and forecasting stock return movements via their conditional volatilities, particularly predicting the Value at Risk (VaR), became increasingly more important for a healthy economical environment. In this dissertation, we evaluate and compare two main families of models for the conditional volatilities  GARCH and Stochastic Volatility (SV)  in terms of their VaR prediction performance of 5 major US stock indices. We calculate GARCHtype model parameters via Quasi Maximum Likelihood Estimation (QMLE) while for those of SV we employ MCMC with Ancillary Sufficient Interweaving Strategy. We use the forecast volatilities corresponding to each model to predict the VaR of the 5 indices. We test the predictive performances of the estimated models by a twostage backtesting procedure and then compare them via the Lopez loss function. Results of this dissertation indicate that even though it is more computational demanding than GARCHtype models, SV dominates them in forecasting VaR. Since financial volatilities are moving together across assets and markets, it becomes apparent that modeling the volatilities in a multivariate framework of modeling is more appropriate. However, existing studies in the literature do not present compelling evidence for a strong preference between univariate and multivariate models. In this dissertation we also address the problem of forecasting portfolio VaR via multivariate GARCH models versus univariate GARCH models. We construct 3 portfolios with stock returns of 3 major US stock indices, 6 major banks and 6 major technical companies respectively. For each portfolio, we model the portfolio conditional covariances with GARCH, EGARCH and MGARCHBEKK, MGARCHDCC, and GOGARCH models. For each estimated model, the forecast portfolio volatilities are further used to calculate (portfolio) VaR. The ability to capture the portfolio volatilities is evaluated by MAE and RMSE; the VaR prediction performance is tested through a twostage backtesting procedure and compared in terms of the loss function. The results of our study indicate that even though MGARCH models are better in predicting the volatilities of some portfolios, GARCH models could perform as well as their multivariate (and computationally more demanding) counterparts.
Show less  Date Issued
 2019
 Identifier
 2019_Spring_Xiao_fsu_0071E_15172
 Format
 Thesis
 Title
 Online Feature Selection with Annealing and Its Applications.
 Creator

Sun, Lizhe, Barbu, Adrian G., Kumar, Piyush, She, Yiyuan, Linero, Antonio Ricardo, Florida State University, College of Arts and Sciences, Department of Statistics
 Abstract/Description

Feature selection is an important technique for high dimensional statistics and machine learning. It has many applications in computer vision, natural language processing, bioinformatics, etc. However, most of the feature selection methods in the literature are proposed for offline learning while the existing online feature selection methods have limitations in true feature recovery. In this dissertation, we propose some novel online feature selection methods and a framework. One is called...
Show moreFeature selection is an important technique for high dimensional statistics and machine learning. It has many applications in computer vision, natural language processing, bioinformatics, etc. However, most of the feature selection methods in the literature are proposed for offline learning while the existing online feature selection methods have limitations in true feature recovery. In this dissertation, we propose some novel online feature selection methods and a framework. One is called Stochastic Feature Selection with Annealing, and the other one is the framework of running averages. Based on the methods and the framework we developed, we can recover the support of the true features with higher accuracy. We provide a theoretical analysis, and through simulations and experiments on real sparse datasets, we show that our proposed methods compare favorably with some stateoftheart online methods in the literature.
Show less  Date Issued
 2019
 Identifier
 2019_Summer_Sun_fsu_0071E_15253
 Format
 Thesis
 Title
 Shape Based Function Estimation.
 Creator

Dasgupta, Sutanoy, Srivastava, Anuj, Pati, Debdeep, Klassen, E. (Eric), Huffer, Fred W. (Fred William), Wu, Wei, Florida State University, College of Arts and Sciences,...
Show moreDasgupta, Sutanoy, Srivastava, Anuj, Pati, Debdeep, Klassen, E. (Eric), Huffer, Fred W. (Fred William), Wu, Wei, Florida State University, College of Arts and Sciences, Department of Statistics
Show less  Abstract/Description

Estimation of functions is an extremely rich and wellresearched topic of research with broad applications spanning several scientific fields. We develop a shape based framework for probability density and general function modelling. The framework encompasses both shape constrained and unconstrained estimation, and can accomodate a much broader notion of shape constraints than has been considered in literature. The estimation approach is a two step process where the first step creates a...
Show moreEstimation of functions is an extremely rich and wellresearched topic of research with broad applications spanning several scientific fields. We develop a shape based framework for probability density and general function modelling. The framework encompasses both shape constrained and unconstrained estimation, and can accomodate a much broader notion of shape constraints than has been considered in literature. The estimation approach is a two step process where the first step creates a template or an initial guess, and the second important step ``improves" the estimate according to an appropriate objective function. We derive asymptotic properties of the estimators in different scenarios, and illustrate the performance of the estimate through several simulation as well as real data examples.
Show less  Date Issued
 2019
 Identifier
 2019_Summer_Dasgupta_fsu_0071E_15347
 Format
 Thesis
 Title
 The Relationship Between Body Mass and Blood Pressure in Diverse Populations.
 Creator

Abayomi, Emilola J., McGee, Daniel, Lackland, Daniel, Hurt, Myra, Chicken, Eric, Niu, Xufeng, Department of Statistics, Florida State University
 Abstract/Description

High blood pressure is a major determinant of risk for Coronary Heart Disease (CHD) and stroke, leading causes of death in the industrialized world. A myriad of pharmacological treatments for elevated blood pressure, defined as a blood pressure greater than 140/90mmHg, are available and have at least partially resulted in large reductions in the incidence of CHD and stroke in the U.S. over the last 50 years. The factors that may increase blood pressure levels are not well understood, but body...
Show moreHigh blood pressure is a major determinant of risk for Coronary Heart Disease (CHD) and stroke, leading causes of death in the industrialized world. A myriad of pharmacological treatments for elevated blood pressure, defined as a blood pressure greater than 140/90mmHg, are available and have at least partially resulted in large reductions in the incidence of CHD and stroke in the U.S. over the last 50 years. The factors that may increase blood pressure levels are not well understood, but body mass is thought to be a major determinant of blood pressure level. Obesity is measured through various methods (skinfolds, waisttohip ratio, bioelectrical impedance analysis (BIA), etc.), but the most commonly used measure is body mass index,BMI= Weight(kg)/Height(m)2
Show less  Date Issued
 2012
 Identifier
 FSU_migr_etd5308
 Format
 Thesis
 Title
 Statistical Models on Human Shapes with Application to Bayesian Image Segmentation and Gait Recognition.
 Creator

Kaziska, David M., Srivastava, Anuj, Mio, Washington, Chicken, Eric, Wegkamp, Marten, Department of Statistics, Florida State University
 Abstract/Description

In this dissertation we develop probability models for human shapes and apply those probability models to the problems of image segmentation and human identi_cation by gait recognition. To build probability models on human shapes, we consider human shape to be realizations of random variables on a space of simple closed curves and a space of elastic curves. Both of these spaces are quotient spaces of in_nite dimensional manifolds. Our probability models arise through Tangent Principal...
Show moreIn this dissertation we develop probability models for human shapes and apply those probability models to the problems of image segmentation and human identi_cation by gait recognition. To build probability models on human shapes, we consider human shape to be realizations of random variables on a space of simple closed curves and a space of elastic curves. Both of these spaces are quotient spaces of in_nite dimensional manifolds. Our probability models arise through Tangent Principal Component Analysis, a method of studying probability models on manifolds by projecting them onto a tangent plane to the manifold. Since we put the tangent plane at the Karcher mean of sample shapes, we begin our study by examining statistical properties of Karcher means on manifolds. We derive theoretical results for the location of Karcher means on certain manifolds, and perform a simulation study of properties of Karcher means on our shape space. Turning to the speci_c problem of distributions on human shapes we examine alternatives for probability models and _nd that kernel density estimators perform well. We use this model to sample shapes and to perform shape testing. The _rst application we consider is human detection in infrared images. We pursue this application using Bayesian image segmentation, in which our proposed human in an image is a maximum likelihood estimate, obtained using a prior distribution on human shapes and a likelihood arising from a divergence measure on the pixels in the image. We then consider human identi_cation by gait recognition. We examine human gait as a cyclostationary process on the space of elastic curves and develop a metric on processes based on the geodesic distance between sequences on that space. We develop and demonstrate a framework for gait recognition based on this metric, which includes the following elements: automatic detection of gait cycles, interpolation to register gait cycles, computation of a mean gait cycle, and identi_cation by matching a test cycle to the nearest member of a training set. We perform the matching both by an exhaustive search of the training set and through an expedited method using clusterbased trees and boosting.
Show less  Date Issued
 2005
 Identifier
 FSU_migr_etd3275
 Format
 Thesis
 Title
 Nonparametric Data Analysis on Manifolds with Applications in Medical Imaging.
 Creator

Osborne, Daniel Eugene, Patrangenaru, Victor, Liu, Xiuwen, Barbu, Adrian, Chicken, Eric, Department of Statistics, Florida State University
 Abstract/Description

Over the past twenty years, there has been a rapid development in Nonparametric Statistical Analysis on Manifolds applied to Medical Imaging problems. In this body of work, we focus on two different medical imaging problems. The first problem corresponds to analyzing the CT scan data. In this context, we perform nonparametric analysis on the 3D data retrieved from CT scans of healthy young adults, on the SizeandReflection Shape Space of kads in general position in 3D. This work is a part...
Show moreOver the past twenty years, there has been a rapid development in Nonparametric Statistical Analysis on Manifolds applied to Medical Imaging problems. In this body of work, we focus on two different medical imaging problems. The first problem corresponds to analyzing the CT scan data. In this context, we perform nonparametric analysis on the 3D data retrieved from CT scans of healthy young adults, on the SizeandReflection Shape Space of kads in general position in 3D. This work is a part of larger project on planning reconstructive surgery in severe skull injuries which includes preprocessing and postprocessing steps of CT images. The next problem corresponds to analyzing MR diffusion tensor imaging data. Here, we develop a twosample procedure for testing the equality of the generalized Frobenius means of two independent populations on the space of symmetric positive matrices. These new methods, naturally lead to an analysis based on Cholesky decompositions of covariance matrices which helps to decrease computational time and does not increase dimensionality. The resulting nonparametric matrix valued statistics are used for testing if there is a difference on average between corresponding signals in Diffusion Tensor Images (DTI) in young children with dyslexia when compared to their clinically normal peers. The results presented here correspond to data that was previously used in the literature using parametric methods which also showed a significant difference.
Show less  Date Issued
 2012
 Identifier
 FSU_migr_etd5085
 Format
 Thesis
 Title
 MixedEffects Models for Count Data with Applications to Educational Research.
 Creator

Shin, Jihyung, Niu, Xufeng, Hu, Shouping, Al Otaiba, Stephanie Dent, McGee, Daniel, Wu, Wei, Department of Statistics, Florida State University
 Abstract/Description

This research is motivated by an analysis of reading research data. We are interested in modeling the test outcome of ability to fluently recode letters into sounds of kindergarten children aged between 5 and 7. The data showed excessive zero scores (more than 30% of children) on the test. In this dissertation, we carefully examine the models dealing with excessive zeros, which are based on the mixture of distributions, a distribution with zeros and a standard probability distribution with...
Show moreThis research is motivated by an analysis of reading research data. We are interested in modeling the test outcome of ability to fluently recode letters into sounds of kindergarten children aged between 5 and 7. The data showed excessive zero scores (more than 30% of children) on the test. In this dissertation, we carefully examine the models dealing with excessive zeros, which are based on the mixture of distributions, a distribution with zeros and a standard probability distribution with non negative values. In such cases, a log normal variable or a Poisson random variable is often observed with probability from semicontinuous data or count data. The previously proposed models, mixedeffects and mixeddistribution models (MEMD) by Tooze(2002) et al. for semicontinuous data and zeroinflated Poisson (ZIP) regression models by Lambert(1992) for count data are reviewed. We apply zeroinflated Poisson models to repeated measures data of zeroinflated data by introducing a pair of possibly correlated random effects to the zeroinflated Poisson model to accommodate withinsubject correlation and between subject heterogeneity. The model describes the effect of predictor variables on the probability of nonzero responses (occurrence) and mean of nonzero responses (intensity) separately. The likelihood function is maximized using dual quasiNewton optimization of an approximated by adaptive Gaussian quadrature. The maximum likelihood estimates are obtained through standard statistical software package. Using different model parameters, the number of subject, and the number of measurements per subject, the simulation study is conducted and the results are presented. The dissertation ends with the application of the model to reading research data and future research. We examine the number of correct letter sound counted of children collected over 2008 2009 academic year. We find that age, gender and socioeconomic status are significantly related to the letter sound fluency of children in both parts of the model. The model provides better explanation of data structure and easier interpretations of parameter values, as they are the same as in standard logistic models and Poisson regression models. The model can be extended to accommodate serial correlation which can be observed in longitudinal data. Also, one may consider multilevel zeroinflated Poisson model. Although the multilevel model was proposed previously, parameter estimation by penalized quasi likelihood methods is questionable, and further examination is needed.
Show less  Date Issued
 2012
 Identifier
 FSU_migr_etd5181
 Format
 Thesis
 Title
 Estimation and Sequential Monitoring of Nonlinear Functional Responses Using Wavelet Shrinkage.
 Creator

Cuevas, Jordan, Chicken, Eric, Sobanjo, John, Niu, Xufeng, Wu, Wei, Department of Statistics, Florida State University
 Abstract/Description

Statistical process control (SPC) is widely used in industrial settings to monitor processes for shifts in their distributions. SPC is generally thought of in two distinct phases: Phase I, in which historical data is analyzed in order to establish an incontrol process, and Phase II, in which new data is monitored for deviations from the incontrol form. Traditionally, SPC had been used to monitor univariate (multivariate) processes for changes in a particular parameter (parameter vector)....
Show moreStatistical process control (SPC) is widely used in industrial settings to monitor processes for shifts in their distributions. SPC is generally thought of in two distinct phases: Phase I, in which historical data is analyzed in order to establish an incontrol process, and Phase II, in which new data is monitored for deviations from the incontrol form. Traditionally, SPC had been used to monitor univariate (multivariate) processes for changes in a particular parameter (parameter vector). Recently however, technological advances have resulted in processes in which each observation is actually an ndimensional functional response (referred to as a profile), where n can be quite large. Additionally, these profiles are often unable to be adequately represented parametrically, making traditional SPC techniques inapplicable. This dissertation starts out by addressing the problem of nonparametric function estimation, which would be used to analyze process data in a PhaseI setting. The translation invariant wavelet estimator (TI) is often used to estimate irregular functions, despite the drawback that it tends to oversmooth jumps. A trimmed translation invariant estimator (TTI) is proposed, of which the TI estimator is a special case. By reducing the point by point variability of the TI estimator, TTI is shown to retain the desirable qualities of TI while improving reconstructions of functions with jumps. Attention is then turned to the PhaseII problem of monitoring sequences of profiles for deviations from incontrol. Two profile monitoring schemes are proposed; the first monitors for changes in the noise variance using a likelihood ratio test based on the highest detail level of wavelet coefficients of the observed profile. The second offers a semiparametric test to monitor for changes in both the functional form and noise variance. Both methods make use of wavelet shrinkage in order to distinguish relevant functional information from noise contamination. Different forms of each of these test statistics are proposed and results are compared via Monte Carlo simulation.
Show less  Date Issued
 2012
 Identifier
 FSU_migr_etd4788
 Format
 Thesis
 Title
 Investigating the Categories for Cholesterol and Blood Pressure for Risk Assessment of Death Due to Coronary Heart Disease.
 Creator

Franks, Billy J., McGee, Daniel, Hurt, Myra, Huﬀer, Fred, Niu, Xufeng, Department of Statistics, Florida State University
 Abstract/Description

Many characteristics for predicting death due to coronary heart disease are measured on a continuous scale. These characteristics, however, are often categorized for clinical use and to aid in treatment decisions. We would like to derive a systematic approach to determine the best categorizations of systolic blood pressure and cholesterol level for use in identifying individuals who are at high risk for death due to coronary heart disease and to compare these data derived categories to those...
Show moreMany characteristics for predicting death due to coronary heart disease are measured on a continuous scale. These characteristics, however, are often categorized for clinical use and to aid in treatment decisions. We would like to derive a systematic approach to determine the best categorizations of systolic blood pressure and cholesterol level for use in identifying individuals who are at high risk for death due to coronary heart disease and to compare these data derived categories to those in common usage. Whatever categories are chosen, they should allow physicians to accurately estimate the probability of survival from coronary heart disease until some time t. The best categories will be those that provide the most accurate prediction for an individual's risk of dying by t. The approach that will be used to determine these categories will be a version of Classification And Regression Trees that can be applied to censored survival data. The major goals of this dissertation are to obtain dataderived categories for risk assessment, compare these categories to the ones already recommended in the medical community, and to assess the performance of these categories in predicting survival probabilities.
Show less  Date Issued
 2005
 Identifier
 FSU_migr_etd4402
 Format
 Thesis