Current Search: Research Repository (x) » * (x) » Thesis (x) » Department of Statistics (x)
Search results
Pages
- Title
- Time-Varying Coefficient Models with ARMA-GARCH Structures for Longitudinal Data Analysis.
- Creator
-
Zhao, Haiyan, Niu, Xufeng, Huffer, Fred, Nolder, Craig, McGee, Dan, Department of Statistics, Florida State University
- Abstract/Description
-
The motivation of my research comes from the analysis of the Framingham Heart Study (FHS) data. The FHS is a long term prospective study of cardiovascular disease in the community of Framingham, Massachusetts. The study began in 1948 and 5,209 subjects were initially enrolled. Examinations were given biennially to the study participants and their status associated with the occurrence of disease was recorded. In this dissertation, the event we are interested in is the incidence of the coronary...
Show moreThe motivation of my research comes from the analysis of the Framingham Heart Study (FHS) data. The FHS is a long term prospective study of cardiovascular disease in the community of Framingham, Massachusetts. The study began in 1948 and 5,209 subjects were initially enrolled. Examinations were given biennially to the study participants and their status associated with the occurrence of disease was recorded. In this dissertation, the event we are interested in is the incidence of the coronary heart disease (CHD). Covariates considered include sex, age, cigarettes per day (CSM), serum cholesterol (SCL), systolic blood pressure (SBP) and body mass index (BMI, weight in kilograms/height in meters squared). Statistical literature review indicates that effects of the covariates on Cardiovascular disease or death caused by all possible diseases in the Framingham study change over time. For example, the effect of SCL on Cardiovascular disease decreases linearly over time. In this study, I would like to examine the time-varying effects of the risk factors on CHD incidence. Time-varying coefficient models with ARMA-GARCH structure are developed in this research. The maximum likelihood and the marginal likelihood methods are used to estimate the parameters in the proposed models. Since high-dimensional integrals are involved in the calculations of the marginal likelihood, the Laplace approximation is employed in this study. Simulation studies are conducted to evaluate the performance of these two estimation methods based on our proposed models. The Kullback-Leibler (KL) divergence and the root mean square error are employed in the simulation studies to compare the results obtained from different methods. Simulation results show that the marginal likelihood approach gives more accurate parameter estimates, but is more computationally intensive. Following the simulation study, our proposed models are applied to the Framingham Heart Study to investigate the time-varying effects of covariates with respect to CHD incidence. To specify the time-series structures of the effects of risk factors, the Bayesian Information Criterion (BIC) is used for model selection. Our study shows that the relationship between CHD and risk factors changes over time. For males, there is an obviously decreasing linear trend for age effect, which implies that the age effect on CHD is less significant for elder patients than younger patients. The effect of CSM stays almost the same in the first 30 years and decreases thereafter. There are slightly decreasing linear trends for both effects of SBP and BMI. Furthermore, the coefficients of SBP are mostly positive over time, i.e., patients with higher SBP are more likely developing CHD as expected. For females, there is also an obviously decreasing linear trend for age effect, while the effects of SBP and BMI on CHD are mostly positive and do not change too much over time.
Show less - Date Issued
- 2010
- Identifier
- FSU_migr_etd-0527
- Format
- Thesis
- Title
- A Comparison of Estimators in Hierarchical Linear Modeling: Restricted Maximum Likelihood versus Bootstrap via Minimum Norm Quadratic Unbiased Estimators.
- Creator
-
Delpish, Ayesha Nneka, Niu, Xu-Feng, Tate, Richard L., Huffer, Fred W., Zahn, Douglas, Department of Statistics, Florida State University
- Abstract/Description
-
The purpose of the study was to investigate the relative performance of two estimation procedures, the restricted maximum likelihood (REML) and the bootstrap via MINQUE, for a two-level hierarchical linear model under a variety of conditions. Specific focus lay on observing whether the bootstrap via MINQUE procedure offered improved accuracy in the estimation of the model parameters and their standard errors in situations where normality may not be guaranteed. Through Monte Carlo simulations,...
Show moreThe purpose of the study was to investigate the relative performance of two estimation procedures, the restricted maximum likelihood (REML) and the bootstrap via MINQUE, for a two-level hierarchical linear model under a variety of conditions. Specific focus lay on observing whether the bootstrap via MINQUE procedure offered improved accuracy in the estimation of the model parameters and their standard errors in situations where normality may not be guaranteed. Through Monte Carlo simulations, the importance of this assumption for the accuracy of multilevel parameter estimates and their standard errors was assessed using the accuracy index of relative bias and by observing the coverage percentages of 95% confidence intervals constructed for both estimation procedures. The study systematically varied the number of groups at level-2 (30 versus 100), the size of the intraclass correlation (0.01 versus 0.20) and the distribution of the observations (normal versus chi-squared with 1 degree of freedom). The number of groups and intraclass correlation factors produced effects consistent with those previously reported—as the number of groups increased, the bias in the parameter estimates decreased, with a more significant effect observed for those estimates obtained via REML. High levels of the intraclass correlation also led to a decrease in the efficiency of parameter estimation under both methods. Study results show that while both the restricted maximum likelihood and the bootstrap via MINQUE estimates of the fixed effects were accurate, the efficiency of the estimates was affected by the distribution of errors with the bootstrap via MINQUE procedure outperforming the REML. Both procedures produced less efficient estimators under the chi-squared distribution, particularly for the variance-covariance component estimates.
Show less - Date Issued
- 2006
- Identifier
- FSU_migr_etd-0771
- Format
- Thesis
- Title
- Estimation from Data Representing a Sample of Curves.
- Creator
-
Auguste, Anna L., Bunea, Florentina, Mason, Patrick, Hollander, Myles, Huffer, Fred, Department of Statistics, Florida State University
- Abstract/Description
-
This dissertation introduces and assesses an algorithm to generate confidence bands for a regression function or a main effect when multiple data sets are available. In particular it proposes to construct confidence bands for different trajectories and then aggregate these to produce an overall confidence band for a mean function. An estimator of the regression function or main effect is also examined. First, nonparametric estimators and confidence bands are formed on each data set separately...
Show moreThis dissertation introduces and assesses an algorithm to generate confidence bands for a regression function or a main effect when multiple data sets are available. In particular it proposes to construct confidence bands for different trajectories and then aggregate these to produce an overall confidence band for a mean function. An estimator of the regression function or main effect is also examined. First, nonparametric estimators and confidence bands are formed on each data set separately. Then each data set is in turn treated as a testing set for aggregating the preliminary results from the remaining data sets. The criterion used for this aggregation is either the least squares (LS) criterion or a BIC type penalized LS criterion. The proposed estimator is the average over data sets of these aggregates. It is thus a weighted sum of the preliminary estimators. The proposed confidence band is the minimum L1 band of all the M aggregate bands when we only have a main effect. In the case where there is some random effect we suggest an adjustment to the confidence band. In this case, the proposed confidence band is the minimum L1 band of all the M adjusted aggregate bands. Desirable asymptotic properties are shown to hold. A simulation study examines the performance of each technique relative to several alternate methods and theoretical benchmarks. An application to seismic data is conducted.
Show less - Date Issued
- 2006
- Identifier
- FSU_migr_etd-0286
- Format
- Thesis
- Title
- Statistical Shape Analysis on Manifolds with Applications to Planar Contours and Structural Proteomics.
- Creator
-
Ellingson, Leif A., Patrangenaru, Vic, Mio, Washington, Zhang, Jinfeng, Niu, Xufeng, Department of Statistics, Florida State University
- Abstract/Description
-
The technological advances in recent years have produced a wealth of intricate digital imaging data that is analyzed effectively using the principles of shape analysis. Such data often lies on either high-dimensional or infinite-dimensional manifolds. With computing power also now strong enough to handle this data, it is necessary to develop theoretically-sound methodology to perform the analysis in a computationally efficient manner. In this dissertation, we propose approaches of doing so...
Show moreThe technological advances in recent years have produced a wealth of intricate digital imaging data that is analyzed effectively using the principles of shape analysis. Such data often lies on either high-dimensional or infinite-dimensional manifolds. With computing power also now strong enough to handle this data, it is necessary to develop theoretically-sound methodology to perform the analysis in a computationally efficient manner. In this dissertation, we propose approaches of doing so for planar contours and the three-dimensional atomic structures of protein binding sites. First, we adapt Kendall's definition of direct similarity shapes of finite planar configurations to shapes of planar contours under certain regularity conditions and utilize Ziezold's nonparametric view of Frechet mean shapes. The space of direct similarity shapes of regular planar contours is embedded in a space of Hilbert-Schmidt operators in order to obtain the Veronese-Whitney extrinsic mean shape. For computations, it is necessary to use discrete approximations of both the contours and the embedding. For cases when landmarks are not provided, we propose an automated, randomized landmark selection procedure that is useful for contour matching within a population and is consistent with the underlying asymptotic theory. For inference on the extrinsic mean direct similarity shape, we consider a one-sample neighborhood hypothesis test and the use of nonparametric bootstrap to approximate confidence regions. Bandulasiri et al (2008) suggested using extrinsic reflection size-and-shape analysis to study the relationship between the structure and function of protein binding sites. In order to obtain meaningful results for this approach, it is necessary to identify the atoms common to a group of binding sites with similar functions and obtain proper correspondences for these atoms. We explore this problem in depth and propose an algorithm for simultaneously finding the common atoms and their respective correspondences based upon the Iterative Closest Point algorithm. For a benchmark data set, our classification results compare favorably with those of leading established methods. Finally, we discuss current directions in the field of statistics on manifolds, including a computational comparison of intrinsic and extrinsic analysis for various applications and a brief introduction of sample spaces with manifold stratification.
Show less - Date Issued
- 2011
- Identifier
- FSU_migr_etd-0053
- Format
- Thesis
- Title
- Examining the Effect of Treatment on the Distribution of Blood Pressure in the Population Using Observational Data.
- Creator
-
Kucukemiroglu, Saryet Alexa, McGee, Daniel, Slate, Elizabeth H., Hurt, Myra M., Huffer, Fred W. (Fred William), Florida State University, College of Arts and Sciences,...
Show moreKucukemiroglu, Saryet Alexa, McGee, Daniel, Slate, Elizabeth H., Hurt, Myra M., Huffer, Fred W. (Fred William), Florida State University, College of Arts and Sciences, Department of Statistics
Show less - Abstract/Description
-
Since the introduction of anti-hypertensive medications in the mid-1950s, there has been an increased use of blood pressure medications in the US. The growing use of anti-hypertensive treatment has affected the distribution of blood pressure in the population over time. Now observational data no longer reflect natural blood pressure levels. Our goal is to examine the effect of anti-hypertensive drugs on distributions of blood pressure using several well-known observational studies. The...
Show moreSince the introduction of anti-hypertensive medications in the mid-1950s, there has been an increased use of blood pressure medications in the US. The growing use of anti-hypertensive treatment has affected the distribution of blood pressure in the population over time. Now observational data no longer reflect natural blood pressure levels. Our goal is to examine the effect of anti-hypertensive drugs on distributions of blood pressure using several well-known observational studies. The statistical concept of censoring is used to estimate the distribution of blood pressure in populations if no treatment were available. The treated and estimated untreated distributions are then compared to determine the general effect of these medications in the population. Our analyses show that these drugs have an increasing impact on controlling blood pressure distributions in populations that are heavily treated.
Show less - Date Issued
- 2017
- Identifier
- FSU_FALL2017_Kucukemiroglu_fsu_0071E_14275
- Format
- Thesis
- Title
- Semi-Parametric Generalized Estimating Equations with Kernel Smoother: A Longitudinal Study in Financial Data Analysis.
- Creator
-
Yang, Liu, Niu, Xufeng, Cheng, Yingmei, Huffer, Fred W. (Fred William), Tao, Minjing, Florida State University, College of Arts and Sciences, Department of Statistics
- Abstract/Description
-
Longitudinal studies are widely used in various fields, such as public health, clinic trials and financial data analysis. A major challenge for longitudinal studies is repeated measurements from each subject, which cause time dependent correlation within subjects. Generalized Estimating Equations can deal with correlated outcomes for longitudinal data through marginal effect. My model will base on Generalized Estimating Equations with semi-parametric approach, providing a flexible structure...
Show moreLongitudinal studies are widely used in various fields, such as public health, clinic trials and financial data analysis. A major challenge for longitudinal studies is repeated measurements from each subject, which cause time dependent correlation within subjects. Generalized Estimating Equations can deal with correlated outcomes for longitudinal data through marginal effect. My model will base on Generalized Estimating Equations with semi-parametric approach, providing a flexible structure for regression models: coefficients for parametric covariates will be estimated and nuisance covariates will be fitted in kernel smoothers for non-parametric part. Profile kernel estimator and the seemingly unrelated kernel estimator (SUR) will be used to deliver consistent and efficient semi-parametric estimators comparing to parametric models. We provide simulation results for estimating semi-parametric models with one or multiple non-parametric terms. In application part, we would like to focus on financial market: a credit card loan data will be used with the payment information for each customer across 6 months, investigating whether gender, income, age or other factors will influence payment status significantly. Furthermore, we propose model comparisons to evaluate whether our model should be fitted based on different levels of factors, such as male and female or based on different types of estimating methods, such as parametric estimation or semi-parametric estimation.
Show less - Date Issued
- 2017
- Identifier
- FSU_FALL2017_YANG_fsu_0071E_14219
- Format
- Thesis
- Title
- Bayesian Modeling and Variable Selection for Complex Data.
- Creator
-
Li, Hanning, Pati, Debdeep, Huffer, Fred W. (Fred William), Kercheval, Alec N., Sinha, Debajyoti, Bradley, Jonathan R., Florida State University, College of Arts and Sciences,...
Show moreLi, Hanning, Pati, Debdeep, Huffer, Fred W. (Fred William), Kercheval, Alec N., Sinha, Debajyoti, Bradley, Jonathan R., Florida State University, College of Arts and Sciences, Department of Statistics
Show less - Abstract/Description
-
As we routinely encounter high-throughput datasets in complex biological and environment research, developing novel models and methods for variable selection has received widespread attention. In this dissertation, we addressed a few key challenges in Bayesian modeling and variable selection for high-dimensional data with complex spatial structures. a) Most Bayesian variable selection methods are restricted to mixture priors having separate components for characterizing the signal and the...
Show moreAs we routinely encounter high-throughput datasets in complex biological and environment research, developing novel models and methods for variable selection has received widespread attention. In this dissertation, we addressed a few key challenges in Bayesian modeling and variable selection for high-dimensional data with complex spatial structures. a) Most Bayesian variable selection methods are restricted to mixture priors having separate components for characterizing the signal and the noise. However, such priors encounter computational issues in high dimensions. This has motivated continuous shrinkage priors, resembling the two-component priors facilitating computation and interpretability. While such priors are widely used for estimating high-dimensional sparse vectors, selecting a subset of variables remains a daunting task. b) Spatial/spatial-temporal data sets with complex structures are nowadays commonly encountered in various scientific research fields ranging from atmospheric sciences, forestry, environmental science, biological science, and social science. Selecting important spatial variables that have significant influences on occurrences of events is undoubtedly necessary and essential for providing insights to researchers. Self-excitation, which is a feature that occurrence of an event increases the likelihood of more occurrences of the same type of events nearby in time and space, can be found in many natural/social events. Research on modeling data with self-excitation feature has increasingly drawn interests recently. However, existing literature on self-exciting models with inclusion of high-dimensional spatial covariates is still underdeveloped. c) Gaussian Process is among the most powerful model frames for spatial data. Its major bottleneck is the computational complexity which stems from inversion of dense matrices associated with a Gaussian process covariance. Hierarchical divide-conquer Gaussian Process models have been investigated for ultra large data sets. However, computation associated with scaling the distributing computing algorithm to handle a large number of sub-groups poses a serious bottleneck. In chapter 2 of this dissertation, we propose a general approach for variable selection with shrinkage priors. The presence of very few tuning parameters makes our method attractive in comparison to ad hoc thresholding approaches. The applicability of the approach is not limited to continuous shrinkage priors, but can be used along with any shrinkage prior. Theoretical properties for near-collinear design matrices are investigated and the method is shown to have good performance in a wide range of synthetic data examples and in a real data example on selecting genes affecting survival due to lymphoma. In Chapter 3 of this dissertation, we propose a new self-exciting model that allows the inclusion of spatial covariates. We develop algorithms which are effective in obtaining accurate estimation and variable selection results in a variety of synthetic data examples. Our proposed model is applied on Chicago crime data where the influence of various spatial features is investigated. In Chapter 4, we focus on a hierarchical Gaussian Process regression model for ultra-high dimensional spatial datasets. By evaluating the latent Gaussian process on a regular grid, we propose an efficient computational algorithm through circulant embedding. The latent Gaussian process borrows information across multiple sub-groups, thereby obtaining a more accurate prediction. The hierarchical model and our proposed algorithm are studied through simulation examples.
Show less - Date Issued
- 2017
- Identifier
- FSU_FALL2017_Li_fsu_0071E_14159
- Format
- Thesis
- Title
- Spatial Statistics and Its Applications in Biostatistics and Environmental Statistics.
- Creator
-
Hu, Guanyu, Huffer, Fred W. (Fred William), Paek, Insu, Sinha, Debajyoti, Slate, Elizabeth H., Bradley, Jonathan R., Florida State University, College of Arts and Sciences,...
Show moreHu, Guanyu, Huffer, Fred W. (Fred William), Paek, Insu, Sinha, Debajyoti, Slate, Elizabeth H., Bradley, Jonathan R., Florida State University, College of Arts and Sciences, Department of Statistics
Show less - Abstract/Description
-
This dissertation presents some topics in spatial statistics and their application in biostatistics and environmental statistics. The field of spatial statistics is an energetic area in statistics. In Chapter 2 and Chapter 3, the goal is to build subregion models under the assumption that the responses or the parameters are spatially correlated. For regression models, considering spatially varying coecients is a reasonable way to build subregion models. There are two different techniques for...
Show moreThis dissertation presents some topics in spatial statistics and their application in biostatistics and environmental statistics. The field of spatial statistics is an energetic area in statistics. In Chapter 2 and Chapter 3, the goal is to build subregion models under the assumption that the responses or the parameters are spatially correlated. For regression models, considering spatially varying coecients is a reasonable way to build subregion models. There are two different techniques for exploring spatially varying coecients. One is geographically weighted regression (Brunsdon et al. 1998). The other is a spatially varying coecients model which assumes a stationary Gaussian process for the regression coecients (Gelfand et al. 2003). Based on the ideas of these two techniques, we introduce techniques for exploring subregion models in survival analysis which is an important area of biostatistics. In Chapter 2, we introduce modied versions of the Kaplan-Meier and Nelson-Aalen estimators which incorporate geographical weighting. We use ideas from counting process theory to obtain these modied estimators, to derive variance estimates, and to develop associated hypothesis tests. In Chapter 3, we introduce a Bayesian parametric accelerated failure time model with spatially varying coefficients. These two techniques can explore subregion models in survival analysis using both nonparametric and parametric approaches. In Chapter 4, we introduce Bayesian parametric covariance regression analysis for a response vector. The proposed method denes a regression model between the covariance matrix of a p-dimensional response vector and auxiliary variables. We propose a constrained Metropolis-Hastings algorithm to get the estimates. Simulation results are presented to show performance of both regression and covariance matrix estimates. Furthermore, we have a more realistic simulation experiment in which our Bayesian approach has better performance than the MLE. Finally, we illustrate the usefulness of our model by applying it to the Google Flu data. In Chapter 5, we give a brief summary of future work.
Show less - Date Issued
- 2017
- Identifier
- FSU_FALL2017_Hu_fsu_0071E_14205
- Format
- Thesis
- Title
- The One-and Two-Sample Problem for Data on Hilbert Manifolds with Applications to Shape Analysis.
- Creator
-
Qiu, Mingfei, Patrangenaru, Victor, Liu, Xiuwen, Slate, Elizabeth H., Barbu, Adrian G. (Adrian Gheorghe), Clickner, Robert Paul, Paige, Robert, Florida State University, College...
Show moreQiu, Mingfei, Patrangenaru, Victor, Liu, Xiuwen, Slate, Elizabeth H., Barbu, Adrian G. (Adrian Gheorghe), Clickner, Robert Paul, Paige, Robert, Florida State University, College of Arts and Sciences, Department of Statistics
Show less - Abstract/Description
-
This dissertation is concerned with high level imaging analysis. In particular, our focus is on extracting the projective shape information or the similarity shape from digital camera images or Magnetic Resonance Imaging(MRI). The approach is statistical without making any assumptions about the distributions of the random object under investigation. The data is organized as points on a Hilbert manifold. In the case of projective shapes of finite dimensional configuration of points, we...
Show moreThis dissertation is concerned with high level imaging analysis. In particular, our focus is on extracting the projective shape information or the similarity shape from digital camera images or Magnetic Resonance Imaging(MRI). The approach is statistical without making any assumptions about the distributions of the random object under investigation. The data is organized as points on a Hilbert manifold. In the case of projective shapes of finite dimensional configuration of points, we consider testing a one-sample null hypothesis, while in the infinite dimensional case, we considered a neighborhood hypothesis testing methods. For 3D scenes, we retrieve the 3D projective shape, and use the Lie group structure of the projective shape space. We test the equality of two extrinsic means, by introducing the mean projective shape change. For 2D MRI of midsections of Corpus Callosum contours, we use an automatic matching technique that is necessary in pursuing a one-sample neighborhood hypothesis testing for the similarity shapes. We conclude that the mean similarity shape of the Corpus Callosum of average individuals is very far from the shape of Albert Einstein's, which may explain his geniality. Another application of our Hilbert manifold methodology is two-sample testing problem for Veronese-Whitney means of projective shapes of 3D contours. Particularly, our data consisting comparing 3D projective shapes of contours of leaves from the same tree species.
Show less - Date Issued
- 2015
- Identifier
- FSU_2015fall_Qiu_fsu_0071E_12922
- Format
- Thesis
- Title
- Examining the Relationship of Dietary Component Intakes to Each Other and to Mortality.
- Creator
-
Alrajhi, Sharifah, McGee, Daniel, Levenson, Cathy W., Niu, Xufeng, Sinha, Debajyoti, Florida State University, College of Arts and Sciences, Department of Statistics
- Abstract/Description
-
In this essay we present analysis examining the basic dietary structure and its relationship to mortality in the first National Health and Nutrition Examination Survey (NHANES I) conducted between 1971 and 1975. We used results from 24-hour recalls on 10,483 individuals in this study. All of the indivduals in the analytic sample were followed through 1992 for vital status. The mean follow-up period for the participants was 16 years. During follow-up 2,042 (48%) males and 1,754 (27%) females...
Show moreIn this essay we present analysis examining the basic dietary structure and its relationship to mortality in the first National Health and Nutrition Examination Survey (NHANES I) conducted between 1971 and 1975. We used results from 24-hour recalls on 10,483 individuals in this study. All of the indivduals in the analytic sample were followed through 1992 for vital status. The mean follow-up period for the participants was 16 years. During follow-up 2,042 (48%) males and 1,754 (27%) females died. We first attempted to capture the inherent structure of the dietary data using principal components analyses (PCA). We performed this estimation separately for each race (white and black) and gender (male and female) and compared the estimated principal components among these four strata. We found that the principal components were similar (but not identical) in the four strata. we also related our estimated principal components to mortality using Cox Proportional Hazards (CPH) models and related dietary component to mortality using forward variable selection.
Show less - Date Issued
- 2015
- Identifier
- FSU_2015fall_Alrajhi_fsu_0071E_12802
- Format
- Thesis
- Title
- Median Regression for Complex Survey Data.
- Creator
-
Fraser, Raphael André, Sinha, Debajyoti, Lipsitz, Stuart, Carlson, Elwood, Slate, Elizabeth H., Huffer, Fred W. (Fred William), Florida State University, College of Arts and...
Show moreFraser, Raphael André, Sinha, Debajyoti, Lipsitz, Stuart, Carlson, Elwood, Slate, Elizabeth H., Huffer, Fred W. (Fred William), Florida State University, College of Arts and Sciences, Department of Statistics
Show less - Abstract/Description
-
The ready availability of public-use data from various large national complex surveys has immense potential for the assessment of population characteristics--means, proportions, totals, etcetera. Using a model-based approach, complex surveys can be used to evaluate the effectiveness of treatments and to identify risk factors for important diseases such as cancer. Existing statistical methods based on estimating equations and/or utilizing resampling methods are often not valid with survey data...
Show moreThe ready availability of public-use data from various large national complex surveys has immense potential for the assessment of population characteristics--means, proportions, totals, etcetera. Using a model-based approach, complex surveys can be used to evaluate the effectiveness of treatments and to identify risk factors for important diseases such as cancer. Existing statistical methods based on estimating equations and/or utilizing resampling methods are often not valid with survey data due to design features such as stratification, multistage sampling and unequal selection probabilities. In this paper, we accommodate these design features in the analysis of highly skewed response variables arising from large complex surveys. Specifically, we propose a double-transform-both-sides based estimating equations approach to estimate the median regression parameters of the highly skewed response; the double-transform-both-sides method applies the same transformation twice to both the response and regression function. The usual sandwich variance estimate can be used in our approach, whereas a resampling approach would be needed for a pseudo-likelihood based on minimizing absolute deviations. Furthermore, the double-transform-both-sides estimator is relatively robust to the true underlying distribution, and has much smaller mean square error than the least absolute deviations estimator. The method is motivated by an analysis of laboratory data on urinary iodine concentration from the National Health and Nutrition Examination Survey.
Show less - Date Issued
- 2015
- Identifier
- FSU_2015fall_Fraser_fsu_0071E_12825
- Format
- Thesis
- Title
- Matched Sample Based Approach for Cross-Platform Normalization on Gene Expression Data.
- Creator
-
Shao, Jiang, Zhang, Jinfeng, Sang, Qing-Xiang Amy, Wu, Wei, Niu, Xufeng, Florida State University, College of Arts and Sciences, Department of Statistics
- Abstract/Description
-
Gene-expression data profile are widely used in all kinds of biomedical studies especially in cancer research. This dissertation work focus on solving the problem of how to combine datasets arising from different studies. Of particular interest is how to remove platform effect alone. The matched sample based cross-platform normalization method we developed are designed to tackle data merging problem in two scenarios: The first is affy-agilent cross-platform normalization which are belong to...
Show moreGene-expression data profile are widely used in all kinds of biomedical studies especially in cancer research. This dissertation work focus on solving the problem of how to combine datasets arising from different studies. Of particular interest is how to remove platform effect alone. The matched sample based cross-platform normalization method we developed are designed to tackle data merging problem in two scenarios: The first is affy-agilent cross-platform normalization which are belong to classic microarray gene expression profile. The second is the integration of microarray data with Next Generation Sequencing genome data. We use several general validation measures to assess and compare with the popular Distance-weighted discrimination method. With the public web-based tool NCI-60 CellMiner and The Cancer Genome Atlas data portal supported, our proposed method outperformed DWD in both cross-platform scenarios. It can be further assessed by the ability of exploring biological features in the studies of cancer type discrimination. We applied our method onto two classification problem: One is Breast cancer tumor/normal status classification on microarray and next generation sequencing datasets; The other is Breast cancer patients chemotherapy response classification on GPL96 and GPL570 microarray datasets. Both problems show the classification power are increased after our matched sample based cross-platform normalization method.
Show less - Date Issued
- 2015
- Identifier
- FSU_2015fall_Shao_fsu_0071E_12833
- Format
- Thesis
- Title
- Individual Patient-Level Data Meta-Analysis: A Comparison of Methods for the Diverse Populations Collaboration Data Set.
- Creator
-
Dutton, Matthew Thomas, McGee, Daniel, Becker, Betsy, Niu, Xufeng, Zhang, Jinfeng, Department of Statistics, Florida State University
- Abstract/Description
-
DerSimonian and Laird define meta-analysis as "the statistical analysis of a collection of analytic results for the purpose of integrating their findings. One alternative to classical meta-analytic approaches in known as Individual Patient-Level Data, or IPD, meta-analysis. Rather than depending on summary statistics calculated for individual studies, IPD meta-analysis analyzes the complete data from all included studies. Two potential approaches to incorporating IPD data into the meta...
Show moreDerSimonian and Laird define meta-analysis as "the statistical analysis of a collection of analytic results for the purpose of integrating their findings. One alternative to classical meta-analytic approaches in known as Individual Patient-Level Data, or IPD, meta-analysis. Rather than depending on summary statistics calculated for individual studies, IPD meta-analysis analyzes the complete data from all included studies. Two potential approaches to incorporating IPD data into the meta-analytic framework are investigated. A two-stage analysis is first conducted, in which individual models are fit for each study and summarized using classical meta-analysis procedures. Secondly, a one-stage approach that singularly models the data and summarizes the information across studies is investigated. Data from the Diverse Populations Collaboration data set are used to investigate the differences between these two methods in a specific example. The bootstrap procedure is used to determine if the two methods produce statistically different results in the DPC example. Finally, a simulation study is conducted to investigate the accuracy of each method in given scenarios.
Show less - Date Issued
- 2011
- Identifier
- FSU_migr_etd-0620
- Format
- Thesis
- Title
- Analysis of the Wealth Distribution at Equilibrium in a Heterogeneous Agent Economy.
- Creator
-
Badshah, Muffasir H., Srivastava, Anuj, Beaumont, Paul, Wu, Wei, Kercheval, Alec, Department of Statistics, Florida State University
- Abstract/Description
-
This paper aims at analyzing a macro economy with a continuum of infinitely-lived households that make rational decisions about consumption and wealth savings in the face of employment and aggregate productivity shocks. The heterogeneous population structure arises when households differ in wealth and employment status against which they cannot insure. In this framework, the household wealth evolution is modeled as a mixture Markov process. The stationary wealth distributions are obtained...
Show moreThis paper aims at analyzing a macro economy with a continuum of infinitely-lived households that make rational decisions about consumption and wealth savings in the face of employment and aggregate productivity shocks. The heterogeneous population structure arises when households differ in wealth and employment status against which they cannot insure. In this framework, the household wealth evolution is modeled as a mixture Markov process. The stationary wealth distributions are obtained using eigen structures of transition matrices under the Perron-Frobenius theorem. This step is utilized repeatedly to find the equilibrium state of the system, and it leads to an efficient framework for studying the dynamic general equilibrium. A systematic evaluation of the equilibrium state under different initial conditions is further presented and analyzed.
Show less - Date Issued
- 2010
- Identifier
- FSU_migr_etd-0844
- Format
- Thesis
- Title
- Minimax Tests for Nonparametric Alternatives with Applications to High Frequency Data.
- Creator
-
Yu, Han, Song, Kai-Sheng, Professor, Jack Quine, Professor, Fred Huffer, Professor, Dan McGee, Department of Statistics, Florida State University
- Abstract/Description
-
We present a general methodology for developing an asymptotically distribution-free, asymptotic minimax tests. The tests are constructed via a nonparametric density-quantile function and the limiting distribution is derived by a martingale approach. The procedure can be viewed as a novel parametric extension of the classical parametric likelihood ratio test. The proposed tests are shown to be omnibus within an extremely large class of nonparametric global alternatives characterized by simple...
Show moreWe present a general methodology for developing an asymptotically distribution-free, asymptotic minimax tests. The tests are constructed via a nonparametric density-quantile function and the limiting distribution is derived by a martingale approach. The procedure can be viewed as a novel parametric extension of the classical parametric likelihood ratio test. The proposed tests are shown to be omnibus within an extremely large class of nonparametric global alternatives characterized by simple conditions. Furthermore, we establish that the proposed tests provide better minimax distinguishability. The tests have much greater power for detecting high-frequency nonparametric alternatives than the existing classical tests such as Kolmogorov-Smirnov and Cramer-von Mises tests. The good performance of the proposed tests is demonstrated by Monte Carlo simulations and applications in High Energy Physics.
Show less - Date Issued
- 2006
- Identifier
- FSU_migr_etd-0796
- Format
- Thesis
- Title
- Testing for the Equality of Two Distributions on High Dimensional Object Spaces and Nonparametric Inference for Location Parameters.
- Creator
-
Guo, Ruite, Patrangenaru, Victor, Mio, Washington, Barbu, Adrian G. (Adrian Gheorghe), Bradley, Jonathan R., Florida State University, College of Arts and Sciences, Department...
Show moreGuo, Ruite, Patrangenaru, Victor, Mio, Washington, Barbu, Adrian G. (Adrian Gheorghe), Bradley, Jonathan R., Florida State University, College of Arts and Sciences, Department of Statistics
Show less - Abstract/Description
-
Our view is that while some of the basic principles of data analysis are going to remain unchanged, others are to be gradually replaced with Geometry and Topology methods. Linear methods are still making sense for functional data analysis, or in the context of tangent bundles of object spaces. Complex nonstandard data is represented on object spaces. An object space admitting a manifold stratification may be embedded in an Euclidean space. One defines the extrinsic energy distance associated...
Show moreOur view is that while some of the basic principles of data analysis are going to remain unchanged, others are to be gradually replaced with Geometry and Topology methods. Linear methods are still making sense for functional data analysis, or in the context of tangent bundles of object spaces. Complex nonstandard data is represented on object spaces. An object space admitting a manifold stratification may be embedded in an Euclidean space. One defines the extrinsic energy distance associated with two probability measures on an arbitrary object space embedded in a numerical space, and one introduces an extrinsic energy statistic to test for homogeneity of distributions of two random objects (r.o.'s) on such an object space. This test is validated via a simulation example on the Kendall space of planar k-ads with a Veronese-Whitney (VW) embedding. One considers an application to medical imaging, to test for the homogeneity of the distributions of Kendall shapes of the midsections of the Corpus Callosum in a clinically normal population vs a population of ADHD diagnosed individuals. Surprisingly, due to the high dimensionality, these distributions are not significantly different, although they are known to have highly significant VW-means. New spread and location parameters are to be added to reflect the nontrivial topology of certain object spaces. TDA is going to be adapted to object spaces, and hypothesis testing for distributions is going to be based on extrinsic energy methods. For a random point on an object space embedded in an Euclidean space, the mean vector cannot be represented as a point on that space, except for the case when the embedded space is convex. To address this misgiving, since the mean vector is the minimizer of the expected square distance, following Frechet (1948), on an embedded compact object space, one may consider both minimizers and maximizers of the expected square distance to a given point on the embedded object space as mean, respectively anti-mean of the random point. Of all distances on an object space, one considers here the chord distance associated with the embedding of the object space, since for such distances one can give a necessary and sufficient condition for the existence of a unique Frechet mean (respectively Frechet anti-mean). For such distributions these location parameters are called extrinsic mean (respectively extrinsic anti-mean), and the corresponding sample statistics are consistent estimators of their population counterparts. Moreover around the extrinsic mean ( anti-mean ) located at a smooth point, one derives the limit distribution of such estimators.
Show less - Date Issued
- 2017
- Identifier
- FSU_SUMMER2017_Guo_fsu_0071E_13977
- Format
- Thesis
- Title
- A Bayesian Wavelet Based Analysis of Longitudinally Observed Skewed Heteroscedastic Responses.
- Creator
-
Baker, Danisha S. (Danisha Sharice), Chicken, Eric, Sinha, Debajyoti, Harper, Kristine, Pati, Debdeep, Florida State University, College of Arts and Sciences, Department of...
Show moreBaker, Danisha S. (Danisha Sharice), Chicken, Eric, Sinha, Debajyoti, Harper, Kristine, Pati, Debdeep, Florida State University, College of Arts and Sciences, Department of Statistics
Show less - Abstract/Description
-
Unlike many of the current statistical models focusing on highly skewed longitudinal data, we present a novel model accommodating a skewed error distribution, partial linear median regression function, nonparametric wavelet expansion, and serial observations on the same unit. Parameters are estimated via a semiparametric Bayesian procedure using an appropriate Dirichlet process mixture prior for the skewed error distribution. We use a hierarchical mixture model as the prior for the wavelet...
Show moreUnlike many of the current statistical models focusing on highly skewed longitudinal data, we present a novel model accommodating a skewed error distribution, partial linear median regression function, nonparametric wavelet expansion, and serial observations on the same unit. Parameters are estimated via a semiparametric Bayesian procedure using an appropriate Dirichlet process mixture prior for the skewed error distribution. We use a hierarchical mixture model as the prior for the wavelet coefficients. For the "vanishing" coefficients, the model includes a level dependent prior probability mass at zero. This practice implements wavelet coefficient thresholding as a Bayesian Rule. Practical advantages of our method are illustrated through a simulation study and via analysis of a cardiotoxicity study of children of HIV infected mother.
Show less - Date Issued
- 2017
- Identifier
- FSU_SUMMER2017_Baker_fsu_0071E_14036
- Format
- Thesis
- Title
- Regression Methods for Skewed and Heteroscedastic Response with High-Dimensional Covariates.
- Creator
-
Wang, Libo, Sinha, Debajyoti, Taylor, Miles G., Pati, Debdeep, She, Yiyuan, Yang, Yun (Professor of Statistics), Florida State University, College of Arts and Sciences,...
Show moreWang, Libo, Sinha, Debajyoti, Taylor, Miles G., Pati, Debdeep, She, Yiyuan, Yang, Yun (Professor of Statistics), Florida State University, College of Arts and Sciences, Department of Statistics
Show less - Abstract/Description
-
The rise of studies with high-dimensional potential covariates has invited a renewed interest in dimension reduction that promotes more parsimonious models, ease of interpretation and computational tractability. However, current variable selection methods restricted to continuous response often assume Gaussian response for methodological as well as theoretical developments. In this thesis, we consider regression models that induce sparsity, gain prediction power, and accommodates response...
Show moreThe rise of studies with high-dimensional potential covariates has invited a renewed interest in dimension reduction that promotes more parsimonious models, ease of interpretation and computational tractability. However, current variable selection methods restricted to continuous response often assume Gaussian response for methodological as well as theoretical developments. In this thesis, we consider regression models that induce sparsity, gain prediction power, and accommodates response distributions beyond Gaussian with common variance. The first part of this thesis is a transform-both-side Bayesian variable selection model (TBS) which allows skewness, heteroscedasticity and extreme heavy tailed responses. Our method develops a framework which facilitates computationally feasible inference in spite of inducing non-local priors on the original regression coefficients. Even if the transformed conditional mean is no longer linear with respect to covariates, we still prove the consistency of our Bayesian TBS estimators. Simulation studies and real data analysis demonstrate the advantages of our methods. Another main part of this thesis deals the above challenges from a frequentist standpoint. This model incorporates a penalized likelihood to accommodate skewed response, arising from an epsilon-skew-normal (ESN) distribution. With suitable optimization techniques to handle this two-piece penalized likelihood, our method demonstrates substantial gains in sensitivity and specificity even under high-dimensional settings. We conclude this thesis with a novel Bayesian semi-parametric modal regression method along with its implementation and simulation studies.
Show less - Date Issued
- 2017
- Identifier
- FSU_SUMMER2017_Wang_fsu_0071E_13950
- Format
- Thesis
- Title
- Nonparametric Change Point Detection Methods for Profile Variability.
- Creator
-
Geneus, Vladimir J. (Vladimir Jacques), Chicken, Eric, Liu, Guosheng (Professor of Earth, Ocean and Atmospheric Science), Sinha, Debajyoti, Zhang, Xin (Professor of Engineering)...
Show moreGeneus, Vladimir J. (Vladimir Jacques), Chicken, Eric, Liu, Guosheng (Professor of Earth, Ocean and Atmospheric Science), Sinha, Debajyoti, Zhang, Xin (Professor of Engineering), Florida State University, College of Arts and Sciences, Department of Statistics
Show less - Abstract/Description
-
Due to the importance of seeing profile change in devices such as of medical apparatus, measuring the change point in variability of a different functions is important. In a sequence of functional observations (each of the same length), we wish to determine as quickly as possible when a change in the observations has occurred. Wavelet-based change point methods are proposed that determine when the variability of the noise in a sequence of functional profiles (i.e. the precision profile of...
Show moreDue to the importance of seeing profile change in devices such as of medical apparatus, measuring the change point in variability of a different functions is important. In a sequence of functional observations (each of the same length), we wish to determine as quickly as possible when a change in the observations has occurred. Wavelet-based change point methods are proposed that determine when the variability of the noise in a sequence of functional profiles (i.e. the precision profile of medical devices) has occurred; goes out of control from a known, fixed value, or an estimated in-control value. Various methods have been proposed which focus on changes in the form of the function. One method, the NEWMA, based on EWMA, focuses on changes in both. However, the drawback is that the form of the in-control function is known. Others methods, including the χ² for Phase I & Phase II make some assumption about the function. Our interest, however, is in detecting changes in the variance from one function to the next. In particular, we are interested not on differences from one profile to another (variance between), rather differences in variance (variance within). The functional portion of the profiles is allowed to come from a large class of functions and may vary from profile to profile. The estimator is evaluated on a variety of conditions, including allowing the wavelet noise subspace to be substantially contaminated by the profile's functional structure, and is compared to two competing noise monitoring methods. Nikoo and Noorossana (2013) propose a nonparametric wavelet regression method that uses both change point techniques to monitor the variance: a Nonparametric Control Charts, via the mean of m median control charts, and a Parametric Control Charts, via χ²distribution. We propose improvements to their method by incorporating prior data and making use of likelihood ratios. Our methods make use of the orthogonal properties of wavelet projections to accurately and efficiently monitor the level of noise from one profile to the next; detect changes in noise in Phase II setting. We show through simulation results that our proposed methods have better power and are more robust against the confounding effect between variance estimation and function estimation. The proposed methods are shown to be very efficient at detecting when the variability has changed through an extensive simulation study. Extensions are considered that explore the usage of windowing and estimated in-control values for the MAD method; and the effect of the exact distribution under normality rather than the asymptotic distribution. These developments are implemented in the parametric, nonparametric scale, and complete nonparameric settings. The proposed methodologies are tested through simulation and applicable to various biometric and health related topics; and have the potential to improve in computational efficiency and in reducing the number of assumptions required.
Show less - Date Issued
- 2017
- Identifier
- FSU_SUMMER2017_Geneus_fsu_0071E_13862
- Format
- Thesis
- Title
- Scalable and Structured High Dimensional Covariance Matrix Estimation.
- Creator
-
Sabnis, Gautam, Pati, Debdeep, Kercheval, Alec N., Sinha, Debajyoti, Chicken, Eric, Florida State University, College of Arts and Sciences, Department of Statistics
- Abstract/Description
-
With rapid advances in data acquisition and storage techniques, modern scientific investigations in epidemiology, genomics, imaging and networks are increasingly producing challenging data structures in the form of high-dimensional vectors, matrices and multiway arrays (tensors) rendering traditional statistical and computational tools inappropriate. One hope for meaningful inferences in such situations is to discover an inherent lower-dimensional structure that explains the physical or...
Show moreWith rapid advances in data acquisition and storage techniques, modern scientific investigations in epidemiology, genomics, imaging and networks are increasingly producing challenging data structures in the form of high-dimensional vectors, matrices and multiway arrays (tensors) rendering traditional statistical and computational tools inappropriate. One hope for meaningful inferences in such situations is to discover an inherent lower-dimensional structure that explains the physical or biological process generating the data. The structural assumptions impose constraints that force the objects of interest to lie in lower-dimensional spaces, thereby facilitating their estimation and interpretation and, at the same time reducing computational burden. The assumption of an inherent structure, motivated by various scientific applications, is often adopted as the guiding light in the analysis and is fast becoming a standard tool for parsimonious modeling of such high dimensional data structures. The content of this thesis is specifically directed towards methodological development of statistical tools, with attractive computational properties, for drawing meaningful inferences though such structures. The third chapter of this thesis proposes a distributed computing framework, based on a divide and conquer strategy and hierarchical modeling, to accelerate posterior inference for high-dimensional Bayesian factor models. Our approach distributes the task of high-dimensional covariance matrix estimation to multiple cores, solves each subproblem separately via a latent factor model, and then combines these estimates to produce a global estimate of the covariance matrix. Existing divide and conquer methods focus exclusively on dividing the total number of observations n into subsamples while keeping the dimension p fixed. The approach is novel in this regard: it includes all of the n samples in each subproblem and, instead, splits the dimension p into smaller subsets for each subproblem. The subproblems themselves can be challenging to solve when p is large due to the dependencies across dimensions. To circumvent this issue, a novel hierarchical structure is specified on the latent factors that allows for flexible dependencies across dimensions, while still maintaining computational efficiency. Our approach is readily parallelizable and is shown to have computational efficiency of several orders of magnitude in comparison to fitting a full factor model. The fourth chapter of this thesis proposes a novel way of estimating a covariance matrix that can be represented as a sum of a low-rank matrix and a diagonal matrix. The proposed method compresses high-dimensional data, computes the sample covariance in the compressed space, and lifts it back to the ambient space via a decompression operation. A salient feature of our approach relative to existing literature on combining sparsity and low-rank structures in covariance matrix estimation is that we do not require the low-rank component to be sparse. A principled framework for estimating the compressed dimension using Stein's Unbiased Risk Estimation theory is demonstrated. In the final chapter of this thesis, we tackle the problem of variable selection in high dimensions. Consistent model selection in high dimensions has received substantial interest in recent years and is an extremely challenging problem for Bayesians. The literature on model selection with continuous shrinkage priors is even less-developed due to the unavailability of exact zeros in the posterior samples of parameter of interest. Heuristic methods based on thresholding the posterior mean are often used in practice which lack theoretical justification, and inference is highly sensitive to the choice of the threshold. We aim to address the problem of selecting variables through a novel method of post processing the posterior samples.
Show less - Date Issued
- 2017
- Identifier
- FSU_SUMMER2017_Sabnis_fsu_0071E_14043
- Format
- Thesis
- Title
- An Examination of the Relationship between Alcohol and Dementia in a Longitudinal Study.
- Creator
-
Hu, Tingting, McGee, Daniel, Slate, Elizabeth H., Hurt, Myra M., Niu, Xufeng, Florida State University, College of Arts and Sciences, Department of Statistics
- Abstract/Description
-
The high mortality rate and huge expenditure caused by dementia makes it a pressing concern for public health researchers. Among the potential risk factors in diet and nutrition, the relation between alcohol usage and dementia has been investigated in many studies, but no clear picture has emerged. This association has been reported as protective, neurotoxic, U-shaped curve, and insignificant in different sources. An individual’s alcohol usage is dynamic and could change over time, however,...
Show moreThe high mortality rate and huge expenditure caused by dementia makes it a pressing concern for public health researchers. Among the potential risk factors in diet and nutrition, the relation between alcohol usage and dementia has been investigated in many studies, but no clear picture has emerged. This association has been reported as protective, neurotoxic, U-shaped curve, and insignificant in different sources. An individual’s alcohol usage is dynamic and could change over time, however, to our knowledge, only one study took this time-varying nature into account when assessing the association between alcohol intake and cognition. Using Framingham Heart Study (FHS) data, our work fills an important gap in that both alcohol use and dementia status were included into the analysis longitudinally. Furthermore, we incorporated a gender-specific categorization of alcohol consumption. In this study, we examined three aspects of the association: (1) Concurrent alcohol usage and dementia, longitudinally, (2) Past alcohol usage and later dementia, (3) Cumulative alcohol usage and dementia. The data consisted of 2,192 FHS participants who took Exams 17-23 during 1981-1996, which included dementia assessment, and had complete data on alcohol use (mean follow-up = 40 years) and key covariates. Cognitive status was determined using information from the Mini-Mental State Examinations (MMSE) and the examiner’s assessment. Alcohol consumption was determined in oz/week and also categorized as none, moderate and heavy. We investigated both total alcohol consumption and consumption by type of alcoholic beverage. Results showed that the association between alcohol and dementia may differ by gender and by alcoholic type.
Show less - Date Issued
- 2018
- Identifier
- 2018_Su_Hu_fsu_0071E_14330
- Format
- Thesis
- Title
- A Study of Some Issues of Goodness-of-Fit Tests for Logistic Regression.
- Creator
-
Ma, Wei, McGee, Daniel, Mai, Qing, Levenson, Cathy W., Niu, Xufeng, Florida State University, College of Arts and Sciences, Department of Statistics
- Abstract/Description
-
Goodness-of-fit tests are important to assess how well a model fits a set of observations. Hosmer-Lemeshow (HL) test is a popular and commonly used method to assess the goodness-of-fit for logistic regression. However, there are two issues for using HL test. One of them is that we have to specify the number of partition groups and the different groups often suggest the different decisions. So in this study, we propose several grouping tests to combine multiple HL tests with varying the number...
Show moreGoodness-of-fit tests are important to assess how well a model fits a set of observations. Hosmer-Lemeshow (HL) test is a popular and commonly used method to assess the goodness-of-fit for logistic regression. However, there are two issues for using HL test. One of them is that we have to specify the number of partition groups and the different groups often suggest the different decisions. So in this study, we propose several grouping tests to combine multiple HL tests with varying the number of groups to make the decision instead of just using one arbitrary group or finding the optimum group. This is due to the reason that the best selection for the groups is data-dependent and it is not easy to find. The other drawback of HL test is that it is not powerful to detect the violation of missing interactions between continuous and dichotomous covariates. Therefore, we propose global and interaction tests in order to capture such violations. Simulation studies are carried out to assess the Type I errors and powers for all the proposed tests. These tests are illustrated by the bone mineral density data from NHANES III.
Show less - Date Issued
- 2018
- Identifier
- 2018_Su_Ma_fsu_0071E_14681
- Format
- Thesis
- Title
- AP Student Visual Preferences for Problem Solving.
- Creator
-
Swoyer, Liesl, Department of Statistics
- Abstract/Description
-
The purpose of this study is to explore the mathematical preference of high school AP Calculus students by examining their tendencies for using differing methods of thought. A student's preferred mode of thinking was measured on a scale ranging from a preference for analytical thought to a preference for visual thought as they completed derivative and antiderivative tasks presented both algebraically and graphically. This relates to previous studies by continuing to analyze the factors that...
Show moreThe purpose of this study is to explore the mathematical preference of high school AP Calculus students by examining their tendencies for using differing methods of thought. A student's preferred mode of thinking was measured on a scale ranging from a preference for analytical thought to a preference for visual thought as they completed derivative and antiderivative tasks presented both algebraically and graphically. This relates to previous studies by continuing to analyze the factors that have been found to mediate the students' performance and preference in regards to a variety of calculus tasks. Data was collected by Dr. Erhan Haciomeroglu at the University of Central Florida. Students' preferences were not affected by gender. Students were found to approach graphical and algebraic tasks similarly, without any significant change with regards to derivative or antiderivative nature of the tasks. Highly analytic and highly visual students revealed the same proportion of change in visuality as harmonic students when more difficult calculus tasks were encountered. Thus, a strong preference for visual thinking when completing algebraic tasks was not the determining factor of their preferred method of thinking when approaching graphical tasks.
Show less - Date Issued
- 2012
- Identifier
- FSU_migr_uhm-0052
- Format
- Thesis
- Title
- Elastic Functional Principal Component Analysis for Modeling and Testing of Functional Data.
- Creator
-
Duncan, Megan, Srivastava, Anuj, Klassen, E., Huffer, Fred W., Wu, Wei, Florida State University, College of Arts and Sciences, Department of Statistics
- Abstract/Description
-
Statistical analysis of functional data requires tools for comparing, summarizing and modeling observed functions as elements of a function space. A key issue in Functional Data Analysis (FDA) is the presence of the phase variability in the observed data. A successful statistical model of functional data has to account for the presence of phase variability. Otherwise the ensuing inferences can be inferior. Recent methods for FDA include steps for phase separation or functional alignment. For...
Show moreStatistical analysis of functional data requires tools for comparing, summarizing and modeling observed functions as elements of a function space. A key issue in Functional Data Analysis (FDA) is the presence of the phase variability in the observed data. A successful statistical model of functional data has to account for the presence of phase variability. Otherwise the ensuing inferences can be inferior. Recent methods for FDA include steps for phase separation or functional alignment. For example, Elastic Functional Principal Component Analysis (Elastic FPCA) uses the strengths of Functional Principal Component Analysis (FPCA), along with the tools from Elastic FDA, to perform joint phase-amplitude separation and modeling. A related problem in FDA is to quantify and test for the amount of phase in a given data. We develop two types of hypothesis tests for testing the significance of phase variability: a metric-based approach and a model-based approach. The metric-based approach treats phase and amplitude as independent components and uses their respective metrics to apply the Friedman-Rafsky Test, Schilling's Nearest Neighbors, and Energy Test to test the differences between functions and their amplitudes. In the model-based test, we use Concordance Correlation Coefficients as a tool to quantify the agreement between functions and their reconstructions using FPCA and Elastic FPCA. We demonstrate this framework using a number of simulated and real data, including weather, tecator, and growth data.
Show less - Date Issued
- 2018
- Identifier
- 2018_Sp_Duncan_fsu_0071E_14470
- Format
- Thesis
- Title
- Elastic Functional Regression Model.
- Creator
-
Ahn, Kyungmin, Srivastava, Anuj, Klassen, E., Wu, Wei, Huffer, Fred W., Florida State University, College of Arts and Sciences, Department of Statistics
- Abstract/Description
-
Functional variables serve important roles as predictors in a variety of pattern recognition and vision applications. Focusing on a specific subproblem, termed scalar-on-function regression, most current approaches adopt the standard L2 inner product to form a link between functional predictors and scalar responses. These methods may perform poorly when predictor functions contain nuisance phase variability, i.e., predictors are temporally misaligned due to noise. While a simple solution...
Show moreFunctional variables serve important roles as predictors in a variety of pattern recognition and vision applications. Focusing on a specific subproblem, termed scalar-on-function regression, most current approaches adopt the standard L2 inner product to form a link between functional predictors and scalar responses. These methods may perform poorly when predictor functions contain nuisance phase variability, i.e., predictors are temporally misaligned due to noise. While a simple solution could be to pre-align predictors as a pre-processing step, before applying a regression model, this alignment is seldom optimal from the perspective of regression. In this dissertation, we propose a new approach, termed elastic functional regression, where alignment is included in the regression model itself, and is performed in conjunction with the estimation of other model parameters. This model is based on a norm-preserving warping of predictors, not the standard time warping of functions, and provides better prediction in situations where the shape or the amplitude of the predictor is more useful than its phase. We demonstrate the effectiveness of this framework using simulated and real data.
Show less - Date Issued
- 2018
- Identifier
- 2018_Sp_Ahn_fsu_0071E_14452
- Format
- Thesis
- Title
- Building a Model Performance Measure for Examining Clinical Relevance Using Net Benefit Curves.
- Creator
-
Mukherjee, Anwesha, McGee, Daniel, Hurt, Myra M., Slate, Elizabeth H., Sinha, Debajyoti, Florida State University, College of Arts and Sciences, Department of Statistics
- Abstract/Description
-
ROC curves are often used to evaluate predictive accuracy of statistical prediction models. This thesis studies other measures which not only incorporate the statistical but also the clinical consequences of using a particular prediction model. Depending on the disease and population under study, the mis-classification costs of false positives and false negatives vary. The concept of Decision Curve Analysis (DCA) takes this cost into account, by using the threshold probability (the...
Show moreROC curves are often used to evaluate predictive accuracy of statistical prediction models. This thesis studies other measures which not only incorporate the statistical but also the clinical consequences of using a particular prediction model. Depending on the disease and population under study, the mis-classification costs of false positives and false negatives vary. The concept of Decision Curve Analysis (DCA) takes this cost into account, by using the threshold probability (the probability above which a patient opts for treatment). Using the DCA technique, a Net Benefit Curve is built by plotting "Net Benefit", a function of the expected benefit and expected harm of using a model, by the threshold probability. Only the threshold probability range that is relevant to the disease and the population under study is used to plot the net benefit curve to obtain the optimum results using a particular statistical model. This thesis concentrates on the process of construction of a summary measure to find which predictive model yields highest net benefit. The most intuitive approach is to calculate the area under the net benefit curve. We examined whether the use of weights such as, the estimated empirical distribution of the threshold probability to compute the weighted area under the curve, creates a better summary measure. Real data from multiple cardiovascular research studies- The Diverse Population Collaboration (DPC) datasets, is used to compute the summary measures: area under the ROC curve (AUROC), area under the net benefit curve (ANBC) and weighted area under the net benefit curve (WANBC). The results from the analysis are used to compare these measures to examine whether these measures are in agreement with each other and which would be the best to use in specified clinical scenarios. For different models the summary measures and its standard errors (SE) were calculated to study the variability in the measure. The method of meta-analysis is used to summarize these estimated summary measures to reveal if there is significant variability among these studies.
Show less - Date Issued
- 2018
- Identifier
- 2018_Sp_Mukherjee_fsu_0071E_14350
- Format
- Thesis
- Title
- Non-Parametric and Semi-Parametric Estimation and Inference with Applications to Finance and Bioinformatics.
- Creator
-
Tran, Hoang Trong, She, Yiyuan, Ökten, Giray, Chicken, Eric, Niu, Xufeng, Tao, Minjing, Florida State University, College of Arts and Sciences, Department of Statistics
- Abstract/Description
-
In this dissertation, we develop tools from non-parametric and semi-parametric statistics to perform estimation and inference. In the first chapter, we propose a new method called Non-Parametric Outlier Identification and Smoothing (NOIS), which robustly smooths stock prices, automatically detects outliers and constructs pointwise confidence bands around the resulting curves. In real- world examples of high-frequency data, NOIS successfully detects erroneous prices as outliers and uncovers...
Show moreIn this dissertation, we develop tools from non-parametric and semi-parametric statistics to perform estimation and inference. In the first chapter, we propose a new method called Non-Parametric Outlier Identification and Smoothing (NOIS), which robustly smooths stock prices, automatically detects outliers and constructs pointwise confidence bands around the resulting curves. In real- world examples of high-frequency data, NOIS successfully detects erroneous prices as outliers and uncovers borderline cases for further study. NOIS can also highlight notable features and reveal new insights in inter-day chart patterns. In the second chapter, we focus on a method for non-parametric inference called empirical likelihood (EL). Computation of EL in the case of a fixed parameter vector is a convex optimization problem easily solved by Lagrange multipliers. In the case of a composite empirical likelihood (CEL) test where certain components of the parameter vector are free to vary, the optimization problem becomes non-convex and much more difficult. We propose a new algorithm for the CEL problem named the BI-Linear Algorithm for Composite EmPirical Likelihood (BICEP). We extend the BICEP framework by introducing a new method called Robust Empirical Likelihood (REL) that detects outliers and greatly improves the inference in comparison to the non-robust EL. The REL method is combined with CEL by the TRI-Linear Algorithm for Composite EmPirical Likelihood (TRICEP). We demonstrate the efficacy of the proposed methods on simulated and real world datasets. We present a novel semi-parametric method for variable selection with interesting biological applications in the final chapter. In bioinformatics datasets the experimental units often have structured relationships that are non-linear and hierarchical. For example, in microbiome data the individual taxonomic units are connected to each other through a phylogenetic tree. Conventional techniques for selecting relevant taxa either do not account for the pairwise dependencies between taxa, or assume linear relationships. In this work we propose a new framework for variable selection called Semi-Parametric Affinity Based Selection (SPAS), which has the flexibility to utilize struc- tured and non-parametric relationships between variables. In synthetic data experiments SPAS outperforms existing methods and on real world microbiome datasets it selects taxa according to their phylogenetic similarities.
Show less - Date Issued
- 2018
- Identifier
- 2018_Sp_Tran_fsu_0071E_14477
- Format
- Thesis
- Title
- Bayesian Analysis of Survival Data with Missing Censoring Indicators and Simulation of Interval Censored Data.
- Creator
-
Bunn, Veronica, Sinha, Debajyoti, Brownstein, Naomi Chana, Slate, Elizabeth H., Linero, Antonio Ricardo, Florida State University, College of Arts and Sciences, Department of...
Show moreBunn, Veronica, Sinha, Debajyoti, Brownstein, Naomi Chana, Slate, Elizabeth H., Linero, Antonio Ricardo, Florida State University, College of Arts and Sciences, Department of Statistics
Show less - Abstract/Description
-
In some large clinical studies, it may be impractical to give physical examinations to every subject at his/her last monitoring time in order to diagnose the occurrence of an event of interest. This challenge creates survival data with missing censoring indicators where the probability of missing may depend on time of last monitoring. We present a fully Bayesian semi-parametric method for such survival data to estimate regression parameters of Cox's proportional hazards model [Cox, 1972]....
Show moreIn some large clinical studies, it may be impractical to give physical examinations to every subject at his/her last monitoring time in order to diagnose the occurrence of an event of interest. This challenge creates survival data with missing censoring indicators where the probability of missing may depend on time of last monitoring. We present a fully Bayesian semi-parametric method for such survival data to estimate regression parameters of Cox's proportional hazards model [Cox, 1972]. Simulation studies show that our method performs better than competing methods. We apply the proposed method to data from the Orofacial Pain: Prospective Evaluation and Risk Assessment (OPPERA) study. Clinical studies often include interval censored data. We present a method for the simulation of interval censored data based on Poisson processes. We show that our method gives simulated data that fulfills the assumption of independent interval censoring, and is more computationally efficient that other methods used for simulating interval censored data.
Show less - Date Issued
- 2018
- Identifier
- 2018_Su_Bunn_fsu_0071E_14742
- Format
- Thesis
- Title
- Generalized Mahalanobis Depth in Point Process and Its Application in Neural Coding and Semi-Supervised Learning in Bioinformatics.
- Creator
-
Liu, Shuyi, Wu, Wei, Wang, Xiaoqiang, Zhang, Jinfeng, Mai, Qing, Florida State University, College of Arts and Sciences, Department of Statistics
- Abstract/Description
-
In the first project, we propose to generalize the notion of depth in temporal point process observations. The new depth is defined as a weighted product of two probability terms: 1) the number of events in each process, and 2) the center-outward ranking on the event times conditioned on the number of events. In this study, we adopt the Poisson distribution for the first term and the Mahalanobis depth for the second term. We propose an efficient bootstrapping approach to estimate parameters...
Show moreIn the first project, we propose to generalize the notion of depth in temporal point process observations. The new depth is defined as a weighted product of two probability terms: 1) the number of events in each process, and 2) the center-outward ranking on the event times conditioned on the number of events. In this study, we adopt the Poisson distribution for the first term and the Mahalanobis depth for the second term. We propose an efficient bootstrapping approach to estimate parameters in the defined depth. In the case of Poisson process, the observed events are order statistics where the parameters can be estimated robustly with respect to sample size. We demonstrate the use of the new depth by ranking realizations from a Poisson process. We also test the new method in classification problems using simulations as well as real neural spike train data. It is found that the new framework provides more accurate and robust classifications as compared to commonly used likelihood methods. In the second project, we demonstrate the value of semi-supervised dimension reduction in clinical area. The advantage of semi-supervised dimension reduction is very easy to understand. Semi-Supervised dimension reduction method adopts the unlabeled data information to perform dimension reduction and it can be applied to help build a more precise prediction model comparing with common supervised dimension reduction techniques. After thoroughly comparing with dimension embedding methods with label data only, we show the improvement of semi-supervised dimension reduction with unlabeled data in breast cancer chemotherapy clinical area. In our semi-supervised dimension reduction method, we not only explore adding unlabeled data to linear dimension reduction such as PCA, we also explore semi-supervised non-linear dimension reduction, such as semi-supervised LLE and semi-supervised Isomap.
Show less - Date Issued
- 2018
- Identifier
- 2018_Sp_Liu_fsu_0071E_14367
- Format
- Thesis
- Title
- Volatility Matrix Estimation for High-Frequency Financial Data.
- Creator
-
Xue, Yang, Tao, Minjing, Cheng, Yingmei, Fendler, Rachel Loveitt, Huffer, Fred W., Niu, Xufeng, Florida State University, College of Arts and Sciences, Department of Statistics
- Abstract/Description
-
Volatility is usually employed to measure the dispersion of asset returns, and it’s widely used in risk analysis and asset management. This first chapter studies a kernel-based spot volatility matrix estimator with pre-averaging approach for high-frequency data contaminated by market microstructure noise. When the sample size goes to infinity and the bandwidth vanishes, we show that our estimator is consistent and its asymptotic normality is established with achieving an optimal convergence...
Show moreVolatility is usually employed to measure the dispersion of asset returns, and it’s widely used in risk analysis and asset management. This first chapter studies a kernel-based spot volatility matrix estimator with pre-averaging approach for high-frequency data contaminated by market microstructure noise. When the sample size goes to infinity and the bandwidth vanishes, we show that our estimator is consistent and its asymptotic normality is established with achieving an optimal convergence rate. We also construct a consistent pairwise spot co-volatility estimator with Hayashi-Yoshida method for non-synchronous high-frequency data with noise contamination. The simulation studies demonstrate that the proposed estimators work well under different noise levels, and their estimation performances are improved by the increasing sample frequency. In empirical applications, we implement the estimators on the intraday prices of four component stocks of Dow Jones Industrial Average. The second chapter shows a factor-based vast volatility matrix estimation method for high- frequency financial data with market microstructure noise, finite large jumps and infinite activity small jumps. We construct the sample volatility matrix estimator based on the approximate factor model, and use the pre-averaging and thresholding estimation method (PATH) to digest the noise and jumps. After using the principle component analysis (PCA) to decompose the sample volatility matrix estimator, our proposed volatility matrix estimator is finally obtained by imposing the block-diagonal regularization on the residual covariance matrix through sorting the assets with the global industry classification standard (GICS) codes. The Monte Carlo simulation shows that our proposed volatility matrix estimator can remove the majority effects of noise and jumps, and its estimation performance improves fast when the sample frequency increases. Finally, the PCA-based estimators are employed to perform volatility matrix estimation and asset allocation for S&P 500 stocks. To compare with PCA-based estimators, we also include the exchange-traded funds (ETFs) data to construct observable factors such as the Fama-French factors for volatility estimation.
Show less - Date Issued
- 2018
- Identifier
- 2018_Sp_Xue_fsu_0071E_14471
- Format
- Thesis
- Title
- Wavelet-Based Bayesian Approaches to Sequential Profile Monitoring.
- Creator
-
Varbanov, Roumen, Chicken, Eric, Linero, Antonio Ricardo, Huffenberger, Kevin M., Yang, Yanyun, Florida State University, College of Arts and Sciences, Department of Statistics
- Abstract/Description
-
We consider change-point detection and estimation in sequences of functional observations. This setting often arises when the quality of a process is characterized by such observations, termed profiles, and monitoring profiles for changes in structure can be used to ensure the stability of the process over time. While interest in profile monitoring has grown, few methods approach the problem from a Bayesian perspective. In this dissertation, we propose three wavelet-based Bayesian approaches...
Show moreWe consider change-point detection and estimation in sequences of functional observations. This setting often arises when the quality of a process is characterized by such observations, termed profiles, and monitoring profiles for changes in structure can be used to ensure the stability of the process over time. While interest in profile monitoring has grown, few methods approach the problem from a Bayesian perspective. In this dissertation, we propose three wavelet-based Bayesian approaches to profile monitoring -- the last of which can be extended to a general process monitoring setting. First, we develop a general framework for the problem of interest in which we base inference on the posterior distribution of the change point without placing restrictive assumptions on the form of profiles. The proposed method uses an analytic form of the posterior distribution in order to run online without relying on Markov chain Monte Carlo (MCMC) simulation. Wavelets, an effective tool for estimating nonlinear signals from noise-contaminated observations, enable the method to flexibly distinguish between sustained changes in profiles and the inherent variability of the process. Second, we modify the initial framework in a posterior approximation algorithm designed to utilize past information in a computationally efficient manner. We show that the approximation can detect changes of smaller magnitude better than traditional alternatives for curbing computational cost. Third, we introduce a monitoring scheme that allows an unchanged process to run infinitely long without a false alarm; the scheme maintains the ability to detect a change with probability one. We include theoretical results regarding these properties and illustrate the implementation of the scheme in the previously established framework. We demonstrate the efficacy of proposed methods on simulated data and significantly outperform a relevant frequentist competitor.
Show less - Date Issued
- 2018
- Identifier
- 2018_Sp_Varbanov_fsu_0071E_14513
- Format
- Thesis
- Title
- Tests and Classifications in Adaptive Designs with Applications.
- Creator
-
Chen, Qiusheng, Niu, Xufeng, McGee, Daniel, Slate, Elizabeth H., Zhang, Jinfeng, Florida State University, College of Arts and Sciences, Department of Statistics
- Abstract/Description
-
Statistical tests for biomarker identification and classification methods for patient grouping are two important topics in adaptive designs of clinical trials. In this article, we evaluate four test methods for biomarker identification: a model-based identification method, the popular t-test, the nonparametric Wilcoxon Rank Sum test, and the Least Absolute Shrinkage and Selection Operator (Lasso) method. For selecting the best classification methods in Stage 2 of an adaptive design, we...
Show moreStatistical tests for biomarker identification and classification methods for patient grouping are two important topics in adaptive designs of clinical trials. In this article, we evaluate four test methods for biomarker identification: a model-based identification method, the popular t-test, the nonparametric Wilcoxon Rank Sum test, and the Least Absolute Shrinkage and Selection Operator (Lasso) method. For selecting the best classification methods in Stage 2 of an adaptive design, we examine classification methods including the recently developed machine learning approaches such as Random Forest, Lasso and Elastic-Net Regularized Generalized Linear Models (Glmnet), Support Vector Machine (SVM), Gradient Boosting Machine (GBM), and Extreme Gradient Boost- ing (XGBoost). Statistical simulations are carried out in our study to assess the performance of biomarker identification methods and the classification methods. The best identification method and the classification technique will be selected based on the True Positive Rate (TPR,also called Sensitivity) and the True Negative Rate (TNR,also called Specificity). The optimal test method for gene identification and classification method for patient grouping will be applied to the Adap- tive Signature Design (ASD) for the purpose of evaluating the performance of ASD in different situations, including simulated data and a real data set for breast cancer patients.
Show less - Date Issued
- 2018
- Identifier
- 2018_Sp_Chen_fsu_0071E_14309
- Format
- Thesis
- Title
- Statistical Shape Analysis of Neuronal Tree Structures.
- Creator
-
Duncan, Adam, Srivastava, Anuj, Klassen, E., Wu, Wei, Huffer, Fred W., Florida State University, College of Arts and Sciences, Department of Statistics
- Abstract/Description
-
Neuron morphology plays a central role in characterizing cognitive health and functionality of brain structures. The problem of quantifying neuron shapes, and capturing statistical variability of shapes, is difficult because axons and dendrites have tree structures that differ in both geometry and topology. In this work, we restrict to the trees that consist of: (1) a main branch viewed as a parameterized curve in ℝ³, and (2) some number of secondary branches -- also parameterized curves in...
Show moreNeuron morphology plays a central role in characterizing cognitive health and functionality of brain structures. The problem of quantifying neuron shapes, and capturing statistical variability of shapes, is difficult because axons and dendrites have tree structures that differ in both geometry and topology. In this work, we restrict to the trees that consist of: (1) a main branch viewed as a parameterized curve in ℝ³, and (2) some number of secondary branches -- also parameterized curves in ℝ³ -- which emanate from the main branch at arbitrary points. We present two shape-analytic frameworks which each give a metric structure to the set of such tree shapes, Both frameworks are based on an elastic metric on the space of curves with certain shape-preserving nuisance variables modded out. In the first framework, the side branches are treated as a continuum of curve-valued annotations to the main branch. In the second framework, the side branches are treated as discrete entities and are matched to each other by permutation. We show geodesic deformations between tree shapes in both frameworks, and we show Fréchet means and modes of variability, as well as cross-validated classification between different experimental groups using the second framework. We conclude with a smaller project which extends some of these ideas to more general weighted attributed graphs.
Show less - Date Issued
- 2018
- Identifier
- 2018_Sp_Duncan_fsu_0071E_14500
- Format
- Thesis
- Title
- Two Studies on the Application of Machine Learning for Biomedical Big Data.
- Creator
-
Lung, Pei-Yau, Zhang, Jinfeng, Liu, Xiuwen, Barbu, Adrian G., Wu, Wei, Florida State University, College of Arts and Sciences, Department of Statistics
- Abstract/Description
-
Large volumes of genomic data and new scientific discoveries in biomedical research are being made every day by laboratories in both academia and industry. However, two issues severely affect the usability of so-called biomedical big data: 1) the majority of the public genomic data do not contain enough clinical information, and 2) scientific discoveries are stored in text as unstructured data. This dissertation presents two studies, which address each issue using machine learning methods, in...
Show moreLarge volumes of genomic data and new scientific discoveries in biomedical research are being made every day by laboratories in both academia and industry. However, two issues severely affect the usability of so-called biomedical big data: 1) the majority of the public genomic data do not contain enough clinical information, and 2) scientific discoveries are stored in text as unstructured data. This dissertation presents two studies, which address each issue using machine learning methods, in order to maximize the usability of biomedical big data. In the first study, we infer missing clinical information using multiple gene expression data sets and a wide variety of machine learning methods. We proposed a new performance measure, Proportion of Positives which can be predicted with High accuracy (PPH), to evaluate models in term of their effectiveness in recovering data with missing clinical information. PPH estimates the percentage of data that can be recovered given a desired level of accuracy. The experiment results demonstrate the effectiveness of the predicted clinical information in downstream inference tasks. In the second study, we propose a three-stage computational method to automatically extract chemical-protein interactions (CPIs) from a given text. Our method extracts CPI-pairs and CPI-triplets from sentences; where a CPI-pair consists of a chemical compound and a protein name, and a CPI-triplet consists of a CPI-pair along with an interaction word describing their relationship. We extract a diverse set of features from sentences, which are used to build multiple machine learning models. Our models contain both simple features, which can be directly computed from sentences, and more sophisticated features derived using sentence structure analysis techniques. Our method performed the best among systems which use non-deep-learning methods, and outperformed several deep-learning-based systems in the track 5 of the BioCreative VI challenge. The features we designed in this study are informative and can be applied to other machine learning methods including deep learning.
Show less - Date Issued
- 2019
- Identifier
- 2019_Summer_Lung_fsu_0071E_15134
- Format
- Thesis
- Title
- Survival Analysis Using Bayesian Joint Models.
- Creator
-
Xu, Zhixing, Sinha, Debajyoti, Schatschneider, Christopher, Bradley, Jonathan R., Chicken, Eric, Lin, Lifeng, Florida State University, College of Arts and Sciences, Department...
Show moreXu, Zhixing, Sinha, Debajyoti, Schatschneider, Christopher, Bradley, Jonathan R., Chicken, Eric, Lin, Lifeng, Florida State University, College of Arts and Sciences, Department of Statistics
Show less - Abstract/Description
-
In many clinical studies, each patient is at risk of recurrent events as well as the terminating event. In Chapter 2, we present a novel latent-class based semiparametric joint model that offers clinically meaningful and estimable association between the recurrence profile and risk of termination. Unlike previous shared-frailty based joint models, this model has a coherent interpretation of the covariate effects on all relevant functions and model quantities that are either conditional or...
Show moreIn many clinical studies, each patient is at risk of recurrent events as well as the terminating event. In Chapter 2, we present a novel latent-class based semiparametric joint model that offers clinically meaningful and estimable association between the recurrence profile and risk of termination. Unlike previous shared-frailty based joint models, this model has a coherent interpretation of the covariate effects on all relevant functions and model quantities that are either conditional or unconditional on events history. We offer a fully Bayesian method for estimation and prediction using a complete specification of the prior process of the baseline functions. When there is a lack of prior information about the baseline functions, we derive a practical and theoretically justifiable partial likelihood based semiparametric Bayesian approach. Our Markov Chain Monte Carlo tools for both Bayesian methods are implementable via publicly available software. Practical advantages of our methods are illustrated via a simulation study and the analysis of a transplant study with recurrent Non-Fatal Graft Rejections (NFGR) and the termination event of death due to total graft rejection. In Chapter 3, we are motivated by the important problem of estimating Daily Fine Particulate Matter (PM2.5) over the US. Tracking and estimating Daily Fine Particulate Matter (PM2.5) is very important as it has been shown that PM2.5 is directly related to mortality related to the lungs, cardiovascular system, and stroke. That is, high values of PM2.5 constitute a public health problem in the US, and it is important that we precisely estimate PM2.5 to aid in public policy decisions. Thus, we propose a Bayesian hierarchical model for high-dimensional ``multi-type" responses. By ``multi-type" responses we mean a collection of correlated responses that have different distributional assumptions (e.g., continuous skewed observations, and count-valued observations). The Centers for Disease Control and Prevention (CDC) database provides counts of mortalities related to PM2.5 and daily averaged PM2.5 which are treated as responses in our analysis. Our model capitalizes on the shared conjugate structure between the Weibull (to model PM2.5), Poisson (to model diseases mortalities), and multivariate log-gamma distributions, and use dimension reduction to aid with computation. Our model can also be used to improve the precision of estimates and estimate at undisclosed/missing counties. We provide a simulation study to illustrate the performance of the model and give an in-depth analysis of the CDC dataset.
Show less - Date Issued
- 2019
- Identifier
- 2019_Spring_Xu_fsu_0071E_15078
- Format
- Thesis
- Title
- Fused Lasso and Tensor Covariance Learning with Robust Estimation.
- Creator
-
Kunz, Matthew Ross, She, Yiyuan, Stiegman, Albert E., Mai, Qing, Chicken, Eric, Florida State University, College of Arts and Sciences, Department of Statistics
- Abstract/Description
-
With the increase in computation and data storage, there has been a vast collection of information gained with scientific measurement devices. However, with this increase in data and variety of domain applications, statistical methodology must be tailored to specific problems. This dissertation is focused on analyzing chemical information with an underlying structure. Robust fused lasso leverages information about the neighboring regression coefficient structure to create blocks of...
Show moreWith the increase in computation and data storage, there has been a vast collection of information gained with scientific measurement devices. However, with this increase in data and variety of domain applications, statistical methodology must be tailored to specific problems. This dissertation is focused on analyzing chemical information with an underlying structure. Robust fused lasso leverages information about the neighboring regression coefficient structure to create blocks of coefficients. Robust modifications are made to the mean to account for gross outliers in the data. This method is applied to near infrared spectral measurements in prediction of an aqueous analyte concentration and is shown to improve prediction accuracy. Expansion on the robust estimation and structure analysis is performed by examining graph structures within a clustered tensor. The tensor is subjected to wavelet smoothing and robust sparse precision matrix estimation for a detailed look into the covariance structure. This methodology is applied to catalytic kinetics data where the graph structure estimates the elementary steps within the reaction mechanism.
Show less - Date Issued
- 2018
- Identifier
- 2018_Fall_Kunz_fsu_0071E_14844
- Format
- Thesis
- Title
- Marked Determinantal Point Processes.
- Creator
-
Feng, Yiming, Nolder, Craig, Niu, Xufeng, Bradley, Jonathan R., Huffer, Fred W. (Fred William), Florida State University, College of Arts and Sciences, Department of Statistics
- Abstract/Description
-
Determinantal point processes (DPPs), which can be dened by their correlation kernels with known moments, are useful models for point patterns where nearby points exhibit repulsion. They have many nice properties, such as closed-form densities, tractable estimation of parameterized families, and no edge eects. In the past, univariate DPPs have been well-studied, both in discrete and continuous settings although their statistical applications are fairly recent and still rather limited, whereas...
Show moreDeterminantal point processes (DPPs), which can be dened by their correlation kernels with known moments, are useful models for point patterns where nearby points exhibit repulsion. They have many nice properties, such as closed-form densities, tractable estimation of parameterized families, and no edge eects. In the past, univariate DPPs have been well-studied, both in discrete and continuous settings although their statistical applications are fairly recent and still rather limited, whereas the multivariate DPPs, or the so-called multi-type marked DPPs, have been little explored. In this thesis, we propose a class of multivariate DPPs based on a block kernel construction. For the marked DPP, we show that the conditions of existence of DPP can easily be satised. The block construction allows us to model the individually marked DPPs as well as controlling the scale of repulsion of points having dierent marks. Unlike other researchers who model the kernel function of a DPP, we model its spectral representation, which not only guarantees the existence of the multivariate DPP, but makes the simulation-based estimation methods readily available. In our research, we adopted bivariate complex Fourier basis, which demonstrates nice properties such as constant intensity and approximate isotropy within a short distance between the nearby points. The parameterized block kernels can approximate to commonly-used covariance functions using Fourier expansion. The parameters can be estimated using Maximum Likelihood Estimation, Bayesian approach and Minimum Contrast Estimation.
Show less - Date Issued
- 2019
- Identifier
- 2019_Spring_Feng_fsu_0071E_15011
- Format
- Thesis
- Title
- Bayesian Tractography Using Geometric Shape Priors.
- Creator
-
Dong, Xiaoming, Srivastava, Anuj, Klassen, E. (Eric), Wu, Wei, Huffer, Fred W. (Fred William), Florida State University, College of Arts and Sciences, Department of Statistics
- Abstract/Description
-
Diffusion-weighted image(DWI) and tractography have been developed for decades and are key elements in recent, large-scale efforts for mapping the human brain. The two techniques together provide us a unique possibility to access the macroscopic structure and connectivity of the human brain non-invasively and in vivo. The information obtained not only can help visualize brain connectivity and help segment the brain into different functional areas but also provides tools for understanding some...
Show moreDiffusion-weighted image(DWI) and tractography have been developed for decades and are key elements in recent, large-scale efforts for mapping the human brain. The two techniques together provide us a unique possibility to access the macroscopic structure and connectivity of the human brain non-invasively and in vivo. The information obtained not only can help visualize brain connectivity and help segment the brain into different functional areas but also provides tools for understanding some major cognitive diseases such as multiple sclerosis, schizophrenia, epilepsy, etc. There are lots of efforts have been put into this area. On the one hand, a vast spectrum of tractography algorithms have been developed in recent years, ranging from deterministic approaches through probabilistic methods to global tractography; On the other hand, various mathematical models, such as diffusion tensor, multi-tensor model, spherical deconvolution, Q-ball modeling, have been developed to better exploit the acquisition dependent signal of Diffusion-weighted image(DWI). Despite considerable progress in this area, current methods still face many challenges, such as sensitive to noise, lots of false positive/negative fibers, incapable of handling complex fiber geometry and expensive computation cost. More importantly, recent researches have shown that, even with high-quality data, the results using current tractography methods may not be improved, suggesting that it is unlikely to obtain an anatomically accurate map of the human brain solely based on the diffusion profile. Motivated by these issues, this dissertation develops a global approach that incorporates anatomical validated geometric shape prior when reconstructing neuron fibers. The fiber tracts between regions of interest are initialized and updated via deformations based on gradients of the posterior energy defined in this paper. This energy has contributions from diffusion data, shape prior information, and roughness penalty. The dissertation first describes and demonstrates the proposed method on the 2D dataset and then extends it to 3D Phantom data and the real brain data. The results show that the proposed method is relatively immune to issues such as noise, complicated fiber structure like fiber crossings and kissing, false positive fibers, and achieve more explainable tractography results.
Show less - Date Issued
- 2019
- Identifier
- 2019_Spring_DONG_fsu_0071E_15144
- Format
- Thesis
- Title
- Envelopes, Subspace Learning and Applications.
- Creator
-
Wang, Wenjing, Zhang, Xin, Tao, Minjing, Li, Wen, Huffer, Fred W. (Fred William), Florida State University, College of Arts and Sciences, Department of Statistics
- Abstract/Description
-
Envelope model is a nascent dimension reduction technique. We focus on extending the envelope methodology to broader applications. In the first part of this thesis we propose a common reducing subspace model that can simultaneously estimating covariance, precision matrices and their differences across multiple populations. This model leads to substantial dimension reduction and efficient parameter estimation. We explicitly quantify the efficiency gain through an asymptotic analysis. In the...
Show moreEnvelope model is a nascent dimension reduction technique. We focus on extending the envelope methodology to broader applications. In the first part of this thesis we propose a common reducing subspace model that can simultaneously estimating covariance, precision matrices and their differences across multiple populations. This model leads to substantial dimension reduction and efficient parameter estimation. We explicitly quantify the efficiency gain through an asymptotic analysis. In the second part, we propose a set of new mixture models called CLEMM (Clustering with Envelope Mixture Models) that is based on the widely used Gaussian mixture model assumptions. The proposed CLEMM framework and the associated envelope-EM algorithms provides the foundations for envelope methodology in unsupervised and semi-supervised learning problems. We also illustrate the performance of these models with simulation studies and empirical applications. Also, we have extended the envelope discriminant analysis from vector data to tensor data in the third part of this thesis. Another study on copula-based models for forecasting realized volatility matrix is included, which is an important financial application of estimating covariance matrices. We consider multivariate-t, Clayton, and bivariate t, Gumbel, Clayton copulas to model and forecast one-day ahead realized volatility matrices. Empirical results show that copula based models can achieve significant performance both in terms of statistical precision and economical efficiency.
Show less - Date Issued
- 2019
- Identifier
- 2019_Spring_Wang_fsu_0071E_15085
- Format
- Thesis
- Title
- A Bayesian Semiparametric Joint Model for Longitudinal and Survival Data.
- Creator
-
Wang, Pengpeng, Slate, Elizabeth H., Bradley, Jonathan R., Wetherby, Amy M., Lin, Lifeng, Florida State University, College of Arts and Sciences, Department of Statistics
- Abstract/Description
-
Many biomedical studies monitor both a longitudinal marker and a survival time on each subject under study. Modeling these two endpoints as joint responses has potential to improve the inference for both. We consider the approach of Brown and Ibrahim (2003) that proposes a Bayesian hierarchical semiparametric joint model. The model links the longitudinal and survival outcomes by incorporating the mean longitudinal trajectory as a predictor for the survival time. The usual parametric mixed...
Show moreMany biomedical studies monitor both a longitudinal marker and a survival time on each subject under study. Modeling these two endpoints as joint responses has potential to improve the inference for both. We consider the approach of Brown and Ibrahim (2003) that proposes a Bayesian hierarchical semiparametric joint model. The model links the longitudinal and survival outcomes by incorporating the mean longitudinal trajectory as a predictor for the survival time. The usual parametric mixed effects model for the longitudinal trajectory is relaxed by using a Dirichlet process prior on the coefficients. A Cox proportional hazards model is then used for the survival time. The complicated joint likelihood increases the computational complexity. We develop a computationally efficient method by using a multivariate log-gamma distribution instead of Gaussian distribution to model the data. We use Gibbs sampling combined with Neal's algorithm (2000) and the Metropolis-Hastings method for inference. Simulation studies illustrate the procedure and compare this log-gamma joint model with the Gaussian joint models. We apply this joint modeling method to a human immunodeciency virus (HIV) data and a prostate-specific antigen (PSA) data.
Show less - Date Issued
- 2019
- Identifier
- 2019_Spring_Wang_fsu_0071E_15120
- Format
- Thesis
- Title
- High-Dimensional Statistical Methods for Tensor Data and Efficient Algorithms.
- Creator
-
Pan, Yuqing, Mai, Qing, Zhang, Xin, Yu, Weikuan, Slate, Elizabeth H., Florida State University, College of Arts and Sciences, Department of Statistics
- Abstract/Description
-
In contemporary sciences, it is of great interest to study supervised and unsupervised learning problems of high-dimensional tensor data. In this dissertation, we develop new methods for tensor classification and clustering problems, and discuss algorithms to enhance their performance. For supervised learning, we propose CATCH model, in short for Covariate-Adjusted Tensor Classification in High-dimensions, which efficiently integrates the low-dimensional covariates and the tensor to perform...
Show moreIn contemporary sciences, it is of great interest to study supervised and unsupervised learning problems of high-dimensional tensor data. In this dissertation, we develop new methods for tensor classification and clustering problems, and discuss algorithms to enhance their performance. For supervised learning, we propose CATCH model, in short for Covariate-Adjusted Tensor Classification in High-dimensions, which efficiently integrates the low-dimensional covariates and the tensor to perform classification and variable selection. The CATCH model preserves and utilizes the structures of the data for maximum interpretability and optimal prediction. We propose a penalized approach to select a subset of tensor predictor entries that has direct discriminative effects after adjusting for covariates. Theoretical results confirm that our approach achieves variable selection consistency and optimal classification accuracy. For unsupervised learning, we consider clustering problem on high-dimensional tensor data. we propose an efficient procedure based on EM algorithm. It directly estimates the sparse discriminant vector from a penalized objective function and provides computationally efficient rules to update all other parameters. Meanwhile, the algorithm takes advantage of the tensor structure to reduce the number of parameters, which leads to lower storage costs. The performance of our method over existing methods is demonstrated in simulated and real data examples. Moreover, based on tensor computation, we propose a novel algorithm referred to as the SMORE algorithm for differential network analysis. The SMORE algorithm has low storage cost and high computation speed, especially in the presence of strong sparsity. It also provides a unified framework for binary and multiple network problems. In addition, we note that the SMORE algorithm can be applied to high-dimensional quadratic discriminant analysis problems, providing a new approach for multiclass high-dimensional quadratic discriminant analysis. In the end, we discuss some directions of the future work, including new approaches, applications and relaxing assumptions.
Show less - Date Issued
- 2019
- Identifier
- 2019_Spring_Pan_fsu_0071E_15135
- Format
- Thesis
- Title
- Univariate and Multivariate Volatility Models for Portfolio Value at Risk.
- Creator
-
Xiao, Jingyi, Niu, Xufeng, Ökten, Giray, Wu, Wei, Huffer, Fred W. (Fred William), Florida State University, College of Arts and Sciences, Department of Statistics
- Abstract/Description
-
In modern day financial risk management, modeling and forecasting stock return movements via their conditional volatilities, particularly predicting the Value at Risk (VaR), became increasingly more important for a healthy economical environment. In this dissertation, we evaluate and compare two main families of models for the conditional volatilities - GARCH and Stochastic Volatility (SV) - in terms of their VaR prediction performance of 5 major US stock indices. We calculate GARCH-type...
Show moreIn modern day financial risk management, modeling and forecasting stock return movements via their conditional volatilities, particularly predicting the Value at Risk (VaR), became increasingly more important for a healthy economical environment. In this dissertation, we evaluate and compare two main families of models for the conditional volatilities - GARCH and Stochastic Volatility (SV) - in terms of their VaR prediction performance of 5 major US stock indices. We calculate GARCH-type model parameters via Quasi Maximum Likelihood Estimation (QMLE) while for those of SV we employ MCMC with Ancillary Sufficient Interweaving Strategy. We use the forecast volatilities corresponding to each model to predict the VaR of the 5 indices. We test the predictive performances of the estimated models by a two-stage backtesting procedure and then compare them via the Lopez loss function. Results of this dissertation indicate that even though it is more computational demanding than GARCH-type models, SV dominates them in forecasting VaR. Since financial volatilities are moving together across assets and markets, it becomes apparent that modeling the volatilities in a multivariate framework of modeling is more appropriate. However, existing studies in the literature do not present compelling evidence for a strong preference between univariate and multivariate models. In this dissertation we also address the problem of forecasting portfolio VaR via multivariate GARCH models versus univariate GARCH models. We construct 3 portfolios with stock returns of 3 major US stock indices, 6 major banks and 6 major technical companies respectively. For each portfolio, we model the portfolio conditional covariances with GARCH, EGARCH and MGARCH-BEKK, MGARCH-DCC, and GO-GARCH models. For each estimated model, the forecast portfolio volatilities are further used to calculate (portfolio) VaR. The ability to capture the portfolio volatilities is evaluated by MAE and RMSE; the VaR prediction performance is tested through a two-stage backtesting procedure and compared in terms of the loss function. The results of our study indicate that even though MGARCH models are better in predicting the volatilities of some portfolios, GARCH models could perform as well as their multivariate (and computationally more demanding) counterparts.
Show less - Date Issued
- 2019
- Identifier
- 2019_Spring_Xiao_fsu_0071E_15172
- Format
- Thesis
- Title
- Online Feature Selection with Annealing and Its Applications.
- Creator
-
Sun, Lizhe, Barbu, Adrian G., Kumar, Piyush, She, Yiyuan, Linero, Antonio Ricardo, Florida State University, College of Arts and Sciences, Department of Statistics
- Abstract/Description
-
Feature selection is an important technique for high dimensional statistics and machine learning. It has many applications in computer vision, natural language processing, bioinformatics, etc. However, most of the feature selection methods in the literature are proposed for offline learning while the existing online feature selection methods have limitations in true feature recovery. In this dissertation, we propose some novel online feature selection methods and a framework. One is called...
Show moreFeature selection is an important technique for high dimensional statistics and machine learning. It has many applications in computer vision, natural language processing, bioinformatics, etc. However, most of the feature selection methods in the literature are proposed for offline learning while the existing online feature selection methods have limitations in true feature recovery. In this dissertation, we propose some novel online feature selection methods and a framework. One is called Stochastic Feature Selection with Annealing, and the other one is the framework of running averages. Based on the methods and the framework we developed, we can recover the support of the true features with higher accuracy. We provide a theoretical analysis, and through simulations and experiments on real sparse datasets, we show that our proposed methods compare favorably with some state-of-the-art online methods in the literature.
Show less - Date Issued
- 2019
- Identifier
- 2019_Summer_Sun_fsu_0071E_15253
- Format
- Thesis
- Title
- Shape Based Function Estimation.
- Creator
-
Dasgupta, Sutanoy, Srivastava, Anuj, Pati, Debdeep, Klassen, E. (Eric), Huffer, Fred W. (Fred William), Wu, Wei, Florida State University, College of Arts and Sciences,...
Show moreDasgupta, Sutanoy, Srivastava, Anuj, Pati, Debdeep, Klassen, E. (Eric), Huffer, Fred W. (Fred William), Wu, Wei, Florida State University, College of Arts and Sciences, Department of Statistics
Show less - Abstract/Description
-
Estimation of functions is an extremely rich and well-researched topic of research with broad applications spanning several scientific fields. We develop a shape based framework for probability density and general function modelling. The framework encompasses both shape constrained and unconstrained estimation, and can accomodate a much broader notion of shape constraints than has been considered in literature. The estimation approach is a two step process where the first step creates a...
Show moreEstimation of functions is an extremely rich and well-researched topic of research with broad applications spanning several scientific fields. We develop a shape based framework for probability density and general function modelling. The framework encompasses both shape constrained and unconstrained estimation, and can accomodate a much broader notion of shape constraints than has been considered in literature. The estimation approach is a two step process where the first step creates a template or an initial guess, and the second important step ``improves" the estimate according to an appropriate objective function. We derive asymptotic properties of the estimators in different scenarios, and illustrate the performance of the estimate through several simulation as well as real data examples.
Show less - Date Issued
- 2019
- Identifier
- 2019_Summer_Dasgupta_fsu_0071E_15347
- Format
- Thesis
- Title
- The Relationship Between Body Mass and Blood Pressure in Diverse Populations.
- Creator
-
Abayomi, Emilola J., McGee, Daniel, Lackland, Daniel, Hurt, Myra, Chicken, Eric, Niu, Xufeng, Department of Statistics, Florida State University
- Abstract/Description
-
High blood pressure is a major determinant of risk for Coronary Heart Disease (CHD) and stroke, leading causes of death in the industrialized world. A myriad of pharmacological treatments for elevated blood pressure, defined as a blood pressure greater than 140/90mmHg, are available and have at least partially resulted in large reductions in the incidence of CHD and stroke in the U.S. over the last 50 years. The factors that may increase blood pressure levels are not well understood, but body...
Show moreHigh blood pressure is a major determinant of risk for Coronary Heart Disease (CHD) and stroke, leading causes of death in the industrialized world. A myriad of pharmacological treatments for elevated blood pressure, defined as a blood pressure greater than 140/90mmHg, are available and have at least partially resulted in large reductions in the incidence of CHD and stroke in the U.S. over the last 50 years. The factors that may increase blood pressure levels are not well understood, but body mass is thought to be a major determinant of blood pressure level. Obesity is measured through various methods (skinfolds, waist-to-hip ratio, bioelectrical impedance analysis (BIA), etc.), but the most commonly used measure is body mass index,BMI= Weight(kg)/Height(m)2
Show less - Date Issued
- 2012
- Identifier
- FSU_migr_etd-5308
- Format
- Thesis
- Title
- Statistical Models on Human Shapes with Application to Bayesian Image Segmentation and Gait Recognition.
- Creator
-
Kaziska, David M., Srivastava, Anuj, Mio, Washington, Chicken, Eric, Wegkamp, Marten, Department of Statistics, Florida State University
- Abstract/Description
-
In this dissertation we develop probability models for human shapes and apply those probability models to the problems of image segmentation and human identi_cation by gait recognition. To build probability models on human shapes, we consider human shape to be realizations of random variables on a space of simple closed curves and a space of elastic curves. Both of these spaces are quotient spaces of in_nite dimensional manifolds. Our probability models arise through Tangent Principal...
Show moreIn this dissertation we develop probability models for human shapes and apply those probability models to the problems of image segmentation and human identi_cation by gait recognition. To build probability models on human shapes, we consider human shape to be realizations of random variables on a space of simple closed curves and a space of elastic curves. Both of these spaces are quotient spaces of in_nite dimensional manifolds. Our probability models arise through Tangent Principal Component Analysis, a method of studying probability models on manifolds by projecting them onto a tangent plane to the manifold. Since we put the tangent plane at the Karcher mean of sample shapes, we begin our study by examining statistical properties of Karcher means on manifolds. We derive theoretical results for the location of Karcher means on certain manifolds, and perform a simulation study of properties of Karcher means on our shape space. Turning to the speci_c problem of distributions on human shapes we examine alternatives for probability models and _nd that kernel density estimators perform well. We use this model to sample shapes and to perform shape testing. The _rst application we consider is human detection in infrared images. We pursue this application using Bayesian image segmentation, in which our proposed human in an image is a maximum likelihood estimate, obtained using a prior distribution on human shapes and a likelihood arising from a divergence measure on the pixels in the image. We then consider human identi_cation by gait recognition. We examine human gait as a cyclo-stationary process on the space of elastic curves and develop a metric on processes based on the geodesic distance between sequences on that space. We develop and demonstrate a framework for gait recognition based on this metric, which includes the following elements: automatic detection of gait cycles, interpolation to register gait cycles, computation of a mean gait cycle, and identi_cation by matching a test cycle to the nearest member of a training set. We perform the matching both by an exhaustive search of the training set and through an expedited method using cluster-based trees and boosting.
Show less - Date Issued
- 2005
- Identifier
- FSU_migr_etd-3275
- Format
- Thesis
- Title
- Nonparametric Data Analysis on Manifolds with Applications in Medical Imaging.
- Creator
-
Osborne, Daniel Eugene, Patrangenaru, Victor, Liu, Xiuwen, Barbu, Adrian, Chicken, Eric, Department of Statistics, Florida State University
- Abstract/Description
-
Over the past twenty years, there has been a rapid development in Nonparametric Statistical Analysis on Manifolds applied to Medical Imaging problems. In this body of work, we focus on two different medical imaging problems. The first problem corresponds to analyzing the CT scan data. In this context, we perform nonparametric analysis on the 3D data retrieved from CT scans of healthy young adults, on the Size-and-Reflection Shape Space of k-ads in general position in 3D. This work is a part...
Show moreOver the past twenty years, there has been a rapid development in Nonparametric Statistical Analysis on Manifolds applied to Medical Imaging problems. In this body of work, we focus on two different medical imaging problems. The first problem corresponds to analyzing the CT scan data. In this context, we perform nonparametric analysis on the 3D data retrieved from CT scans of healthy young adults, on the Size-and-Reflection Shape Space of k-ads in general position in 3D. This work is a part of larger project on planning reconstructive surgery in severe skull injuries which includes pre-processing and post-processing steps of CT images. The next problem corresponds to analyzing MR diffusion tensor imaging data. Here, we develop a two-sample procedure for testing the equality of the generalized Frobenius means of two independent populations on the space of symmetric positive matrices. These new methods, naturally lead to an analysis based on Cholesky decompositions of covariance matrices which helps to decrease computational time and does not increase dimensionality. The resulting nonparametric matrix valued statistics are used for testing if there is a difference on average between corresponding signals in Diffusion Tensor Images (DTI) in young children with dyslexia when compared to their clinically normal peers. The results presented here correspond to data that was previously used in the literature using parametric methods which also showed a significant difference.
Show less - Date Issued
- 2012
- Identifier
- FSU_migr_etd-5085
- Format
- Thesis
- Title
- Mixed-Effects Models for Count Data with Applications to Educational Research.
- Creator
-
Shin, Jihyung, Niu, Xufeng, Hu, Shouping, Al Otaiba, Stephanie Dent, McGee, Daniel, Wu, Wei, Department of Statistics, Florida State University
- Abstract/Description
-
This research is motivated by an analysis of reading research data. We are interested in modeling the test outcome of ability to fluently recode letters into sounds of kindergarten children aged between 5 and 7. The data showed excessive zero scores (more than 30% of children) on the test. In this dissertation, we carefully examine the models dealing with excessive zeros, which are based on the mixture of distributions, a distribution with zeros and a standard probability distribution with...
Show moreThis research is motivated by an analysis of reading research data. We are interested in modeling the test outcome of ability to fluently recode letters into sounds of kindergarten children aged between 5 and 7. The data showed excessive zero scores (more than 30% of children) on the test. In this dissertation, we carefully examine the models dealing with excessive zeros, which are based on the mixture of distributions, a distribution with zeros and a standard probability distribution with non negative values. In such cases, a log normal variable or a Poisson random variable is often observed with probability from semicontinuous data or count data. The previously proposed models, mixed-effects and mixed-distribution models (MEMD) by Tooze(2002) et al. for semicontinuous data and zero-inflated Poisson (ZIP) regression models by Lambert(1992) for count data are reviewed. We apply zero-inflated Poisson models to repeated measures data of zero-inflated data by introducing a pair of possibly correlated random effects to the zero-inflated Poisson model to accommodate within-subject correlation and between subject heterogeneity. The model describes the effect of predictor variables on the probability of nonzero responses (occurrence) and mean of nonzero responses (intensity) separately. The likelihood function is maximized using dual quasi-Newton optimization of an approximated by adaptive Gaussian quadrature. The maximum likelihood estimates are obtained through standard statistical software package. Using different model parameters, the number of subject, and the number of measurements per subject, the simulation study is conducted and the results are presented. The dissertation ends with the application of the model to reading research data and future research. We examine the number of correct letter sound counted of children collected over 2008 -2009 academic year. We find that age, gender and socioeconomic status are significantly related to the letter sound fluency of children in both parts of the model. The model provides better explanation of data structure and easier interpretations of parameter values, as they are the same as in standard logistic models and Poisson regression models. The model can be extended to accommodate serial correlation which can be observed in longitudinal data. Also, one may consider multilevel zero-inflated Poisson model. Although the multilevel model was proposed previously, parameter estimation by penalized quasi likelihood methods is questionable, and further examination is needed.
Show less - Date Issued
- 2012
- Identifier
- FSU_migr_etd-5181
- Format
- Thesis
- Title
- Estimation and Sequential Monitoring of Nonlinear Functional Responses Using Wavelet Shrinkage.
- Creator
-
Cuevas, Jordan, Chicken, Eric, Sobanjo, John, Niu, Xufeng, Wu, Wei, Department of Statistics, Florida State University
- Abstract/Description
-
Statistical process control (SPC) is widely used in industrial settings to monitor processes for shifts in their distributions. SPC is generally thought of in two distinct phases: Phase I, in which historical data is analyzed in order to establish an in-control process, and Phase II, in which new data is monitored for deviations from the in-control form. Traditionally, SPC had been used to monitor univariate (multivariate) processes for changes in a particular parameter (parameter vector)....
Show moreStatistical process control (SPC) is widely used in industrial settings to monitor processes for shifts in their distributions. SPC is generally thought of in two distinct phases: Phase I, in which historical data is analyzed in order to establish an in-control process, and Phase II, in which new data is monitored for deviations from the in-control form. Traditionally, SPC had been used to monitor univariate (multivariate) processes for changes in a particular parameter (parameter vector). Recently however, technological advances have resulted in processes in which each observation is actually an n-dimensional functional response (referred to as a profile), where n can be quite large. Additionally, these profiles are often unable to be adequately represented parametrically, making traditional SPC techniques inapplicable. This dissertation starts out by addressing the problem of nonparametric function estimation, which would be used to analyze process data in a Phase-I setting. The translation invariant wavelet estimator (TI) is often used to estimate irregular functions, despite the drawback that it tends to oversmooth jumps. A trimmed translation invariant estimator (TTI) is proposed, of which the TI estimator is a special case. By reducing the point by point variability of the TI estimator, TTI is shown to retain the desirable qualities of TI while improving reconstructions of functions with jumps. Attention is then turned to the Phase-II problem of monitoring sequences of profiles for deviations from in-control. Two profile monitoring schemes are proposed; the first monitors for changes in the noise variance using a likelihood ratio test based on the highest detail level of wavelet coefficients of the observed profile. The second offers a semiparametric test to monitor for changes in both the functional form and noise variance. Both methods make use of wavelet shrinkage in order to distinguish relevant functional information from noise contamination. Different forms of each of these test statistics are proposed and results are compared via Monte Carlo simulation.
Show less - Date Issued
- 2012
- Identifier
- FSU_migr_etd-4788
- Format
- Thesis
- Title
- Investigating the Categories for Cholesterol and Blood Pressure for Risk Assessment of Death Due to Coronary Heart Disease.
- Creator
-
Franks, Billy J., McGee, Daniel, Hurt, Myra, Huffer, Fred, Niu, Xufeng, Department of Statistics, Florida State University
- Abstract/Description
-
Many characteristics for predicting death due to coronary heart disease are measured on a continuous scale. These characteristics, however, are often categorized for clinical use and to aid in treatment decisions. We would like to derive a systematic approach to determine the best categorizations of systolic blood pressure and cholesterol level for use in identifying individuals who are at high risk for death due to coronary heart disease and to compare these data derived categories to those...
Show moreMany characteristics for predicting death due to coronary heart disease are measured on a continuous scale. These characteristics, however, are often categorized for clinical use and to aid in treatment decisions. We would like to derive a systematic approach to determine the best categorizations of systolic blood pressure and cholesterol level for use in identifying individuals who are at high risk for death due to coronary heart disease and to compare these data derived categories to those in common usage. Whatever categories are chosen, they should allow physicians to accurately estimate the probability of survival from coronary heart disease until some time t. The best categories will be those that provide the most accurate prediction for an individual's risk of dying by t. The approach that will be used to determine these categories will be a version of Classification And Regression Trees that can be applied to censored survival data. The major goals of this dissertation are to obtain data-derived categories for risk assessment, compare these categories to the ones already recommended in the medical community, and to assess the performance of these categories in predicting survival probabilities.
Show less - Date Issued
- 2005
- Identifier
- FSU_migr_etd-4402
- Format
- Thesis