Current Search: Research Repository (x) » * (x) » Thesis (x) » Department of Computer Science (x)
Search results
Pages
- Title
- Machine Learning Algorithms and Applications for Lidar, Images, and Unstructured Data.
- Creator
-
Parajuli, Biswas, Kumar, Piyush, She, Yiyuan, Liu, Xiuwen, Zhao, Peixiang, Florida State University, College of Arts and Sciences, Department of Computer Science
- Abstract/Description
-
Aerial imagery of geographic regions in the form of Lidar and RGB images aids different tasks like survey, urban-planning, mapping, surveillance, navigation, localization and others. Most of the applications, in general, require accurate segmentation and identification of variety of objects. The labeling is mostly done manually which is slow and expensive. This dissertation focuses on roads as the object of interest and aims to develop methods to automatically extract road networks from both...
Show moreAerial imagery of geographic regions in the form of Lidar and RGB images aids different tasks like survey, urban-planning, mapping, surveillance, navigation, localization and others. Most of the applications, in general, require accurate segmentation and identification of variety of objects. The labeling is mostly done manually which is slow and expensive. This dissertation focuses on roads as the object of interest and aims to develop methods to automatically extract road networks from both aerial Lidar and images. This work investigates deep convolutional architectures that can fuse the two types of data for road segmentation. It presents a design which performs better than the state-of-the-art RGB-only methods. It also describes a simple, disk-packing based algorithm which translates the road segmentation into a OpenStreetMap-like road network graph while giving improved accuracies in terms of connectivity, topology and reduction in outliers. This dissertation also presents a truth finding algorithm based on iterative outlier removal which can be used for reaching a consensus when information sources or ensembles of trained machine learning models are at a conflict. In addition, it introduces a full and published book on Python programming based on the experiences this research provided. The hope is to contribute towards teaching and learning Python.
Show less - Date Issued
- 2019
- Identifier
- 2019_Spring_Parajuli_fsu_0071E_14920
- Format
- Thesis
- Title
- Towards Automating the Establishment and Evolution of Software Traceability.
- Creator
-
Mills, Chris (Christopher), Haiduc, Sonia, Blessing, Susan K., Chakraborty, Shayok, Zhao, Peixiang, Florida State University, College of Arts and Sciences, Department of...
Show moreMills, Chris (Christopher), Haiduc, Sonia, Blessing, Susan K., Chakraborty, Shayok, Zhao, Peixiang, Florida State University, College of Arts and Sciences, Department of Computer Science
Show less - Abstract/Description
-
Software systems contain an immense amount of information captured in a variety of documents such as source code files, user documentation, use and test cases, bug reports, and system requirements among others. Relationships between these pieces of information -- called traceability links -- provide stakeholders broader knowledge about a system's constituent pieces and support many aspects of the software's development, maintenance, and evolution. Ideally, traceability links would be...
Show moreSoftware systems contain an immense amount of information captured in a variety of documents such as source code files, user documentation, use and test cases, bug reports, and system requirements among others. Relationships between these pieces of information -- called traceability links -- provide stakeholders broader knowledge about a system's constituent pieces and support many aspects of the software's development, maintenance, and evolution. Ideally, traceability links would be documented as software artifacts are produced. For instance, as they work, developers would document which test cases exercise which code segments or which code classes implement which use cases. However, this is typically not the case. Due to organizational issues such as tight timelines for product delivery and lack of buy-in by project managers, software traceability is often a secondary concern. To address this situation and improve traceability for a system post hoc, stakeholders can perform Traceability Link Recovery (TLR). TLR is a software engineering task that fills in missing traceability information by establishing (i.e., recovering) links between related artifacts. Through this process, software traceability can be promoted to naturally support various tasks such as program comprehension, concept localization, verifying test coverage, and ensuring that system and legal requirements are met. Unfortunately, performing TLR manually is an extremely time and resource intensive task. Therefore, even though prior work suggests it directly improves software maintenance and evolution, few systems have sufficient traceability to realize these benefits. The few that do are mainly safety-critical and have tight regulatory requirements where traceability is legally required for quality assurance to mitigate risk. First, we seek to reduce the cost of establishing traceability links through TLR by improving automatic approaches to it based on artifact similarity. Second, we seek to reduce the cost of maintaining existing traceability information by applying supervised machine learning. This technique mines statistical patterns from historical traceability information to build a predictive model that infers artifact relationships without the need for a human operator. As a result, software teams are able to realize the hitherto cost prohibitive benefits of traceability even for projects where there is no legal requirement for traceability to exist.
Show less - Date Issued
- 2019
- Identifier
- 2019_Spring_Mills_fsu_0071E_15138
- Format
- Thesis
- Title
- Towards Ubiquitous Sensing Using Commodity WiFi.
- Creator
-
Tan, Sheng, Yang, Jie, Shanbhag, Sachin, Wang, An-I Andy, Duan, Zhenhai, Florida State University, College of Arts and Sciences, Department of Computer Science
- Abstract/Description
-
Recently, the prevalence of WiFi devices and ubiquitous coverage of WiFi network provide us the opportunity to extend WiFi capabilities beyond communication, particularly in sensing the physical environment. Most existing systems that enable human sensing utilizing commodity WiFi devices are simply rely on profile training based techniques. Such techniques suffer from performance degradation when configuration changes after training. Furthermore, those systems can not work under multi-user...
Show moreRecently, the prevalence of WiFi devices and ubiquitous coverage of WiFi network provide us the opportunity to extend WiFi capabilities beyond communication, particularly in sensing the physical environment. Most existing systems that enable human sensing utilizing commodity WiFi devices are simply rely on profile training based techniques. Such techniques suffer from performance degradation when configuration changes after training. Furthermore, those systems can not work under multi-user scenarios. To overcome the limitations of existing solutions, this dissertation introduces the design and implementation of three systems. First, we propose MultiTrack, a multi-user indoor tracking and activity recognition system. It leverages multiple transmission links and all the available bandwidth at 5GHz of commodity WiFi to achieve tracking multiple users simultaneously. Second, we present WiFinger, a fine-grained finger gesture recognition system, which utilizes single RF device and does not require per-user or per-location training. Lastly, we present FruitSense, a RF based fruit ripeness level detection system that achieves environment-independent sensing. Such system demonstrates the wireless sensing can be utilized beyond human sensing to the biosensing field.
Show less - Date Issued
- 2019
- Identifier
- 2019_Spring_Tan_fsu_0071E_14891
- Format
- Thesis
- Title
- Machine Learning Approach for Generalizing Traffic Pattern-Based Adaptive Routing in Dragonfly Networks.
- Creator
-
Ryasnianskiy, Yevgeniy, Yuan, Xin, Liu, Xiuwen, Kumar, Piyush, Florida State University, College of Arts and Sciences, Department of Computer Science
- Abstract/Description
-
Universal Global Adaptive routing (UGAL) is a common routing scheme used in systems based on the Dragonfly interconnect topology. UGAL uses information about local link-loads to make adaptive routing decisions. Traffic Pattern-based Adaptive Routing (TPR) enhances UGAL by incorporating additional network statistics into the routing process. Contemporary switches are designed to accommodate an expansive set of network performance metrics. Distinguishing between significant, predictive metrics...
Show moreUniversal Global Adaptive routing (UGAL) is a common routing scheme used in systems based on the Dragonfly interconnect topology. UGAL uses information about local link-loads to make adaptive routing decisions. Traffic Pattern-based Adaptive Routing (TPR) enhances UGAL by incorporating additional network statistics into the routing process. Contemporary switches are designed to accommodate an expansive set of network performance metrics. Distinguishing between significant, predictive metrics and insignificant metrics is critical to the process of designing an adaptive routing algorithm. We propose the use of recurrent neural networks to assess the relative predictive power of various network statistics. Using this method we rank the predictive power of network statistics using data collected from a network simulator. Both UGAL and TPR require tuning of hyper-parameters to achieve optimal performance, with TPR having more than 20 parameters for the Cray Cascade architecture. We demonstrate that the optimal value of these parameters can vary significantly based on the size of the architecture, the arrangement of global links chosen for the Dragonfly topology, and the traffic that the system will likely encounter. We propose and evaluate using a neural network to simplify the tuning of hyper-parameters used in TPR. We find that this approach is able to match or exceed the performance of TPR across several synthetic traffic patterns using a network simulator.
Show less - Date Issued
- 2019
- Identifier
- 2019_Spring_Ryasnianskiy_fsu_0071N_15232
- Format
- Thesis
- Title
- A Multi-Criteria Decision Support System for Ph.D. Supervisor Selection: A Hybrid Approach.
- Creator
-
Hasan, Mir Anamul, Schwartz, Daniel G., Meyer-Bäse, Anke, Haiduc, Sonia, Wang, An-I Andy, Whalley, David B., Florida State University, College of Arts and Sciences, Department...
Show moreHasan, Mir Anamul, Schwartz, Daniel G., Meyer-Bäse, Anke, Haiduc, Sonia, Wang, An-I Andy, Whalley, David B., Florida State University, College of Arts and Sciences, Department of Computer Science
Show less - Abstract/Description
-
Selection of a suitable Ph.D. supervisor is a very important step in a student's career. This dissertation presents a multi-criteria decision support system to assist students in making this choice. The system employs a hybrid method that first utilizes a fuzzy analytic hierarchy process to extract the relative importance of the identified criteria and sub-criteria to consider when selecting a supervisor. Then, it applies an information retrieval-based similarity algorithm (TF/IDF or Okapi...
Show moreSelection of a suitable Ph.D. supervisor is a very important step in a student's career. This dissertation presents a multi-criteria decision support system to assist students in making this choice. The system employs a hybrid method that first utilizes a fuzzy analytic hierarchy process to extract the relative importance of the identified criteria and sub-criteria to consider when selecting a supervisor. Then, it applies an information retrieval-based similarity algorithm (TF/IDF or Okapi BM25) to retrieve relevant candidate supervisor profiles based on the student's research interest. The selected profiles are then re-ranked based on other relevant factors chosen by the user, such as publication record, research grant record, and collaboration record. The ranking method evaluates the potential supervisors objectively based on various metrics that are defined in terms of detailed domain-specific knowledge, automating part of the decision making process. In contrast with other existing works, this system does not require the professor's involvement and no subjective measures are employed.
Show less - Date Issued
- 2019
- Identifier
- 2019_Summer_Hasan_fsu_0071E_15378
- Format
- Thesis
- Title
- A Computational Investigation of the Optimal Halton Sequence in QMC Applications.
- Creator
-
Bayousef, Manal Sarhan, Mascagni, Michael, Duke, D. W. (Dennis W.), Liu, Xiuwen, Yuan, Xin, Florida State University, College of Arts and Sciences, Department of Computer Science
- Abstract/Description
-
We propose the use of randomized (scrambled) quasirandom sequences for the purpose of providing practical error estimates for quasi-Monte Carlo (QMC) applications. One popular quasirandom sequence among practitioners is the Halton sequence. However, Halton subsequences have correlation problems in their highest dimensions, and so using this sequence for high-dimensional integrals dramatically affects the accuracy of QMC. Consequently, QMC studies have previously proposed several scrambling...
Show moreWe propose the use of randomized (scrambled) quasirandom sequences for the purpose of providing practical error estimates for quasi-Monte Carlo (QMC) applications. One popular quasirandom sequence among practitioners is the Halton sequence. However, Halton subsequences have correlation problems in their highest dimensions, and so using this sequence for high-dimensional integrals dramatically affects the accuracy of QMC. Consequently, QMC studies have previously proposed several scrambling methods; however, to varying degrees, scrambled versions of Halton sequences still suffer from the correlation problem as manifested in two-dimensional projections. This paper proposes a modified Halton sequence (MHalton), created using a linear digital scrambling method, which finds the optimal multiplier for the Halton sequence in the linear scrambling space. In order to generate better uniformity of distributed sequences, we have chosen strong MHalton multipliers up to 360 dimensions. The proposed multipliers have been tested and proved to be stronger than several sets of multipliers used in other known scrambling methods. To compare the quality of our proposed scrambled MHalton sequences with others, we have performed several extensive computational tests that use L₂-discrepancy and high dimensional integration tests. Moreover, we have tested MHalton sequences on Mortgage-backed security (MBS), which is one of the most widely used applications in finance. We have tested our proposed MHalton sequence numerically and empirically, and they show optimal results in QMC applications. These confirm the efficiency and safety of our proposed MHalton over scrambling sequences previously used in QMC applications.
Show less - Date Issued
- 2019
- Identifier
- 2019_Summer_Bayousef_fsu_0071E_15377
- Format
- Thesis
- Title
- Time Series Analysis and Forecasting for Business Intelligence Applications.
- Creator
-
Abrishami, Soheila, Kumar, Piyush, Mio, Washington, Liu, Xiuwen, Zhao, Peixiang, Florida State University, College of Arts and Sciences, Department of Computer Science
- Abstract/Description
-
In this dissertation, I explore different types of applications in the area of applied machine learning, time series analysis, and prediction. Time series forecasting is a fundamental task in machine learning and data mining. It is an active area of research, especially in applications that have direct impact on the real world. Foot traffic forecasting is one such application, which has a direct impact on businesses and non-profits alike. An accurate foot traffic prediction system can help...
Show moreIn this dissertation, I explore different types of applications in the area of applied machine learning, time series analysis, and prediction. Time series forecasting is a fundamental task in machine learning and data mining. It is an active area of research, especially in applications that have direct impact on the real world. Foot traffic forecasting is one such application, which has a direct impact on businesses and non-profits alike. An accurate foot traffic prediction system can help retail businesses, physical stores, and restaurants optimize their labor schedule and costs, and reduce food wastage. In this work, we design a large scale data collection and prediction system for store foot traffic. We propose and compare different prediction models for foot traffic forecasting. Our foot traffic data has been collected from wireless access points deployed at over 65 businesses across the United States, for more than one year. We validate our work by comparing to state-of-the-art time series forecasting approaches. Results show the competitiveness of our proposed method in comparison to our previous work and state-of-the-art procedures for time series forecasting. Another challenging task in the area of time series forecasting is financial time series forecasting. As another part of my dissertation, I present a deep learning system for stock price prediction, which uses a variety of data for a subset of the stocks on the NASDAQ exchange to forecast the stock price. The prediction model is trained on the minutely data for a specific stock ticker and predicts the closing price of that stock ticker for multi-step-ahead. Our deep learning framework consists of a Variational Autoencoder for removing noise and uses time-series data engineering to combine the higher-level features with the original features. This new set of features is fed to a Stacked LSTM Autoencoder for multi-step-ahead prediction of the stock closing price. Besides, this prediction is used by a profit-maximization strategy to provide advice on the appropriate time for buying and selling a specific stock. Results show that the proposed framework outperforms the state-of-the-art time series forecasting approaches with respect to predictive accuracy and profitability. In the second part of my work, we present a web-based tool for automatic recoloring of web pages. Automatic application of different color palettes to web pages is essential for both professional and amateur web designers. However, existing recoloring tools for images and web pages do not provide full recoloring. We replace colors in .css, .html, and .svg files, and recolor images such as logos, banners, and background tiles to recolor web pages entirely. The new color theme is based on a color guide image provided by the user. The evaluation shows a high level of satisfaction with the quality of palettes and results of recoloring.
Show less - Date Issued
- 2019
- Identifier
- 2019_Summer_Abrishami_fsu_0071E_15325
- Format
- Thesis
- Title
- The Future of Android with Liquid Development.
- Creator
-
Yannes, Zachary, Tyson, Gary Scott, DeBrunner, Linda S., Whalley, David B., Yuan, Xin, Florida State University, College of Arts and Sciences, Department of Computer Science
- Abstract/Description
-
The ART executes apps in the Dalvik VM. The Dalvik VM creates a Zygote instance when the device first boots which is responsible for sharing Android runtime libraries to new applications. New apps rely heavily on external libraries in addition to the runtime libraries for everything from graphical user interfaces to remote databases. I propose an extension to the Zygote, aptly named Amniote, which exposes the Zygote to the user space. Amniote allows developers to sideload common third-party...
Show moreThe ART executes apps in the Dalvik VM. The Dalvik VM creates a Zygote instance when the device first boots which is responsible for sharing Android runtime libraries to new applications. New apps rely heavily on external libraries in addition to the runtime libraries for everything from graphical user interfaces to remote databases. I propose an extension to the Zygote, aptly named Amniote, which exposes the Zygote to the user space. Amniote allows developers to sideload common third-party libraries to reduce application boot time and memory. Just like the Android runtime libraries, apps would share the address to the library and generate a local copy only when one app writes to a page. This dissertation will address three main points. First, there is an increase in third-party library usage and an increase in the number of libraries used per app. Second, the execution of benchmark apps shows that most page accesses are before COW operations, which indicates that pages from preloaded classes will infrequently be duplicated. Third, a novel framework, the Amniote framework, moves control of the Zygote process to user space allowing greater opportunities for preloading and adoption of third-party libraries.
Show less - Date Issued
- 2019
- Identifier
- 2019_Fall_Yannes_fsu_0071E_15338
- Format
- Thesis
- Title
- Building Tools for Forensic Analysis of Mobile and IoT Applications Using Selective Data Extraction.
- Creator
-
Dorai, Gokila, Aggarwal, Sudhir, Mio, Washington, Kumar, Piyush, Mukherjee, Tathagata, Wong, Sandy, Liu, Xiuwen, Florida State University, College of Arts and Sciences,...
Show moreDorai, Gokila, Aggarwal, Sudhir, Mio, Washington, Kumar, Piyush, Mukherjee, Tathagata, Wong, Sandy, Liu, Xiuwen, Florida State University, College of Arts and Sciences, Department of Computer Science
Show less - Abstract/Description
-
The amount of data stored on smart phones and other mobile devices has increased phenomenally over the last decade. As a result there has been a spike in the use of these devices for documenting different scenarios that are encountered by the users as they go about their daily lives. Smart phone data has also become a critical evidence in the court for several criminal cases. Forensic software tool developers are continually developing new techniques for the extraction of data from several...
Show moreThe amount of data stored on smart phones and other mobile devices has increased phenomenally over the last decade. As a result there has been a spike in the use of these devices for documenting different scenarios that are encountered by the users as they go about their daily lives. Smart phone data has also become a critical evidence in the court for several criminal cases. Forensic software tool developers are continually developing new techniques for the extraction of data from several smart phones. The two most common techniques are physical and logical extraction. Logical extraction is a technique for extracting the files and folders without any of the deleted data from a mobile device. For logical extraction, a software tool is used to make a copy of the files. Experienced examiners have to know how each of these devices and operating systems function, the many locations that data can be on each different device/OS, as well as how to access and work with all of that information in a forensically sound manner. This dissertation discusses about the multi-fold contributions we have made in order to improve forensic analysis and extraction of data from smart phones. In this dissertation, I have discussed various tools and systems that have been built for forensic analysis of mobile and IoT applications using selective data extraction, machine learning, classification techniques and inference engines.
Show less - Date Issued
- 2019
- Identifier
- 2019_Fall_Dorai_fsu_0071E_15523
- Format
- Thesis
- Title
- Convolutional Neural Networks for Hurricane Road Closure Probability and Tree Debris Estimation.
- Creator
-
Kakareko, Grzegorz, Liu, Xiuwen, Jung, Sungmoon, Zhao, Peixiang, Chakraborty, Shayok, Florida State University, College of Arts and Sciences, Department of Computer Science
- Abstract/Description
-
Hurricanes cause significant property loss every year. A substantial part of that loss is due to the trees destroyed by the wind, which in turn block the roads and produce a large amount of debris. The debris not only can cause damage to nearby properties, but also needs to be cleaned after the hurricane. Neural Networks grown significantly as a field over the last year finding a lot of applications in many disciplines like computer science, medicine, banking, physics, and engineering. In...
Show moreHurricanes cause significant property loss every year. A substantial part of that loss is due to the trees destroyed by the wind, which in turn block the roads and produce a large amount of debris. The debris not only can cause damage to nearby properties, but also needs to be cleaned after the hurricane. Neural Networks grown significantly as a field over the last year finding a lot of applications in many disciplines like computer science, medicine, banking, physics, and engineering. In this thesis, a new method is proposed to estimate the tree debris due to high winds using the Convolutional Neural Networks (CNNs). For the purposes of this thesis the tree satellite image dataset was created which then was used to train two networks CNN-I and CNN-II for tree recognition and tree species recognition, respectively. Satellite images were used as the input for the CNNs to recognize the locations and types of the trees that can produce the debris. The tree images selected by CNN were used to approximate the tree parameters that were later used to calculate the tree failure density function often called fragility function (at least one failure in the time period) for each recognized tree. The tree failure density functions were used to compose the probability of road closure due to hurricane winds and overall amount of the tree debris. The proposed approach utilizes the current trends in Neural Networks and is easily applicable, such that can help cities and state authorities to better plan for the adverse consequences of tree failures due to hurricane winds.
Show less - Date Issued
- 2019
- Identifier
- 2019_Fall_Kakareko_fsu_0071N_15486
- Format
- Thesis
- Title
- Topology Aware Routing Techniques for Next Generation Interconnect Networks.
- Creator
-
Rahman, Md Shafayat, Yuan, Xin, Liu, Guosheng, Tyson, Gary Scott, Zhang, Zhenghao, Florida State University, College of Arts and Sciences, Department of Computer Science
- Abstract/Description
-
As the world is moving towards exascale computing, interconnect networks are becoming more and more important because of their omnipresent use in high performance computing systems and in large scale data centers. The performance of an interconnect network depends on its topology, routing, job distribution, and other technological factors, and can become a major performance bottleneck for the entire system. My research as a PhD candidate in Computer Science in Florida State University is...
Show moreAs the world is moving towards exascale computing, interconnect networks are becoming more and more important because of their omnipresent use in high performance computing systems and in large scale data centers. The performance of an interconnect network depends on its topology, routing, job distribution, and other technological factors, and can become a major performance bottleneck for the entire system. My research as a PhD candidate in Computer Science in Florida State University is focused on interconnect network architecture. To be precise, I work to design topology-aware adaptive routing schemes for some existing and proposed interconnect topologies which will improve the performance of the respective systems. First, I perform a comprehensive analysis on Slim Fly network topology and demonstrate that the topology in its original form is not load-balanced. Because of the way the topology is formed, certain links are more likely to be utilized than the rest of the links, which leads the system to perform in less than optimum level. I propose two novel schemes to address the issue. The first scheme involves modifying the topology by allocating more band-width to the heavily-used links. The second approach modifies the adaptive routing used over the topology to redistribute the traffic flows to achieve better load balance. Second, I notice that the fraction of shorter minimal and non-minimal paths can vary across the design space of the Dragonfly topology, but traditional adaptive routing does not take advantage of this. I propose Topology-Custom UGAL routing (T-UGAL) for Dragonfly that customizes the set of the non-minimal paths used in UGAL in accordance to the topology underneath, which leads to shorter average path lengths and better system performance in terms of packet latency and system throughput. I design a multi-step algorithm to find the most optimum set of non-minimal paths for the particular topology. For both of my projects, I follow the common theme of discovering inherent properties of the topology, and modifying the routing scheme to leverage those characteristics. Considering the fact that large HPC systems often need significant investments to construct, and tend to retain their structure over years, this is a robust and practical approach to ensure the fullest utilization of the systems.
Show less - Date Issued
- 2019
- Identifier
- 2019_Fall_Rahman_fsu_0071E_15545
- Format
- Thesis
- Title
- Matching Physical File Representation to Logical Access Patterns for Better Performance.
- Creator
-
Zhang, Shuanglong, Wang, An-I Andy, Zhang, Jinfeng, Whalley, David B., Zhao, Peixiang, Florida State University, College of Arts and Sciences, Department of Computer Science
- Abstract/Description
-
Over the years, the storage substrate of operating systems has evolved with changing storage devices and workloads [2, 6, 7, 8, 12, 15, 18, 26, 29, 33, 34, 35, 39, 41, 42, 44, 47, 48, 54]. Both academia and industry have devoted significant research effort to the file system component, a critical part of the storage system. A file system directs the underlying device-specific software to perform data reads and writes as well as providing the notion of files to interact with users and...
Show moreOver the years, the storage substrate of operating systems has evolved with changing storage devices and workloads [2, 6, 7, 8, 12, 15, 18, 26, 29, 33, 34, 35, 39, 41, 42, 44, 47, 48, 54]. Both academia and industry have devoted significant research effort to the file system component, a critical part of the storage system. A file system directs the underlying device-specific software to perform data reads and writes as well as providing the notion of files to interact with users and applications. To achieve this, a file system represents logical files internally or physically with data (the file content) and metadata (information required to locate, index, and operate on data). Most file system optimizations assume this one-to-one coupling of logical and physical representations [2, 7, 8, 18, 25, 26, 29, 33, 34, 35, 48]. This dissertation presents the design, implementation, and evaluation of two new systems, which decouple these representations and offer a new class of optimization opportunities not previously possible. First, the Composite-File File System (CFFS) exploits the observation that many files are frequently accessed together. By consolidating related file metadata, performance can be improved by up to 27%. Second, the Fine-grained Journal Store (FJS) exploits the observation that typically only subregions of a metadata entry are updated, but the heavyweight reliability and storage mechanisms then affect the entire metadata entry. This results in many unnecessary metadata writes that harm both the performance and the lifespan of certain storage devices. By focusing on only the updated metadata regions and consolidating storage and reliability mechanisms, the Fine-grained Journal Store can both improve the performance up to 15x and reduce unnecessary writes up to 5.8x. Overall, the decoupling of logical and physical representations allows more flexible matching of the physical representations to the workload patterns, and the results show that this approach is promising.
Show less - Date Issued
- 2018
- Identifier
- 2018_Su_Zhang_fsu_0071E_14368
- Format
- Thesis
- Title
- Multi-Temporal-Spectral Land Cover Classification for Remote Sensing Imagery Using Deep Learning.
- Creator
-
[No family name], Atharva, Liu, Xiuwen, Yang, Xiaojun, Tyson, Gary Scott, Zhao, Peixiang, Florida State University, College of Arts and Sciences, Department of Computer Science
- Abstract/Description
-
Sustainability research of the environment depends on accurate land cover information over large areas. Even with the increased number of satellite systems and sensors acquiring data with improved spectral, spatial, radiometric and temporal characteristics and the new data distribution policy, most existing global land cover datasets were derived from a single-date multi-spectral remotely sensed image using pixel-based classifiers with low accuracy. To improve the accuracy, the bottleneck is...
Show moreSustainability research of the environment depends on accurate land cover information over large areas. Even with the increased number of satellite systems and sensors acquiring data with improved spectral, spatial, radiometric and temporal characteristics and the new data distribution policy, most existing global land cover datasets were derived from a single-date multi-spectral remotely sensed image using pixel-based classifiers with low accuracy. To improve the accuracy, the bottleneck is how to develop accurate and effective image classification techniques. By incorporating and utilizing the spatial and multi-temporal information with multi-spectral information of remote sensing images for land cover classification, and considering their spatial and temporal interdependence, I propose three deep network systems tailored for medium-resolution remote sensing data. With a test site from the Florida Everglades area (with a size of 771 square kilometers), the proposed new deep systems have achieved significant improvements in the classification accuracy over most existing pixel-based classifiers. A proposed patch-based recurrent neural network (PB-RNN) system, a proposed pixel-based recurrent neural network system and a proposed patch-based convolutional neural network system achieve 97.21%, 87.65% and 89.26% classification accuracy respectively while a pixel-based single-image neural network (NN) system achieves only 64.74% classification accuracy. By integrating the proposed deep networks and the huge collection of medium-resolution remote sensing data, I believe that much accurate land cover datasets can be produced over large areas.
Show less - Date Issued
- 2018
- Identifier
- 2018_Su_Atharva_fsu_0071E_14727
- Format
- Thesis
- Title
- Sensor Systems and Signal Processing Algorithms for Wireless Applications.
- Creator
-
Mukherjee, Avishek, Zhang, Zhenghao, Yu, Ming, Kumar, Piyush, Liu, Xiuwen, Florida State University, College of Arts and Sciences, Department of Computer Science
- Abstract/Description
-
The demand for high performance wireless networks and systems have become increasingly high over the last decade. This dissertation addresses three systems that were designed to improving the efficiency, reliability and security of wireless systems. To improve the efficiency and reliability of wireless systems, we propose two algorithms, namely CSIFit and CSIApx, to compress the Channel State Information (CSI) of Wi-Fi networks with Orthogonal Frequency Division Multiplexing (OFDM) and...
Show moreThe demand for high performance wireless networks and systems have become increasingly high over the last decade. This dissertation addresses three systems that were designed to improving the efficiency, reliability and security of wireless systems. To improve the efficiency and reliability of wireless systems, we propose two algorithms, namely CSIFit and CSIApx, to compress the Channel State Information (CSI) of Wi-Fi networks with Orthogonal Frequency Division Multiplexing (OFDM) and Multiple Input Multiple Output (MIMO). We evaluated these systems with both experimental and synthesized CSI data. Our work on CSIApx confirmed that we can achieve very good compression ratios with very little loss accuracy, at a fraction of the complexity needed in current state-of-the-art compression methods. The second system is sensor based application to reliably detect falls inside homes. A automatic fall detection system has tremendous value to the well-being of seniors living alone. We design and implement MultiSense, a novel fall detection system, which has the following desirable features. First, it does not require the human to wear any device, therefore is convenient to seniors. Second, it has been tested in typical settings including living rooms and bathrooms, and has shown very good accuracy. Third, it is built with inexpensive components, with expected hardware cost around $150 to cover a typical room. MultiSense does not require any training data and is comparatively non-invasive than similar systems. Our evaluation showed that MultiSense achieved no False Negatives, i.e., was able to detect falls accurately each time, while producing no False Positives in a daily use test. Therefore, we believe MultiSense can be used to accurately detect human falls and can be extremely helpful to seniors living alone. Lastly, TBAS is a spoof detection method designed to improve the security of wireless networks. TBAS is based on two facts: 1) different transmitting locations likely result in different wireless channels, and 2) the drift in channel state information within a short time interval should be bounded. We proposed and implemented TBAS on Microsoft's SORA platform as well as commodity wireless cards and tested it's performance in typical Wi-Fi environments with different levels of channel mobility. Our results show that TBAS can be very accurate when running on 3 by 2 systems and above, i.e., TBAS on MIMO has a very low false positive error ratio, where a false positive event occurs when two packets from the same user are misclassified as from different users, while also maintaining a very low false negative ratio of 0.1%, where a false negative event occurs when two packets from different users are misclassified as from the same user. We believe our experimental findings can be used as a guideline for future systems that will deploy TBAS.
Show less - Date Issued
- 2018
- Identifier
- 2018_Su_Mukherjee_fsu_0071E_14750
- Format
- Thesis
- Title
- Design and Evaluation of Networking Techniques for the Next Generation of Interconnection Networks.
- Creator
-
Faizian, Peyman, Yuan, Xin, Ke, Fengfeng, Srinivasan, Ashok, Tyson, Gary Scott, Florida State University, College of Arts and Sciences, Department of Computer Science
- Abstract/Description
-
High performance computing(HPC) and data center systems have undergone rapid growth in recent years. To meet the current and future demand of compute- and data-intensive applications, these systems require the integration of a large number of processors, storage and I/O devices through high-speed interconnection networks. In massively scaled HPC and data centers, the performance of the interconnect is a major defining factor for the performance of the entire system. Interconnect performance...
Show moreHigh performance computing(HPC) and data center systems have undergone rapid growth in recent years. To meet the current and future demand of compute- and data-intensive applications, these systems require the integration of a large number of processors, storage and I/O devices through high-speed interconnection networks. In massively scaled HPC and data centers, the performance of the interconnect is a major defining factor for the performance of the entire system. Interconnect performance depends on a variety of factors including but not limited to topological characteristics, routing schemes, resource management techniques and technological constraints. In this dissertation, I explore several approaches to improve the performance of large-scale networks. First, I investigate the topological properties of a network and their effect on the performance of the system under different workloads. Based on detailed analysis of graph structures, I find a well-known graph as a potential topology of choice for the next generation of large-scale networks. Second, I study the behavior of adaptive routing on the current generation of supercomputers based on the Dragonfly topology and highlight the fact that the performance of adaptive routing on such networks can be enhanced by using detailed information about the communication pattern. I develop a novel approach for identifying the traffic pattern and then use this information to improve the performance of adaptive routing on dragonfly networks. Finally, I investigate the possible advantages of utilizing emerging software defined networking technology in the high performance computing domain. My findings show that by leveraging the use of SDN, we can achieve near-optimal rate allocation for communication patterns in an HPC cluster, which can remove the necessity for expensive adaptive routing schemes and simplify the control plane on the next generation of supercomputers.
Show less - Date Issued
- 2018
- Identifier
- 2018_Sp_Faizian_fsu_0071E_14185
- Format
- Thesis
- Title
- DAGDA Decoupling Address Generation from Loads and Stores.
- Creator
-
Stokes, Michael, Whalley, David B., Liu, Xiuwen, Tyson, Gary Scott, Florida State University, College of Arts and Sciences, Department of Computer Science
- Abstract/Description
-
DAGDA exposes some of the hidden operations that the hardware uses when performing loads and stores to the compiler to save energy and increase performance. We decouple the micro-operations for loads and stores into two operations: the first, the "prepare to access memory" instruction, or "pam", checks to see if a line is resident in the L1 DC and determines its way in the L1 DC data array, if it exists. The second operations performs the actual data access. This allows us to both save energy...
Show moreDAGDA exposes some of the hidden operations that the hardware uses when performing loads and stores to the compiler to save energy and increase performance. We decouple the micro-operations for loads and stores into two operations: the first, the "prepare to access memory" instruction, or "pam", checks to see if a line is resident in the L1 DC and determines its way in the L1 DC data array, if it exists. The second operations performs the actual data access. This allows us to both save energy using compiler optimization techniques and improve performance because "pam" operations are a natural way of prefetching data into the L1 DC
Show less - Date Issued
- 2018
- Identifier
- 2018_Su_Stokes_fsu_0071N_14269
- Format
- Thesis
- Title
- A Comprehensive Study of Portability Bug Characteristics in Desktop and Android Applications.
- Creator
-
Clow, Jonathan Alexander, Nistor, Adrian, Haiduc, Sonia, Whalley, David B., Florida State University, College of Arts and Sciences, Department of Computer Science
- Abstract/Description
-
Since 2008, the Android ecosystem has been tremendously popular with consumers, developers, and manufacturers due to the open nature of the operating system and its compatibility and availability on a range of devices. This, however, comes at a cost. The variety of available devices and speed of evolution of the Android system itself adds layers of fragmentation to the ecosystem around which developers must navigate. Yet this phenomenon is not unique to the Android ecosystem, impacting...
Show moreSince 2008, the Android ecosystem has been tremendously popular with consumers, developers, and manufacturers due to the open nature of the operating system and its compatibility and availability on a range of devices. This, however, comes at a cost. The variety of available devices and speed of evolution of the Android system itself adds layers of fragmentation to the ecosystem around which developers must navigate. Yet this phenomenon is not unique to the Android ecosystem, impacting desktop applications like Apache Tomcat and Google Chrome as well. As fragmentation of a system grows, so does the burden on developers to produce software than can execute on a wide variety of potential device, environment, and system combinations, while the reality prevents developers from anticipating every possible scenarios. This study provides the first empirical study characterizing portability bugs in both desktop and Android applications. Specifically, we examined 228 randomly selected bugs from 18 desktop and Android applications for the common root causes, manifestation patterns, and fix strategies used to combat portability bugs. Our study reveals several commonalities among the bugs and platforms, which include: (1) 92.14% of all bugs examined are caused by an interaction with a single dependency, (2) 53.13% of all bugs examined are caused by an interaction with the system, and (3) 33.19% of all bugs examined are fixed by adding a direct or indirect check against the dependency causing the bug. These results provide guidance for techniques and strategies to help developers and researchers identify and fix portability bugs.
Show less - Date Issued
- 2018
- Identifier
- 2018_Su_Clow_fsu_0071N_14798
- Format
- Thesis
- Title
- Deep: Dependency Elimination Using Early Predictions.
- Creator
-
Penagos, Luis G., Whalley, David B., Yuan, Xin, Yu, Weikuan, Florida State University, College of Arts and Sciences, Department of Computer Science
- Abstract/Description
-
Conditional branches have traditionally been a performance bottleneck for most processors. The high frequency of branches in code coupled with expensive pipeline flushes on mispredictions make branches expensive instructions worth optimizing. Conditional branches have historically inhibited compilers from applying optimizations across basic block boundaries due to the forks in control flow that they introduce. This thesis describes a systematic way of generating paths (traces) of branch-free...
Show moreConditional branches have traditionally been a performance bottleneck for most processors. The high frequency of branches in code coupled with expensive pipeline flushes on mispredictions make branches expensive instructions worth optimizing. Conditional branches have historically inhibited compilers from applying optimizations across basic block boundaries due to the forks in control flow that they introduce. This thesis describes a systematic way of generating paths (traces) of branch-free code at compile time by decomposing branching and verification operations to eliminate the dependence of a branch on its preceding compare instruction. This explicit decomposition allows us to move comparison instructions past branches and to merge pre and post branch code. These paths generated at compile time can potentially provide additional opportunities for conventional optimizations such as common subexpression elimination, dead assignment elimination and instruction selection. Moreover, this thesis describes a way of coalescing multiple branch instructions within innermost loops to produce longer basic blocks to provide additional optimization opportunities.
Show less - Date Issued
- 2018
- Identifier
- 2018_Su_Penagos_fsu_0071N_14784
- Format
- Thesis
- Title
- staDFA: An Efficient Subexpression Matching Method.
- Creator
-
Chowdhury, Mohammad Imran, van Engelen, Robert A., Whalley, David B., Wang, An-I Andy, Florida State University, College of Arts and Sciences, Department of Computer Science
- Abstract/Description
-
The main task of a Lexical Analyzer such as Lex [20], Flex [26] and RE/Flex [34], is to perform tokenization of a given input file within reasonable time and with limited storage requirements. Hence, most lexical analyzers use Deterministic Finite Automata (DFA) to tokenize input to ensure that the running time of the lexical analyzer is linear (or close to linear) in the size of the input. However, DFA constructed from Regular Expressions (RE) are inadequate to indicate the positions and/or...
Show moreThe main task of a Lexical Analyzer such as Lex [20], Flex [26] and RE/Flex [34], is to perform tokenization of a given input file within reasonable time and with limited storage requirements. Hence, most lexical analyzers use Deterministic Finite Automata (DFA) to tokenize input to ensure that the running time of the lexical analyzer is linear (or close to linear) in the size of the input. However, DFA constructed from Regular Expressions (RE) are inadequate to indicate the positions and/or extents in a matching string of a given subexpression of the regular expression. This means that all implementations of trailing contexts in DFA-based lexical analyzers, including Lex, Flex and RE/Flex, produce incorrect results. For any matching string in the input (also called the lexeme) that matches a token is regular expression pattern, it is not always possible to tell the position of a part of the lexeme that matches a subexpression of the regular expression. For example, the string abba matches the pattern a b*/b a, but the position of the trailing context b a of the pattern in the string abba cannot be determined by a DFA-based matcher in the aforementioned lexical analyzers. There are algorithms based on Nondeterministic Finite Automata (NFA) that match subexpressions accurately. However, these algorithms are costly to execute and use backtracking or breadth-first search algorithms that run in non-linear time, with polynomial or even exponential worst-case time complexity. A tagged DFA-based approach (TDFA) was pioneered by Ville Laurikari [15] to efficiently match subexpressions. However, TDFA are not perfectly suitable for lexical analyzers since the tagged DFA edges require sets of memory updates, which hampers the performance of DFA edge traversals when matching input. I will introduce a new DFA-based algorithm for efficient subexpression matching that performs memory updates in DFA states. I propose, the Store-Transfer-Accept Deterministic Finite Automata (staDFA). In my proposed algorithm, the subexpression matching positions and/or extents are stored in a Marker Position Store (MPS). The MPS is updated while the input is tokenized to provide the positions/extents of the sub-match. Compression techniques for DFA, such as Hopcroft’s method [14], default transitions [18, 19], and other methods, can be applied to staDFA. For an instance, this thesis provide a modified Hopcroft’s method for the minimization of staDFA.
Show less - Date Issued
- 2018
- Identifier
- 2018_Su_Chowdhury_fsu_0071N_14793
- Format
- Thesis
- Title
- Modeling and Comparison of Large-Scale Interconnect Designs.
- Creator
-
Mollah, Md Atiqul Islam, Yuan, Xin, Ke, Fengfeng, Aggarwal, Sudhir, van Engelen, Robert A., Florida State University, College of Arts and Sciences, Department of Computer Science
- Abstract/Description
-
Modern day high performance computing (HPC) clusters and data centers require a large number of computing and storage elements to be interconnected. Interconnect performance is considered a major bottleneck to the overall performance of such systems. Due to the massive scale of the network, interconnect designs are often evaluated and compared through models. My research is focused on developing scalable yet accurate methods to model large-scale interconnections and their architectural...
Show moreModern day high performance computing (HPC) clusters and data centers require a large number of computing and storage elements to be interconnected. Interconnect performance is considered a major bottleneck to the overall performance of such systems. Due to the massive scale of the network, interconnect designs are often evaluated and compared through models. My research is focused on developing scalable yet accurate methods to model large-scale interconnections and their architectural components. Such models are applied to investigate the performance characteristics of different components of interconnect systems including the topology, the routing scheme, and the network control/management scheme. Then, through multiple experimental studies, I apply the newly developed modeling techniques to evaluate the performance of novel interconnects technologies and thus, validate the case for their adoptions in the current and future generation of interconnected systems.
Show less - Date Issued
- 2018
- Identifier
- 2018_Sp_Mollah_fsu_0071E_14461
- Format
- Thesis
- Title
- Securing Systems by Vulnerability Mitigation and Adaptive Live Patching.
- Creator
-
Chen, Yue, Wang, Zuoxin, Yu, Ming, Liu, Xiuwen, Wang, An-I Andy, Florida State University, College of Arts and Sciences, Department of Computer Science
- Abstract/Description
-
The number and type of digital devices are increasing tremendously in today's world. However, as the code size soars, the hidden vulnerabilities become a major threat to user security and privacy. Vulnerability mitigation, detection, and patch generation are key protection mechanisms against attacks and exploits. In this dissertation, we first explore the limitations of existing solutions. For vulnerability mitigation, in particular, currently deployed address space layout randomization (ASLR...
Show moreThe number and type of digital devices are increasing tremendously in today's world. However, as the code size soars, the hidden vulnerabilities become a major threat to user security and privacy. Vulnerability mitigation, detection, and patch generation are key protection mechanisms against attacks and exploits. In this dissertation, we first explore the limitations of existing solutions. For vulnerability mitigation, in particular, currently deployed address space layout randomization (ASLR) has the drawbacks that the process is randomized only once, and the segment is moved as a whole. This design makes the program particularly vulnerable to information leaks. For vulnerability detection, many existing solutions can only detect the symptoms of attacks, instead of locating the underlying exploited vulnerabilities, since the manifestation of an attack does not always coincide with the exploited vulnerabilities. For patch generation towards a large number of different devices, current schemes fail to meet the requirements of timeliness and adaptiveness. To tackle the limitations of existing solutions, this dissertation introduces the design and implementation of three countermeasures. First, we present Remix, an effective and efficient on-demand live randomization system, which randomizes basic blocks of each function during runtime to provide higher entropy and stronger protection against code reuse attacks. Second, we propose Ravel, an architectural approach to pinpointing vulnerabilities from attacks. It leverages a record & replay mechanism to reproduce attacks in the lab environment, and uses the program's memory access patterns to locate targeted vulnerabilities which can be a variety of types. Lastly, we present KARMA, a multi-level live patching framework for Android kernels with minor performance overhead. The patches are written in a high-level memory-safe language, with the capability to be adapted to thousands of different Android kernels.
Show less - Date Issued
- 2018
- Identifier
- 2018_Sp_Chen_fsu_0071E_14297
- Format
- Thesis
- Title
- Enabling Efficient Big Data Services on HPC Systems with SHMEM-Based Programming Stack.
- Creator
-
Fu, Huansong, Yu, Weikuan, Ye, Ming, Duan, Zhenhai, Venkata, Manjunath Gorentla, Mascagni, Michael, Florida State University, College of Arts and Sciences, Department of...
Show moreFu, Huansong, Yu, Weikuan, Ye, Ming, Duan, Zhenhai, Venkata, Manjunath Gorentla, Mascagni, Michael, Florida State University, College of Arts and Sciences, Department of Computer Science
Show less - Abstract/Description
-
Thesis abstract With the continuous expansion of the Big Data universe, researchers have been relentlessly searching for ways to improve the efficiency of big data services, including data analytics and data infrastructures. In the meantime, there has also been an increasing interest to leverage High-performance Computing (HPC) capabilities for big data analytics. Symmetric Hierarchical Memory (SHMEM) is a popular parallel programming model thrived in the HPC realm. For many Partitioned...
Show moreThesis abstract With the continuous expansion of the Big Data universe, researchers have been relentlessly searching for ways to improve the efficiency of big data services, including data analytics and data infrastructures. In the meantime, there has also been an increasing interest to leverage High-performance Computing (HPC) capabilities for big data analytics. Symmetric Hierarchical Memory (SHMEM) is a popular parallel programming model thrived in the HPC realm. For many Partitioned Global Address Space (PGAS) systems and applications, SHMEM libraries are popularly used as a high-performance communication layer between the applications and underlying fast-speed interconnects. SHMEM features an one-sided communication interface. It allows remote data to be accessed in a shared-memory manner, in contrast to the conventional two-sided communication where remote data must be accessed through an explicit handshake protocol. We reveal that SHMEM offers a number of great benefits to develop parallel and distributed applications and frameworks on tightly-coupled, high-end HPC systems, such as its shared-memory style addressing model and the flexibility of its communication model. This dissertation focuses on improving the performance of big data services by leveraging a lightweight, flexible and balanced SHMEM-based programming stack. In order to realize this goal, we have studied some representative data infrastructure and data analytic framework. Specifically, key-value stores are a very popular form of data infrastructure deployed for many large-scale web services. Unfortunately, a key-value store usually adopts an inefficient communication design in a traditional server-client architecture, where the server can easily become a bottleneck in processing a huge amount of requests. Because of this, both latency and throughput can be seriously affected. Moreover, graph processing is an emerging type of data analytics that deals with large-scale graph data. Unsuitable for traditional MapReduce, graph analytic algorithms are often written and run with programming models that are specifically designed for graph processing. However, there is an imbalance issue in state-of-the-art graph processing programming model which has drastically affected the performance of graph processing. There is a critical need to revisit the conventional design of graph processing while the volume of real-world useful graph data keeps increasing everyday. Furthermore, although we reveal that a SHMEM-based programming stack helps solve the aforementioned issues, there is still a lack of understanding about how portable this stack can be for it to fit in with specific data infrastructure and framework being optimized and also other distributed systems in general. This includes to understand the potential performance gain or loss, limitations of usage, and portability on different platforms etc. This dissertation has centered around addressing these research challenges and carried out three studies, each tackling a unique challenge but all focusing on facilitating a SHMEM-based programming stack to enable and accelerate big data services. Firstly, we use a popular SHMEM standard called OpenSHMEM to build a high-performance key-value store called SHMEMCache, which overcomes several issues in enabling direct access to key-value pairs, including race conditions, remote point chasing and unawareness of remote access. We have then thoroughly evaluated SHMEMCache and shown that it has accomplished significant performance improvements over the other contemporary key-value stores, and also achieved good scalability over a thousand nodes on a leadership-class supercomputer. Secondly, to understand the implications in using various SHMEM model and one-sided communication library for big data services, we revisit the design of SHMEMCache and extend it with a portable communication interface and develop Portable-SHMEMCache. Portable-SHMEMCache is able to support a variety of one-sided communication libraries. Based on this new framework, we have supported both OpenSHMEM and MPI-RMA for SHMEMCache as proof-of-concept. We have conducted an extensive experimental analysis to evaluate the performance of Portable-SHMEMCache on two different platforms. Thirdly, we have thoroughly studied the issues existed in state-of-the-art graph processing frameworks. We have proposed salient design features to tackle their serious inefficiency and imbalance issues. The design features have been incorporated in a new graph processing framework called SHMEMGraph. Our comprehensive experiments for SHMEMGraph have demonstrated its significant performance advantages compared to state-of-the-art graph processing frameworks. This dissertation has pushed forward the big data evolution by enabling efficient representative data infrastructure and analytic frameworks on HPC systems with SHMEM-based programming models. The performance improvements compared to state-of-the-art frameworks have demonstrated the efficacy of our solution designs and the potential of leveraging HPC capabilities for big data. We believe that our work has better prepared contemporary data infrastructures and analytic frameworks for addressing the big data challenge.
Show less - Date Issued
- 2018
- Identifier
- 2019_Spring_Fu_fsu_0071E_14906
- Format
- Thesis
- Title
- Improving the Effectiveness of Performance Analysis for HPC by Using Appropriate Modeling and Simulation Schemes.
- Creator
-
Tong, Zhou, Yuan, Xin, Ke, Fengfeng, Zhang, Zhenghao, Haiduc, Sonia, Pakin, Scott D., Florida State University, College of Arts and Sciences, Department of Computer Science
- Abstract/Description
-
Performance modeling and simulation of parallel applications are critical performance analysis techniques in High Performance Computing (HPC). Efficient and accurate performance modeling and simulation can aid the tuning and optimization of current systems as well as the design of future HPC systems. As the HPC applications and systems increase in size, efficient and accurate performance modeling and simulation of parallel applications is becoming increasingly challenging. In general,...
Show morePerformance modeling and simulation of parallel applications are critical performance analysis techniques in High Performance Computing (HPC). Efficient and accurate performance modeling and simulation can aid the tuning and optimization of current systems as well as the design of future HPC systems. As the HPC applications and systems increase in size, efficient and accurate performance modeling and simulation of parallel applications is becoming increasingly challenging. In general, simulation yields higher accuracy at the cost of high simulation time in comparison to modeling. This dissertation aims at developing effective performance analysis techniques for the next generation HPC systems. Since modeling is often orders of magnitude faster than simulation, the idea is to separate HPC applications into two types: 1) the ones that modeling can produce similar performance results as simulation and 2) the ones that simulation can result in more meaningful information about the application performance than modeling. By using modeling for the first type of applications and simulation for the rest of applications, the efficiency of performance analysis can be significantly improved. The contribution of this thesis is three-fold. First, a comprehensive study of the performance and accuracy trade-offs between modeling and simulation on a wide range of HPC applications is performed. The results indicate that for the majority of HPC applications, modeling and simulation yield similar performance results. This lays the foundation for improving performance analysis on HPC systems by selecting between modeling and simulation on each application. Second, a scalable and fast classification techniques (MFACT) are developed based on the Lamport's logical clock that can provide fast diagnosis of MPI application performance bottleneck and assist in the processing of application tuning and optimization on current and future HPC systems. MFACT also classifies HPC applications into bandwidth-bound, latency-bound, communication-bound, and computation-bound. Third, built-upon MFACT, for a given system configuration, statistical methods are introduced to classify HPC applications into the two types: the ones that needs simulation and the ones that modeling is sufficient. The classification techniques and tools enable effective performance analysis for future HPC systems and applications without losing accuracy.
Show less - Date Issued
- 2017
- Identifier
- FSU_FALL2017_Tong_fsu_0071E_14074
- Format
- Thesis
- Title
- A Study on Semantic Relation Representations in Neural Word Embeddings.
- Creator
-
Chen, Zhiwei, Liu, Xiuwen, He, Zhe (Professor of Information Studies), Zhao, Peixiang, Florida State University, College of Arts and Sciences, Department of Computer Science
- Abstract/Description
-
Neural network based word embeddings have demonstrated outstanding results in a variety of tasks, and become a standard input for Natural Language Processing (NLP) related deep learning methods. Despite these representations are able to capture semantic regularities in languages, some general questions, e.g., "what kinds of semantic relations do the embeddings represent?" and "how could the semantic relations be retrieved from an embedding?" are not clear and very little relevant work has...
Show moreNeural network based word embeddings have demonstrated outstanding results in a variety of tasks, and become a standard input for Natural Language Processing (NLP) related deep learning methods. Despite these representations are able to capture semantic regularities in languages, some general questions, e.g., "what kinds of semantic relations do the embeddings represent?" and "how could the semantic relations be retrieved from an embedding?" are not clear and very little relevant work has been done. In this study, we propose a new approach to exploring the semantic relations represented in neural embeddings based on WordNet and Unified Medical Language System (UMLS). Our study demonstrates that neural embeddings do prefer some semantic relations and that the neural embeddings also represent diverse semantic relations. Our study also finds that the Named Entity Recognition (NER)-based phrase composition outperforms Word2phrase and the word variants do not affect the performance on analogy and semantic relation tasks.
Show less - Date Issued
- 2017
- Identifier
- FSU_SUMMER2017_Chen_fsu_0071N_14103
- Format
- Thesis
- Title
- Comparing Samos Document Search Performance between Apache Solr and Neo4j.
- Creator
-
Stallard, Adam Preston, Zhao, Peixiang, Smith, Shawn R., Haiduc, Sonia, Nistor, Adrian, Florida State University, College of Arts and Sciences, Department of Computer Science
- Abstract/Description
-
The Distributed Oceanographic Match-Up Service (DOMS) currently under development is a centralized service that allows researchers to easily match in situ and satellite oceanographic data from distributed sources to facilitate satellite calibration, validation, and retrieval algorithm development. The Shipboard Automated Meteorological and Oceanographic System (SAMOS) initiative provides routine access to high-quality marine meteorological and near-surface oceanographic observations from...
Show moreThe Distributed Oceanographic Match-Up Service (DOMS) currently under development is a centralized service that allows researchers to easily match in situ and satellite oceanographic data from distributed sources to facilitate satellite calibration, validation, and retrieval algorithm development. The Shipboard Automated Meteorological and Oceanographic System (SAMOS) initiative provides routine access to high-quality marine meteorological and near-surface oceanographic observations from research vessels. SAMOS is one of several endpoints connected into the DOMS network, providing in-situ data for the match-up service. DOMS in-situ endpoints currently use Apache Solr as a backend search engine on each node in the distributed network. While Solr is a high-performance solution that facilitates creation and maintenance of indexed data, it is limited in the sense that its schema is fixed. The property graph model escapes this limitation by removing any prohibiting requirements on the data model, and permitting relationships between data objects. This paper documents the development of the SAMOS Neo4j property graph database including new search possibilities that take advantage of the property graph model, performance comparisons with Apache Solr, and a vision for graph databases as a storage tool for oceanographic data. The integration of the SAMOS Neo4j graph into DOMS is also described. Various data models are explored including spatial-temporal records from SAMOS added to a time tree using Graph Aware technology. This extension provides callable Java procedures within the CYPHER query language that generate in-graph structures used in data retrieval. Neo4j excels at performing relationship and path-based queries, which challenge relational-SQL databases because they require memory intensive joins due to the limitation of their design. Consider a user who wants to find records over several years, but only for specific months. If a traditional database only stores timestamps, this type of query could be complex and likely prohibitively slow. Using the time tree model in a graph, one can specify a path from the root to the data which restricts resolutions to certain time frames (e.g., months). This query can be executed without joins, unions, or other compute-intensive operations, putting Neo4j at a computational advantage to the SQL database alternative. That said, while this advantage may be useful, it should not be interpreted as an advantage to Solr in the context of DOMS. Solr makes use of Apache Lucene indexing at its core, while Neo4j provides its own native schema indexes. Ultimately they each provide unique solutions for data retrieval that are geared for specific tasks. In the DOMS setting it would appear that Solr is the most suitable option, as there seems to be very limited use cases where Neo4j does outperform Solr. This is primarily because the use case as a subsetting tool does not require the flexibility and path-based queries that graph database tools offer. Rather, DOMS nodes are using high performance indexing structures to quickly filter large amounts of raw data that are not deeply connected, a feature of large data sets where graph queries would indeed become useful.
Show less - Date Issued
- 2017
- Identifier
- FSU_SUMMER2017_Stallard_fsu_0071N_13933
- Format
- Thesis
- Title
- Feistel-Inspired Scrambling Improves the Quality of Linear Congruential Generators.
- Creator
-
Aljahdali, Asia Othman, Mascagni, Michael, Duke, D. W. (Dennis W.), Srinivasan, Ashok (Professor of Computer Science), van Engelen, Robert, Florida State University, College of...
Show moreAljahdali, Asia Othman, Mascagni, Michael, Duke, D. W. (Dennis W.), Srinivasan, Ashok (Professor of Computer Science), van Engelen, Robert, Florida State University, College of Arts and Sciences, Department of Computer Science
Show less - Abstract/Description
-
Pseudorandom number generators (PRNGs) are an essential tool in many areas, including simulation studies of stochastic processes, modeling, randomized algorithms, and games. The performance of any PRNGs depends on the quality of the generated random sequences; they must be generated quickly and have good statistical properties. Several statistical test suites have been developed to evaluate a single stream of random numbers, such as TestU01, DIEHARD, the tests from the SPRNG package, and a...
Show morePseudorandom number generators (PRNGs) are an essential tool in many areas, including simulation studies of stochastic processes, modeling, randomized algorithms, and games. The performance of any PRNGs depends on the quality of the generated random sequences; they must be generated quickly and have good statistical properties. Several statistical test suites have been developed to evaluate a single stream of random numbers, such as TestU01, DIEHARD, the tests from the SPRNG package, and a set of tests designed to evaluate bit sequences developed at NIST. TestU01 provides batteries of test that are sets of the mentioned suites. The predefined batteries are SmallCrush (10 tests, 16 p-values) that runs quickly, Crush (96 tests, 187 p-values) and BigCrush (106 tests, 2254 p-values) batteries that take longer to run. Most pseudorandom generators use recursion to produce sequences of numbers that appear to be random. The linear congruential generator is one of the well-known pseudorandom generators, the next number in the random sequences is determined by the previous one. The recurrences start with a value called the seed. Each time a recurrence starts with the same seed the same sequence is produced. This thesis develops a new pseudorandom number generation scheme that produces random sequences with good statistical properties via scrambling linear congruential generators. The scrambling technique is based on a simplified version of Feistel network, which is a symmetric structure used in the construction of cryptographic block ciphers. The proposed research seeks to improve the quality of the linear congruential generators’ output streams and to break up the regularities existing in the generators.
Show less - Date Issued
- 2017
- Identifier
- FSU_SUMMER2017_Aljahdali_fsu_0071E_13941
- Format
- Thesis
- Title
- Dependency Collapsing in Instruction-Level Parallel Architectures.
- Creator
-
Brunell, Victor J., Whalley, David B., Tyson, Gary Scott, Yuan, Xin, Florida State University, College of Arts and Sciences, Department of Computer Science
- Abstract/Description
-
Processors that employ instruction fusion can improve performance and energy usage beyond traditional processors by collapsing and simultaneously executing dependent instruction chains on the critical path. This paper describes compiler mechanisms that can facilitate and guide instruction fusion in processors built to execute fused instructions. The compiler support discussed in this paper includes compiler annotations to guide fusion, exploring multiple new fusion configurations, and...
Show moreProcessors that employ instruction fusion can improve performance and energy usage beyond traditional processors by collapsing and simultaneously executing dependent instruction chains on the critical path. This paper describes compiler mechanisms that can facilitate and guide instruction fusion in processors built to execute fused instructions. The compiler support discussed in this paper includes compiler annotations to guide fusion, exploring multiple new fusion configurations, and developing scheduling algorithms that effectively select and order fusible instructions. The benefits of providing compiler support for dependent instruction fusion include statically detecting fusible instruction chains without the need for hardware dynamic detection support and improved performance by increasing available parallelism.
Show less - Date Issued
- 2017
- Identifier
- FSU_SUMMER2017_Brunell_fsu_0071N_14109
- Format
- Thesis
- Title
- I/O Latency in the Linux Storage Stack.
- Creator
-
Stephens, Brandon, Wang, An-I Andy, Wang, Zhi, Wang, Zuoxin, Whalley, David B., Florida State University, College of Arts and Sciences, Department of Computer Science
- Abstract/Description
-
As storage device performance increases, the lifespan of an I/O request becomes throttled more-so by data path traversal than physical disk access. Even though many computer performance analysis tools exist, a surprisingly small amount of research has been published documenting bottlenecks throughout the Linux storage stack. What research has been published focuses on results found through tracing, glossing over how the traces were performed. This work details my process of developing a...
Show moreAs storage device performance increases, the lifespan of an I/O request becomes throttled more-so by data path traversal than physical disk access. Even though many computer performance analysis tools exist, a surprisingly small amount of research has been published documenting bottlenecks throughout the Linux storage stack. What research has been published focuses on results found through tracing, glossing over how the traces were performed. This work details my process of developing a refined tracing method, what that method is, and how the research can be applied to measure I/O latency at any layer of the Linux storage stack. Sample results are given after examining the filesystem layer, the block layer, and the memory management system. Among these three components of the storage stack, the filesystem layer is responsible for the longest duration of an I/O request's lifespan.
Show less - Date Issued
- 2017
- Identifier
- FSU_FALL2017_Stephens_fsu_0071N_14270
- Format
- Thesis
- Title
- Community Search and Detection on Large Graphs.
- Creator
-
Akbas, Esra, Zhao, Peixiang, Mio, Washington, Kumar, Piyush, Liu, Xiuwen, Florida State University, College of Arts and Sciences, Department of Computer Science
- Abstract/Description
-
Modern science and technology have witnessed in the past decade a proliferation of complex data that can be naturally modeled and interpreted as graphs. In real-world networked applications, the underlying graphs oftentimes exhibit fundamental community structures supporting widely varying interconnected processes. Identifying communities may offer insight on how the network is organized. In this thesis, we worked on community detection and search problems on graph data. Community detection ...
Show moreModern science and technology have witnessed in the past decade a proliferation of complex data that can be naturally modeled and interpreted as graphs. In real-world networked applications, the underlying graphs oftentimes exhibit fundamental community structures supporting widely varying interconnected processes. Identifying communities may offer insight on how the network is organized. In this thesis, we worked on community detection and search problems on graph data. Community detection (graph clustering) has become one of the most well-studied problems in graph management and analytics, the goal of which is to group vertices of a graph into densely knitted clusters with each cluster being well separated from all the others. Classic graph clustering methods primarily take advantage of topological information of graphs to model and quantify the proximity between vertices. With the proliferation of rich, heterogeneous graph contents widely available in real-world graphs, such as user profiles in social networks, it becomes essential to consider both structures and attributive contents of graphs for better quality graph clustering. On the other hand, existing community detection methods focus primarily on discovering communities in an apriori, top-down manner with the only reference to the input graph. As a result, all communities have to be exhaustively identified thus incurring expensive time/space cost and a huge amount of fruitless computation, if only a fraction of them are of special interest to end-users. In many real-world occasions, however, people are more interested in the communities pertaining to a given vertex. In our first project, we work on attributed graph clustering problem. We propose a graph embedding approach to cluster content-enriched, attributed graphs. The key idea is to design a unified latent representation for each vertex of a graph such that both the graph connectivity and vertex attribute proximity within the localized region of the vertex can be jointly embedded into a unified, continuous vector space. As a result, the challenging attributed graph clustering problem is cast to the traditional data clustering problem. In my second and third projects, we work on a query-dependent variant of community detection, referred to as the community search problem. The objective of community search is to identify dense subgraphs containing the query vertices. We study the community search problem in the truss-based model aimed at discovering all dense and cohesive k-truss communities to which the query set Q belongs. We introduce a novel equivalence relation, k-truss equivalence, to model the intrinsic density and cohesiveness of edges in k-truss communities and based on this equivalence we create 2 different space-efficient, truss-preserving index structure, EquiTruss and TEQ. Community search for one query or multiple queries can thus be addressed upon EquiTruss and TEQ without repeated, time-demanding accesses to the original graph, G, which proves to be theoretically optimal. While query set includes one query vertex in our first project, it includes multiple query vertices in our second project. As a summary, to get better quality on attributed graph clustering, the attribute-aware cluster information is well preserved during graph embedding. While we use Skip-Gram method for embedding, there are other embedding methods. We can use them to see the effect of different embedding methods on attributed graphs. In addition, our index structure is good for community search on large graphs without considering attribute information. Using attribute information in addition to the structure may give better communities for given query nodes. So, we can update our index structure to support community search on attributed graphs.
Show less - Date Issued
- 2017
- Identifier
- FSU_FALL2017_Akbas_fsu_0071E_14173
- Format
- Thesis
- Title
- Exploring Novel Burst Buffer Management on Extreme-Scale HPC Systems.
- Creator
-
Wang, Teng, Yu, Weikuan, Erlebacher, Gordon, Whalley, David B., Wang, An-I Andy, Oral, Sarp, Florida State University, College of Arts and Sciences, Department of Computer Science
- Abstract/Description
-
The computing power on the leadership-class supercomputers has been growing exponentially over the past few decades, and is projected to reach exascale in the near future. This trend, however, will continue to push forward the peak I/O requirement for checkpoint/restart, data analysis and visualization. As a result, the conventional Parallel File System (PFS) is no longer a qualified candidate for handling the exascale I/O workloads. On one hand, the basic storage unit of the conventional PFS...
Show moreThe computing power on the leadership-class supercomputers has been growing exponentially over the past few decades, and is projected to reach exascale in the near future. This trend, however, will continue to push forward the peak I/O requirement for checkpoint/restart, data analysis and visualization. As a result, the conventional Parallel File System (PFS) is no longer a qualified candidate for handling the exascale I/O workloads. On one hand, the basic storage unit of the conventional PFS is still the hard drives, which are expensive in terms of I/O bandwidth/operation per dollar. Providing sufficient hard drives to meet the I/O requirement at exascale is prohibitively costly. On the other hand, the effective I/O bandwidth of PFS is limited by I/O contention, which occurs when multiple computing processes concurrently write to the same shared disks. Recently, researchers and system architects are exploring a new storage architecture with tiers of burst buffers (e.g. DRAM, NVRAM and SSD) deployed between the compute nodes and the backend PFS. This additional burst buffer layer offers much higher aggregate I/O bandwidth than the PFS and is designed to absorb the massive I/O workloads on the slower PFS. Burst buffers have been deployed on numerous contemporary supercomputers, and they have also become an indispensable hardware component on the next-generation supercomputers. There are two representative burst buffer architectures being explored: node-local burst buffers (burst buffers on compute nodes) and remote shared burst buffers (burst buffers on I/O nodes). Both types of burst buffers rely on a software management system to provide fast and scalable data service. However, there is still a lack of in-depth study on the software solutions and their impacts. On one hand, a number of studies on burst buffers are based on modeling and simulation, which cannot exactly capture the performance impact of various design choices. On the other hand, existing software development efforts are generally carried out by industrial companies, whose proprietary products are commercialized without releasing sufficient details on the internal design. This dissertation explores the alternative burst buffer management strategies based on research designs and prototype implementations, with a focus on how to accelerate the common scientific I/O workloads, including the bursty writes from checkpointing and bursty reads from restart/analysis/visualization. Our design philosophy is to leverage burst buffers as a fast and intermediate storage layer to orchestrate the data movement between the applications and burst buffers, as well as the data movement between burst buffers and the backend PFS. On one hand, the performance benefit of burst buffers can significantly speed up the data movement between the applications and burst buffers. On the other hand, this additional burst buffer layer offers extra capacity to buffer and reshape the write requests, and drain them to the backend PFS in a manner catering to the most effective utilization of PFS capabilities. Rooted on this design philosophy, this dissertation investigates three data management strategies. The first two strategies answer how to efficiently move data between the scientific applications and the burst buffers. These two strategies are respectively designed for the remote shared burst buffers and the node-local burst buffers. The rest one strategy aims to speed up the data movement between the burst buffers and the PFS, it is applicable to both types of burst buffers. In the first strategy, a novel burst buffer system named BurstMem is designed and prototyped to manage the remote shared burst buffers. BurstMem expedites scientific checkpointing by quickly buffering the checkpoints in the burst buffers after each round of computation and asynchronously flushing the datasets to the PFS during the next round of computation. It outperforms the state-of-the-art data management systems with efficient data transfer, buffering and flushing. In the second strategy, we have designed and prototyped an ephemeral burst buffer file system named BurstFS to manage the node-local burst buffers. BurstFS delivers scalable write bandwidth by having each process write to its node-local burst buffer. It also provides fast and temporary data sharing service for multiple coupled applications in the same job. In the third strategy, a burst buffer orchestration framework named TRIO is devised to address I/O contention on the PFS. TRIO buffers scientific applications' bursty write requests, and dynamically adjusts the flush order of all the write requests to avoid multiple burst buffers' competing flush on the same disk. Our experiments demonstrate that by addressing I/O contention, TRIO not only improves the storage bandwidth utilization but also minimizes the average I/O service time for each job. Through systematic experiments and comprehensive evaluation and analysis, we have validated our design and management solutions for burst buffers can significantly accelerate scientific I/O for the next-generation supercomputers.
Show less - Date Issued
- 2017
- Identifier
- FSU_2017SP_Wang_fsu_0071E_13703
- Format
- Thesis
- Title
- MAR: Mobile Augmented Reality in Indoor Environment.
- Creator
-
Alahmadi, Mohammad Neal, Yang, Jie, Mascagni, Michael, Haiduc, Sonia, Florida State University, College of Arts and Sciences, Department of Computer Science
- Abstract/Description
-
For decades, augmented reality has been used to allow a person to visualize an overlay of annotations, videos, and images on physical objects using a camera. Due to the high computational processing cost that is required to match an image from among an enormous number of images, it has been daunting to use the concept of augmented reality on a smartphone without significant processing delays. Although the Global Positioning System (GPS) can be very useful for the outdoor localization of an...
Show moreFor decades, augmented reality has been used to allow a person to visualize an overlay of annotations, videos, and images on physical objects using a camera. Due to the high computational processing cost that is required to match an image from among an enormous number of images, it has been daunting to use the concept of augmented reality on a smartphone without significant processing delays. Although the Global Positioning System (GPS) can be very useful for the outdoor localization of an object, GPS is not suitable for indoor localization. To address the problem of indoor localization, we propose using mobile augmented reality in an indoor environment. Since most smartphones have many useful sensors such as accelerometers, magnetometers and Wi-Fi sensors, we can leverage these sensors to locate the phone’s location, the phone’s field of view, and the phone’s angle of view. Using Mobile Augmented Reality (MAR) based on processing data from several smartphone sensors, we can achieve indoor localization with reduced processing time. We tested MAR in simulated environments, and deployed the system in the Love building (LOV) at Florida State University. We used 200 images in the simulated environment, and compared the matching processing time between multiple object recognition algorithms and reduced the matching time from 2.8 seconds to only 0.17 second using a brisk algorithm.
Show less - Date Issued
- 2017
- Identifier
- FSU_FALL2017_Alahmadi_fsu_0071N_13939
- Format
- Thesis
- Title
- Segmentation and Structure Determination in Electron Microscopy.
- Creator
-
Banerjee Mukherjee, Chaity, Liu, Xiuwen, Taylor, Kenneth A., Barbu, Adrian G., Kumar, Piyush, Tyson, Gary Scott, Florida State University, College of Arts and Sciences,...
Show moreBanerjee Mukherjee, Chaity, Liu, Xiuwen, Taylor, Kenneth A., Barbu, Adrian G., Kumar, Piyush, Tyson, Gary Scott, Florida State University, College of Arts and Sciences, Department of Computer Science
Show less - Abstract/Description
-
One of the goals of biology is to be able to understand the structure and interaction of macromolecules, to be able to better understand life at a macromolecular level. One of the most important inventions that revolutionized the study of macromolecular structures is that of the electron microscope. Electron microscopes are used for studying three dimensional structures of macromolecular assemblies using 2D and 3D geometry. The underlying principle of 3D reconstruction from 2D projections is...
Show moreOne of the goals of biology is to be able to understand the structure and interaction of macromolecules, to be able to better understand life at a macromolecular level. One of the most important inventions that revolutionized the study of macromolecular structures is that of the electron microscope. Electron microscopes are used for studying three dimensional structures of macromolecular assemblies using 2D and 3D geometry. The underlying principle of 3D reconstruction from 2D projections is well understood and forms the basis of electron microscopy. Depending on the type of structure under investigation, either electron tomography is used where the structures are heterogeneous or in case that they are homogeneous single particle electron microscopy is used. Whatever be the underlying source of the data, tomography or single particle, they involve significant amounts of computational problems. Many of these problems have been studied in other branches of computer science, like in computer vision and machine learning. However, until very recently, there has not been a significant exchange of ideas between these two disparate communities. This work is a step in that direction. We study two problems: the first related to the well-studied problem of segmentation but in the context of electron tomography. The second relates to that of studying the macromolecular structure of actin-myosin interaction using 3D reconstruction of single particle electron microscopic data. We hope that this would be the beginning of a formal interaction between the two fields that has the potential to enrich each other tremendously.
Show less - Date Issued
- 2017
- Identifier
- FSU_FALL2017_BanerjeeMukherjee_fsu_0071E_14065
- Format
- Thesis
- Title
- Early Detection of Alzheimer's Disease Based on Volume and Intensity Changes in MRI Images.
- Creator
-
Kanel, Prabesh, Liu, Xiuwen, Grant, Samuel C., Tyson, Gary Scott, Kumar, Piyush, Florida State University, College of Arts and Sciences, Department of Computer Science
- Abstract/Description
-
Alzheimer's disease (AD) is one of the top 10 leading causes of death in the US; it debilitates memory and impairs cognition. The current core clinical criteria for diagnosis of AD are based on functional deficits and cognitive impairments that do not include the advanced imaging techniques or cerebrospinal fluid analysis; the final confirmation of the disease is only possible at the time of autopsy when neurofibrillary tangles and beta-amyloids are present in a brain tissue examination. The...
Show moreAlzheimer's disease (AD) is one of the top 10 leading causes of death in the US; it debilitates memory and impairs cognition. The current core clinical criteria for diagnosis of AD are based on functional deficits and cognitive impairments that do not include the advanced imaging techniques or cerebrospinal fluid analysis; the final confirmation of the disease is only possible at the time of autopsy when neurofibrillary tangles and beta-amyloids are present in a brain tissue examination. The distributions of these particular pathogens (neurofibrillary tangles and beta-amyloids) follow a pattern that is useful to identify different stages of AD at the time of the autopsy by looking at the presence of pathogens in the areas of the brain. The pathogens are first seen in entorhinal/perirhinal cortex, and then spread to hippocampus cornu ammonis subfields, followed by association cortex and finally the rest of the brain. This disease progression is standard and described in NIA-RI guidelines. In the last decades, with the introduction of advanced imaging techniques in research settings, many in vivo based research methods have been focusing on the volumetric measurements of the hippocampus and its subfields in MRI images and using them as additional information for early diagnosis of AD. While the hippocampal volume provides excellent diagnostic aid, it doesn't address both the pathogens associated with AD and the progression of the pathogens within the different subregions of the hippocampus. The hippocampus formation is a complex circuit that spans the temporal lobes and found to have distinctive subregions. These subregions are subject to different influence by AD at different stages. Since the disease progression as seen in pathogen distributions follows a pattern, studying the pattern of the regional changes will allow us to predict which stage the disease is at. These pathological shifts in regions of the brain are studied extensively in ex vivo MRI as well as during autopsy but not in in vivo MRI. Considering that the brain areas with neurofibrillary tangles and beta-amyloids show hypointensity (PD-weighted) and hyperintensity voxels (T2-weighted) in the MRI images, we suggest an in vivo study using normalized MRI images taken from the Alzheimer's Disease Neuroimaging Initiative (ADNI) database. We analyze the pattern of changes in the hippocampal region using a volume-based method in normalized T1-weighted MRI and an intensity-based analysis in normalized PD-weighted MRI. We use the volume-based method to calculate the changes in the volume of each of the hippocampus subfields. For the intensity-based method, we count the number of hypointensity (intensity value < 125) voxel combinations from the Gray-Level Co-occurrence Matrix (GLCM); a normalized MRI images are used inorder to minimize intensity variation. We then use the data (volume and intensity changes) to construct decision trees which classify the MRI images into three categories: normal control (NC), mild cognitive impairment (MCI) and AD. We have found that the volume-based decision trees detect AD MRI images with an accuracy of 75 % but failed to detect NC and MCI MRI images with the same level of accuracy. Whereas, with the intensity-based decision trees, we were able to classify MRI images into NC, MCI and AD categories each with an equally high level of accuracy (above 86 %). To find out how reliable the intensity-based method is in classifying MRI images, we introduced noises to our images. The addition of noises forced some adjustments in our decision trees. The accuracy of decision tree classification decreased in the presence of noises. However, even in the presence of the additional noises, we noticed that the intensity-based method outperforms volume-based method. The classification of MRI images improves when both measures (intensity-based and volume-based) are used in constructing our decision trees. This study has demonstrated that the inclusion of the intensity measurements of PD-weighted MRI images in AD studies may provide a more accurate way to model the natural progression of AD in vivo and contribute to the early diagnosis of AD.
Show less - Date Issued
- 2017
- Identifier
- FSU_2017SP_Kanel_fsu_0071E_13744
- Format
- Thesis
- Title
- An Effective and Efficient Approach for Clusterability Evaluation.
- Creator
-
Adolfsson, Andreas, Ackerman, Margareta, Brownstein, Naomi Chana, Haiduc, Sonia, Tyson, Gary Scott, Florida State University, College of Arts and Sciences, Department of...
Show moreAdolfsson, Andreas, Ackerman, Margareta, Brownstein, Naomi Chana, Haiduc, Sonia, Tyson, Gary Scott, Florida State University, College of Arts and Sciences, Department of Computer Science
Show less - Abstract/Description
-
Clustering is an essential data mining tool that aims to discover inherent cluster structure in data. As such, the study of clusterability, which evaluates whether data possesses such structure, is an integral part of cluster analysis. Yet, despite their central role in the theory and application of clustering, current notions of clusterability fall short in two crucial aspects that render them impractical; most are computationally infeasible and others fail to classify the structure of real...
Show moreClustering is an essential data mining tool that aims to discover inherent cluster structure in data. As such, the study of clusterability, which evaluates whether data possesses such structure, is an integral part of cluster analysis. Yet, despite their central role in the theory and application of clustering, current notions of clusterability fall short in two crucial aspects that render them impractical; most are computationally infeasible and others fail to classify the structure of real datasets. In this thesis, we propose a novel approach to clusterability evaluation that is both computationally efficient and successfully captures the structure in real data. Our method applies multimodality tests to the (one-dimensional) set of pairwise distances based on the original, potentially high-dimensional data. We present extensive analyses of our approach for both the Dip and Silverman multimodality tests on real data as well as 17,000 simulations, demonstrating the success of our approach as the first practical notion of clusterability.
Show less - Date Issued
- 2016
- Identifier
- FSU_SUMMER2017_Adolfsson_fsu_0071N_13478
- Format
- Thesis
- Title
- Enhancing Infiniband with Openflow-Style SDN Capability.
- Creator
-
Lee, Jason, Yuan, Xin, Zhang, Zhenghao, Yu, Weikuan, Florida State University, College of Arts and Sciences, Department of Computer Science
- Abstract/Description
-
InfiniBand is the de facto networking technology for commodity HPC clusters and has been widely deployed. However, most production large-scale InfiniBand clusters use simple routing schemes such as the destination-mod-k routing to route traffic, which may result in degraded communication performance. In this work, I investigate using the OpenFlow-style Software-Defined Networking (SDN) technology to overcome the routing deficiency in InfiniBand. I design an enhanced InfiniBand with OpenFlow...
Show moreInfiniBand is the de facto networking technology for commodity HPC clusters and has been widely deployed. However, most production large-scale InfiniBand clusters use simple routing schemes such as the destination-mod-k routing to route traffic, which may result in degraded communication performance. In this work, I investigate using the OpenFlow-style Software-Defined Networking (SDN) technology to overcome the routing deficiency in InfiniBand. I design an enhanced InfiniBand with OpenFlow-style SDN capability and demonstrate a use case that illustrates how the SDN capability can be exploited in HPC clusters to improve the system and application performance. Finally, I quantify the potential benefits of InfiniBand with OpenFlow-style SDN capability in balancing the network load by simulating job traces from production HPC clusters. The results indicate that InfiniBand with SDN capability can achieve much better network load balancing than traditional InfiniBand for HPC clusters.
Show less - Date Issued
- 2016
- Identifier
- FSU_FA2016_Lee_fsu_0071N_13520
- Format
- Thesis
- Title
- There & Never Back Again: A Walk-on-Subdomains Tale.
- Creator
-
Hamlin, Preston William, Mascagni, Michael, Haiduc, Sonia, van Engelen, Robert, Florida State University, College of Arts and Sciences, Department of Computer Science
- Abstract/Description
-
Simulation software is used in a multitude of industry and academic fields, in an assortment of scopes. One might be interested in gravitation of celestial bodies, or field structures of molecular interactions. The scale to which these simulations can grow demands an equally scalable computational mechanism. Traditional and even accelerated solvers suffer from a lack of general scalability, depending on multiple input aspects and recursive refinement. Monte Carlo solvers offer a highly...
Show moreSimulation software is used in a multitude of industry and academic fields, in an assortment of scopes. One might be interested in gravitation of celestial bodies, or field structures of molecular interactions. The scale to which these simulations can grow demands an equally scalable computational mechanism. Traditional and even accelerated solvers suffer from a lack of general scalability, depending on multiple input aspects and recursive refinement. Monte Carlo solvers offer a highly scalable computational mechanism, as it is not prone to the curse of dimensionality and the error can be driven down simply by taking more samples. In the course of refactoring such a Monte Carlo simulation software artefact, several anomalies were noted in its implementation structure. Through attempts at remediating the undesirable behaviour, a general problem of susceptibility was discovered for the Walk-on-Subdomains family of algorithms.
Show less - Date Issued
- 2016
- Identifier
- FSU_FA2016_Hamlin_fsu_0071N_13643
- Format
- Thesis
- Title
- Soft Error Event Monte Carlo Modeling and Simulation: Impacts of Soft Error Events on Computer Memory.
- Creator
-
Ogden, Christopher, Mascagni, Michael V., Duke, D. W. (Dennis W.), Kumar, Piyush, Florida State University, College of Arts and Sciences, Department of Computer Science
- Abstract/Description
-
This dissertation addresses the creation of a unique, adaptable, and light-weight core methodology to address the problem of Soft Error Modeling and Simulation. This core methodology was successfully tailored, validated, and expanded to work with a diverse cross-section of realistic memory devices, reliability techniques, and soft error event behaviors. These devices were shielded by a mutually supporting trio of reliability techniques while under the threat of soft error events. The...
Show moreThis dissertation addresses the creation of a unique, adaptable, and light-weight core methodology to address the problem of Soft Error Modeling and Simulation. This core methodology was successfully tailored, validated, and expanded to work with a diverse cross-section of realistic memory devices, reliability techniques, and soft error event behaviors. These devices were shielded by a mutually supporting trio of reliability techniques while under the threat of soft error events. The techniques included in this dissertation are: (1) error correction codes, (2) interleaving distance, and (3) scrubbing. The strike-times, soft error event types, and bit error severities of the Soft Error Events were stochastically estimated using publicly available research findings published from a variety of proprietary reliability data sources. This proprietary data was gathered from certain secret vendor-specific computer memory devices. Both logically-oriented and physically-oriented memory cell organizational perspectives were incorporated into the core-methodology that was tailored to create the Simulators implemented within this dissertation. The failure probabilities of memory devices were calculated by the simulators that were designed and implemented within this dissertation. The results of these simulations were validated for specific test cases against the published literature models. This core methodology was applied to create scalable Simulators that were implemented utilizing a variety of soft error event behavioral characteristics, memory device design constraints, and reliability technique parameters. This core methodology and the simulators created from its application may be utilized by researchers to address a variety of open research questions in the field. An open research question was answered within this dissertation as proof of the effectiveness of the core methodology. This particular research question concerned establishing the significance of Soft Error Event (SEE) topography by studying the impact of Topographically reflective SEEs on the overall failure probability and corresponding reliability of the simulated memory device over time. To address this open research question, the Topographic 2-Parameter Weibull Soft Error (T2P-WSE) Simulator stochastically estimates the topographic strike-patterns of SEE severities based on the most commonly encountered Multiple Cell Upset shapes gathered by a commercial grade 3D-TCAD-based Neutron Particle Strike Simulation in a generic 45 nm SRAM (Static Random Access Memory) memory device. Both the failure probability and reliability results generated by the Topographic 2-Parameter Weibull Soft Error (T2P-WSE) Simulator were shown to be significantly different from the Row-Depth-Only 2-Parameter Weibull Soft Error Simulator (S2P-WSE) when given equivalent inputs. As documented within this dissertation, this conclusion was verified and confirmed from both a visual and statistical standpoint. Topography was observed to play a significant role in the overall failure probability of the device. It was concluded that the failure probability of the T2P-WSE Simulator was significantly reduced in comparison to the failure probability of the S2P-WSE Simulator. As defined for a variety of input parameters, the S2P-WSE Simulator consistently over-estimated the failure probability of the device. The reason for this outcome is directly related to the row-depth-only bit error severity assumption of the S2P-WSE Simulator. The row-depth-only assumption forces every MCU SEE impacts the device to spread its bit errors in a fixed row-depth-only pattern as opposed to a more realistic topographic pattern incorporated such as the patterns encoded into the T2P-WSE Simulator for the 45 nm memory chip geometry. This conclusion only served to reinforce the initial observation that when taking into account the spread of the bit errors, one would significantly reduce the overall failure probability for a memory storage device implemented with an interleaving distance architecture by taking into account its topographic shape. The core methodology calls for the stochastic estimation of the strike-time, type, and bit-error severity that represent all simulated soft error events destined to impact the simulated device at some simulation time unit over the total simulation run-time. These Soft Error Events will strike the device at the appointed strike-time and be mitigated by the chosen set of mutually supporting reliability techniques. These reliability techniques include the following: (1) error correcting codes, (2) interleaving distance, and (3) scrubbing. This core methodology was fitted to the Compound Poisson and a logical memory cell organization for the Compound Poisson Soft Error Simulator. This core methodology was also successfully applied to the 2 Parameter Weibull Failure Distribution and a Physical Memory Cell organization. Both CPSE and S2P-WSE Simulators proved equally capable in calculating the failure probability of any variety of simulated memory storage devices shielded by the three integrated reliability techniques under the Impact of these stochastically determined soft error events. This failure probability over simulated time was utilized to evaluate all of the secondary results of the Core Methodology including such results as the Mean-Time-To Failure and Failures-In-Time Number at the conclusion of each simulation run. All of the simulators presented within this dissertation were implemented within a Matlab programming environment.
Show less - Date Issued
- 2016
- Identifier
- FSU_2016SU_Ogden_fsu_0071E_13415
- Format
- Thesis
- Title
- Optimizing Transfers of Control in the Static Pipeline Architecture.
- Creator
-
Baird, Ryan R., Whalley, David B., Tyson, Gary Scott, Yuan, Xin, Florida State University, College of Arts and Sciences, Department of Computer Science
- Abstract/Description
-
Statically pipelined processors offer a new way to improve the performance beyond that of a traditional in-order pipeline while simultaneously reducing energy usage by enabling the compiler to control more fine-grained details of the program execution. This paper describes how a compiler can exploit the features of the static pipeline architecture to apply optimizations on transfers of control that are not possible on a conventional architecture. The optimizations presented in this paper...
Show moreStatically pipelined processors offer a new way to improve the performance beyond that of a traditional in-order pipeline while simultaneously reducing energy usage by enabling the compiler to control more fine-grained details of the program execution. This paper describes how a compiler can exploit the features of the static pipeline architecture to apply optimizations on transfers of control that are not possible on a conventional architecture. The optimizations presented in this paper include hoisting the target address calculations for branches, jumps, and calls out of loops, performing branch chaining between calls and jumps, hoisting the setting of return addresses out of loops, and exploiting conditional calls and returns. The benefits of performing these transfer of control optimizations include a 6.8% reduction in execution time and a 3.6% decrease in estimated energy usage.
Show less - Date Issued
- 2016
- Identifier
- FSU_2016SU_Baird_fsu_0071N_13241
- Format
- Thesis
- Title
- Evaluation of a Benchmark Suite Exposing Android System Complexities Using Region-Based Caching.
- Creator
-
Brown, Martin Kenneth, Tyson, Gary Scott, DeBrunner, Linda S., Whalley, David B., Yuan, Xin, Florida State University, College of Arts and Sciences, Department of Computer Science
- Abstract/Description
-
The computer architecture community relies on standard benchmark suites like MiBench, NAS, PARSEC, SPEC CPU2006 (SPEC)®, and SPLASH to study different hardware designs, but such suites are insufficient for evaluating mobile platforms like Android. Even suites that were developed for embedded systems cannot be used to gain an understanding of Android device/system interaction because they do not exercise key components of the software stack. Although based on a conventional Linux ® kernel,...
Show moreThe computer architecture community relies on standard benchmark suites like MiBench, NAS, PARSEC, SPEC CPU2006 (SPEC)®, and SPLASH to study different hardware designs, but such suites are insufficient for evaluating mobile platforms like Android. Even suites that were developed for embedded systems cannot be used to gain an understanding of Android device/system interaction because they do not exercise key components of the software stack. Although based on a conventional Linux ® kernel, Android includes native libraries, a virtual machine runtime, and an application framework with multiple components for managing resources. All these interact in complex ways to support Android applications. C programs running on Linux have a relatively simple virtual memory organization, and most memory references come from the application code. In contrast, Android has a much more complex virtual memory organization (due to its multiple APIs and numerous shared libraries), and most memory references come from the Android software stack. The complexity of Android's execution environment provides opportunities for computer architects to better support the execution characteristics, structures, and resource requirements of the Android software stack and opportunities for software developers to optimize their applications for this rich environment. To help the community to exploit these opportunities, we introduce Agave, an open-source benchmark suite designed to expose the complex interactions between components of the Android software stack.
Show less - Date Issued
- 2016
- Identifier
- FSU_FA2016_Brown_fsu_0071E_13594
- Format
- Thesis
- Title
- Preventing Cyber-Induced Irreversible Physical Damage to Cyber-Physical Systems.
- Creator
-
Yang, Jaewon, Liu, Xiuwen, Kim, Daekwan, Burmester, Mike, Duan, Zhenhai, Florida State University, College of Arts and Sciences, Department of Computer Science
- Abstract/Description
-
With the advancement information and communication technologies, networked computing devices have been adopted to address real-world challenges due to their efficiency and programmability while maintaining scalability, sustainability, and resilience. As a result, computing and communication technologies have been integrated into critical infrastructures and other physical processes. Cyber physical systems (CPS) integrate computation and physical processes of critical infrastructure systems....
Show moreWith the advancement information and communication technologies, networked computing devices have been adopted to address real-world challenges due to their efficiency and programmability while maintaining scalability, sustainability, and resilience. As a result, computing and communication technologies have been integrated into critical infrastructures and other physical processes. Cyber physical systems (CPS) integrate computation and physical processes of critical infrastructure systems. Historically, these systems mostly relied on proprietary technologies and were built as stand-alone systems in physically secure locations. However, the situation has changed considerably in recent years. Commodity hardware, software, and standardized communication technologies are used in CPS to enhance their connectivity, provide better accessibility to costumers and maintenance personnel, and improve overall efficiency and robustness of their operations. Unfortunately, increased connectivity, efficiency, and openness have also significantly increased vulnerabilities of CPS to cyber attacks. These vulnerabilities could allow attackers to alter the systems' behavior and cause irreversible physical damage, or even worse cyber-induced disasters. However, existing security measures cannot be effectively applied to CPS directly because they are mostly for cyber only systems. Thus, new approaches to preventing cyber physical system disasters are essential. We recognize very different characteristics of cyber and physical components in CPS, where cyber components are flexible with large attack surfaces while physical components are inflexible and relatively simple with very small attack surfaces. This research focuses on the components where cyber and physical components interact. Securing cyber-physical interfaces will complete a layer-based defense strategy in the "Defense in Depth Framework". In this research we propose Trusted Security Modules (TSM) as a systematic solution to provide a guarantee to prevent cyber-induced physical damage even when operating systems and controllers are compromised. TSMs will be placed at the interface between cyber and physical components by adapting the existing integrity enforcing mechanisms such as Trusted Platform Module (static integrity), Control-Flow Integrity (dynamic integrity) to enhance its own security and integrity. Through this dissertation we introduce the general design and number of ways to implement the TSM. We also show the behaviors of TSM with a working prototype and simulation.
Show less - Date Issued
- 2016
- Identifier
- FSU_2016SP_Yang_fsu_0071E_13064
- Format
- Thesis
- Title
- Contributions to Problems in Topology Control of Unmanned Aerial Systems.
- Creator
-
Mukherjee, Tathagata, Kumar, Piyush, Sinha, Debajyoti, Liu, Xiuwen, Zhao, Peixiang, Jones, Faye, Florida State University, College of Arts and Sciences, Department of Computer...
Show moreMukherjee, Tathagata, Kumar, Piyush, Sinha, Debajyoti, Liu, Xiuwen, Zhao, Peixiang, Jones, Faye, Florida State University, College of Arts and Sciences, Department of Computer Science
Show less - Abstract/Description
-
Network of unmanned vehicles, are poised to be the next giant leap of technology. Such systems are already being used by the defense and law enforcement agencies. The US DoD and some very large private corporations are spending large sums of money for developing intellectual properties that will assist in implementing such systems. Amazon is testing its own drones for delivery whereas Google, Tesla, Ford and Mercedes Benz are investing in self driving cars, that can operate in real traffic...
Show moreNetwork of unmanned vehicles, are poised to be the next giant leap of technology. Such systems are already being used by the defense and law enforcement agencies. The US DoD and some very large private corporations are spending large sums of money for developing intellectual properties that will assist in implementing such systems. Amazon is testing its own drones for delivery whereas Google, Tesla, Ford and Mercedes Benz are investing in self driving cars, that can operate in real traffic conditions. Their goal is to make the vehicles either operate in a standalone mode or in collaboration with other similar vehicles. This push towards the use of unmanned systems comes with its own set of new problems. Most of these problems would require solutions in real time. This has created a lot of scope for innovative research using these networks and the research problems fall broadly into two categories: the first relates to the development of hardware and robotics, which is concerned with building better hardware to implement such systems. The second category is concerned with the design and implementation of efficient algorithms, that work with the implemented hardware. These algorithms in turn belong to three broad categories, namely: navigation and localization algorithms, algorithms for dynamic topology determination for efficient communication and information exchange, consensus algorithms for multi-sensor data fusion that in turn help in navigation and localization. In this thesis we consider four problems of interest, each of which belongs to one of these categories: first we consider the problem of GPS-denied navigation using signals of opportunity, which belongs to the class of algorithms for navigation and localization. Next we consider two problems from the second category. More precisely we consider the problem of spanners, which are sparse subgraphs of an input graph having nice structural properties. We study two variations of the spanner problem: the first one arises in communication networks and aims to guarantee the communication efficiency of a sparse subgraph of the input and the second one studies construction of spanners having robustness guarantees. Finally, we consider algorithms for truth determination from multiple sources, which is a type of consensus algorithm.
Show less - Date Issued
- 2016
- Identifier
- FSU_FA2016_Mukherjee_fsu_0071E_13450
- Format
- Thesis
- Title
- Game Based Visual-to-Auditory Sensory Substitution Training.
- Creator
-
Marshall, Justin B., Tyson, Gary Scott, Erlebacher, Gordon, Liu, Xiuwen, Ackerman, Margareta, Florida State University, College of Arts and Sciences, Department of Computer Science
- Abstract/Description
-
There has been a great deal of research devoted to computer vision related assistive technologies. Unfortunately, this area of research has not produced many usable solutions. The long cane and the guard dog are still far more useful than most of these devices. Through the push for advanced mobile and gaming systems, new low-cost solutions have become available for building innovative and creative assistive technologies. These technologies have been used for sensory substitution projects that...
Show moreThere has been a great deal of research devoted to computer vision related assistive technologies. Unfortunately, this area of research has not produced many usable solutions. The long cane and the guard dog are still far more useful than most of these devices. Through the push for advanced mobile and gaming systems, new low-cost solutions have become available for building innovative and creative assistive technologies. These technologies have been used for sensory substitution projects that attempt to convert vision into either auditory or tactile stimuli. These projects have reported some degree of measurable success. Most of these projects focused on converting either image brightness or depth into auditory signals. This research was devoted to the design and creation of a video game simulator that was capable of performing research and training for these sensory substitution concepts that converts vision into auditory stimuli. The simulator was used to perform direct comparisons between some of the popular sensory substitution techniques as well as exploring new concepts for conversion. This research of 42 participants tested different techniques for image simplification and discovered that using depth-to-tone sensory substitution may be more usable than brightness-to-tone simulation. The study has shown that using 3D game simulators can be used in lieu of building costly prototypes for testing new sensory substitution concepts.
Show less - Date Issued
- 2015
- Identifier
- FSU_2015fall_Marshall_fsu_0071E_12749
- Format
- Thesis
- Title
- Pyquery: A Search Engine for Python Packages and Modules.
- Creator
-
Imminni, Shiva Krishna, Kumar, Piyush, Haiduc, Sonia, Ackerman, Margareta, Florida State University, College of Arts and Sciences, Department of Computer Science
- Abstract/Description
-
Python Package Index (PyPI) is a repository that hosts all the packages ever developed for the Python community. It hosts thousands of packages from different developers and for the Python community, it is the primary source for downloading and installing packages. It also provides a simple web interface to search for these packages. A direct search on PyPI returns hundreds of packages that are not intuitively ordered, thus making it harder to find the right package. Developers consequently...
Show morePython Package Index (PyPI) is a repository that hosts all the packages ever developed for the Python community. It hosts thousands of packages from different developers and for the Python community, it is the primary source for downloading and installing packages. It also provides a simple web interface to search for these packages. A direct search on PyPI returns hundreds of packages that are not intuitively ordered, thus making it harder to find the right package. Developers consequently resort to mature search engines like Google, Bing or Yahoo which redirect them to the appropriate package homepage at PyPI. Hence, the first task of this thesis is to improve search results for python packages. Secondly, this thesis also attempts to develop a new search engine that allows Python developers to perform a code search targeting python modules. Currently, the existing search engines classify programming languages such that a developer must select a programming language from a list. As a result every time a developer performs a search operation, he or she has to choose Python out of a plethora of programming languages. This thesis seeks to offer a more reliable and dedicated search engine that caters specifically to the Python community and ensures a more efficient way to search for Python packages and modules.
Show less - Date Issued
- 2015
- Identifier
- FSU_2015fall_Imminni_fsu_0071N_12969
- Format
- Thesis
- Title
- Utilizing Cutting-Edge Computational Biology Methods in the Genomic Analysis of Florida Endangered Species.
- Creator
-
Stribling, Daniel B., Department of Computer Science
- Abstract/Description
-
Over the past decade, the technologies used to obtain sequencing data from biological tissues have significantly improved. This has resulted in a marked increase in the ability of biological researchers to collect unprecedented quantities of large-scale DNA sequence data in a short timeframe. Recent developments in genome sequencing algorithms have allowed bioinformatics utilities to begin to take full advantage of this data, paving the way for significant increases in our understanding of...
Show moreOver the past decade, the technologies used to obtain sequencing data from biological tissues have significantly improved. This has resulted in a marked increase in the ability of biological researchers to collect unprecedented quantities of large-scale DNA sequence data in a short timeframe. Recent developments in genome sequencing algorithms have allowed bioinformatics utilities to begin to take full advantage of this data, paving the way for significant increases in our understanding of genomics. New methods of genomics research have now created many new opportunities for discoveries in fields such as conservation ecology, personalized medicine, and the study of genetic disease. This research project consist of two major components: the utilization of recently-developed computational biology methods to perform sequence assembly on native Florida Species, and the creation of new bioinformatics utilities to facilitate genomics research. This project includes the completion of the first stage of the Florida Endangered Species Sequencing Project, the assembly and annotation of the transcriptome of the Florida wolf spider: Schizocosa ocreata, and a preliminary analysis of differential gene expression in ocreata organisms. Initial work is also included on Florida Endangered Species Sequencing Project Stage Two: sequence assembly projects for the Florida Manatee and the Gopher Tortoise. Discussion is included of two new computational biology utilities: TFLOW, a transcriptome assembly pipeline designed to facilitate de novo transcriptome assembly projects, and ongoing development of the GATTICA web-based bioinformatics toolkit. The TFLOW package has been released for download through the FSU Center for Genomics and Personalized Medicine.
Show less - Date Issued
- 2015
- Identifier
- FSU_migr_uhm-0503
- Format
- Thesis
- Title
- SIDR: Scalable Inter-Domain Routing Protocol for Information Centric Networking.
- Creator
-
Kim, Sangmun, Duan, Zhenhai, Liu, Xiuwen, Wang, Zhi, Florida State University, College of Arts and Sciences, Department of Computer Science
- Abstract/Description
-
The current Internet architecture reveals its chronic limitations against the new paradigms of networking. The advancement of technology has changed the network use from communication to content distribution. Unfortunately, the IP-based network architecture is inadequate to support such a trend because it has been developed from a traditional communication model like one-to-one phone call in which users should know where the others are. In order to solve this problem, current innovative...
Show moreThe current Internet architecture reveals its chronic limitations against the new paradigms of networking. The advancement of technology has changed the network use from communication to content distribution. Unfortunately, the IP-based network architecture is inadequate to support such a trend because it has been developed from a traditional communication model like one-to-one phone call in which users should know where the others are. In order to solve this problem, current innovative researches propose a way to place content itself on the center of communication, i.e., Information Centric Networking (ICN) or Content Centric Networking (CCN) [16], so that all the entities on the network can communicate with each other using content name, regardless the location of the content. As a result of these efforts, NDN [29] was proposed as an instance of ICN, and has been actively researched. Additionally, Wang et al. [26] and Hoque et al. [15] suggested OSPFN and NLSR, respectively, as for intra-domain routing protocols in order to support this new generation of networking, but to the best of our knowledge, no inter-domain routing protocol has been proposed yet. In this thesis, we develop an inter-domain routing protocol for ICN, called SIDR, Scalable Inter-Domain Routing protocol, which can support the scalability using domain names of networks. Since our protocol is established based on BGP, it follows the message formats and attributes of the protocol. In addition, it also consists of E-SIDR and I-SIDR, in which E-SIDR distributes the reachability information between ASes, while I-SIDR announces the information inside an AS. However, the security features of ICN enable SIDR to be secure as S-BGP with simpler mechanisms. Additionally, I-SIDR is not suffering from complex routing techniques, such as route reflection or BGP confederation, because of the loop prevention property of the new networking architecture. As a result of our simulation, E-SIDR works exactly the same as EBGP during announcements and withdrawals. And also, I-SIDR shows competitive performances compared to IBGP. Even though I-SIDR generates the same or more number of messages, it stabilizes networks within a short space of time as IBGP. Throughout this paper, we will show that how our protocol can support the scalability and how it uses the desirable features of the new network model.
Show less - Date Issued
- 2015
- Identifier
- FSU_migr_etd-9630
- Format
- Thesis
- Title
- Plugging I/O Resource Leaks in General Purpose Real-Time Systems.
- Creator
-
Stanovich, Mark J., Baker, Theodore P., Wang, An-I Andy, Collins, E. (Emmanuel), Kumar, Piyush, Florida State University, College of Arts and Sciences, Department of Computer...
Show moreStanovich, Mark J., Baker, Theodore P., Wang, An-I Andy, Collins, E. (Emmanuel), Kumar, Piyush, Florida State University, College of Arts and Sciences, Department of Computer Science
Show less - Abstract/Description
-
I/O services provided by general-purpose operating systems and commodity hardware are designed for achieving high average-case performance without worrying about timing constraints. A common trend is to use such components to build systems that expect to meet timing constraints in order to function properly. The area of real-time (RT) theoretical analysis provides techniques for designing such a system to meet timing constraints, however, the optimizations used for average-case performance do...
Show moreI/O services provided by general-purpose operating systems and commodity hardware are designed for achieving high average-case performance without worrying about timing constraints. A common trend is to use such components to build systems that expect to meet timing constraints in order to function properly. The area of real-time (RT) theoretical analysis provides techniques for designing such a system to meet timing constraints, however, the optimizations used for average-case performance do not lend themselves to real-time (RT) theoretical analysis techniques. The most common approaches for dealing with this problem on general-purpose systems is to either disable optimizations in order to apply RT analysis techniques or to forego applying any RT analysis. Neither of these approaches tends to work well. Disabling optimizations often results in poor performance that can support only a fraction of the average-case workload. On the other hand, foregoing RT analysis does not take advantage of very powerful techniques to design a system to meet timing constraints. The thesis of this dissertation is that optimizations for providing I/O service on general purpose systems can be leveraged while still taking advantage of theoretical techniques that have been developed over the past several decades. By capitalizing on average-case optimizations, a general-purpose system is able to support a wider spectrum of I/O timing constraints and increase system-wide performance, which includes applications that do not specify timing constraints. This dissertation describes a journey to develop a new approach that indirectly overrides scheduling optimizations when timing constraints might be in jeopardy. The distinguishing characteristic of this approach is that fewer scheduling decisions are required in comparison to the rigid micro-management style typically encountered in RT systems. This flexibility in scheduling provides a practical means to achieve high average-case performance while guaranteeing time constrained I/O service. Therefore, a larger number and broader range of time constrained applications can be supported as compared to existing approaches. Contributions of this dissertation include the following: (1) measurement techniques to obtain workload models amenable to existing RT theoretical analysis, (2) throttling approach to indirectly control built-in hardware scheduling optimizations for meeting time constrained I/O service, (3) modified aperiodic scheduling algorithm to reduce the effect of CPU context switching overhead incurred when providing I/O service, and (4) novel scheduling algorithm and subsequent analysis to balance throughput and response time depending on requested RT I/O guarantees. These contributions are demonstrated on an actual system using a general-purpose operating system and commodity hardware.
Show less - Date Issued
- 2015
- Identifier
- FSU_migr_etd-9460
- Format
- Thesis
- Title
- Cashtags: Protecting the Input and Display of Sensitive Data.
- Creator
-
Mitchell, Michael J., Wang, An-I Andy, DeBrunner, Linda S. (Linda Sumners), Tyson, Gary Scott, Wang, Zhi, Florida State University, College of Arts and Sciences, Department of...
Show moreMitchell, Michael J., Wang, An-I Andy, DeBrunner, Linda S. (Linda Sumners), Tyson, Gary Scott, Wang, Zhi, Florida State University, College of Arts and Sciences, Department of Computer Science
Show less - Abstract/Description
-
Mobile computing is the new norm. As people feel increasingly comfortable computing in public places such as coffee shops and transportation hubs, the threat of exposing sensitive information increases. While solutions exist to guard the communication channels used by mobile devices, the visual channel remains, to a significant degree, open. Shoulder surfing is becoming a viable threat in a world where users are frequently surrounded by high-power cameras, and where sensitive information from...
Show moreMobile computing is the new norm. As people feel increasingly comfortable computing in public places such as coffee shops and transportation hubs, the threat of exposing sensitive information increases. While solutions exist to guard the communication channels used by mobile devices, the visual channel remains, to a significant degree, open. Shoulder surfing is becoming a viable threat in a world where users are frequently surrounded by high-power cameras, and where sensitive information from recorded images can be extracted with modest computing power. In response, this dissertation presents Cashtags: a system to defend against attacks on mobile devices based on visual observations. The system allows users to access sensitive information in public without the fear of visual leaks. This is accomplished by intercepting sensitive data elements before they are displayed on screen, then replacing them with non-sensitive information. In addition, the system provides a means of computing with sensitive data in a non-observable way. All of this is accomplished while maintaining full functionality and legacy compatibility across applications.
Show less - Date Issued
- 2015
- Identifier
- FSU_migr_etd-9412
- Format
- Thesis
- Title
- Building Trusted Computer Systems via Unified Software Integrity Protection.
- Creator
-
Jenkins, Jonathan, Burmester, Mike, Mio, Washington, Haiduc, Sonia, Liu, Xiuwen, Srinivasan, Ashok, Florida State University, College of Arts and Sciences, Department of...
Show moreJenkins, Jonathan, Burmester, Mike, Mio, Washington, Haiduc, Sonia, Liu, Xiuwen, Srinivasan, Ashok, Florida State University, College of Arts and Sciences, Department of Computer Science
Show less - Abstract/Description
-
The task of protecting software integrity can be approached with a two-part strategy that addresses threats to the static integrity of memory contents and the dynamic integrity of software memory interactions. Although their resultant effects are realized in fundamentally different ways, attacks on either memory contents manipulated by instructions or the operation of software are both damaging and facilitate further attack. The ability to alter the static memory state (programs,...
Show moreThe task of protecting software integrity can be approached with a two-part strategy that addresses threats to the static integrity of memory contents and the dynamic integrity of software memory interactions. Although their resultant effects are realized in fundamentally different ways, attacks on either memory contents manipulated by instructions or the operation of software are both damaging and facilitate further attack. The ability to alter the static memory state (programs, configuration data, etc.) of software opens an attack vector as broad as the capabilities of the attacker. Altering the operation of running programs (e.g. control flow) allows an attacker to divert the results (memory effects) of the target program from those which were intended by the program author. Neither static nor dynamic analyses of integrity are alone sufficient to completely describe the integrity of trusted system software. Further, there is a characteristic facilitation of vulnerabilities between the two classes of software integrity in that common security violations are decomposed as sequences of static and dynamic integrity violations which have enabling relationships. In order to capture and provide a unified software integrity, this work will analyze software integrity and detail techniques to enable protections to be applied by a coherent, systematic frame- work directly to the memory and memory interactions which exhibit software integrity rather than tailored to each member of an evolving set of particular threats.
Show less - Date Issued
- 2015
- Identifier
- FSU_migr_etd-9623
- Format
- Thesis
- Title
- Building an Intelligent Assistant for Digital Forensics.
- Creator
-
Karabiyik, Umit, Aggarwal, Sudhir, Foo, Simon Y., Duan, Zhenhai, Liu, Xiuwen, Florida State University, College of Arts and Sciences, Department of Computer Science
- Abstract/Description
-
Software tools designed for disk analysis play a critical role today in digital forensics investigations. However, these digital forensics tools are often difficult to use, usually task specific, and generally require professionally trained users with IT backgrounds. The relevant tools are also often open source requiring additional technical knowledge and proper configuration. This makes it difficult for investigators without some computer science background to easily conduct the needed disk...
Show moreSoftware tools designed for disk analysis play a critical role today in digital forensics investigations. However, these digital forensics tools are often difficult to use, usually task specific, and generally require professionally trained users with IT backgrounds. The relevant tools are also often open source requiring additional technical knowledge and proper configuration. This makes it difficult for investigators without some computer science background to easily conduct the needed disk analysis. In this dissertation, we present AUDIT, a novel automated disk investigation toolkit that supports investigations conducted by non-expert (in IT and disk technology) and expert investigators. Our system design and implementation of AUDIT intelligently integrates open source tools and guides non-IT professionals while requiring minimal technical knowledge about the disk structures and file systems of the target disk image. We also present a new hierarchical disk investigation model which leads AUDIT to systematically examine the disk in its totality based on its physical and logical structures. AUDIT's capabilities as an intelligent digital assistant are evaluated through a series of experiments comparing it with a human investigator as well as against standard benchmark disk images.
Show less - Date Issued
- 2015
- Identifier
- FSU_migr_etd-9626
- Format
- Thesis
- Title
- Probabilistic Context-Free Grammar Based Password Cracking: Attack, Defense and Applications.
- Creator
-
Yazdi, Shiva Houshmand, Aggarwal, Sudhir, Mio, Washington, Kumar, Piyush, Yuan, Xin, Florida State University, College of Arts and Sciences, Department of Computer Science
- Abstract/Description
-
Passwords are critical for security in many different domains such as social networks, emails, encryption of sensitive data and online banking. Human memorable passwords are thus a key element in the security of such systems. It is important for system administrators to have access to the most powerful and efficient attacks to assess the security of their systems more accurately. The probabilistic context-free grammar technique has been shown to be very effective in password cracking. In this...
Show morePasswords are critical for security in many different domains such as social networks, emails, encryption of sensitive data and online banking. Human memorable passwords are thus a key element in the security of such systems. It is important for system administrators to have access to the most powerful and efficient attacks to assess the security of their systems more accurately. The probabilistic context-free grammar technique has been shown to be very effective in password cracking. In this approach, the system is trained on a set of revealed passwords and a probabilistic context-free grammar is constructed. The grammar is then used to generate guesses in highest probability order, which is the optimal off-line attack. The initial approach, although performing much better than other rule-based password crackers, only considered the simple structures of the passwords. This dissertation explores how classes of new patterns (such as keyboard and multi-word) can be learned in the training phase and can be used to substantially improve the effectiveness of the probabilistic password cracking system. Smoothing functions are used to generate new patterns that were not found in the training set, and new measures are developed to compare and improve both training and attack dictionaries. The results on cracking multiple datasets show that we can achieve up to 55% improvement over the previous system. A new technique is also introduced which creates a grammar that can incorporate any available information about a specific target by giving higher probability values to components that carry this information. This grammar can then help in guessing the user's new password in a timelier manner. Examples of such information can be any old passwords, names of family members or important dates. A new algorithm is described that given two old passwords determines the transformations between them and uses the information in predicting user's new password. A password checker is also introduced that analyzes the strength of user chosen passwords by estimating the probability of the passwords being cracked, and helps users in selecting stronger passwords. The system modifies the weak password slightly and suggests a new stronger password to the user. By dynamically updating the grammar we make sure that the guessing entropy increases and the suggested passwords thus remain resistant to various attacks. New results are presented that show how accurate the system is in determining weak and strong passwords. Another application of the probabilistic context-free grammar technique is also introduced that identifies stored passwords on disks and media. The disk is examined for potential password strings and a set of filtering algorithms are developed that winnow down the space of tokens to a more manageable set. The probabilistic context-free grammar is then used to assign probabilities to the remaining tokens to distinguish strings that are more likely to be passwords. In one of the tests, a set of 2,000 potential passwords winnowed down from 49 million tokens is returned which identifies 60% of the actual passwords.
Show less - Date Issued
- 2015
- Identifier
- FSU_migr_etd-9615
- Format
- Thesis