Current Search: Research Repository (x) » * (x) » Thesis (x) » Department of Computer Science (x)
Search results
Pages
- Title
- The Representation of Association Semantics with Annotations in a Biodiversity Informatics System.
- Creator
-
Gaitros, Daivd A., Riccardi, Greg, Ronquist, Fredrik, Engelen, Robert van, Srinivasan, Ashok, Department of Computer Science, Florida State University
- Abstract/Description
-
A specialized variation of associations for biodiversity data is defined and developed that makes the capture and discovery of information about biological images easier and more efficient. Biodiversity is the study of the diversity of plants and animals within a given region. Storing, understanding, and retrieving biodiversity data is a complex problem. Biodiversity experts disagree on the structure and the basic ontologies. Much of the knowledge on this subject is contained in private...
Show moreA specialized variation of associations for biodiversity data is defined and developed that makes the capture and discovery of information about biological images easier and more efficient. Biodiversity is the study of the diversity of plants and animals within a given region. Storing, understanding, and retrieving biodiversity data is a complex problem. Biodiversity experts disagree on the structure and the basic ontologies. Much of the knowledge on this subject is contained in private collections, paper notebooks, and the minds biologists. Collaboration among scientists is still problematic because of the logistics involved in sharing collections. This research adds value to image repositories by collecting and publishing semantically rich user specified associations among images and other objects. Current database and annotation techniques rely on structured data sets and ontologies to make storing, associating, and retrieving data efficient and reliable. A problem with biodiversity data is that the information is usually stored as ad-hoc text associated with non-standardized schemas and ontologies. This research developed a method that allows the storage of ad-hoc semantic associations through a complex relationship of working sets, phylogenetic character states, and image annotations. MorphBank is a collaborative research project supported by an NSF BDI grant (0446224 - $2,249,530.00) titled "Web Image Database Technology for Comparative Morphology and Biodiversity Research". MorphBank is an on-line museum-quality collection of biological images that facilitates the collaboration of biologists from around the world. This research proves the viability of using association semantics through annotations of biodiversity informatics for storing and discovery of new information.
Show less - Date Issued
- 2007
- Identifier
- FSU_migr_etd-0437
- Format
- Thesis
- Title
- Evaluating Urban Deployment Scenarios for Vehicular Wireless Networks.
- Creator
-
Potnis, Niranjan, Gopalan, Kartik, Wang, An-I Andy, Duan, Zhenhai, Department of Computer Science, Florida State University
- Abstract/Description
-
Vehicular wireless networks are gaining commercial interest. Mobile connectivity, road safety, and traffic congestion management are some applications that have arisen with this networking paradigm. Existing research primarily focuses on developing mobility models and evaluating routing protocols in ideal open-field environments. It provides limited information of whether vehicular networks can be deployed in an urban setting. This thesis evaluates the practicality of deployment scenarios for...
Show moreVehicular wireless networks are gaining commercial interest. Mobile connectivity, road safety, and traffic congestion management are some applications that have arisen with this networking paradigm. Existing research primarily focuses on developing mobility models and evaluating routing protocols in ideal open-field environments. It provides limited information of whether vehicular networks can be deployed in an urban setting. This thesis evaluates the practicality of deployment scenarios for a vehicular ad hoc network with a wireless mesh infrastructure support. The deployment scenarios include: (1) a mesh-enhanced peer-to-peer ad hoc routing deployment model where both the mobile nodes and static wireless infrastructure nodes participate in routing, (2) a mesh-enhanced infrastructural routing deployment model where only the static wireless infrastructure nodes participate in routing and (3) a scenario where static wireless infrastructure nodes in deployments (1) and (2) have the ability to communicate over multiple wireless channels. These deployment scenarios are evaluated with a mobility model that restricts the movement of vehicles to street boundaries based on real world maps and imposes simple traffic rules. This study also proposes a method of capturing the effect of obstacles on wireless communication based on empirical experiments in urban environments. The results indicate that (1) the mesh-enhanced infrastructural routing deployment yields significantly better performance compared to mesh enhanced peer-to-peer ad hoc routing deployment; (2) in the mesh-enhanced infrastructural routing deployment scenario increasing the density of infrastructure nodes is beneficial while increasing the density of mobile nodes has no significant effect; (3) in the mesh-enhanced peer-to-peer ad hoc routing deployment scenario, higher density of infrastructure nodes as well as mobile nodes can lead to decreased performance; (4) using multiple channels of communication on infrastructure nodes yields highly increased performance; and (5) the effect of obstacles could be represented in simulations through parameters, which could be set based on empirical experiments.
Show less - Date Issued
- 2006
- Identifier
- FSU_migr_etd-0465
- Format
- Thesis
- Title
- Bcq a Bin-Based Core Stateless Packet Scheduler for Scalable and Flexible Support of Guaranteed Services.
- Creator
-
Purnachandra, Karthik P., Duan, Zhenhai, Yuan, Xin, Gopalan, Kartik, Department of Computer Science, Florida State University
- Abstract/Description
-
IP Networks have become an integral part of our daily lives. As we become more dependent on this technology, we realize the importance and use of networks that can be configured to cater to various classes of services and users. Given the potential scalability in providing Quality of Services (QoS), core-stateless packet scheduling algorithms have attracted lot of attention in recent years. Unlike traditional stateful packet schedulers that require routers to maintain per-flow state and...
Show moreIP Networks have become an integral part of our daily lives. As we become more dependent on this technology, we realize the importance and use of networks that can be configured to cater to various classes of services and users. Given the potential scalability in providing Quality of Services (QoS), core-stateless packet scheduling algorithms have attracted lot of attention in recent years. Unlike traditional stateful packet schedulers that require routers to maintain per-flow state and perform per-flow operations, core-stateless packet schedulers service packets based on some state carried in packet headers (such as reservation rate of a flow), and as a consequence, no per-flow state needs to be maintained at core routers, and no per-flow operations performed, which significantly reduce the complexity and improve the scalability of the packet scheduling algorithms. On the other hand, although core-stateless packet schedulers remove the requirement of per-flow state and operations, they aim to emulate the scheduling operations of the corresponding stateful packet schedulers. An important implication of this emulation is that they need to sort packets according to the control state carried in the packet headers and service packets in that order. This sorting operation can be quite expensive when the packet queue is long, which may not be acceptable in high-speed backbone networks. In this thesis, we present a bin-based core-stateless packet scheduling algorithm, BCQ, to overcome this problem. Like other core-stateless packet scheduling algorithms, BCQ does not require core routers to maintain per-flow state and perform per-flow operations. It schedules packets based on the notion of virtual time stamps. Virtual time stamps are computed using only some control state that can be carried in packet headers (and a few constant parameters of the scheduler). However, unlike current core-state packet scheduling algorithm, a BCQ scheduler maintain a number of packet bins, each representing a range of virtual times. Arriving packets at a BCQ scheduler are classified into the packet bins maintained by the BCQ, based on the virtual time stamps of the packets. Bins are serviced according to the range of virtual times they represent, packets in bins with earlier virtual times are serviced first. Packets within each bin are serviced in FIFO order. We formally present the BCQ scheduler in this thesis and conduct simulations to study its performance. Our simulation results show that BCQ is a scalable and flexible packet scheduling algorithm. By controlling the size of bins (therefore the cost of BCQ), BCQ can achieve different desirable performances. For example, when the bin size is sufficient large, all arriving packets will be falling in one bin, and no packet sorting is conducted (BCQ becomes a FIFO scheduler). On the other hand, as we gradually decrease the bin size, BCQ can provide different QoS performance (at greater cost). When the bin size is sufficient small, BCQ can provide the same end-to-end delay performance as other core-stateless schedulers.
Show less - Date Issued
- 2005
- Identifier
- FSU_migr_etd-0486
- Format
- Thesis
- Title
- Deconstruction and Analysis of Email Messages.
- Creator
-
Zhu, Zhenghui, Aggarwal, Sudhir, Duan, Zhenhai, Medeiros, Breno de, Department of Computer Science, Florida State University
- Abstract/Description
-
Phishing scams have grown in frequency and developed in sophistication, and in recent years emails have been misused by scammers to frequently launch criminal attacks. By using phishing emails, scammers can make money in a very short time and generally avoid prosecution. Although it is typically easy for them to implement fraudulent plans with little cost, it is normally hard for law enforcement to catch them. On the other hand, victims can often face severe property loss or loss due to...
Show morePhishing scams have grown in frequency and developed in sophistication, and in recent years emails have been misused by scammers to frequently launch criminal attacks. By using phishing emails, scammers can make money in a very short time and generally avoid prosecution. Although it is typically easy for them to implement fraudulent plans with little cost, it is normally hard for law enforcement to catch them. On the other hand, victims can often face severe property loss or loss due to identity theft. Research focusing on detecting and preventing phishing attacks has thus become a hot topic in the area of computer and network security and a variety of tools have been developed to address aspects of this problem. However, there is currently not much software that can be used to detect and analyze phishing crimes efficiently. When investigating incidents of phishing and the related problem of identity theft, law enforcement investigators need to spend a lot of time and effort but they often get only few clues or results. We have developed the Undercover Multipurpose Anti-Spoofing Kit (UnMASK) to help solve this problem. This thesis presents the idea and the design of the deconstruction and analysis of email messages, which is used in UnMASK to help law enforcement in investigating and prosecuting email based crimes. It addresses the following problems: how can we parse a raw email message and find the information for investigation? What kind of information can we gather from the Internet? And which UNIX tools can be used for our investigation? In contrast to other work in this area, this research comprehensively considers exploits in phishing emails and defines a well-provided raw email parser for law enforcement investigations. And we also design and implement a new protocol used in the UNIX tool system. It not only tries to identify suspicious emails, but also emphasizes the gathering of evidence of crime. To the best of our knowledge, UnMASK is the first system that can automatically deconstruct email messages and present related forensic information in a convenient format to law enforcement. Test results show that the parser and the UNIX tool system of UnMASK are stable and useful. It can correctly extract information that law enforcement officers want to check in raw emails and it also correctly gathers information from the Internet. It generally takes a couple of minutes for our system to complete the report for one raw email message. Compared to the hours investigators spent to do the same work, our system greatly improves their efficiency
Show less - Date Issued
- 2007
- Identifier
- FSU_migr_etd-0515
- Format
- Thesis
- Title
- Mobile Agent Protection with Data Encapsulation and Execution Tracing.
- Creator
-
Suen, Anna, Yasinsac, Alec, Burmester, Mike, Hawkes, Lois, Department of Computer Science, Florida State University
- Abstract/Description
-
Mobile agent systems provide a new method for computer communication. A mobile agent can migrate from platform to platform, performing a task or computation for its originator. Mobile agents are a promising new technology; however, there exist many security issues that need to be addressed. Security issues consist of protecting the agent platform and protecting the mobile agent. The toughest task is protecting the mobile agent, who is subject to attacks from the platform it is operating on....
Show moreMobile agent systems provide a new method for computer communication. A mobile agent can migrate from platform to platform, performing a task or computation for its originator. Mobile agents are a promising new technology; however, there exist many security issues that need to be addressed. Security issues consist of protecting the agent platform and protecting the mobile agent. The toughest task is protecting the mobile agent, who is subject to attacks from the platform it is operating on. This thesis is concerned with protecting a mobile agent who collects data on behalf of its originator. A new mobile agent protection protocol, the data encapsulation protocol, is presented in this thesis.
Show less - Date Issued
- 2003
- Identifier
- FSU_migr_etd-0399
- Format
- Thesis
- Title
- Reducing the WCET of Applications on Low End Embedded Systems.
- Creator
-
Zhao, Wankang, Whalley, David, Srivastava, Anuj, Baker, Theodore P., Engelen, Robert A. van, Gallivan, Kyle, Department of Computer Science, Florida State University
- Abstract/Description
-
Applications in embedded systems often need to meet specified timing constraints. It is advantageous to not only calculate the Worst-Case Execution Time (WCET) of an application, but to also perform transformations that attempt to reduce the WCET, since an application with a lower WCET will be less likely to violate its timing constraints. A compiler has been integrated with a timing analyzer to obtain the WCET of a program on demand during compilation. This environment is used to investigate...
Show moreApplications in embedded systems often need to meet specified timing constraints. It is advantageous to not only calculate the Worst-Case Execution Time (WCET) of an application, but to also perform transformations that attempt to reduce the WCET, since an application with a lower WCET will be less likely to violate its timing constraints. A compiler has been integrated with a timing analyzer to obtain the WCET of a program on demand during compilation. This environment is used to investigate three different types of compiler optimization techniques to reduce WCET. First, an interactive compilation system has been developed that allows a user to interact with a compiler and get feedback regarding the WCET. In addition, a genetic algorithm is used to automatically search for an effective optimization phase sequence to reduce the WCET. Second, a WCET code positioning optimization has been investigated that uses worst-case path information to reorder basic blocks so that the branch penalties can be reduced in the worst-case path. Third, WCET path optimizations, similar to frequent path optimizations, are used to reduce the WCET. There are several contributions to this work. To the best of our knowledge, this is the first compiler that interacts with a timing analyzer to use WCET predictions during the compilation of applications. The dissertation demonstrates that a genetic algorithm search can find an optimization sequence that simultaneously improves both WCET and code size. New compiler optimizations have been developed that use WC path information from a timing analyzer. The results show that the WCET code positioning algorithms typically find the optimal layout of the basic blocks with the minimal WCET. It is also shown that frequent path optimizations can be applied on WC paths using worst-case path information from a timing analyzer to reduce WCET. These new compiler optimizations described in this dissertation not only significantly reduce WCET, but also are completely automatic.
Show less - Date Issued
- 2005
- Identifier
- FSU_migr_etd-0528
- Format
- Thesis
- Title
- Application Configurable Processors.
- Creator
-
Zimmer, Christopher J., Whalley, David, Tyson, Gary, Engelen, Robert van, Department of Computer Science, Florida State University
- Abstract/Description
-
As the complexity requirements for embedded applications increase, the performance demands of embedded compilers also increase. Compiler optimizations, such as software pipelining and recurrence elimination, can significantly reduce execution time for applications, but these transformations require the use of additional registers to hold data values across one or more loop iterations. Compilers for embedded systems have difficulty exploiting these optimizations since they typically do not...
Show moreAs the complexity requirements for embedded applications increase, the performance demands of embedded compilers also increase. Compiler optimizations, such as software pipelining and recurrence elimination, can significantly reduce execution time for applications, but these transformations require the use of additional registers to hold data values across one or more loop iterations. Compilers for embedded systems have difficulty exploiting these optimizations since they typically do not have enough registers on an embedded processor to be able to apply the transformations. In this paper, we evaluate a new application configurable processor utilizing several different register structures which can enable these optimizations without increasing the architecturally addressable register storage requirements. Using this approach can lead to an improved execution time through enabled optimizations and reduced register pressure for embedded architectures.
Show less - Date Issued
- 2006
- Identifier
- FSU_migr_etd-0494
- Format
- Thesis
- Title
- Optimal Linear Features for Content Based Image Retrieval and Applications.
- Creator
-
Zhu, Yuhua, Liu, Xiuwen, Patrangenaru, Victor, Li, Feifei, Mascagni, Michael, Kumar, Piyush, Mio, Washington, Department of Computer Science, Florida State University
- Abstract/Description
-
Since the number of digital images is growing explosively, content based image retrieval becomes an active research area to automatically index and retrieve images based on their semantic features and visual appearance. Content based image retrieval (CBIR) research is largely concentrating on two topics due to their fundamental importance: (1) similarity of images that depends on the feature representation and feature similarity function; (2)machine learning algorithms to enhance retrieval...
Show moreSince the number of digital images is growing explosively, content based image retrieval becomes an active research area to automatically index and retrieve images based on their semantic features and visual appearance. Content based image retrieval (CBIR) research is largely concentrating on two topics due to their fundamental importance: (1) similarity of images that depends on the feature representation and feature similarity function; (2)machine learning algorithms to enhance retrieval results through adaptively improving classification results and similarity metrics. Color histogram is one of the most commonly used features because it provides important color distribution information of images and is easy to calculate. However, color histogram ignores spatial information which is also important for discriminating for spatial patterns. We propose a new type of features called spetral histogram (SH) features to include spatial information of images by combining local patterns through filters and global features through histograms. Spetral histogram features are obtained by concatenating histograms of image spectral components associated with a bank of filters; it has been shown that they provide a unified representation for modeling textures, faces, and other images. Through experiments, we demonstrate their effectiveness for CBIR using a benchmark dataset. In order to alleviate the sensitivity to scaling, we propose to use "characteristic scale" to obtain intrinsic SH features that are invariant to changes in scale. In order to deal with domain specific images such as "images containing cats", we propose a new shape feature called gradient curve. The gradient curve feature combined with histogram of gradient (HOG) along edge fragment patches is shown to be effective in cat head detection. We develop a new machine learning algorithm called Optimal Factor Analysis (OFA), which is designed to learn low-dimensional representations that optimize discrimination based on the nearest neighbor classifier using Euclidean distances. The method is applied to content-based image categorization and retrieval using SH features. We have achieved significantly better retrieval results on a benchmark dataset than some existing methods. Then we also explore the possibility of improving classification and retrieval result by applying OFA with respect to the metrics derived from cross-correlation of spectral histograms. Considering the large amount of unlabeled data in real world applications, we propose a new semi-supervised learning algorithm named Transductive Optimal Component Analysis (Transductive OCA); it utilizes unlabeled data to learn optimal linear representations by incorporating an additional term that prefers representations with large "margins" when classifying unlabeled data in the nearest classifier sense. We have achieved improvements on face recognition applications using Transductive OCA.
Show less - Date Issued
- 2010
- Identifier
- FSU_migr_etd-0512
- Format
- Thesis
- Title
- PPDA: Privacy Preserving Data Aggregation in Wireless Sensor Networks.
- Creator
-
Polite, Khandys A., Yasinsac, Alec, Burmester, Mike, Riccardi, Greg, Department of Computer Science, Florida State University
- Abstract/Description
-
Wireless sensor networks are undoubtedly one of the largest growing types of networks today. Much research has been done to make these networks operate more efficiently including the application of data aggregation. Recently, more research has been done on the security of wireless sensor networks using data aggregation. In this thesis, we discuss a method in which data aggregation can be performed securely by allowing a sensor network to aggregate encrypted data without first decrypting it.
- Date Issued
- 2004
- Identifier
- FSU_migr_etd-0633
- Format
- Thesis
- Title
- The Design, Implementation, and Evaluation of a Reliable Multicast Protocol for Ethernet Switched Networks.
- Creator
-
Ding, Shiling, Yuan, Xin, Liu, Xiuwen, Engelen, Robert van, Department of Computer Science, Florida State University
- Abstract/Description
-
Recent advances in multicasting present new opportunities for improving communication performance for clusters of workstations. The standard IP multicast, however, only supports unreliable multicast, which is diffcult to use for building high level message passing routines. Thus, reliable multicast primitives must be implemented over the standard IP multicast to facilitate the use of multicast for high performance communication on clussters of workstations. In this paper, we present the...
Show moreRecent advances in multicasting present new opportunities for improving communication performance for clusters of workstations. The standard IP multicast, however, only supports unreliable multicast, which is diffcult to use for building high level message passing routines. Thus, reliable multicast primitives must be implemented over the standard IP multicast to facilitate the use of multicast for high performance communication on clussters of workstations. In this paper, we present the design, implementation, and evaluation of a reliable multicast protocol, called M-ary Tree-based Reliable Multicast Protocol(MTRMP), that we develop for efficient reliable multicast on Ethernet switched clusters. MTRMP eliminates the ACK-implosion problem and achieves scalability by organizing receivers in a logical tree structure. To achieve high throughput, MTRMP distributes the error recovery task to receivers and allows the sender to move ahead without ensuring that all receivers receive a packet. The results of our evaluation show that MTRMP performs better than other existing reliable multicast protocols on Ethernet switched networks.
Show less - Date Issued
- 2003
- Identifier
- FSU_migr_etd-0732
- Format
- Thesis
- Title
- Effective Exploitation of a Large Data Register File.
- Creator
-
Searles, Mark C., Whalley, David, Tyson, Gary, Yuan, Xin, Department of Computer Science, Florida State University
- Abstract/Description
-
As the gap between CPU speed and memory speed widens, it is appropriate to investigate alternative storage systems. One approach is to use a large data register file. Registers, in general, offer several advantages when accessing data, including: faster access time, accessing multiple values in a single cycle, reduced power consumption, and small indices. Traditionally, registers only have been used to hold the values of scalar variables and temporaries; this necessarily excludes global...
Show moreAs the gap between CPU speed and memory speed widens, it is appropriate to investigate alternative storage systems. One approach is to use a large data register file. Registers, in general, offer several advantages when accessing data, including: faster access time, accessing multiple values in a single cycle, reduced power consumption, and small indices. Traditionally, registers only have been used to hold the values of scalar variables and temporaries; this necessarily excludes global structures and in particular arrays, which tend to exhibit high spatial locality. Although large register files have been explored, prior studies did not resolve complexities that limited their usefulness. In this thesis, we present a large data register file, which employs block movement of registers for efficient access and is able to support composite data structures, such as arrays and structs. The performance benefits realized – from this approach – include access to data values earlier in the pipeline, removal of many loads and stores, decreased contention within the data cache, and decreased energy consumption.
Show less - Date Issued
- 2006
- Identifier
- FSU_migr_etd-0288
- Format
- Thesis
- Title
- Developing a Bioinformatics Utility Belt to Eliminate Search Redundancy from the Ever-Growing Databases.
- Creator
-
Taylor, Misha, Engelen, Robert van, Swofford, David, Baker, Theodore, Thompson, Steven M., Department of Computer Science, Florida State University
- Abstract/Description
-
Biological databases are growing at an exponential rate. Designing algorithms to deal with the inherent redundancy in these databases which can cope with the overwhelming amount of data returned from similarity searches is an active area of research. This paper presents an overview of a real-world problem related to biological database searching, outlining how a human expert solves this problem. Then, several bioinformatics approaches are presented from the literature, forming a "utility belt...
Show moreBiological databases are growing at an exponential rate. Designing algorithms to deal with the inherent redundancy in these databases which can cope with the overwhelming amount of data returned from similarity searches is an active area of research. This paper presents an overview of a real-world problem related to biological database searching, outlining how a human expert solves this problem. Then, several bioinformatics approaches are presented from the literature, forming a "utility belt" which might be used to solve the problem computationally.
Show less - Date Issued
- 2003
- Identifier
- FSU_migr_etd-0345
- Format
- Thesis
- Title
- Secure Real-Time Conversations.
- Creator
-
Azoum, Shadi S., Burmester, Mike, Yasinsac, Alec, Aggarwal, Sudhir, Department of Computer Science, Florida State University
- Abstract/Description
-
Instant messaging has been and still is a revolutionary technology, bringing people in communication with each other faster and easier. Its characteristic as sending messaging in real-time make it even more appealing than e-mail itself. It is of no surprise then that it is a popular application found not only in personal computers, but portable and mobile devices. While it contains rich and exciting features such as the ability to monitor the status of friends or to send other forms of data...
Show moreInstant messaging has been and still is a revolutionary technology, bringing people in communication with each other faster and easier. Its characteristic as sending messaging in real-time make it even more appealing than e-mail itself. It is of no surprise then that it is a popular application found not only in personal computers, but portable and mobile devices. While it contains rich and exciting features such as the ability to monitor the status of friends or to send other forms of data besides text, it lacks one important security feature: confidentiality. Messages that are transferred are not protected in any way. With the availability of network sniffers and related programs that can capture this data, anyone with access to the network can read these messages. In addition, the fact that companies have adopted this technology and employees share confidential information make this an even greater threat. We propose a framework and method that secures and fortifies the instant messaging design. This combines an elliptic curve integrated encryption scheme and an identity-based, centralized public key infrastructure to ensure that privacy is preserved. This thesis provides a thorough overview of the cryptographic concepts necessary to discuss these two powerful components, while a sample implementation verifies its feasibility. In the end, several goals are fulfilled. The first is that the current structure of instant messaging have not changed. Second, this new system handles a range of hardware capabilities, from desktop PCs to PDAs. Finally, when put in action, the system works transparently, making its confidentiality feature a default setting for a popular form of communication.
Show less - Date Issued
- 2008
- Identifier
- FSU_migr_etd-0272
- Format
- Thesis
- Title
- Expert System Ruleset Portability Using the Language Abstraction for Rule-Based Knowledge Systems (LARK) Engine.
- Creator
-
Ayers, Kenneth Lloyd, Lacher, R. C., Schwartz, Daniel G., Stoecklin, Sara F., Department of Computer Science, Florida State University
- Abstract/Description
-
This thesis describes the Language Abstraction for Rule-based Knowledge-systems (LARK) Engine. The goal of this engine is to process various expert system rulesets and generate the required semantics for multiple production systems – thus creating true portability for expert systems such as M.1 and CLIPS. Specifically, LARK provides ruleset translation from Lark Markup Language (LarkML, an XML language defined herein), to CLIPS and M.1 expert system rules, as well as an implementation of...
Show moreThis thesis describes the Language Abstraction for Rule-based Knowledge-systems (LARK) Engine. The goal of this engine is to process various expert system rulesets and generate the required semantics for multiple production systems – thus creating true portability for expert systems such as M.1 and CLIPS. Specifically, LARK provides ruleset translation from Lark Markup Language (LarkML, an XML language defined herein), to CLIPS and M.1 expert system rules, as well as an implementation of rules written in natural language. LARK also demonstrates the ability to parse and convert basic CLIPS and M.1 rules to LarkML. In addition to describing the LARK Engine, this thesis also outlines an overview of significant expert system, UML, and business ruleset portability efforts. Ruleset portability is quickly evolving as the combined efforts of many organizations push the technology forward. Significant ruleset portability efforts include the Production Rule Representation (PRR) as defined by the Object Management Group (OMG), the Rule Interchange Format (RIF) as specified by W3C, the Rule Markup Language (RuleML) Initiative composed of a large group of industry and academia participants, and the Natural Rule Language (NRL), an effort sponsored by SourceForge.
Show less - Date Issued
- 2008
- Identifier
- FSU_migr_etd-0267
- Format
- Thesis
- Title
- Appearance-Based Classification and Recognition Using Spectral Histogram Representations and Hierarchical Learning for OCA.
- Creator
-
Zhang, Qiang, Liu, Xiuwen, Whalley, David, Gallivan, Kyle, Department of Computer Science, Florida State University
- Abstract/Description
-
This thesis is composed of two parts. Part one is on Appearance-Based Classification and Recognition Using Spectral Histogram Representations. We present a unified method for appearance-based applications including texture classification, 2D object recognition, and 3D object recognition using spectral histogram representations. Based on a generative process, the representation is derived by partitioning the frequency domain into small disjoint regions and assuming independence among the...
Show moreThis thesis is composed of two parts. Part one is on Appearance-Based Classification and Recognition Using Spectral Histogram Representations. We present a unified method for appearance-based applications including texture classification, 2D object recognition, and 3D object recognition using spectral histogram representations. Based on a generative process, the representation is derived by partitioning the frequency domain into small disjoint regions and assuming independence among the regions.This give rise to a set of filters and a representation consisting of marginal distribution of those filers responses. We provide generic evidence for its effectiveness in characterizing object appearance through statistical sampling and in classification by visualizing images in the spectral histogram space. We use a multilayer perception as the classifier and propose a selection algorithm by maximizing the performance over training samples. A distinct advantage of the representation is that it can be effectiveness used for different classification and recognition tasks. The claim is supported by experiments and comparisons in texture classification, face recognition, and appearance-based 3D object recognition. The marked improvement over existing methods justifies the effectiveness of the generative process and the derived spectral histogram representation. Part two is on Hierarchical Learning for Optimal Component Analysis. Optimization problems on manifolds such as Grassmann and Stiefel have been a subject of active research recently. However the learning process can be slow when the dimension of data is large. As a learning example on the Grassmann manifold, optimal component analysis (OCA) provides a general subspace formulation and a stochastic optimization algorithm is used to learn optimal bases. In this paper, we propose a technique called hierarchical learning that can reduce the learning time of OCA dramatically. Hierarchical learning decomposes the original optimization problem into several levels according to a specially designed hierarchical organization and the dimension of the data is reduced at each level using a shrinkage matrix. The learning process starts from the lowest level with an arbitrary initial point. The following approach is then applied recursively: (i) optimize the recognition performance in the reduced space using the expanded optimal basis learned from the next lower level as an initial condition, and (ii) expand the optimal subspace to the bigger space in a pre-specified way. By applying this decomposition procedure recursively, a hierarchy of layers is formed. We show that the optimal performance obtained in the reduced space is maintained after the expansion. Therefore, the learning process of each level starts with a good initial point obtained from the next lower level. This speeds up the original algorithm significantly since the learning is performed mainly in reduced spaces and the computational complexity is reduced greatly at each iteration. The effectiveness of the hierarchical learning is illustrated on two popular data-sets, where the computation time is reduced by a factor of about 30 compared to the original algorithm.
Show less - Date Issued
- 2005
- Identifier
- FSU_migr_etd-0544
- Format
- Thesis
- Title
- Methods of Detecting Intrusions in Security Protocols.
- Creator
-
Sherwood, Robert William, Burmester, Mike, Yasinsac, Alec, Hawkes, Lois, Department of Computer Science, Florida State University
- Abstract/Description
-
Since the explosion of computer systems and computer networks within the past decade, e-commerce, online banking, and other "internet" oriented applications have risen exponentially. According to Forrester Research Group, online shopping in the US grew 580% from 1998 to 2000 accounting for more than $45 billion in sales [10]. Online Banking Report states there are over 100 million people participating in online banking worldwide, an increase of 80% since 1984. This number is expected to rise...
Show moreSince the explosion of computer systems and computer networks within the past decade, e-commerce, online banking, and other "internet" oriented applications have risen exponentially. According to Forrester Research Group, online shopping in the US grew 580% from 1998 to 2000 accounting for more than $45 billion in sales [10]. Online Banking Report states there are over 100 million people participating in online banking worldwide, an increase of 80% since 1984. This number is expected to rise to 300 million households by 2012 [3]. These applications rely on secure communications for information passing such as credit card numbers and bank account information. The secure communication is realized through the use of cryptography and security protocols for key exchange, authentication etcetera. These protocols can be attacked, possibly resulting in vital information being compromised. This paper discusses classic methodologies concerning intrusion detection and how they are being applied to security protocols. Three methods are presented for detecting and/or preventing intrusions in security protocols. The first method is a simple method aimed at detecting intrusions from attackers with rudimentary skills. The second method, a modified version of the original model, provides a more formidable defense to the sophisticated attacker. Lastly, this paper discusses the third method, IPSec, and how it provides the best security for detecting intrusions in security protocols. Each method is tested with known attacks and the results are discussed.
Show less - Date Issued
- 2004
- Identifier
- FSU_migr_etd-0315
- Format
- Thesis
- Title
- Improving Monte Carlo Linear Solvers Through Better Iterative Processes.
- Creator
-
Aggarwal, Vikram, Srinivasan, Ashok, Mascagni, Michael, Engelen, Robert van, Department of Computer Science, Florida State University
- Abstract/Description
-
Monte Carlo (MC) linear solvers are fundamentally based on the ability to estimate a matrix-vector product, using a random sampling process. They use the fact that deterministic stationary iterative processes to solve linear systems can be written as sums of a series of matrix-vector products. Replacing the deterministic matrix-vector products with MC estimates yields a MC linear solver. While MC linear solvers have a long history, they did not gain widespread acceptance in the numerical...
Show moreMonte Carlo (MC) linear solvers are fundamentally based on the ability to estimate a matrix-vector product, using a random sampling process. They use the fact that deterministic stationary iterative processes to solve linear systems can be written as sums of a series of matrix-vector products. Replacing the deterministic matrix-vector products with MC estimates yields a MC linear solver. While MC linear solvers have a long history, they did not gain widespread acceptance in the numerical linear algebra community, for the following reasons: (i) their slow convergence, and (ii) the limited class of problems for which they converged. Slow convergence is caused by both, the MC process for estimating the matrix-vector product, and the stationary process underlying the MC technique, while the latter is caused primarily by the stationary iterative process. The MC linear algebra community made significant advances in reducing the errors from slow convergence through better techniques for estimating the matrix-vector product, and also through a variety of variance reduction techniques. However, use of MC linear algebra is still limited, since the techniques use only stationary iterative processes resulting from a diagonal splitting (for example, Jacobi), which have poor convergence properties. The reason for using such splittings is because it is believed that efficient MC implementations of more sophisticated splittings is not feasible. Consequently, little effort has been placed by the MC community on addressing this important issue. In this thesis, we address the issue of improving the iterative process underlying the MC linear solvers. In particular, we demonstrate that the reasons for considering only diagonal splitting is not valid, and show a specific non-diagonal splitting for which an efficient MC implementation is feasible, even though it superficially suffers from the drawbacks for which non-diagonal splittings were not considered by the MC linear algebra community. We also show that conventional techniques to improve deterministic iterative processes, such as the Chebyshev method, show promise in improving MC techniques too. Despite such improvements, we do not expect MC techniques to be competitive with modern deterministic techniques to accurately solve linear systems. However, MC techniques have the advantage that they can obtain approximate solutions fast. For example, an estimate of the solution can be obtained in constant time, independent of the size of the matrix, if we permit a small amount of preprocessing. There are other advantages too, such as the ability to estimate specific components of a solution, and latency and fault tolerance in parallel and distributed environments. There are a variety of applications where fast, approximate, solutions are useful, such as preconditioning, graph partitioning, and information retrieval. Thus MC linear algebra techniques are of relevance to important classes of applications. We demonstrate this by showing its benefits in an application to dynamic load balancing of parallel computations.
Show less - Date Issued
- 2004
- Identifier
- FSU_migr_etd-0017
- Format
- Thesis
- Title
- Enhancing Pattern Classification with Relational Fuzzy Neural Networks and Square BK-Products.
- Creator
-
Davis, Warren L., Kohout, Ladislav J., Meyer-Bäse, Anke, Engelen, Robert van, Lacher, R. Christopher, McDuffie, Ernest L., Department of Computer Science, Florida State University
- Abstract/Description
-
This research presents several important developments in pattern classification using fuzzy neural networks and BK-Square products and presents extensions to max-min fuzzy neural network research. In this research, the max and min operations used in the fuzzy operations are replaced by more general t-norms and co-norms, respectively. In addition, instead of the £ukasiewicz equivalence connective used in network of Reyes-Garcia and Bandler, this research introduces a variety of equivalence...
Show moreThis research presents several important developments in pattern classification using fuzzy neural networks and BK-Square products and presents extensions to max-min fuzzy neural network research. In this research, the max and min operations used in the fuzzy operations are replaced by more general t-norms and co-norms, respectively. In addition, instead of the £ukasiewicz equivalence connective used in network of Reyes-Garcia and Bandler, this research introduces a variety of equivalence connectives. A new software tool was developed specifically for this research, allowing for greater experimental flexibility, as well as some interesting options that allow greater exploitation of the merits of the relational BK-square network. The effectiveness of this classifier is explored in the domain of phoneme recognition, taxonomic classification, and diabetes diagnosis. This research finds that the variance of fuzzy operations in equivalence and implication formulae, in complete divergence from classical composition, produces drastically different performance within this classifier. Techniques are presented that select effective fuzzy operation combinations. In addition, this classifier is shown to be effective at feature selection by using a technique which usually would be impractical with standard neural networks, but is made practical through the unique nature of this classifier.
Show less - Date Issued
- 2006
- Identifier
- FSU_migr_etd-0057
- Format
- Thesis
- Title
- Instruction Caching in Multithreading Processors Using Guarantees.
- Creator
-
Gavin, Peter Brendan, Tyson, Gary, Whalley, David, Yuan, Xin, Department of Computer Science, Florida State University
- Abstract/Description
-
The OpenSPARC T1 is a multithreading processor developed and open sourced by Sun Microsystems (now Oracle). This paper presents an implementation of the low-power Tagless-Hit Instruction Cache (TH-IC) for the T1, after adapting it to the multithreading architecture found in that processor. The TH-IC eliminates the need for many instruction cache and ITLB accesses, by guaranteeing that accesses within a much smaller L0-style cache will hit. The OpenSPARC T1 uses a 16KB, 4-way set associative...
Show moreThe OpenSPARC T1 is a multithreading processor developed and open sourced by Sun Microsystems (now Oracle). This paper presents an implementation of the low-power Tagless-Hit Instruction Cache (TH-IC) for the T1, after adapting it to the multithreading architecture found in that processor. The TH-IC eliminates the need for many instruction cache and ITLB accesses, by guaranteeing that accesses within a much smaller L0-style cache will hit. The OpenSPARC T1 uses a 16KB, 4-way set associative instruction, and a 64-entry fully associative ITLB. The addition of the TH-IC eliminates approximately 75% of accesses to these structures, instead processing the fetch directly from a much smaller 128 byte data array. Adding the TH-IC to the T1 also demonstrates that even already power efficient processors can be made more efficient using this technique.
Show less - Date Issued
- 2010
- Identifier
- FSU_migr_etd-0133
- Format
- Thesis
- Title
- Formal Security Evaluation of Ad Hoc Routing Protocols.
- Creator
-
Andel, Todd R., Yasinsac, Alec, Kazmer, Michelle, Aggarwal, Sudhir, Medeiros, Breno de, Tyson, Gary, Department of Computer Science, Florida State University
- Abstract/Description
-
Research into routing protocol development for mobile ad hoc networks has been a significant undertaking since the late 1990's. Secure routing protocols for mobile ad hoc networks provide the necessary functionality for proper network operation. If the underlying routing protocol cannot be trusted to follow the protocol operations, additional trust layers cannot be obtained. For instance, authentication between nodes is meaningless without a trusted underlying route. Security analysis...
Show moreResearch into routing protocol development for mobile ad hoc networks has been a significant undertaking since the late 1990's. Secure routing protocols for mobile ad hoc networks provide the necessary functionality for proper network operation. If the underlying routing protocol cannot be trusted to follow the protocol operations, additional trust layers cannot be obtained. For instance, authentication between nodes is meaningless without a trusted underlying route. Security analysis procedures to formally evaluate these developing protocols have been significantly lagging, resulting in unstructured security analysis approaches and numerous secure ad hoc routing protocols that can easily be broken. Evaluation techniques to analyze security properties in ad hoc routing protocols generally rely on manual, non-exhaustive approaches. Non-exhaustive analysis techniques may conclude a protocol is secure, while in reality the protocol may contain unapparent or subtle flaws. Using formalized exhaustive evaluation techniques to analyze security properties increases protocol confidence. Intertwined to the security evaluation process is the threat model chosen to form the analysis. Threat models drive analysis capabilities, affecting how we evaluate trust. Current attacker threat models limit the results obtained during protocol security analysis over ad hoc routing protocols. Developing a proper threat model to evaluate security properties in mobile ad hoc routing protocols presents a significant challenge. If the attacker strength is too weak, we miss vital security flaws. If the attacker strength is too strong, we cannot identify the minimum required attacker capabilities needed to break the routing protocol. To solve these problems, we contribute to the field in the following ways: Adaptive Threat Modeling. We develop an adaptive threat model to evaluate route discovery attacks against ad hoc routing protocols. Adaptive threat modeling enables us to evaluate trust in the ad hoc routing process and allows us to identify minimum requirements an attacker needs to break a given routing protocol. Automated Security Evaluation. We develop an automated evaluation process to analyze security properties in the route discovery phase for on-demand source routing protocols. Using the automated security evaluation process, we are able to produce and analyze all topologies for a given network size. The individual network topologies are fed into the SPIN model checker to exhaustively evaluate protocol models against an attacker attempting to corrupt the route discovery process. Our contributions provide the first automated exhaustive analysis approach to evaluate ad hoc on-demand source routing protocols.
Show less - Date Issued
- 2007
- Identifier
- FSU_migr_etd-0192
- Format
- Thesis
- Title
- Topology Aggregation for Networks with Two Additive Metrics.
- Creator
-
Ansari, Almas, Yuan, Xin, Hawkes, Lois, Aggarwal, Sudhir, Department of Computer Science, Florida State University
- Abstract/Description
-
Topology Aggregation is concerned about summarizing a network domain in a concise manner. This thesis deals with topology aggregation for networks with two additive metrics. Summarizing such a network domain is difficult for a number of reasons. First, computing paths between two nodes with two additive metrics is NP-Hard. Second, it is unclear how the quality of two paths with two additive metrics can be compared, which leads to the difficulty in determining the quality of topology...
Show moreTopology Aggregation is concerned about summarizing a network domain in a concise manner. This thesis deals with topology aggregation for networks with two additive metrics. Summarizing such a network domain is difficult for a number of reasons. First, computing paths between two nodes with two additive metrics is NP-Hard. Second, it is unclear how the quality of two paths with two additive metrics can be compared, which leads to the difficulty in determining the quality of topology aggregation schemes. In this thesis, we develop a method to evaluate the quality of aggregation schemes for networks with two additive metrics, propose to compute the full mesh representation of a domain using the limited path heuristic and demonstrate that the information carried in the full mesh representation is very close to that in the original network representation. We also develop and study a number of schemes to reduce the full mesh representation to the spanning tree based representation. The performance of the proposed schemes is studied through simulation. The results show that minimum spanning tree based schemes yield reasonable performance.
Show less - Date Issued
- 2004
- Identifier
- FSU_migr_etd-0030
- Format
- Thesis
- Title
- Improving the Effectiveness of Performance Analysis for HPC by Using Appropriate Modeling and Simulation Schemes.
- Creator
-
Tong, Zhou, Yuan, Xin, Ke, Fengfeng, Zhang, Zhenghao, Haiduc, Sonia, Pakin, Scott D., Florida State University, College of Arts and Sciences, Department of Computer Science
- Abstract/Description
-
Performance modeling and simulation of parallel applications are critical performance analysis techniques in High Performance Computing (HPC). Efficient and accurate performance modeling and simulation can aid the tuning and optimization of current systems as well as the design of future HPC systems. As the HPC applications and systems increase in size, efficient and accurate performance modeling and simulation of parallel applications is becoming increasingly challenging. In general,...
Show morePerformance modeling and simulation of parallel applications are critical performance analysis techniques in High Performance Computing (HPC). Efficient and accurate performance modeling and simulation can aid the tuning and optimization of current systems as well as the design of future HPC systems. As the HPC applications and systems increase in size, efficient and accurate performance modeling and simulation of parallel applications is becoming increasingly challenging. In general, simulation yields higher accuracy at the cost of high simulation time in comparison to modeling. This dissertation aims at developing effective performance analysis techniques for the next generation HPC systems. Since modeling is often orders of magnitude faster than simulation, the idea is to separate HPC applications into two types: 1) the ones that modeling can produce similar performance results as simulation and 2) the ones that simulation can result in more meaningful information about the application performance than modeling. By using modeling for the first type of applications and simulation for the rest of applications, the efficiency of performance analysis can be significantly improved. The contribution of this thesis is three-fold. First, a comprehensive study of the performance and accuracy trade-offs between modeling and simulation on a wide range of HPC applications is performed. The results indicate that for the majority of HPC applications, modeling and simulation yield similar performance results. This lays the foundation for improving performance analysis on HPC systems by selecting between modeling and simulation on each application. Second, a scalable and fast classification techniques (MFACT) are developed based on the Lamport's logical clock that can provide fast diagnosis of MPI application performance bottleneck and assist in the processing of application tuning and optimization on current and future HPC systems. MFACT also classifies HPC applications into bandwidth-bound, latency-bound, communication-bound, and computation-bound. Third, built-upon MFACT, for a given system configuration, statistical methods are introduced to classify HPC applications into the two types: the ones that needs simulation and the ones that modeling is sufficient. The classification techniques and tools enable effective performance analysis for future HPC systems and applications without losing accuracy.
Show less - Date Issued
- 2017
- Identifier
- FSU_FALL2017_Tong_fsu_0071E_14074
- Format
- Thesis
- Title
- Game Based Visual-to-Auditory Sensory Substitution Training.
- Creator
-
Marshall, Justin B., Tyson, Gary Scott, Erlebacher, Gordon, Liu, Xiuwen, Ackerman, Margareta, Florida State University, College of Arts and Sciences, Department of Computer Science
- Abstract/Description
-
There has been a great deal of research devoted to computer vision related assistive technologies. Unfortunately, this area of research has not produced many usable solutions. The long cane and the guard dog are still far more useful than most of these devices. Through the push for advanced mobile and gaming systems, new low-cost solutions have become available for building innovative and creative assistive technologies. These technologies have been used for sensory substitution projects that...
Show moreThere has been a great deal of research devoted to computer vision related assistive technologies. Unfortunately, this area of research has not produced many usable solutions. The long cane and the guard dog are still far more useful than most of these devices. Through the push for advanced mobile and gaming systems, new low-cost solutions have become available for building innovative and creative assistive technologies. These technologies have been used for sensory substitution projects that attempt to convert vision into either auditory or tactile stimuli. These projects have reported some degree of measurable success. Most of these projects focused on converting either image brightness or depth into auditory signals. This research was devoted to the design and creation of a video game simulator that was capable of performing research and training for these sensory substitution concepts that converts vision into auditory stimuli. The simulator was used to perform direct comparisons between some of the popular sensory substitution techniques as well as exploring new concepts for conversion. This research of 42 participants tested different techniques for image simplification and discovered that using depth-to-tone sensory substitution may be more usable than brightness-to-tone simulation. The study has shown that using 3D game simulators can be used in lieu of building costly prototypes for testing new sensory substitution concepts.
Show less - Date Issued
- 2015
- Identifier
- FSU_2015fall_Marshall_fsu_0071E_12749
- Format
- Thesis
- Title
- Pyquery: A Search Engine for Python Packages and Modules.
- Creator
-
Imminni, Shiva Krishna, Kumar, Piyush, Haiduc, Sonia, Ackerman, Margareta, Florida State University, College of Arts and Sciences, Department of Computer Science
- Abstract/Description
-
Python Package Index (PyPI) is a repository that hosts all the packages ever developed for the Python community. It hosts thousands of packages from different developers and for the Python community, it is the primary source for downloading and installing packages. It also provides a simple web interface to search for these packages. A direct search on PyPI returns hundreds of packages that are not intuitively ordered, thus making it harder to find the right package. Developers consequently...
Show morePython Package Index (PyPI) is a repository that hosts all the packages ever developed for the Python community. It hosts thousands of packages from different developers and for the Python community, it is the primary source for downloading and installing packages. It also provides a simple web interface to search for these packages. A direct search on PyPI returns hundreds of packages that are not intuitively ordered, thus making it harder to find the right package. Developers consequently resort to mature search engines like Google, Bing or Yahoo which redirect them to the appropriate package homepage at PyPI. Hence, the first task of this thesis is to improve search results for python packages. Secondly, this thesis also attempts to develop a new search engine that allows Python developers to perform a code search targeting python modules. Currently, the existing search engines classify programming languages such that a developer must select a programming language from a list. As a result every time a developer performs a search operation, he or she has to choose Python out of a plethora of programming languages. This thesis seeks to offer a more reliable and dedicated search engine that caters specifically to the Python community and ensures a more efficient way to search for Python packages and modules.
Show less - Date Issued
- 2015
- Identifier
- FSU_2015fall_Imminni_fsu_0071N_12969
- Format
- Thesis
- Title
- A Mechanism for Tracking the Effects of Requirement Changes in Enterprise Software Systems.
- Creator
-
Datta, Subhajit, Engelen, Robert van, Hawkes, Lois, Yasinsac, Alec, Department of Computer Science, Florida State University
- Abstract/Description
-
Managing the effects of changing requirements remains one of the greatest challenges of enterprise software development. The iterative and incremental model provides an expedient framework for addressing such concerns. This thesis proposes a set of metrics – Mutation Index, Component Set, Dependency Index – and a methodology to measure the effects of requirement changes from one iteration to another. To evaluate the effectiveness of the proposed metrics, sample calculations and results from a...
Show moreManaging the effects of changing requirements remains one of the greatest challenges of enterprise software development. The iterative and incremental model provides an expedient framework for addressing such concerns. This thesis proposes a set of metrics – Mutation Index, Component Set, Dependency Index – and a methodology to measure the effects of requirement changes from one iteration to another. To evaluate the effectiveness of the proposed metrics, sample calculations and results from a real life case study are included. Future directions of our work based on this mechanism are also discussed.
Show less - Date Issued
- 2006
- Identifier
- FSU_migr_etd-0828
- Format
- Thesis
- Title
- An Interface for Collaborative Digital Forensics.
- Creator
-
Das, Rajarshi, Aggarwal, Sudhir, Medeiros, Breno de, Duan, Zhenhai, Department of Computer Science, Florida State University
- Abstract/Description
-
This thesis presents a novel interface for collaborative Digital Forensics. The improvement in the process management and remote access apropos of the use of current Digital Forensic tools in the area of Digital Forensics is described in this thesis. The architecture presented, uses current technology and implements standard security procedures. In addition, the development of software modules, elaborated later on in this thesis, makes this architecture secure, portable, robust, reliable,...
Show moreThis thesis presents a novel interface for collaborative Digital Forensics. The improvement in the process management and remote access apropos of the use of current Digital Forensic tools in the area of Digital Forensics is described in this thesis. The architecture presented, uses current technology and implements standard security procedures. In addition, the development of software modules, elaborated later on in this thesis, makes this architecture secure, portable, robust, reliable, scalable and convenient as a solution. Such a solution,presented in this thesis, is not specific to any Digital Forensics tool or operating platform making it a portable architecture. A primary goal of this thesis has been the development of a solution that could support law-enforcement agency needs for remote digital decryption. The interface presented here aims to achieve this goal. The use of two popular Digital Forensic tools and their integration with this interface had led to a fully operational portal with 24X7 digital decryption processing capabilities for agents to use. A secondary goal was to investigate ideas and techniques that could be helpful in the eld of passphrase" generation and recovery. The implementation of certain computational models to support in this research is under way. The interface has been designed with features that would be part of the foundational work of developing new pass phrase breaking software components. Establishing a dedicated setup for the Digital Forensic tools and creating a secure, reliable and user-friendly interface for it, has been a major component of the overall development in creating the portal.
Show less - Date Issued
- 2007
- Identifier
- FSU_migr_etd-0837
- Format
- Thesis
- Title
- Metrics and Techniques to Guide Software Development.
- Creator
-
Datta, Subhajit, Engelen, Robert van, Douglas, Ian, Hawkes, Lois, Baker, Theodore, Schwartz, Daniel, Mascagni, Michael, Department of Computer Science, Florida State University
- Abstract/Description
-
The objective of my doctoral dissertation research is to formulate, implement, and validate metrics and techniques towards perceiving some of the influences on software development, predicting the impact of user initiated changes on a software system, and prescribing guidelines to aid decisions affecting software development. Some of the topics addressed in my dissertation are: Analyzing the extent to which changing requirements affect a system's design, how the delegation of responsibilities...
Show moreThe objective of my doctoral dissertation research is to formulate, implement, and validate metrics and techniques towards perceiving some of the influences on software development, predicting the impact of user initiated changes on a software system, and prescribing guidelines to aid decisions affecting software development. Some of the topics addressed in my dissertation are: Analyzing the extent to which changing requirements affect a system's design, how the delegation of responsibilities to software components can be guided, how Aspect Oriented Programming (AOP) may be combined with Object Oriented Programming (OOP) to best deliver a system's functionality, whether and how characteristics of a system's design are influenced by a outsourced and offshore development. The metrics and techniques developed in my dissertation serve as heuristics across the software development life cycle, helping practitioners evaluate options and take decisions. By way of validation, the metrics and techniques have been applied to more than 10 real life software systems. To facilitate the application of the metrics and techniques, I have led the development of automated tools which can process software development artifacts such as code and Unified Modeling Language (UML) diagrams. The design and implementation of such tools are also discussed in the dissertation.
Show less - Date Issued
- 2009
- Identifier
- FSU_migr_etd-0824
- Format
- Thesis
- Title
- Detection Framework for Phishing Websites.
- Creator
-
Wolff, Marcus, Aggarwal, Sudhir, Duan, Zhenghai, Zhang, Zhenghao, Department of Computer Science, Florida State University
- Abstract/Description
-
This paper discusses a combined and platform-independent solution to detect websites that fake their identity. The approach combines white-listing, black-listing and heuristic strategies to provide an optimal phishing detection ratio against these so-called phishing websites while at the same time making sure that the number of wrongly classified legitimate websites remains as low as possible. For the implementation, a prototype solution was written in platform-independent Java. Practical...
Show moreThis paper discusses a combined and platform-independent solution to detect websites that fake their identity. The approach combines white-listing, black-listing and heuristic strategies to provide an optimal phishing detection ratio against these so-called phishing websites while at the same time making sure that the number of wrongly classified legitimate websites remains as low as possible. For the implementation, a prototype solution was written in platform-independent Java. Practical challenges during the implementation as well as first practical results will be discussed.
Show less - Date Issued
- 2009
- Identifier
- FSU_migr_etd-0879
- Format
- Thesis
- Title
- A Study on Semantic Relation Representations in Neural Word Embeddings.
- Creator
-
Chen, Zhiwei, Liu, Xiuwen, He, Zhe (Professor of Information Studies), Zhao, Peixiang, Florida State University, College of Arts and Sciences, Department of Computer Science
- Abstract/Description
-
Neural network based word embeddings have demonstrated outstanding results in a variety of tasks, and become a standard input for Natural Language Processing (NLP) related deep learning methods. Despite these representations are able to capture semantic regularities in languages, some general questions, e.g., "what kinds of semantic relations do the embeddings represent?" and "how could the semantic relations be retrieved from an embedding?" are not clear and very little relevant work has...
Show moreNeural network based word embeddings have demonstrated outstanding results in a variety of tasks, and become a standard input for Natural Language Processing (NLP) related deep learning methods. Despite these representations are able to capture semantic regularities in languages, some general questions, e.g., "what kinds of semantic relations do the embeddings represent?" and "how could the semantic relations be retrieved from an embedding?" are not clear and very little relevant work has been done. In this study, we propose a new approach to exploring the semantic relations represented in neural embeddings based on WordNet and Unified Medical Language System (UMLS). Our study demonstrates that neural embeddings do prefer some semantic relations and that the neural embeddings also represent diverse semantic relations. Our study also finds that the Named Entity Recognition (NER)-based phrase composition outperforms Word2phrase and the word variants do not affect the performance on analogy and semantic relation tasks.
Show less - Date Issued
- 2017
- Identifier
- FSU_SUMMER2017_Chen_fsu_0071N_14103
- Format
- Thesis
- Title
- Comparing Samos Document Search Performance between Apache Solr and Neo4j.
- Creator
-
Stallard, Adam Preston, Zhao, Peixiang, Smith, Shawn R., Haiduc, Sonia, Nistor, Adrian, Florida State University, College of Arts and Sciences, Department of Computer Science
- Abstract/Description
-
The Distributed Oceanographic Match-Up Service (DOMS) currently under development is a centralized service that allows researchers to easily match in situ and satellite oceanographic data from distributed sources to facilitate satellite calibration, validation, and retrieval algorithm development. The Shipboard Automated Meteorological and Oceanographic System (SAMOS) initiative provides routine access to high-quality marine meteorological and near-surface oceanographic observations from...
Show moreThe Distributed Oceanographic Match-Up Service (DOMS) currently under development is a centralized service that allows researchers to easily match in situ and satellite oceanographic data from distributed sources to facilitate satellite calibration, validation, and retrieval algorithm development. The Shipboard Automated Meteorological and Oceanographic System (SAMOS) initiative provides routine access to high-quality marine meteorological and near-surface oceanographic observations from research vessels. SAMOS is one of several endpoints connected into the DOMS network, providing in-situ data for the match-up service. DOMS in-situ endpoints currently use Apache Solr as a backend search engine on each node in the distributed network. While Solr is a high-performance solution that facilitates creation and maintenance of indexed data, it is limited in the sense that its schema is fixed. The property graph model escapes this limitation by removing any prohibiting requirements on the data model, and permitting relationships between data objects. This paper documents the development of the SAMOS Neo4j property graph database including new search possibilities that take advantage of the property graph model, performance comparisons with Apache Solr, and a vision for graph databases as a storage tool for oceanographic data. The integration of the SAMOS Neo4j graph into DOMS is also described. Various data models are explored including spatial-temporal records from SAMOS added to a time tree using Graph Aware technology. This extension provides callable Java procedures within the CYPHER query language that generate in-graph structures used in data retrieval. Neo4j excels at performing relationship and path-based queries, which challenge relational-SQL databases because they require memory intensive joins due to the limitation of their design. Consider a user who wants to find records over several years, but only for specific months. If a traditional database only stores timestamps, this type of query could be complex and likely prohibitively slow. Using the time tree model in a graph, one can specify a path from the root to the data which restricts resolutions to certain time frames (e.g., months). This query can be executed without joins, unions, or other compute-intensive operations, putting Neo4j at a computational advantage to the SQL database alternative. That said, while this advantage may be useful, it should not be interpreted as an advantage to Solr in the context of DOMS. Solr makes use of Apache Lucene indexing at its core, while Neo4j provides its own native schema indexes. Ultimately they each provide unique solutions for data retrieval that are geared for specific tasks. In the DOMS setting it would appear that Solr is the most suitable option, as there seems to be very limited use cases where Neo4j does outperform Solr. This is primarily because the use case as a subsetting tool does not require the flexibility and path-based queries that graph database tools offer. Rather, DOMS nodes are using high performance indexing structures to quickly filter large amounts of raw data that are not deeply connected, a feature of large data sets where graph queries would indeed become useful.
Show less - Date Issued
- 2017
- Identifier
- FSU_SUMMER2017_Stallard_fsu_0071N_13933
- Format
- Thesis
- Title
- Feistel-Inspired Scrambling Improves the Quality of Linear Congruential Generators.
- Creator
-
Aljahdali, Asia Othman, Mascagni, Michael, Duke, D. W. (Dennis W.), Srinivasan, Ashok (Professor of Computer Science), van Engelen, Robert, Florida State University, College of...
Show moreAljahdali, Asia Othman, Mascagni, Michael, Duke, D. W. (Dennis W.), Srinivasan, Ashok (Professor of Computer Science), van Engelen, Robert, Florida State University, College of Arts and Sciences, Department of Computer Science
Show less - Abstract/Description
-
Pseudorandom number generators (PRNGs) are an essential tool in many areas, including simulation studies of stochastic processes, modeling, randomized algorithms, and games. The performance of any PRNGs depends on the quality of the generated random sequences; they must be generated quickly and have good statistical properties. Several statistical test suites have been developed to evaluate a single stream of random numbers, such as TestU01, DIEHARD, the tests from the SPRNG package, and a...
Show morePseudorandom number generators (PRNGs) are an essential tool in many areas, including simulation studies of stochastic processes, modeling, randomized algorithms, and games. The performance of any PRNGs depends on the quality of the generated random sequences; they must be generated quickly and have good statistical properties. Several statistical test suites have been developed to evaluate a single stream of random numbers, such as TestU01, DIEHARD, the tests from the SPRNG package, and a set of tests designed to evaluate bit sequences developed at NIST. TestU01 provides batteries of test that are sets of the mentioned suites. The predefined batteries are SmallCrush (10 tests, 16 p-values) that runs quickly, Crush (96 tests, 187 p-values) and BigCrush (106 tests, 2254 p-values) batteries that take longer to run. Most pseudorandom generators use recursion to produce sequences of numbers that appear to be random. The linear congruential generator is one of the well-known pseudorandom generators, the next number in the random sequences is determined by the previous one. The recurrences start with a value called the seed. Each time a recurrence starts with the same seed the same sequence is produced. This thesis develops a new pseudorandom number generation scheme that produces random sequences with good statistical properties via scrambling linear congruential generators. The scrambling technique is based on a simplified version of Feistel network, which is a symmetric structure used in the construction of cryptographic block ciphers. The proposed research seeks to improve the quality of the linear congruential generators’ output streams and to break up the regularities existing in the generators.
Show less - Date Issued
- 2017
- Identifier
- FSU_SUMMER2017_Aljahdali_fsu_0071E_13941
- Format
- Thesis
- Title
- Dependency Collapsing in Instruction-Level Parallel Architectures.
- Creator
-
Brunell, Victor J., Whalley, David B., Tyson, Gary Scott, Yuan, Xin, Florida State University, College of Arts and Sciences, Department of Computer Science
- Abstract/Description
-
Processors that employ instruction fusion can improve performance and energy usage beyond traditional processors by collapsing and simultaneously executing dependent instruction chains on the critical path. This paper describes compiler mechanisms that can facilitate and guide instruction fusion in processors built to execute fused instructions. The compiler support discussed in this paper includes compiler annotations to guide fusion, exploring multiple new fusion configurations, and...
Show moreProcessors that employ instruction fusion can improve performance and energy usage beyond traditional processors by collapsing and simultaneously executing dependent instruction chains on the critical path. This paper describes compiler mechanisms that can facilitate and guide instruction fusion in processors built to execute fused instructions. The compiler support discussed in this paper includes compiler annotations to guide fusion, exploring multiple new fusion configurations, and developing scheduling algorithms that effectively select and order fusible instructions. The benefits of providing compiler support for dependent instruction fusion include statically detecting fusible instruction chains without the need for hardware dynamic detection support and improved performance by increasing available parallelism.
Show less - Date Issued
- 2017
- Identifier
- FSU_SUMMER2017_Brunell_fsu_0071N_14109
- Format
- Thesis
- Title
- Matching Physical File Representation to Logical Access Patterns for Better Performance.
- Creator
-
Zhang, Shuanglong, Wang, An-I Andy, Zhang, Jinfeng, Whalley, David B., Zhao, Peixiang, Florida State University, College of Arts and Sciences, Department of Computer Science
- Abstract/Description
-
Over the years, the storage substrate of operating systems has evolved with changing storage devices and workloads [2, 6, 7, 8, 12, 15, 18, 26, 29, 33, 34, 35, 39, 41, 42, 44, 47, 48, 54]. Both academia and industry have devoted significant research effort to the file system component, a critical part of the storage system. A file system directs the underlying device-specific software to perform data reads and writes as well as providing the notion of files to interact with users and...
Show moreOver the years, the storage substrate of operating systems has evolved with changing storage devices and workloads [2, 6, 7, 8, 12, 15, 18, 26, 29, 33, 34, 35, 39, 41, 42, 44, 47, 48, 54]. Both academia and industry have devoted significant research effort to the file system component, a critical part of the storage system. A file system directs the underlying device-specific software to perform data reads and writes as well as providing the notion of files to interact with users and applications. To achieve this, a file system represents logical files internally or physically with data (the file content) and metadata (information required to locate, index, and operate on data). Most file system optimizations assume this one-to-one coupling of logical and physical representations [2, 7, 8, 18, 25, 26, 29, 33, 34, 35, 48]. This dissertation presents the design, implementation, and evaluation of two new systems, which decouple these representations and offer a new class of optimization opportunities not previously possible. First, the Composite-File File System (CFFS) exploits the observation that many files are frequently accessed together. By consolidating related file metadata, performance can be improved by up to 27%. Second, the Fine-grained Journal Store (FJS) exploits the observation that typically only subregions of a metadata entry are updated, but the heavyweight reliability and storage mechanisms then affect the entire metadata entry. This results in many unnecessary metadata writes that harm both the performance and the lifespan of certain storage devices. By focusing on only the updated metadata regions and consolidating storage and reliability mechanisms, the Fine-grained Journal Store can both improve the performance up to 15x and reduce unnecessary writes up to 5.8x. Overall, the decoupling of logical and physical representations allows more flexible matching of the physical representations to the workload patterns, and the results show that this approach is promising.
Show less - Date Issued
- 2018
- Identifier
- 2018_Su_Zhang_fsu_0071E_14368
- Format
- Thesis
- Title
- Multi-Temporal-Spectral Land Cover Classification for Remote Sensing Imagery Using Deep Learning.
- Creator
-
[No family name], Atharva, Liu, Xiuwen, Yang, Xiaojun, Tyson, Gary Scott, Zhao, Peixiang, Florida State University, College of Arts and Sciences, Department of Computer Science
- Abstract/Description
-
Sustainability research of the environment depends on accurate land cover information over large areas. Even with the increased number of satellite systems and sensors acquiring data with improved spectral, spatial, radiometric and temporal characteristics and the new data distribution policy, most existing global land cover datasets were derived from a single-date multi-spectral remotely sensed image using pixel-based classifiers with low accuracy. To improve the accuracy, the bottleneck is...
Show moreSustainability research of the environment depends on accurate land cover information over large areas. Even with the increased number of satellite systems and sensors acquiring data with improved spectral, spatial, radiometric and temporal characteristics and the new data distribution policy, most existing global land cover datasets were derived from a single-date multi-spectral remotely sensed image using pixel-based classifiers with low accuracy. To improve the accuracy, the bottleneck is how to develop accurate and effective image classification techniques. By incorporating and utilizing the spatial and multi-temporal information with multi-spectral information of remote sensing images for land cover classification, and considering their spatial and temporal interdependence, I propose three deep network systems tailored for medium-resolution remote sensing data. With a test site from the Florida Everglades area (with a size of 771 square kilometers), the proposed new deep systems have achieved significant improvements in the classification accuracy over most existing pixel-based classifiers. A proposed patch-based recurrent neural network (PB-RNN) system, a proposed pixel-based recurrent neural network system and a proposed patch-based convolutional neural network system achieve 97.21%, 87.65% and 89.26% classification accuracy respectively while a pixel-based single-image neural network (NN) system achieves only 64.74% classification accuracy. By integrating the proposed deep networks and the huge collection of medium-resolution remote sensing data, I believe that much accurate land cover datasets can be produced over large areas.
Show less - Date Issued
- 2018
- Identifier
- 2018_Su_Atharva_fsu_0071E_14727
- Format
- Thesis
- Title
- Sensor Systems and Signal Processing Algorithms for Wireless Applications.
- Creator
-
Mukherjee, Avishek, Zhang, Zhenghao, Yu, Ming, Kumar, Piyush, Liu, Xiuwen, Florida State University, College of Arts and Sciences, Department of Computer Science
- Abstract/Description
-
The demand for high performance wireless networks and systems have become increasingly high over the last decade. This dissertation addresses three systems that were designed to improving the efficiency, reliability and security of wireless systems. To improve the efficiency and reliability of wireless systems, we propose two algorithms, namely CSIFit and CSIApx, to compress the Channel State Information (CSI) of Wi-Fi networks with Orthogonal Frequency Division Multiplexing (OFDM) and...
Show moreThe demand for high performance wireless networks and systems have become increasingly high over the last decade. This dissertation addresses three systems that were designed to improving the efficiency, reliability and security of wireless systems. To improve the efficiency and reliability of wireless systems, we propose two algorithms, namely CSIFit and CSIApx, to compress the Channel State Information (CSI) of Wi-Fi networks with Orthogonal Frequency Division Multiplexing (OFDM) and Multiple Input Multiple Output (MIMO). We evaluated these systems with both experimental and synthesized CSI data. Our work on CSIApx confirmed that we can achieve very good compression ratios with very little loss accuracy, at a fraction of the complexity needed in current state-of-the-art compression methods. The second system is sensor based application to reliably detect falls inside homes. A automatic fall detection system has tremendous value to the well-being of seniors living alone. We design and implement MultiSense, a novel fall detection system, which has the following desirable features. First, it does not require the human to wear any device, therefore is convenient to seniors. Second, it has been tested in typical settings including living rooms and bathrooms, and has shown very good accuracy. Third, it is built with inexpensive components, with expected hardware cost around $150 to cover a typical room. MultiSense does not require any training data and is comparatively non-invasive than similar systems. Our evaluation showed that MultiSense achieved no False Negatives, i.e., was able to detect falls accurately each time, while producing no False Positives in a daily use test. Therefore, we believe MultiSense can be used to accurately detect human falls and can be extremely helpful to seniors living alone. Lastly, TBAS is a spoof detection method designed to improve the security of wireless networks. TBAS is based on two facts: 1) different transmitting locations likely result in different wireless channels, and 2) the drift in channel state information within a short time interval should be bounded. We proposed and implemented TBAS on Microsoft's SORA platform as well as commodity wireless cards and tested it's performance in typical Wi-Fi environments with different levels of channel mobility. Our results show that TBAS can be very accurate when running on 3 by 2 systems and above, i.e., TBAS on MIMO has a very low false positive error ratio, where a false positive event occurs when two packets from the same user are misclassified as from different users, while also maintaining a very low false negative ratio of 0.1%, where a false negative event occurs when two packets from different users are misclassified as from the same user. We believe our experimental findings can be used as a guideline for future systems that will deploy TBAS.
Show less - Date Issued
- 2018
- Identifier
- 2018_Su_Mukherjee_fsu_0071E_14750
- Format
- Thesis
- Title
- Internet-Based Interface For Database Management System.
- Creator
-
Lu, Haonan, Department of Computer Science
- Abstract/Description
-
This thesis project creates an Internet-based software application for database management that is modeled after Microsoft Access. It runs as a Java applet inside any standard web browser on any operating system and serve as an interface to any standard SQL-compliant database. A user can create, access, and manage a database located on any server from any web browser. The highlight is that no software installation is involved in its use, which means it is highly portable. Since no physical...
Show moreThis thesis project creates an Internet-based software application for database management that is modeled after Microsoft Access. It runs as a Java applet inside any standard web browser on any operating system and serve as an interface to any standard SQL-compliant database. A user can create, access, and manage a database located on any server from any web browser. The highlight is that no software installation is involved in its use, which means it is highly portable. Since no physical installation is involved, users can access their data anytime anywhere as long as they have the access to the Internet.
Show less - Date Issued
- 2012
- Identifier
- FSU_migr_uhm-0145
- Format
- Thesis
- Title
- Utilizing Cutting-Edge Computational Biology Methods in the Genomic Analysis of Florida Endangered Species.
- Creator
-
Stribling, Daniel B., Department of Computer Science
- Abstract/Description
-
Over the past decade, the technologies used to obtain sequencing data from biological tissues have significantly improved. This has resulted in a marked increase in the ability of biological researchers to collect unprecedented quantities of large-scale DNA sequence data in a short timeframe. Recent developments in genome sequencing algorithms have allowed bioinformatics utilities to begin to take full advantage of this data, paving the way for significant increases in our understanding of...
Show moreOver the past decade, the technologies used to obtain sequencing data from biological tissues have significantly improved. This has resulted in a marked increase in the ability of biological researchers to collect unprecedented quantities of large-scale DNA sequence data in a short timeframe. Recent developments in genome sequencing algorithms have allowed bioinformatics utilities to begin to take full advantage of this data, paving the way for significant increases in our understanding of genomics. New methods of genomics research have now created many new opportunities for discoveries in fields such as conservation ecology, personalized medicine, and the study of genetic disease. This research project consist of two major components: the utilization of recently-developed computational biology methods to perform sequence assembly on native Florida Species, and the creation of new bioinformatics utilities to facilitate genomics research. This project includes the completion of the first stage of the Florida Endangered Species Sequencing Project, the assembly and annotation of the transcriptome of the Florida wolf spider: Schizocosa ocreata, and a preliminary analysis of differential gene expression in ocreata organisms. Initial work is also included on Florida Endangered Species Sequencing Project Stage Two: sequence assembly projects for the Florida Manatee and the Gopher Tortoise. Discussion is included of two new computational biology utilities: TFLOW, a transcriptome assembly pipeline designed to facilitate de novo transcriptome assembly projects, and ongoing development of the GATTICA web-based bioinformatics toolkit. The TFLOW package has been released for download through the FSU Center for Genomics and Personalized Medicine.
Show less - Date Issued
- 2015
- Identifier
- FSU_migr_uhm-0503
- Format
- Thesis
- Title
- Design and Evaluation of Networking Techniques for the Next Generation of Interconnection Networks.
- Creator
-
Faizian, Peyman, Yuan, Xin, Ke, Fengfeng, Srinivasan, Ashok, Tyson, Gary Scott, Florida State University, College of Arts and Sciences, Department of Computer Science
- Abstract/Description
-
High performance computing(HPC) and data center systems have undergone rapid growth in recent years. To meet the current and future demand of compute- and data-intensive applications, these systems require the integration of a large number of processors, storage and I/O devices through high-speed interconnection networks. In massively scaled HPC and data centers, the performance of the interconnect is a major defining factor for the performance of the entire system. Interconnect performance...
Show moreHigh performance computing(HPC) and data center systems have undergone rapid growth in recent years. To meet the current and future demand of compute- and data-intensive applications, these systems require the integration of a large number of processors, storage and I/O devices through high-speed interconnection networks. In massively scaled HPC and data centers, the performance of the interconnect is a major defining factor for the performance of the entire system. Interconnect performance depends on a variety of factors including but not limited to topological characteristics, routing schemes, resource management techniques and technological constraints. In this dissertation, I explore several approaches to improve the performance of large-scale networks. First, I investigate the topological properties of a network and their effect on the performance of the system under different workloads. Based on detailed analysis of graph structures, I find a well-known graph as a potential topology of choice for the next generation of large-scale networks. Second, I study the behavior of adaptive routing on the current generation of supercomputers based on the Dragonfly topology and highlight the fact that the performance of adaptive routing on such networks can be enhanced by using detailed information about the communication pattern. I develop a novel approach for identifying the traffic pattern and then use this information to improve the performance of adaptive routing on dragonfly networks. Finally, I investigate the possible advantages of utilizing emerging software defined networking technology in the high performance computing domain. My findings show that by leveraging the use of SDN, we can achieve near-optimal rate allocation for communication patterns in an HPC cluster, which can remove the necessity for expensive adaptive routing schemes and simplify the control plane on the next generation of supercomputers.
Show less - Date Issued
- 2018
- Identifier
- 2018_Sp_Faizian_fsu_0071E_14185
- Format
- Thesis
- Title
- DAGDA Decoupling Address Generation from Loads and Stores.
- Creator
-
Stokes, Michael, Whalley, David B., Liu, Xiuwen, Tyson, Gary Scott, Florida State University, College of Arts and Sciences, Department of Computer Science
- Abstract/Description
-
DAGDA exposes some of the hidden operations that the hardware uses when performing loads and stores to the compiler to save energy and increase performance. We decouple the micro-operations for loads and stores into two operations: the first, the "prepare to access memory" instruction, or "pam", checks to see if a line is resident in the L1 DC and determines its way in the L1 DC data array, if it exists. The second operations performs the actual data access. This allows us to both save energy...
Show moreDAGDA exposes some of the hidden operations that the hardware uses when performing loads and stores to the compiler to save energy and increase performance. We decouple the micro-operations for loads and stores into two operations: the first, the "prepare to access memory" instruction, or "pam", checks to see if a line is resident in the L1 DC and determines its way in the L1 DC data array, if it exists. The second operations performs the actual data access. This allows us to both save energy using compiler optimization techniques and improve performance because "pam" operations are a natural way of prefetching data into the L1 DC
Show less - Date Issued
- 2018
- Identifier
- 2018_Su_Stokes_fsu_0071N_14269
- Format
- Thesis
- Title
- A Comprehensive Study of Portability Bug Characteristics in Desktop and Android Applications.
- Creator
-
Clow, Jonathan Alexander, Nistor, Adrian, Haiduc, Sonia, Whalley, David B., Florida State University, College of Arts and Sciences, Department of Computer Science
- Abstract/Description
-
Since 2008, the Android ecosystem has been tremendously popular with consumers, developers, and manufacturers due to the open nature of the operating system and its compatibility and availability on a range of devices. This, however, comes at a cost. The variety of available devices and speed of evolution of the Android system itself adds layers of fragmentation to the ecosystem around which developers must navigate. Yet this phenomenon is not unique to the Android ecosystem, impacting...
Show moreSince 2008, the Android ecosystem has been tremendously popular with consumers, developers, and manufacturers due to the open nature of the operating system and its compatibility and availability on a range of devices. This, however, comes at a cost. The variety of available devices and speed of evolution of the Android system itself adds layers of fragmentation to the ecosystem around which developers must navigate. Yet this phenomenon is not unique to the Android ecosystem, impacting desktop applications like Apache Tomcat and Google Chrome as well. As fragmentation of a system grows, so does the burden on developers to produce software than can execute on a wide variety of potential device, environment, and system combinations, while the reality prevents developers from anticipating every possible scenarios. This study provides the first empirical study characterizing portability bugs in both desktop and Android applications. Specifically, we examined 228 randomly selected bugs from 18 desktop and Android applications for the common root causes, manifestation patterns, and fix strategies used to combat portability bugs. Our study reveals several commonalities among the bugs and platforms, which include: (1) 92.14% of all bugs examined are caused by an interaction with a single dependency, (2) 53.13% of all bugs examined are caused by an interaction with the system, and (3) 33.19% of all bugs examined are fixed by adding a direct or indirect check against the dependency causing the bug. These results provide guidance for techniques and strategies to help developers and researchers identify and fix portability bugs.
Show less - Date Issued
- 2018
- Identifier
- 2018_Su_Clow_fsu_0071N_14798
- Format
- Thesis
- Title
- Deep: Dependency Elimination Using Early Predictions.
- Creator
-
Penagos, Luis G., Whalley, David B., Yuan, Xin, Yu, Weikuan, Florida State University, College of Arts and Sciences, Department of Computer Science
- Abstract/Description
-
Conditional branches have traditionally been a performance bottleneck for most processors. The high frequency of branches in code coupled with expensive pipeline flushes on mispredictions make branches expensive instructions worth optimizing. Conditional branches have historically inhibited compilers from applying optimizations across basic block boundaries due to the forks in control flow that they introduce. This thesis describes a systematic way of generating paths (traces) of branch-free...
Show moreConditional branches have traditionally been a performance bottleneck for most processors. The high frequency of branches in code coupled with expensive pipeline flushes on mispredictions make branches expensive instructions worth optimizing. Conditional branches have historically inhibited compilers from applying optimizations across basic block boundaries due to the forks in control flow that they introduce. This thesis describes a systematic way of generating paths (traces) of branch-free code at compile time by decomposing branching and verification operations to eliminate the dependence of a branch on its preceding compare instruction. This explicit decomposition allows us to move comparison instructions past branches and to merge pre and post branch code. These paths generated at compile time can potentially provide additional opportunities for conventional optimizations such as common subexpression elimination, dead assignment elimination and instruction selection. Moreover, this thesis describes a way of coalescing multiple branch instructions within innermost loops to produce longer basic blocks to provide additional optimization opportunities.
Show less - Date Issued
- 2018
- Identifier
- 2018_Su_Penagos_fsu_0071N_14784
- Format
- Thesis
- Title
- staDFA: An Efficient Subexpression Matching Method.
- Creator
-
Chowdhury, Mohammad Imran, van Engelen, Robert A., Whalley, David B., Wang, An-I Andy, Florida State University, College of Arts and Sciences, Department of Computer Science
- Abstract/Description
-
The main task of a Lexical Analyzer such as Lex [20], Flex [26] and RE/Flex [34], is to perform tokenization of a given input file within reasonable time and with limited storage requirements. Hence, most lexical analyzers use Deterministic Finite Automata (DFA) to tokenize input to ensure that the running time of the lexical analyzer is linear (or close to linear) in the size of the input. However, DFA constructed from Regular Expressions (RE) are inadequate to indicate the positions and/or...
Show moreThe main task of a Lexical Analyzer such as Lex [20], Flex [26] and RE/Flex [34], is to perform tokenization of a given input file within reasonable time and with limited storage requirements. Hence, most lexical analyzers use Deterministic Finite Automata (DFA) to tokenize input to ensure that the running time of the lexical analyzer is linear (or close to linear) in the size of the input. However, DFA constructed from Regular Expressions (RE) are inadequate to indicate the positions and/or extents in a matching string of a given subexpression of the regular expression. This means that all implementations of trailing contexts in DFA-based lexical analyzers, including Lex, Flex and RE/Flex, produce incorrect results. For any matching string in the input (also called the lexeme) that matches a token is regular expression pattern, it is not always possible to tell the position of a part of the lexeme that matches a subexpression of the regular expression. For example, the string abba matches the pattern a b*/b a, but the position of the trailing context b a of the pattern in the string abba cannot be determined by a DFA-based matcher in the aforementioned lexical analyzers. There are algorithms based on Nondeterministic Finite Automata (NFA) that match subexpressions accurately. However, these algorithms are costly to execute and use backtracking or breadth-first search algorithms that run in non-linear time, with polynomial or even exponential worst-case time complexity. A tagged DFA-based approach (TDFA) was pioneered by Ville Laurikari [15] to efficiently match subexpressions. However, TDFA are not perfectly suitable for lexical analyzers since the tagged DFA edges require sets of memory updates, which hampers the performance of DFA edge traversals when matching input. I will introduce a new DFA-based algorithm for efficient subexpression matching that performs memory updates in DFA states. I propose, the Store-Transfer-Accept Deterministic Finite Automata (staDFA). In my proposed algorithm, the subexpression matching positions and/or extents are stored in a Marker Position Store (MPS). The MPS is updated while the input is tokenized to provide the positions/extents of the sub-match. Compression techniques for DFA, such as Hopcroft’s method [14], default transitions [18, 19], and other methods, can be applied to staDFA. For an instance, this thesis provide a modified Hopcroft’s method for the minimization of staDFA.
Show less - Date Issued
- 2018
- Identifier
- 2018_Su_Chowdhury_fsu_0071N_14793
- Format
- Thesis
- Title
- Modeling and Comparison of Large-Scale Interconnect Designs.
- Creator
-
Mollah, Md Atiqul Islam, Yuan, Xin, Ke, Fengfeng, Aggarwal, Sudhir, van Engelen, Robert A., Florida State University, College of Arts and Sciences, Department of Computer Science
- Abstract/Description
-
Modern day high performance computing (HPC) clusters and data centers require a large number of computing and storage elements to be interconnected. Interconnect performance is considered a major bottleneck to the overall performance of such systems. Due to the massive scale of the network, interconnect designs are often evaluated and compared through models. My research is focused on developing scalable yet accurate methods to model large-scale interconnections and their architectural...
Show moreModern day high performance computing (HPC) clusters and data centers require a large number of computing and storage elements to be interconnected. Interconnect performance is considered a major bottleneck to the overall performance of such systems. Due to the massive scale of the network, interconnect designs are often evaluated and compared through models. My research is focused on developing scalable yet accurate methods to model large-scale interconnections and their architectural components. Such models are applied to investigate the performance characteristics of different components of interconnect systems including the topology, the routing scheme, and the network control/management scheme. Then, through multiple experimental studies, I apply the newly developed modeling techniques to evaluate the performance of novel interconnects technologies and thus, validate the case for their adoptions in the current and future generation of interconnected systems.
Show less - Date Issued
- 2018
- Identifier
- 2018_Sp_Mollah_fsu_0071E_14461
- Format
- Thesis
- Title
- Securing Systems by Vulnerability Mitigation and Adaptive Live Patching.
- Creator
-
Chen, Yue, Wang, Zuoxin, Yu, Ming, Liu, Xiuwen, Wang, An-I Andy, Florida State University, College of Arts and Sciences, Department of Computer Science
- Abstract/Description
-
The number and type of digital devices are increasing tremendously in today's world. However, as the code size soars, the hidden vulnerabilities become a major threat to user security and privacy. Vulnerability mitigation, detection, and patch generation are key protection mechanisms against attacks and exploits. In this dissertation, we first explore the limitations of existing solutions. For vulnerability mitigation, in particular, currently deployed address space layout randomization (ASLR...
Show moreThe number and type of digital devices are increasing tremendously in today's world. However, as the code size soars, the hidden vulnerabilities become a major threat to user security and privacy. Vulnerability mitigation, detection, and patch generation are key protection mechanisms against attacks and exploits. In this dissertation, we first explore the limitations of existing solutions. For vulnerability mitigation, in particular, currently deployed address space layout randomization (ASLR) has the drawbacks that the process is randomized only once, and the segment is moved as a whole. This design makes the program particularly vulnerable to information leaks. For vulnerability detection, many existing solutions can only detect the symptoms of attacks, instead of locating the underlying exploited vulnerabilities, since the manifestation of an attack does not always coincide with the exploited vulnerabilities. For patch generation towards a large number of different devices, current schemes fail to meet the requirements of timeliness and adaptiveness. To tackle the limitations of existing solutions, this dissertation introduces the design and implementation of three countermeasures. First, we present Remix, an effective and efficient on-demand live randomization system, which randomizes basic blocks of each function during runtime to provide higher entropy and stronger protection against code reuse attacks. Second, we propose Ravel, an architectural approach to pinpointing vulnerabilities from attacks. It leverages a record & replay mechanism to reproduce attacks in the lab environment, and uses the program's memory access patterns to locate targeted vulnerabilities which can be a variety of types. Lastly, we present KARMA, a multi-level live patching framework for Android kernels with minor performance overhead. The patches are written in a high-level memory-safe language, with the capability to be adapted to thousands of different Android kernels.
Show less - Date Issued
- 2018
- Identifier
- 2018_Sp_Chen_fsu_0071E_14297
- Format
- Thesis
- Title
- An Effective and Efficient Approach for Clusterability Evaluation.
- Creator
-
Adolfsson, Andreas, Ackerman, Margareta, Brownstein, Naomi Chana, Haiduc, Sonia, Tyson, Gary Scott, Florida State University, College of Arts and Sciences, Department of...
Show moreAdolfsson, Andreas, Ackerman, Margareta, Brownstein, Naomi Chana, Haiduc, Sonia, Tyson, Gary Scott, Florida State University, College of Arts and Sciences, Department of Computer Science
Show less - Abstract/Description
-
Clustering is an essential data mining tool that aims to discover inherent cluster structure in data. As such, the study of clusterability, which evaluates whether data possesses such structure, is an integral part of cluster analysis. Yet, despite their central role in the theory and application of clustering, current notions of clusterability fall short in two crucial aspects that render them impractical; most are computationally infeasible and others fail to classify the structure of real...
Show moreClustering is an essential data mining tool that aims to discover inherent cluster structure in data. As such, the study of clusterability, which evaluates whether data possesses such structure, is an integral part of cluster analysis. Yet, despite their central role in the theory and application of clustering, current notions of clusterability fall short in two crucial aspects that render them impractical; most are computationally infeasible and others fail to classify the structure of real datasets. In this thesis, we propose a novel approach to clusterability evaluation that is both computationally efficient and successfully captures the structure in real data. Our method applies multimodality tests to the (one-dimensional) set of pairwise distances based on the original, potentially high-dimensional data. We present extensive analyses of our approach for both the Dip and Silverman multimodality tests on real data as well as 17,000 simulations, demonstrating the success of our approach as the first practical notion of clusterability.
Show less - Date Issued
- 2016
- Identifier
- FSU_SUMMER2017_Adolfsson_fsu_0071N_13478
- Format
- Thesis
- Title
- I/O Latency in the Linux Storage Stack.
- Creator
-
Stephens, Brandon, Wang, An-I Andy, Wang, Zhi, Wang, Zuoxin, Whalley, David B., Florida State University, College of Arts and Sciences, Department of Computer Science
- Abstract/Description
-
As storage device performance increases, the lifespan of an I/O request becomes throttled more-so by data path traversal than physical disk access. Even though many computer performance analysis tools exist, a surprisingly small amount of research has been published documenting bottlenecks throughout the Linux storage stack. What research has been published focuses on results found through tracing, glossing over how the traces were performed. This work details my process of developing a...
Show moreAs storage device performance increases, the lifespan of an I/O request becomes throttled more-so by data path traversal than physical disk access. Even though many computer performance analysis tools exist, a surprisingly small amount of research has been published documenting bottlenecks throughout the Linux storage stack. What research has been published focuses on results found through tracing, glossing over how the traces were performed. This work details my process of developing a refined tracing method, what that method is, and how the research can be applied to measure I/O latency at any layer of the Linux storage stack. Sample results are given after examining the filesystem layer, the block layer, and the memory management system. Among these three components of the storage stack, the filesystem layer is responsible for the longest duration of an I/O request's lifespan.
Show less - Date Issued
- 2017
- Identifier
- FSU_FALL2017_Stephens_fsu_0071N_14270
- Format
- Thesis
- Title
- Machine Learning Algorithms and Applications for Lidar, Images, and Unstructured Data.
- Creator
-
Parajuli, Biswas, Kumar, Piyush, She, Yiyuan, Liu, Xiuwen, Zhao, Peixiang, Florida State University, College of Arts and Sciences, Department of Computer Science
- Abstract/Description
-
Aerial imagery of geographic regions in the form of Lidar and RGB images aids different tasks like survey, urban-planning, mapping, surveillance, navigation, localization and others. Most of the applications, in general, require accurate segmentation and identification of variety of objects. The labeling is mostly done manually which is slow and expensive. This dissertation focuses on roads as the object of interest and aims to develop methods to automatically extract road networks from both...
Show moreAerial imagery of geographic regions in the form of Lidar and RGB images aids different tasks like survey, urban-planning, mapping, surveillance, navigation, localization and others. Most of the applications, in general, require accurate segmentation and identification of variety of objects. The labeling is mostly done manually which is slow and expensive. This dissertation focuses on roads as the object of interest and aims to develop methods to automatically extract road networks from both aerial Lidar and images. This work investigates deep convolutional architectures that can fuse the two types of data for road segmentation. It presents a design which performs better than the state-of-the-art RGB-only methods. It also describes a simple, disk-packing based algorithm which translates the road segmentation into a OpenStreetMap-like road network graph while giving improved accuracies in terms of connectivity, topology and reduction in outliers. This dissertation also presents a truth finding algorithm based on iterative outlier removal which can be used for reaching a consensus when information sources or ensembles of trained machine learning models are at a conflict. In addition, it introduces a full and published book on Python programming based on the experiences this research provided. The hope is to contribute towards teaching and learning Python.
Show less - Date Issued
- 2019
- Identifier
- 2019_Spring_Parajuli_fsu_0071E_14920
- Format
- Thesis
- Title
- Towards Automating the Establishment and Evolution of Software Traceability.
- Creator
-
Mills, Chris (Christopher), Haiduc, Sonia, Blessing, Susan K., Chakraborty, Shayok, Zhao, Peixiang, Florida State University, College of Arts and Sciences, Department of...
Show moreMills, Chris (Christopher), Haiduc, Sonia, Blessing, Susan K., Chakraborty, Shayok, Zhao, Peixiang, Florida State University, College of Arts and Sciences, Department of Computer Science
Show less - Abstract/Description
-
Software systems contain an immense amount of information captured in a variety of documents such as source code files, user documentation, use and test cases, bug reports, and system requirements among others. Relationships between these pieces of information -- called traceability links -- provide stakeholders broader knowledge about a system's constituent pieces and support many aspects of the software's development, maintenance, and evolution. Ideally, traceability links would be...
Show moreSoftware systems contain an immense amount of information captured in a variety of documents such as source code files, user documentation, use and test cases, bug reports, and system requirements among others. Relationships between these pieces of information -- called traceability links -- provide stakeholders broader knowledge about a system's constituent pieces and support many aspects of the software's development, maintenance, and evolution. Ideally, traceability links would be documented as software artifacts are produced. For instance, as they work, developers would document which test cases exercise which code segments or which code classes implement which use cases. However, this is typically not the case. Due to organizational issues such as tight timelines for product delivery and lack of buy-in by project managers, software traceability is often a secondary concern. To address this situation and improve traceability for a system post hoc, stakeholders can perform Traceability Link Recovery (TLR). TLR is a software engineering task that fills in missing traceability information by establishing (i.e., recovering) links between related artifacts. Through this process, software traceability can be promoted to naturally support various tasks such as program comprehension, concept localization, verifying test coverage, and ensuring that system and legal requirements are met. Unfortunately, performing TLR manually is an extremely time and resource intensive task. Therefore, even though prior work suggests it directly improves software maintenance and evolution, few systems have sufficient traceability to realize these benefits. The few that do are mainly safety-critical and have tight regulatory requirements where traceability is legally required for quality assurance to mitigate risk. First, we seek to reduce the cost of establishing traceability links through TLR by improving automatic approaches to it based on artifact similarity. Second, we seek to reduce the cost of maintaining existing traceability information by applying supervised machine learning. This technique mines statistical patterns from historical traceability information to build a predictive model that infers artifact relationships without the need for a human operator. As a result, software teams are able to realize the hitherto cost prohibitive benefits of traceability even for projects where there is no legal requirement for traceability to exist.
Show less - Date Issued
- 2019
- Identifier
- 2019_Spring_Mills_fsu_0071E_15138
- Format
- Thesis
- Title
- Community Search and Detection on Large Graphs.
- Creator
-
Akbas, Esra, Zhao, Peixiang, Mio, Washington, Kumar, Piyush, Liu, Xiuwen, Florida State University, College of Arts and Sciences, Department of Computer Science
- Abstract/Description
-
Modern science and technology have witnessed in the past decade a proliferation of complex data that can be naturally modeled and interpreted as graphs. In real-world networked applications, the underlying graphs oftentimes exhibit fundamental community structures supporting widely varying interconnected processes. Identifying communities may offer insight on how the network is organized. In this thesis, we worked on community detection and search problems on graph data. Community detection ...
Show moreModern science and technology have witnessed in the past decade a proliferation of complex data that can be naturally modeled and interpreted as graphs. In real-world networked applications, the underlying graphs oftentimes exhibit fundamental community structures supporting widely varying interconnected processes. Identifying communities may offer insight on how the network is organized. In this thesis, we worked on community detection and search problems on graph data. Community detection (graph clustering) has become one of the most well-studied problems in graph management and analytics, the goal of which is to group vertices of a graph into densely knitted clusters with each cluster being well separated from all the others. Classic graph clustering methods primarily take advantage of topological information of graphs to model and quantify the proximity between vertices. With the proliferation of rich, heterogeneous graph contents widely available in real-world graphs, such as user profiles in social networks, it becomes essential to consider both structures and attributive contents of graphs for better quality graph clustering. On the other hand, existing community detection methods focus primarily on discovering communities in an apriori, top-down manner with the only reference to the input graph. As a result, all communities have to be exhaustively identified thus incurring expensive time/space cost and a huge amount of fruitless computation, if only a fraction of them are of special interest to end-users. In many real-world occasions, however, people are more interested in the communities pertaining to a given vertex. In our first project, we work on attributed graph clustering problem. We propose a graph embedding approach to cluster content-enriched, attributed graphs. The key idea is to design a unified latent representation for each vertex of a graph such that both the graph connectivity and vertex attribute proximity within the localized region of the vertex can be jointly embedded into a unified, continuous vector space. As a result, the challenging attributed graph clustering problem is cast to the traditional data clustering problem. In my second and third projects, we work on a query-dependent variant of community detection, referred to as the community search problem. The objective of community search is to identify dense subgraphs containing the query vertices. We study the community search problem in the truss-based model aimed at discovering all dense and cohesive k-truss communities to which the query set Q belongs. We introduce a novel equivalence relation, k-truss equivalence, to model the intrinsic density and cohesiveness of edges in k-truss communities and based on this equivalence we create 2 different space-efficient, truss-preserving index structure, EquiTruss and TEQ. Community search for one query or multiple queries can thus be addressed upon EquiTruss and TEQ without repeated, time-demanding accesses to the original graph, G, which proves to be theoretically optimal. While query set includes one query vertex in our first project, it includes multiple query vertices in our second project. As a summary, to get better quality on attributed graph clustering, the attribute-aware cluster information is well preserved during graph embedding. While we use Skip-Gram method for embedding, there are other embedding methods. We can use them to see the effect of different embedding methods on attributed graphs. In addition, our index structure is good for community search on large graphs without considering attribute information. Using attribute information in addition to the structure may give better communities for given query nodes. So, we can update our index structure to support community search on attributed graphs.
Show less - Date Issued
- 2017
- Identifier
- FSU_FALL2017_Akbas_fsu_0071E_14173
- Format
- Thesis
- Title
- Enabling Efficient Big Data Services on HPC Systems with SHMEM-Based Programming Stack.
- Creator
-
Fu, Huansong, Yu, Weikuan, Ye, Ming, Duan, Zhenhai, Venkata, Manjunath Gorentla, Mascagni, Michael, Florida State University, College of Arts and Sciences, Department of...
Show moreFu, Huansong, Yu, Weikuan, Ye, Ming, Duan, Zhenhai, Venkata, Manjunath Gorentla, Mascagni, Michael, Florida State University, College of Arts and Sciences, Department of Computer Science
Show less - Abstract/Description
-
Thesis abstract With the continuous expansion of the Big Data universe, researchers have been relentlessly searching for ways to improve the efficiency of big data services, including data analytics and data infrastructures. In the meantime, there has also been an increasing interest to leverage High-performance Computing (HPC) capabilities for big data analytics. Symmetric Hierarchical Memory (SHMEM) is a popular parallel programming model thrived in the HPC realm. For many Partitioned...
Show moreThesis abstract With the continuous expansion of the Big Data universe, researchers have been relentlessly searching for ways to improve the efficiency of big data services, including data analytics and data infrastructures. In the meantime, there has also been an increasing interest to leverage High-performance Computing (HPC) capabilities for big data analytics. Symmetric Hierarchical Memory (SHMEM) is a popular parallel programming model thrived in the HPC realm. For many Partitioned Global Address Space (PGAS) systems and applications, SHMEM libraries are popularly used as a high-performance communication layer between the applications and underlying fast-speed interconnects. SHMEM features an one-sided communication interface. It allows remote data to be accessed in a shared-memory manner, in contrast to the conventional two-sided communication where remote data must be accessed through an explicit handshake protocol. We reveal that SHMEM offers a number of great benefits to develop parallel and distributed applications and frameworks on tightly-coupled, high-end HPC systems, such as its shared-memory style addressing model and the flexibility of its communication model. This dissertation focuses on improving the performance of big data services by leveraging a lightweight, flexible and balanced SHMEM-based programming stack. In order to realize this goal, we have studied some representative data infrastructure and data analytic framework. Specifically, key-value stores are a very popular form of data infrastructure deployed for many large-scale web services. Unfortunately, a key-value store usually adopts an inefficient communication design in a traditional server-client architecture, where the server can easily become a bottleneck in processing a huge amount of requests. Because of this, both latency and throughput can be seriously affected. Moreover, graph processing is an emerging type of data analytics that deals with large-scale graph data. Unsuitable for traditional MapReduce, graph analytic algorithms are often written and run with programming models that are specifically designed for graph processing. However, there is an imbalance issue in state-of-the-art graph processing programming model which has drastically affected the performance of graph processing. There is a critical need to revisit the conventional design of graph processing while the volume of real-world useful graph data keeps increasing everyday. Furthermore, although we reveal that a SHMEM-based programming stack helps solve the aforementioned issues, there is still a lack of understanding about how portable this stack can be for it to fit in with specific data infrastructure and framework being optimized and also other distributed systems in general. This includes to understand the potential performance gain or loss, limitations of usage, and portability on different platforms etc. This dissertation has centered around addressing these research challenges and carried out three studies, each tackling a unique challenge but all focusing on facilitating a SHMEM-based programming stack to enable and accelerate big data services. Firstly, we use a popular SHMEM standard called OpenSHMEM to build a high-performance key-value store called SHMEMCache, which overcomes several issues in enabling direct access to key-value pairs, including race conditions, remote point chasing and unawareness of remote access. We have then thoroughly evaluated SHMEMCache and shown that it has accomplished significant performance improvements over the other contemporary key-value stores, and also achieved good scalability over a thousand nodes on a leadership-class supercomputer. Secondly, to understand the implications in using various SHMEM model and one-sided communication library for big data services, we revisit the design of SHMEMCache and extend it with a portable communication interface and develop Portable-SHMEMCache. Portable-SHMEMCache is able to support a variety of one-sided communication libraries. Based on this new framework, we have supported both OpenSHMEM and MPI-RMA for SHMEMCache as proof-of-concept. We have conducted an extensive experimental analysis to evaluate the performance of Portable-SHMEMCache on two different platforms. Thirdly, we have thoroughly studied the issues existed in state-of-the-art graph processing frameworks. We have proposed salient design features to tackle their serious inefficiency and imbalance issues. The design features have been incorporated in a new graph processing framework called SHMEMGraph. Our comprehensive experiments for SHMEMGraph have demonstrated its significant performance advantages compared to state-of-the-art graph processing frameworks. This dissertation has pushed forward the big data evolution by enabling efficient representative data infrastructure and analytic frameworks on HPC systems with SHMEM-based programming models. The performance improvements compared to state-of-the-art frameworks have demonstrated the efficacy of our solution designs and the potential of leveraging HPC capabilities for big data. We believe that our work has better prepared contemporary data infrastructures and analytic frameworks for addressing the big data challenge.
Show less - Date Issued
- 2018
- Identifier
- 2019_Spring_Fu_fsu_0071E_14906
- Format
- Thesis