Doktora Tezleri

Permanent URI for this collectionhttp://standard-demo.gcris.com/handle/123456789/3642

Browse

Now showing 1 - 11 of 11

Density grid based stream clustering algorithm
(Izmir Institute of Technology, 2019-11) Ahmed, Rowanda Daoud; Ayav, Tolga; Ayav, Tolga; Dalkılıç, Gökhan
Recently as applications produce overwhelming data streams, the need for strategies to analyze and cluster streaming data becomes an urgent and a crucial research area for knowledge discovery. The main objective and the key aim of data stream clustering is to gain insights into incoming data. Recognizing all probable patterns in this boundless data which arrives at varying speeds and structure and evolves over time, is very important in this analysis process. The existing data stream clustering strategies so far, all suffer from different limitations, like the inability to find the arbitrary shaped clusters and handling outliers in addition to requiring some parameter information for data processing. For fast, accurate, efficient and effective handling for all these challenges, we proposed DGStream, a new online-offline grid and density-based stream clustering algorithm. We conducted many experiments and evaluated the performance of DGStream over different simulated databases and for different parameter settings where a wide variety of concept drifts, novelty, evolving data, number and size of clusters and outlier detection are considered. Our algorithm is suitable for applications where the interest lies in the most recent information like stock market, or if the analysis of existing information is required as well as cases where both the old and the recent information are all equally important. The experiments, over the synthetic and real datasets, show that our proposed algorithm outperforms the other algorithms in efficiency.
Discovering specific semantic relations among words using neural network methods
(Izmir Institute of Technology, 2021-10) Sezerer, Erhan; Tekir, Selma
Human-level language understanding is one of the oldest challenges in computer science. Many scientific work has been dedicated to finding good representations for semantic units (words, morphemes, characters) in languages. Recently, contextual language models, such as BERT and its variants, showed great success in downstream natural language processing tasks with the use of masked language modelling and transformer structures. Although these methods solve many problems in this domain and are proved to be useful, they still lack one crucial aspect of the language acquisition in humans: Experiential (visual) information. Over the last few years, there has been an increase in the studies that consider experiential information by building multi-modal language models and representations. It is shown by several studies that language acquisition in humans start with learning concrete concepts through images and then continue with learning abstract ideas through text. In this work, the curriculum learning method is used to teach the model concrete/abstract concepts through the use of images and corresponding captions to accomplish the task of multi-modal language modeling/representation. BERT and Resnet-152 model is used on each modality with attentive pooling mechanism on the newly constructed dataset, collected from the Wikimedia Commons. To show the performance of the proposed model, downstream tasks and ablation studies are performed. Contribution of this work is two-fold: a new dataset is constructed from Wikimedia Commons and a new multi-modal pre-training approach that is based on curriculum learning is proposed. Results show that the proposed multi-modal pre-training approach increases the success of the model.
Dynamic itemset hiding under multiple support thresholds
(Izmir Institute of Technology, 2018-07) Öztürk, Ahmet Cumhur; Ergenç Bostanoğlu, Belgin
Data sharing is commonly performed between organizations for mutual benefits. However, if confidential knowledge is not hidden before the data is published it may pose threat to security and privacy. The privacy preserving frequent itemset mining is the process of hiding sensitive itemsets from being discovered with any frequent itemset mining algorithm. The privacy constraint of sensitive itemset hiding is sensitive threshold. If support of a given sensitive itemset is under the sensitive threshold, then this sensitive itemset is considered as non-interesting and hidden. One possible way of decreasing support of sensitive itemsets under predefined sensitive threshold is deleting items from a set of transaction. This type of frequent itemset sanitization is called distortion based frequent itemset hiding. The main focus of this thesis is to preserve sensitive itemsets with considering the multiple sensitive thresholds on both static and dynamic environments. Three different distortion based frequent itemset hiding algorithms proposed; Pseodo Graph Based Sanitization (PGBS), Itemset Oriented Pseudo Graph Based Sanitization (IPGBS) and DynamicPGBS are proposed. Both PGBS and IPGBS algorithms are designed for static environment and the DynamicPGBS algorithm is designed for the dynamic environment. The main objective of these three algorithms is to hide all sensitive itemsets with giving minimum distortion on non-sensitive knowledge and data in the resulting sanitized database.
Fourier analysis based testing of finite state machines
(Izmir Institute of Technology, 2019-03) Takan, Savaş; Ayav, Tolga; Ayav, Tolga
Finite state machine (FSM) is a widely used modeling technique for circuit and software testing. FSM testing is a well-studied topic in the literature and there are several test case generation methods such as W, Wp, UIO, UIOv, DS, HSI and H. Despite the existing methods, there is still a need for alternative techniques with better performance in terms of test suite size, fault detection ratio and test generation time. In this thesis, two new test case generation methods, F and Fw have been proposed. The proposed test generation methods are based on Fourier analysis of Boolean functions. Fourier transformations have been studied extensively in mathematics, computer science and engineering. The proposed F method only tests outputs whereas Fw method also tests the next state with the outputs. In this context, the proposed methods are compared with UIO andWmethods in terms of characteristic, cost, fault detection ratio and effectiveness. The evaluation data are analyzed using T-Test and Hedges’ g. Results show that F and Fw methods outperform the existing methods in terms of the fault detection ratio per test.
Frequent subgraph mining over dynamic graphs
(Izmir Institute of Technology, 2022-07) Abuzayed, Nourhan N. I.; Ergenç Bostanoğlu, Belgin
Frequent subgraph mining (FSM) is an essential and challenging graph mining task used in several applications. Modern applications employ evolving graphs, so FSM is more challenging with evolving graphs due to the streaming nature of the input, and the exponential time complexity of the algorithms. Sampling schemes are used if approximate results serve the purpose. This thesis introduces three approximate frequent subgraph mining algorithms in evolving graphs. those algorithms use novel controlled reservoir sampling. A sample reservoir of the evolving graph and an auxiliary heap reservoir data structure are kept together in a fixed sized reservoir. When the whole reservoir is full, and space has required the edges of lower degree or higher nodes are deleted. This selection is done by utilizing the heap data structure as a heap reservoir, which keeps the node degrees. By keeping the edges of higher degree nodes in the sample reservoir, accuracy is maximized without sacrificing time and space, in contrast, keeping the edges of lower degree nodes in the sample reservoir, accuracy is minimized with higher time and space. The first algorithm is Controlled Reservoir Sampling with Unlimited heap size (UCRS), where the used heap reservoir size is unlimited. The second algorithm is Controlled Reservoir Sampling with Limited heap size (LCRS). It is a modified version of UCRS, but the heap reservoir size is limited, as a result; sample reservoir size in the whole reservoir increases since the total number of nodes dedicated for the whole reservoir includes the nodes of the heap reservoir also. The third algorithm is Maximum Controlled Reservoir Sampling (MCRS). It is a modified version of UCRS, but the candidate edge for deletion is an edge with maximum node degrees. Experimental evaluations to measure scalability and recall performances of the three algorithms in comparison to state of art algorithms are performed on dense and sparse evolving graphs. Findings show that UCRS and LCRS algorithms are scalable and achieve better recall than edge based reservoir algorithms. LCRS achieves the best recall in comparison to edge or subgraph based reservoir algorithms. MCRS has the worst speed-up and recall among the other proposed and competitor algorithms.
Improved image based localization using semantic descriptors
(Izmir Institute of Technology, 2021-01) Çınaroğlu, İbrahim; Baştanlar, Yalın
Place recognition and Visual Localization (VL) for autonomous driving are the topics that keep their popularity in the field of Computer Vision. In this study, semantically improved Hybrid-VL approaches, that use localization aware semantic information in street-level driving images are proposed. Initially, Semantic Descriptor (SD) is extracted from semantically segmented images with a Convolutional Neural Network (CNN) trained for localization task. Then, image retrieval based VL task is performed using the approximate nearest neighbor search (ANNS) in 2D-2D matching context. This proposed method is named as SD-VL and its success is compared with the success of the state-of-the-art Local Descriptor (LD) based VL method (LD-VL) which is frequently used in the literature. Furthermore, with the aim of alleviating the shortcomings of both two methods, a novel decision-level Hybrid-VL (Hybrid-VL_DL ) method is proposed by combining SD-VL and LD-VL in post-processing stage. Also feature-level Hybrid-VL (Hybrid-VL_FL ) method is proposed in order to produce automatically tuned hybrid result. These proposed VL methods are examined on two challenging benchmarks; RobotCar Seasons and Malaga Downtown Data Sets. Moreover, a new VL data set Malaga Streetview Challenge is generated by collecting Google Streetview images on the same path of Malaga Downtown in order to observe impact of environmental and wide-baseline changes. This newly generated test set will be useful for researchers studying in this field. After all, the proposed semantically boosted Hybrid-VL_DL method is able to increase localization performance on both RobotCar Seasons and Malaga Streetview Challenge data sets by 11.6% and 4.5% Top-1 recall@5, and 4% and 5.4% recall@1 scores respectively. Additionally, reliability of our hyper-parameter (W) based Hybrid-VL_DL approach is supported by very close performance of the Hybrid-VL_FL method.
Location privacy in cellular networks
(01. Izmir Institute of Technology, 2022-12) Yaman, Okan; Ayav, Tolga; Ayav, Tolga; Erten, Yusuf Murat
Many third-party utilities and applications that run on devices used in cellular networks keep track of our location data and share it. This vulnerability affects even the subscribers who use dumbphones. This thesis defines three location tracing attacks which are based on utilizing the background data and compares them with the most relevant known attacks. We have demonstrated that any attacker who knows two associated cells of a subscriber with adequate background data can deduce the intermediate cell IDs. Also, utilizing the Hidden Markov Model (HMM) increases the accuracy of an attack. In this dissertation, we introduced novel accuracy metrics for all the anticipated attacks and exploited these for detailed analysis of the threats in a real-life case, a 5G network. This work demonstrates improvements in the current privacy-preserving methods, including adaptation to 5G, and provides insights into preventing this location privacy breach. Various methods have been proposed to overcome these threats and preserve privacy against possible attacks based on this information. A friendly jamming (FJ) based solution, which offers efficient usage of resources, including computing power and energy, was introduced as a solution for these problems. However, one of the tradeoffs of FJ is its viability. Although some studies try to cope with this challenge, they are complicated and focus on old technologies. We propose a lightweight and flexible FJ scheme to address these challenges. We also demonstrate that our model has the same performance as one of the mentioned studies above in a more straightforward way.
Planar geometry estimation with deep learning
(Izmir Institute of Technology, 2022-06) Uzyıldırım, Furkan Eren; Özuysal, Mustafa
Understanding the geometric structure of any scene is one of the oldest problems in Computer Vision. Most scenes include planar regions that provide information about the geometric structure and their automatic detection and segmentation plays an important role in many computer vision applications. In recent years, convolutional neural network architectures have been introduced for piece-wise planar segmentation. They outperform the traditional approaches that generate plane candidates with 3D segmentation methods from the explicitly reconstructed 3D point cloud. However, most of the convolutional neural network architectures are not designed and trained for outdoor scenes, because they require manual annotation, which is a time-consuming task that results in a lack of training data. In this thesis,we propose and develop a deep learning based framework for piece-wise plane detection and segmentation of outdoor scenes without requiring manually annotated training data. We exploit a network trained on imagery with annotated targets and an automatically reconstructed point cloud from either Structure from Motion-Multi View Stereo pipeline or monocular depth estimation network to estimate the training ground truth on the outdoor images in an iterative energy minimization framework. We show that the resulting ground truth estimate of various sets of images in the outdoor domain is good enough to improve network weights of different architectures trained on ground truth annotated images. Moreover, we demonstrate that this transfer learning scheme can be repeated multiple times iteratively to further improve the accuracy of plane detection and segmentation on monocular images of outdoor scenes.
Semantic segmentation of panoramic images and panoramic image based outdoor visual localization
(01. Izmir Institute of Technology, 2022-10) Orhan, Semih; Baştanlar, Yalın
360-degree views are captured by full omnidirectional cameras and generally represented with panoramic images. Unfortunately, these images heavily suffer from the spherical distortion at the poles of the sphere. In previous studies of Convolutional Neural Networks (CNNs), several methods have been proposed (e.g. equirectangular convolution) to alleviate spherical distortion. Getting inspired from these previous efforts, we developed an equirectangular version of the UNet model. We evaluated the semantic segmentation performance of the UNet model and its equirectangular version on an outdoor panoramic dataset. Experimental results showed that the equirectangular version of UNet performed better than UNet. In addition, we released the pixel-level annotated dataset, which is one of the first semantic segmentation datasets of outdoor panoramic images. In visual localization, localizing perspective query images in a panoramic image dataset can alleviate the non-overlapping view problem between cameras. Generally, perspective query images are localized in a panoramic image database with generating its virtual 4 or 8 gnomonic views, which is deforming sphere into cube faces. Doing so can simplify the searching problem to perspective to perspective search, but still there might be a non-overlapping view problem between query and gnomonic database images. Therefore we propose directly localizing perspective query images in panoramic images by applying sliding windows on the last convolution layer of CNNs. Features are extracted with R-MAC, GeM, and SFRS. Experimental results showed that the sliding window approach outperformed 4-gnomonic views, and we get competitive results compared with 8 and 12 gnomonic views. Any city-scale visual localization system has to be robust against long-term changes. Semantic information is more robust to such changes (e.g. surface of the building), and the depth maps provide geometric clues. In our work, we utilized semantic and depth information while pose verification, that is checking semantic and depth similarity to verify the poses (retrievals) obtained with the approach that use only RGB image features. Semantic and depth information are represented with a self-supervised contrastive learning approach (SimCLR). Experimental results showed that pose verification with semantic and depth features improved the visual localization performance of the RGB-only model.
Test case prioritization for regression testing using change impact analysis
(Izmir Institute of Technology, 2019-06) Ufuktepe, Ekincan; Tuğlular, Tuğkan; Tuğlular, Tuğkan
The test case prioritization aims to order test cases to increase rate of fault detection, and to reduce the time for detecting faults. In this study, a static source code analysis based approach, that uses change impact analysis is proposed. The proposed change impact analysis approach uses program slicing technique, method change information and Bayesian Network. With respect to the change impact analysis results, two test case prioritization approaches called LoM and LoM-Addtl are proposed, which is inspired by the "Law of Minimum" from biology and agronomy. The change impact analysis and test case prioritization approaches are performed on three well-known projects. The proposed change impact analysis results are evaluated with precision and recall metrics. On the other hand, the proposed test case prioritization methods LoM and LoM-Addtl are compared with five other test case prioritization techniques and evaluated with the APFD measure. The results of the change impact analysis showed that when a software has completed 75% of its development, 97%-100% of the affected methods and changed methods are predicted. On the other hand, the LoM and LoM-Addtl test case prioritization approaches showed consistent results when compared to the traditional test case prioritization techniques. However, it has been observed that, LoM and LoM-Addtl performed better than the traditional methods when version jumps are smaller. Furthermore, following an Additional in LoM (LoM-Addtl) has shown better results compare to LoM.
Tracking and prediction of evolution of communities in dynamic networks
(Izmir Institute of Technology, 2021-07) Karataş, Arzum; Şahin, Serap
Communities are the most meaningful structures in dynamic networks. Tracking this evolution provides insights into the patterns of community evolution in networks over time and valuable information for decision support systems in many research areas such as marketing, recommender systems, and criminology. Previous work has focused on either high accuracy or time efficiency, but not on low memory consumption. This motivates us to develop a method that combines highly accurate tracking results with low computational resources. This dissertation first provides a brief overview of research in dynamic network analysis. Then, a novel space-efficient method, called TREC, for tracking the evolution of communities in dynamic networks is presented, where community matching using LSH with minhasing technique is proposed to efficiently track similar communities in terms of memory consumption over time. The accuracy of TREC is evaluated on benchmark datasets, and the execution time performance is measured on real dynamic datasets. In addition, a comparative algorithmic complexity analysis of TREC in terms of space and time is performed. Both theoretical and experimental results show that TREC outperforms competitor methods on both datasets in terms of combination of space, accuracy, and execution time. Next, it is investigated that whether the TREC method is suitable for predicting the evolution of community areas. In this evaluation, a prediction study is conducted. A common methodology is followed which includes main steps such as feature extraction, feature selection, classifier training and cross validation. Experimental results show that TREC method is suitable for predicting evolution of communities.

Browse

Browsing Doktora Tezleri by Department "Computer Engineering"