Browsing by Author "Ergenç, Belgin"

Now showing 1 - 6 of 6

Comparison of dynamic rule mining algorithms
(Izmir Institute of Technology, 2012) Shariff, Karunda; Ergenç, Belgin
In real life, new data is constantly added to databases while the existing one is modified or deleted. The new challenge of association rule mining is the need to always maintain meaningful association rules whenever the databases are updated. Many dynamic algorithms that use different techniques have been proposed in the past to deal with this challenge. However less work has been done in comparing their performance. In this study comparison of two dynamic rule mining algorithms; Dynamic Matrix Apriori and Fast Update 2, which have not been compared in the past, is done. The algorithms are tested on three different datasets to determine their execution time with updates of: additions, deletions and different support thresholds. Our findings reveal that DMA performs better with two dataset and so is FUP2 with the other dataset. The difference in performance of the two algorithms is mainly caused by the nature of the datasets.
Distinct encoded records join operator for distributed query processing
(Izmir Institute of Technology, 2012) Öztürk, Ahmet Cumhur; Ergenç, Belgin
Nowadays distributing data among different locations is very popular due to needs of business environment. In today's business environment, accessible, reliable, and scalable data is a critical need and distributed database system provides those advantages. It is a need to transfer data between sites while processing query in distributed database system, if the connection speed between sites is low then transmitting data is very time consuming. Optimizing distributed query processing is different from optimizing query processing in local database system. Most of the algorithms generated for distributed query processing focus on reducing the amount of data transferred between sites. Join operation in database system is for combining different tables with a common join attribute value, if the tables that are put in a join operation are at different locations then some of the tables are needed to be transferred to between sites. Join operation optimization algorithms in distributed database system focus on reducing the amount of data transfer by eliminating redundant tuples from relation before transmitting it to the other site. This thesis introduces a new distributed query processing technique named distinct encoded records join operation (DERjoin) which considers duplicated join attributes in a relation and eliminates them before sending the relation to another site.
Dynamic frequent itemset mining based on Matrix Appriori algorithm
(Izmir Institute of Technology, 2012) Oğuz, Damla; Ergenç, Belgin
The frequent itemset mining algorithms discover the frequent itemsets from a database. When the database is updated, the frequent itemsets should be updated as well. However, running the frequent itemset mining algorithms with every update is inefficent. This is called the dynamic update problem of frequent itemsets and the solution is to devise an algorithm that can dynamically mine the frequent itemsets. In this study, a dynamic frequent itemset mining algorithm, which is called Dynamic Matrix Apriori, is proposed and explained. In addition, the proposed algorithm is compared using two datasets with the base algorithm Matrix Apriori which should be re-run when the database is updated.
Impacts of frequent itemset hiding algorithms on privacy preserving data mining
(Izmir Institute of Technology, 2010) Yıldız, Barış; Ergenç, Belgin
The invincible growing of computer capabilities and collection of large amounts of data in recent years, make data mining a popular analysis tool. Association rules (frequent itemsets), classification and clustering are main methods used in data mining research. The first part of this thesis is implementation and comparison of two frequent itemset mining algorithms that work without candidate itemset generation: Matrix Apriori and FP-Growth. Comparison of these algorithms revealed that Matrix Apriori has higher performance with its faster data structure. One of the great challenges of data mining is finding hidden patterns without violating data owners. privacy. Privacy preserving data mining came into prominence as a solution. In the second study of the thesis, Matrix Apriori algorithm is modified and a frequent itemset hiding framework is developed. Four frequent itemset hiding algorithms are proposed such that: i) all versions work without pre-mining so privacy breech caused by the knowledge obtained by finding frequent itemsets is prevented in advance, ii) efficiency is increased since no pre-mining is required, iii) supports are found during hiding process and at the end sanitized dataset and frequent itemsets of this dataset are given as outputs so no post-mining is required, iv) the heuristics use pattern lengths rather than transaction lengths eliminating the possibility of distorting more valuable data.
Level based labeling scheme for extensible markup language (XML) data processing
(Izmir Institute of Technology, 2010) Atıcı, Beray; Ergenç, Belgin
With the continuous growth of data in businesses and the increasing demand for reaching that data immediately, raised the need of having real time data warehouses. In order to provide such a system, the ETL mechanism will need to be very efficient on updating data. From the literature surveys, it has been observed that there are many studies performed on efficient update of the relational data, while there is limited amount of study on updating the XML data. With the extensible structure and effective performance on data exchange, the usage of XML data structure is increasing day by day. Like relational databases, real time XML databases also need to be updated continuously. The hierarchic characteristic of XML required the usage of tree representations for indexing the data since they provide necessary means to capture different relationships between the nodes. The principal purpose of this study is to define and compare algorithms which label the XML tree with an effective update mechanism. Proposed labeling algorithms aim to provide a mechanism to query and update the XML data by defining all relations between the nodes. In the experimental evaluation part of this thesis, all algorithms is examined and tested with an existing labeling algorithm.
Order based labeling scheme for dynamic XML (extensible markup language) query processing
(Izmir Institute of Technology, 2012) Assefa, Beakal Gizachew; Ergenç, Belgin
Need for robust and high performance XML database systems increased due to growing XML data produced by todayâ€™s applications. Like indexes in relational databases, XML labeling is the key to XML querying. Assigning unique labels to nodes of a dynamic XML tree in which the labels encode all structural relationships between the nodes is a challenging problem. Early labeling schemes designed for static XML document generate short labels; however, their performance degrades in update intensive environments due to the need for relabeling. On the other hand, dynamic labeling schemes achieve dynamicity at the cost of large label size or complexity which results in poor query performance. This thesis presents OrderBased labeling scheme which is dynamic, simple and compact yet able to identify structural relationships among nodes. A set of performance tests show promising labeling, querying, update performance and optimum label size.