Boundary Vertices Sensitive Vertex-cut Partitioning Algorithm |
Author : Asad Feroz Ali, Dr. Syed Saif ur Rahman |
Abstract | Full Text |
Abstract :Graph Partitioning is one of the favorite research topics among researchers since the 70s. It attracts a diverse group of researchers from various fields such as engineering, science and mathematics. In the last decade, the graphs have increased in size to billions of vertices. Despite the fact that storage devices have become cheaper, processing these huge spanning graphs is not possible for a single machine. This call for the need of partitioning the graph so a group of machines can perform various parallel calculations on them which would save time and produce quick results. The research problem is that the ratio of boundary vertices to interior vertices increases with the increase in number of partitions for existing partitioning techniques available. To address this issue, the random edge selection method of Graph Lab algorithm was replaced with four suggested edge sorting techniques. The results were compared with the random edge selection method of Graph Lab using various performance parameters. |
|
Extracting Key Sentences from Text |
Author : Mudasser Iqbal, Muhammad Rafi |
Abstract | Full Text |
Abstract :Automatic key sentence extraction from a text is a challenging task. It has numerous applications in text processing systems. The actual task of key sentence extraction consists of three main functionalities: (i) Identification of sentence boundary in a text, (ii) a ranking function that assigns a score between (0-1) for each correctly extract- ed sentence based on important semantics of the text and (iii) Deciding the most relevant sentences based on evaluation function. This study carries out a survey about the state of the art in the field. It later proposed a heuristic based extraction using lexical chain of terms. The proposed approach is evaluated by using human based evaluation criteria on some existing text datasets. Encouraging results were obtained. |
|
Detection of Duplicate and Near-Duplicate Content for Web Crawlers |
Author : Hadi Hussain Khan, Dr. Husnain Mansoor Ali |
Abstract | Full Text |
Abstract :There is an abundance of duplicated web documents on the internet. For example, two documents online could be very similar to each other except for a very small portion, such as URLs and advertisements. While such differences are not important with regards to web searches, they do tamper with web search results due to duplication. Therefore, if web crawlers could check the duplication percentage of newly crawled pages by a previ- ously crawled page, the quality of web search will signifi- cantly increase. The main objective of this research is to propose a method which is able to check the duplication ratio of the content on the page with the one already crawled previously. The solution includes running a web crawling algorithm in order to calculate the ratio of duplication at the time of web crawling. In order to effectively achieve the goals of this research, Charikar’s SIMHASH finger print- ing-technique has been used. Using this, a new technique for the purpose of detection of exact and near duplication method will be devised which will work to check the duplica- tion ratio with the newly crawled page. The experiment is carried out on multiple pages of two major B2B website namely Ali Baba and Trade key. More than 300 pages from two similar categories on each portal were selected for this experiment. These selected pages were first calculated using a third party duplication detection tool to set the bench mark. The results obtained from the test looked to be very promising and close to the benchmark set. The system running time was very short. However, the results show an average curve variation of 10% away from the bench mark which in this case is fine. Based on the results obtained from the experiment carried out, it can be said that Charikar’s SIMHASH finger printing technique can be effectively used to detect duplication and near duplication. |
|
Improving Query Response Time for Graph Data Using Materialization |
Author : Abdul Waheed, Dr. Syed Saif ur Rahman |
Abstract | Full Text |
Abstract :Graphs are used in many disciplines, from communication networks, biological, social networks includ- ing maths and other fields of science. This is the latest and most important field of computer science today. In this research, the authors have worked on the materialization to improve the query response of graph data. The large graph dataset have been divided into two categories; one contains the topological data and other contains the aggregate data and both are accessed via a PAM (Predicate Aggregate Materialization) engine which plays an intermediary role. PAM engine stores the query results and it checks whether the query is new or already processed every time the query appears. If it is found already processed than it just get the results which are materialized and if it finds a new query than it goes for the extraction of data from required datasets. After completion of process, PAM engine materialize the extracted data for reuse. The technique works and it reduces the processing time and improves response time. |
|
Performance Analysis of Classification Learning Methods on Large Dataset using two Data Mining Tools |
Author : Mazhar Ali, Dr. Husnain Mansoor Ali |
Abstract | Full Text |
Abstract :Data is increasing day to day thus, processing this data and selection of right method and tool is really a big problem. Computer scientists are process- ing and analysing data on different machine learning methods using various Data Mining tools to get the high accuracy of results and minimum time for building of Model. There are several data analysis and processing tools like WEKA, RapidMiner, Keel, and etc. available for the purpose of processing, analysis, modelling and etc. Still no single tool is perfect or nominated for data processing and analysis. In this concern, the authors present here a comparative and analytical research study on the performance of different classification machine learning algorithms like Naïve Bayes, KNN, IBK, Random Forest, C4.5, J48 and Data Mining tools which are WEKA and RapidMiner on a large datasets to evaluate their performance and analytical results with low cost of error. The data set Adult Income is taken from UCI Data repository for this research study. The significance and aim of this study is to evaluate and assess the range of performance of different machine learning methods and two diverse data mining tools on dissimilar datasets. The result of each classification method and Data mining tool is analysed and presented in the end. |
|