A Survey on Analyzing and Processing Data Faster Based on Balanced Partitioning |
Author : Annie .P. Kurian , V. Jeyabalaraja |
Abstract | Full Text |
Abstract :Analyzing and processing a big data is a challenging
task because of its various characteristics and presence of data
in large amount. Due to the enormous data in today’s world, it
is not only a challenge to store and manage the data, but to also analyze and retrieve the best result out of it. In this paper, a study is made on the different types available for big data
analytics and assesses the advantages and drawbacks of each of these types based on various metrics such as scalability,
availability, efficiency, fault tolerance, real-time processing,
data size supported and iterative task support. The existing
system approaches for range-partition queries are insufficient
to quickly provide accurate results in big data. In this paper,
various partitioning techniques on structured data are done.
The challenge in existing system is, due to the proper
partitioning technique, and so the system has to scan the
overall data in order to provide the result for a query.
Partitioning is performed; because it provides availability,
maintenance and improvised query performance to the
database users. A holistic study has been done on balanced
range partition for the structured data on the hadoop ecosystem
i.e. the HIVE and the impact on fast response which would
eventually be taken as specification for testing its efficiency.
So, in this paper a thorough survey on various topics for
processing and analysis of vast structured datasets, and we
have inferred that balanced partitioning through HIVE hadoop
ecosystem would produce fast and an adequate result compared
to the traditional databases.
Keywords – |
|
Prediction of Tumor in Classifying Mammogram images by k-Means, J48 and CART Algorithms |
Author : E.Venkatesan1 , T.Velmurugan2 |
Abstract | Full Text |
Abstract :The Breast cancer is one of the leading cancers for
women in world countries including India. It is the second
most common causes of cancer death in women. The high
incidence of breast cancer in women has increased
significantly in the last few years. Detecting cancer in the later
stages, leads to very complicated surgeries and the chances of
death is very high nowadays. Early detection of Breast Cancer
helps in less complicated procedures and early recovery. Many
tests have been found so as to detect cancer. Some of those
tests are mammography, ultrasound etc. Mammography is a
method that helps in early detection of Breast Cancer. But
finding the mass and its spread from mammographic images is
very difficult. Expert radiologists were needed for accurate
reading of a mammogram image, and analyses have been for kMeans
algorithm which helps for easy detection and extraction
tumor area. The mammography image helps to provide some
criteria in order to help the physicians to decide whether a
certain disease is abnormal or normal. This research work is to
identify the breast cancer tumor area and find its affected
region by splitting the images into five clusters. The tumor area
has been identified in the last cluster and classified with the
help of decision tree algorithms J48 and CART.
|
|
Text Categorization of Multi-Label Documents For Text Mining |
Author : Susan Koshy, R. Padmajavalli |
Abstract | Full Text |
Abstract :Automated text categorisation has been considered as
a vital method to manage and process vast amount of
documents in digital form that are widespread and
continuously increasing.Traditional classification problems are
usually associated with a single label.Text Categorization uses
Multi-label Learning which is a form of supervised learning
where the classification algorithm is required to learn from a
set of instances, each instance can belong to multiple classes
and then be able to predict a set of class labels for a new
instance. Multi-label classification methods have been
increasingly used in modern applications such as music
categorization, functional genomics (gene protein interactions)
and semantic annotation of images besides document filtering,
email classification and Web search. Multi-label classification
methods can be broadly classified asProblem transformation
and Algorithm adaptation. This paper presents anoverview of
single-label text classificationand an analysis ofsome multilabel
classification methods. |
|
A Bayesian Framework for Diagnosing Depression Level of Adolescents |
Author : M.R.Sumathi , B.Poorna |
Abstract | Full Text |
Abstract :Depressive disorder is an illness that involves the
body, mood and thoughts. It interferes with daily life, normal
functioning and causes pain for both the person with the
disorder and those who care about him/her. Severe depression
may lead to serious illness or suicide. The most affected sector
is the Adolescent Community. The biggest problem in
diagnosing and treating depressive disorders is recognizing that
someone is suffering from it. As various factors are involved, it
is very difficult for the Psychologists to diagnose depressive
disorders correctly at an early stage itself. Nowadays,
computers are used in assisting Physicians to diagnose diseases
and identify correct treatments according to the patient details.
In the same way, computers can also be used in assisting
psychologists to diagnose mental disorders and identify correct
treatments according to the patient details. Various techniques
are available to store the expert knowledge and computerize
the diagnosis process. Bayesian Network is such a technique
that combines statistics and expert knowledge to diagnose
diseases effectively. This paper proposes a Framework for
diagnosing depression level in adolescents using Bayesian
Networks. Initially, Ontology should be constructed to provide
a basis for Bayesian Networks. The ontology acts as the
topology and shows the relationships between adolescent
depression concepts. By applying probabilities to the
relationships between concepts from the statistics, and by using
Bayes Theorem, depression level of a patient can be diagnosed
effectively. This framework may help novice psychologists to
understand the domain concepts and also to diagnose the
depression level and suggest correct treatments.
|
|
A Pioneering Cervical Cancer Prediction Prototype in Medical Data Mining using Clustering Pattern |
Author : R.Vidya ,G.M.Nasira |
Abstract | Full Text |
Abstract :Let us not make the cure of the disease more
unbearable than the disease itself this quote is the most durable and inspirational line of medicine field. Data mining is said to be an umbrella term which refers to the progression of finding
out the patterns in data. This can be even succeeded typically
with an assistance of authoritative algorithm to automate
search (as a part). This paper reveals out, how the C2P
(Cervical Cancer Prediction) model is approached by a data
mining algorithm for prediction. The prediction of C2
(Cervical Cancer) has been a challenging problem in research
field. In the Data mining applications, we are utilizing RFT
(Random Forest Tree) algorithm to do the prediction. To the
best of our knowledge, we use popular clustering K-means
technique to achieve more accuracy.
|
|
Predicting Students Performance using K-Median Clustering |
Author : B. Shathya |
Abstract | Full Text |
Abstract : The main objective of education institutions is to
provide quality education to its students. One way to achieve
highest level of quality in higher education system is by
discovering knowledge of students in a particular course. The
knowledge is hidden among the educational data set and it is
extractable through data mining techniques. In this paper, the
K-Median method in clustering technique is used to evaluate
students performance. By this task the extracted knowledge
that describes students performance in end semester
examination. It helps earlier in identifying the students who
need special attention and allow the teacher to provide
appropriate advising and coaching.
|
|
Effective Approaches of Classification Algorithms for Text Mining Applications |
Author : U.Latha1 , T.Velmurugan2 |
Abstract | Full Text |
Abstract :The large amount of data stored in unstructured texts
cannot simply be used for further processing by computers,
which typically handle text as simple sequences of character
strings. Therefore, specific (pre-) processing methods and
algorithms are required in order to retrieve useful information
via text. Text mining refers generally to the process of
retrieving information and knowledge from formless text. This
research work analyses about the use of classification
algorithms and their uses to predict the applications of text
mining. The purpose of
this work is to present an analysis of recent publications
concerning with text mining using classification algorithm in
particular. This survey finds out some of the best suitable
algorithms for text mining analyses suggested by the various
researchers in their research work.
|
|
Effective Approaches of Classification Algorithms for Text Mining Applications |
Author : U.Latha1 , T.Velmurugan2 |
Abstract | Full Text |
Abstract :The large amount of data stored in unstructured texts
cannot simply be used for further processing by computers,
which typically handle text as simple sequences of character
strings. Therefore, specific (pre-) processing methods and
algorithms are required in order to retrieve useful information
via text. Text mining refers generally to the process of
retrieving information and knowledge from formless text. This
research work analyses about the use of classification
algorithms and their uses to predict the applications of text
mining. The purpose of
this work is to present an analysis of recent publications
concerning with text mining using classification algorithm in
particular. This survey finds out some of the best suitable
algorithms for text mining analyses suggested by the various
researchers in their research work.
|
|
Text Mining with Automatic Annotation from Unstructured Content |
Author : R.Priya , R. Padmajavalli |
Abstract | Full Text |
Abstract :Text mining is vast area as compared to information
retrieval. Typical text mining tasks include document
classification, document clustering, building ontology,
sentiment analysis, document summarization, Information
extraction etc. Text mining, also referred to as text data
mining, roughly equivalent to text analytics, refers to the
process of deriving high-quality information from text.
Information Extraction is a vital area in Text Mining
Techniques, which is an automatic/semi automatic extraction
of structured information from unstructured documents. In
most of the cases this activity concerns processing human
language texts by means of natural language processing (NLP).
In this paper, we present themultimedia document processing
and automatic annotation out of images/video as information
extraction.
|
|
Agglomerative Clustering Onvertically Partitioned Data–Distributed Database Mining |
Author : R.Senkamalavalli ,T.Bhuvaneswari |
Abstract | Full Text |
Abstract :Mining distributed databases is emerging as a
fundamental computational problem. A common approach for
mining distributed databases is to move all of the data from
each database to a central site and a single model is built.
Privacy concerns in many application domains prevents
sharing of data, which limits data mining technology to
identify patterns and trends from large amount of data.
Traditional data mining algorithms have been developed within
a centralized model. However, distributed knowledge
discovery has been proposed by many researchers as a solution
to privacy preserving data mining techniques. By vertically
partitioned data, each site contains some attributes of the
entities in the environment.In this paper, we present a method
for Agglomerative clustering algorithm in situations where
different sites contain different attributes for a common set of
entities for verticallypartitioned data. Using association rules
data are partitioned into vertically.
|
|
Classification of Retinal Images for Diabetic Retinopathy at Non-Proliferative Stage using ANFIS |
Author : B.Sumathy ,S.Poornachandra |
Abstract | Full Text |
Abstract : Digital Retinal Funds image is analyzed for the
classification and stages of Diabetic Retinopathy (DR).This
imparts importance, since many of heart, lung, and kidney
related problems could be predicted well in advance, by
analyzing the fundus image itself which is an cost effective
technique. Also, Later stages of DR causes abnormal changes
in human retina and vision loss is occurred in most of the
patients. The aim is to extract the abnormal features like
Microaneurysms, exudates, and blood vessels, to classify DR at
its proliferative stage itself. At the same time, the healthy or
normal features like optic disk, blood vessel map is to be
removed by suitable enhancement technique (Morphological
technique) .The abnormal features and its normal ground
features are trained for classification. The classification is done
using ANFIS architecture. The detection and classification
method , surely provide an additional and promising data and
information to ophthalmologists and to the analyst for further
treatment. The proposed method will give a promising
accuracy when compared with the methods available in the
literature.
|
|
Investigating Performance and Quality in Electronic Industry via Data Mining Techniques |
Author : Davoud GholamianGonabadi , Seyed Mohamad Hosseinioun , Jamal Shahrabi , Mohammad AliMoradi |
Abstract | Full Text |
Abstract :
The organizations have always sought ways and
methodologies to boost the quality level of their product.
Obviously, economic aspect ought to be taken into account,
eschewing irrational costs in the system. Data mining can be
applied as one of the best technics at hand in case of analysis
and prediction of performance and quality. In this work, we
have used a new methodology based on data mining to predict
and improve the quality level of the electronic parts. The
results suggest better accuracy compared to the previous
studies.
|
|
A New Arithmetic Encoding Algorithm Approach for Text Clustering |
Author : Nikhil Pawar , P.K Deshmukh |
Abstract | Full Text |
Abstract :In this paper we propose a new method for
improving the clustering accuracy of text data. Our method
encodes the string values of a dataset using Arithmetic
encoding algorithm, and declares these attributes as integer in
the clustering phase. In the experimental part, we calculate the
efficiency of proposed method, and we obtained a better
clustering accuracy than the one found with traditional
methods. This method is useful when the dataset to be
clustered has only string attributes, because in this case, a
traditional clustering method does not recognize, or recognize
with a low accuracy, the category of instances.
|
|