P.hd in Computer Science and Engineering:Data Warehousing & Data MiningShri Mata Vaishno Devi University
Price on request
P.hd in Computer Science and Engineering:Data Warehousing & Data Mining
Data mining (or data discovery) is the process of autonomously extracting useful information or knowledge (�actionable assets�) from large data stores or sets. Data mining can be performed on a variety of data stores, including the World Wide Web, relational databases, transactional databases, internal legacy systems, pdf documents, and data warehouses. Many organizations have compiled a diverse collection of massively large and dynamic datasets over the years. Data mining is a tool that has been actively used to discover interesting and surprising patterns in these datasets. The technology has been successfully utilized by organizations that collect web click streams, financial transactions, observational science data, etc. Our research work would cover major algorithmic advances in data mining with a thrust towards both theoretical underpinnings of problems as well as successful practical deployments. Topics that would be covered in our research would include clustering, association rules, machine learning, web link analysis, data streams, and privacy-preserving algorithms.
Through Data mining techniques, a knowledge model is obtained representing behavior patterns in relevant problem variables or relations between them. Several algorithms are frequently tested generating different models.
The most usual algorithms or techniques are:
IDT (Induction of Decision Trees)
Fuzzy techniques (fuzzy logic, fuzzy sets, etc.)
VSM (Vector Support Machines)
Bayesian Networks, etc.
Data mining attempts to identify valid novel, potentially useful, and ultimately understandable patterns from huge volume of data. The mined patterns must be ultimately understandable because the purpose of data mining is to aid decision-making. A data mining algorithm is usually inherently associated with some representations for the patterns it mines. Therefore, an important aspect of a data mining algorithm is the comprehensibility of the representations it forms. That is, whether or not the algorithm encodes the patterns it mines in such a way that they can be inspected and understood by human beings.
It is evident that data mining algorithms with good comprehensibility are very desirable. Unfortunately, most data mining algorithms are not very comprehensible and therefore their comprehensibility has to be enhanced by extra mechanisms. Since there are many different data mining tasks and corresponding data mining algorithms, it is difficult for such a short article to cover all of them. So, the following discussions are restricted to the comprehensibility of classification algorithms, but some essence is also applicable to other kinds of data mining algorithms.
With the unprecedented rate at which data is being collected today in almost all fields of human endeavor, there is an emerging economic and scientific need to extract useful information from it. Data mining is the process of automatic discovery of patterns, changes, associations and anomalies in massive databases, and is a highly inter-disciplinary field representing the confluence of several disciplines, including database systems, data warehousing, machine learning, statistics, algorithms, data visualization, and high-performance computing
Data mining refers to the automated or semi-automated search for relationships and global patterning within data. Data mining techniques include data visualization, neural network analysis, and genetic algorithms. Data mining uses complex algorithms to search large amounts of data and find patterns, correlation's, and trends in that data. A data-mining application can create a model that can identify buying habits, shopping trends, credit card purchases as well as perform many non-commercial functions. Data mining, also known as knowledge-discovery in databases (KDD), is the practice of automatically searching large stores of data for patterns. To do this, data mining uses computational techniques from statistics and pattern recognition.
As data-mining has become recognized as a powerful tool, several different communities have laid claim to the subject:
AI, where it is called machine learning.
Researchers in clustering algorithms.
Databases. We'll be taking this approach, of course, concentrating on the challenges that appear when the data is large and computations are complex. In a sense, data mining can be thought of as algorithms for executing very complex queries on non-main-memory data.
In recent years, database and data mining communities have focused on a new model of data processing, where data arrives in the form of continuous streams. Because it is not feasible to store all data, it is quite challenging to perform the traditional data mining operations in a streaming environment. Our current and proposed research focuses on many challenges associated with mining streaming data. Our main thrust would be on designing algorithm which would be effective and efficient in frequent item set mining encompassing deterministic bounds on accuracy. The recent trend in algorithm development for this purpose is towards algorithms which are memory efficient and allow mining of datasets with large number of distinct items and/or very low support levels.