3DMd Data-driven Data Mining下载_在线阅读_32

is_532098

暂无简介

3DMd Data-driven Data Mining Fundamenta Informaticae 90 (2009) 395–426 395 DOI 10.3233/FI-2009-0026 IOS Press 3DM: Domain-oriented Data-driven Data Mining Guoyin Wang∗† and Yan Wang‡ Institute of Computer Science and Technology Chongqing University, of Posts and Telecommunications Chongqing...

Fundamenta Informaticae 90 (2009) 395–426 395 DOI 10.3233/FI-2009-0026 IOS Press 3DM: Domain-oriented Data-driven Data Mining Guoyin Wang∗† and Yan Wang‡ Institute of Computer Science and Technology Chongqing University, of Posts and Telecommunications Chongqing, 400065, P.R.China wanggy@cqupt.edu.cn ; wangyan@lut.cn Abstract. Recent developments in computing, communications, digital storage technologies, and high-throughput data-acquisition technologies, make it possible to gather and store incredible vol- umes of data. It creates unprecedented opportunities for knowledge discovery large-scale database. Data mining technology is a useful tool for this task. It is an emerging area of computational in- telligence that offers new theories, techniques, and tools for processing large volumes of data, such as data analysis, decision making, etc. There are countless researchers working on designing effi- cient data mining techniques, methods, and algorithms. Unfortunately, most data mining researchers pay much attention to technique problems for developing data mining models and methods, while little to basic issues of data mining. What is data mining? What is the product of a data mining process? What are we doing in a data mining process? What is the rule we would obey in a data mining process? What is the relationship between the prior knowledge of domain experts and the knowledge mind from data? In this paper, we will address these basic issues of data mining from the viewpoint of informatics[1]. Data is taken as a manmade format for encoding knowledge about the natural world. We take data mining as a process of knowledge transformation. A domain-oriented data-driven data mining (3DM) model based on a conceptual data mining model is proposed. Some data-driven data mining algorithms are also proposed to show the validity of this model, e.g., the data-driven default rule generation algorithm, data-driven decision tree pre-pruning algorithm and data-driven knowledge acquisition from concept lattice. Keywords: Domain-oriented, Data-driven, Data Mining ∗Address for correspondence: Institute of Computer Science and Technology, Chongqing University, of Posts and Telecom- munications, Chongqing, 400065, P.R.China Also works: School of Information Science & Technology, Southwest Jiaotong, University, Chengdu, 610031, P.R.China. †This paper is partially supported by National Natural Science Foundation of P. R. China under Grants No.60573068 and No.60773113, Natural Science Foundation of Chongqing under Grants No.2008BA2017 and No.2008BA2041 ‡Also works: College of Computer and Communication, Lanzhou University of Technolegy, Lanzhou, 730050, P.R.China 396 G. Wang and Y. Wang / 3DM: Domain-oriented Data-driven Data Mining 1. Introduction Recent developments in computing, communications, digital storage technologies, and high-throughput data-acquisition technologies, make it possible to gather and store incredible volumes of data. It creates unprecedented opportunities for knowledge discovery form large-scale database. Data mining technol- ogy is a useful tool for this problem. Data mining, as a relatively new branch of computer science, has got much attention in recent years. It is motivated by our desire of obtaining knowledge from huge[2]. It uses machine learning, statistical and visualization techniques to discover knowledge from data and represent it in a form that is easily comprehensible and useable for humans. Data mining has become a hot field in artificial intelligence. Data mining is an interdisciplinary field. Many data mining methods are based on the extensions, combinations, and adaptation of machine learning algorithms, statistical methods, knowledge extraction and abstraction. During the past twenty years, many techniques are used in data mining such as artificial neural network, fuzzy set, rough set, decision tree, genetic algorithm, nearest neighbor method, statistics based rule induction, linear regression and linear predictive coding, et al. There are many views for the study of data mining. The vast existing studies of data mining can be classified roughly into three views[2]. The first view is the function-oriented view. It defines data mining as ”the non-trivial process of identifying valid, novel, potentially useful, and ultimately understandable patterns from data”. The function-oriented approaches put forth efforts on searching, mining and utilizing the functionalities of different patterns embedded in various databases. The second view is the theory-oriented view. It fixes the attention on the theoretical aspects of data mining, and also the related disciplines. The third view is the procedure-oriented view. It pays attention to making the processes of mining become effective and efficient. In other words, its objective is to speed up the performance of algorithms. Regardless of which view is adopted in the process of the study of data mining, most data mining researchers pay much attention to technique problems for developing data mining models and methods, while little to basic issues of data mining processes. In other words, it is little to internal information processing mechanisms of data mining processes. What is data mining? What is the product of a data mining process? What are we doing in a data mining process? What is the rule we would obey in a data mining process? What is the relationship between the prior knowledge of domain experts and the knowledge mind from data? To answer the above questions, we need to study the basic mining process. At present, a few results about these questions have been reported. A three-layered conceptual framework is proposed by Yao[3]. It consists of the philosophy layer, the technique layer, and the application layer. The layer framework represents the understanding, discovery, and utilization of knowledge respectively. The philosophy layer investigates the essentials of knowledge. There are many related issues to this question, such as the representation of knowledge, the expression and communication of knowledge in languages. Peng et al propose a systemic framework for the field of data mining and knowledge discovery[4]. Its objective is to identify the research areas of data mining and knowledge discovery. S.Ohsuga[5] consider data mining technology from the viewpoint of knowledge acquisition is a translation from non-symbolic to symbolic representation. A relation between symbolic processing and non-symbolic processing is discussed. In addition, international workshops on foundation of data mining were also held[6, 7, 8]. Unfortunately, there is still no well-accepted and non-controversial answer to many basic questions mentioned above. In G. Wang and Y. Wang / 3DM: Domain-oriented Data-driven Data Mining 397 this paper, we will address these questions and propose our answers based on a conceptual data mining model. Our answer would be ”data mining is a process of knowledge transformation”. According to this understanding, a domain-oriented data-driven data mining (3DM) model is proposed. Some recent achievements of our work on data-driven data mining techniques are also presented to show the validity of the 3DM model. The rest of the paper is organized as follows. In section 2, a model of domain-oriented data-driven data mining is proposed. The problem of knowledge uncertainty measurement is discussed in section 3. In section 4, some data-driven knowledge acquisition methods are presented. Experiment results are discussed in the section 5. At last, in section 6, we conclude this paper. 2. A Model of Domain-oriented Data-driven Data Mining 2.1. Data-driven Data Mining Data mining is defined as ”the nontrivial extraction of implicit, previously unknown,and potentially useful knowledge from data”[9].Knowledge exists everywhere and it is very important for our daily life and work. Knowledge could be expressed in many different forms. There are many forms for encoding knowledge. The easiest form might be such symbolic forms as formula, equation, rule, and theorem. It is very easy for people to understand and use knowledge encoded in these forms. These forms are often used in books, documents, and even expert systems. Data is also a man-made form for encoding knowledge. There are numerals data records generated in many fields. Many natural phenomenon, rules, and even human experience are record into databases everyday. Much useful information could be concluded from data. Unfortunately, people could not read, understand, or use the knowledge expressed in data. So, we think, in a data mining process, knowledge are transformed from a data form, which is not understandable for human, into another understandable symbolic form like rule, formula, theorem, etc. No new knowledge will be generated in a data mining process. That is, we are just transforming knowledge from one form into another while not producing new knowledge. To understand the knowledge transformation process of data mining, we’d better have a look at the knowledge transformation process between different systems at first. The knowledge transformation process between different systems could be completed in many dif- ferent ways. Reading and understanding are simple ways to transform knowledge from symbolic form into biological neural link form, while speaking and writing are two converse processes to transform knowledge from biological neural link form into symbolic form. Translating a book in one language into another is also a process of knowledge transformation from one symbolic form into another symbolic form. We are learning from and studying on the natural world everyday. This is a process of knowledge transformation from natural phenomenon form into biological neural link form. People exchange their knowledge through spoken language and body language also. In a data mining process, we are trans- forming knowledge from data form into symbolic form. Data could also be taken as a measure result of the natural real world. Thus, there are many channels and ways for knowledge transformation between different systems. Fig. 1 is an illustration for such knowledge transformation processes. 398 G. Wang and Y. Wang / 3DM: Domain-oriented Data-driven Data Mining Figure 1. Knowledge transformation among different forms Figure 2. Translating a book in English into Chinese From Fig. 1, one can find that data mining is a kind of knowledge transformation process to transform knowledge from data form into symbolic form. Thus, no new knowledge will be generated in a data mining process. In a data mining process, knowledge is just transformed from data form, which is not understandable for human, into symbolic form, which is understandable for human and easy for application. It is similar to the process of translating a book from English into Chinese. In this translation process, the knowledge in the book itself should remain unchanged. What will be changed is just the coding form (language) of the knowledge. That is, the knowledge of the Chinese book should be the same as the knowledge in the English one. Fig. 2 is an illustration for this case. Following this understanding of data mining, we could have the knowledge transformation framework for data mining as shown in Fig. 3. From Fig. 3 , one can find that knowledge could be encoded into natural form, data form, symbolic form, and neural link form. That is, knowledge could be stored in a natural world system, a data system, a symbol system, or a biological neural network system. The knowledge expressed in each form should have some properties, that is, Pi’s. There should be some relationship between the different forms of the same knowledge. In order to keep the knowledge unchanged in a data mining process, properties of the knowledge should remain unchanged during the knowledge transformation process. Otherwise, there should be some mistake in the knowledge transformation process. G. Wang and Y. Wang / 3DM: Domain-oriented Data-driven Data Mining 399 Figure 3. Knowledge transformation framework for data mining The relationship between knowledge in natural form and data form, natural form and neural link form, symbolic form and neural link form is omitted in Fig. 3. It is just like the relationship between the knowledge in data form and symbolic form. In a data mining process, the properties of knowledge in the data form should remain unchanged. This information could provide some guideline for designing data mining algorithms. It would also be helpful for us to keep the knowledge in the data form unchanged in a data mining process. Unfortunately, the representation of knowledge is still an unsolved problem in artificial intelligence. We do not know all the properties of knowledge. It is still not known how much properties are enough or needed for knowledge representation. So, how could we keep the knowledge unchanged in a data mining process? Fortunately, we know some properties of knowledge representation, for example, uncertainty of knowledge. These properties should not be changed in a data mining process in order to keep the knowledge unchanged. Thus, in order to keep the knowledge unchanged in a data mining process, we need to know some properties of the knowledge in data form, and use it to control the data mining process and keep it unchanged. This is the key idea of a data-driven data mining model. There would be three steps for designing a data-driven data mining method. Step 1. Select a property of knowledge which could be measured in both the data form and the symbolic form for encoding knowledge generated from data. Step 2. Measure the property of the knowledge in the data form and the symbolic form. Step 3. Use the property to control the data mining process and keep knowledge unchanged. The knowledge property is measured in two different systems, data system and symbolic system. There might be a problem. Is the measured result of the knowledge property in data form comparable to the result from a symbolic form? If not, how could we know whether it is unchanged in the data mining process? So, we need to design a comparable measuring method for the selected property. That is, we need to establish some relationship between the knowledge property in data form and symbolic form. 2.2. User-driven (Domain-driven) Data Mining Many real world data mining tasks, for instance financial data mining in capital markets, are highly constraint-based and domain-oriented. Thus, it targets actionable knowledge discovery, which can afford important grounds for performing appropriate actions. Some domain-driven or user-driven data mining methods for such tasks have also been developed in recent years[10, 11, 12, 13, 14, 15, 16]. 400 G. Wang and Y. Wang / 3DM: Domain-oriented Data-driven Data Mining Zhang, Cao and Lin proposed the domain-driven in-depth pattern discovery framework shown in Fig. 4 for financial data mining in capital markets [10, 11]. Figure 4. Domain-driven in-depth pattern discovery framework Their key ideas are as follows. 1. Dealing with constraint-based context. • Data constraints. • Domain constraints. • Interestingness constraints. • Rule constraints. 2. Mining in-depth patterns. • In-depth patterns are highly interesting and actionable patterns in business decision-making. • In-depth patterns are not only interesting to data miners, but also to business decision-makers. Actionable trading strategies can be found via model refinement or parameter tuning. 3. Supporting human-machine-cooperated interactive knowledge discovery. • The in-depth pattern discovery is conducted under the cooperation of business analysts and data analysts. 4. Viewing data mining as a loop-closed iterative refinement process. • The data mining process is closed with iterative refinement and feedbacks of hypotheses, features, models, evaluation and explanations in the human-involved or centered context. Yao and Zhao proposed an interactive user-driven classification method using a granule network also[12]. The key ideas of their method are: 1. It allows users to suggest preferred classifiers and structures. 2. It is an interactive manner between users and machines. 3. Its input and output are interleaved, like a conversation. G. Wang and Y. Wang / 3DM: Domain-oriented Data-driven Data Mining 401 Figure 5. Users access different knowledge from a data form knowledge base 4. A user can freely explore the dataset according to his/her preference and priority, ensure that each classification stage and the corresponding results are all understandable and comprehensible. Kuntz, Guillet, Lehn, and Briand also developed a human-centered process for discovering association rules where the user is considered as a heuristic which drives the mining algorithms via a well-adapted interface[13]. Han and Lakshmanan integrated both constraint-based and multidimensional mining into one framework that provided an interactive, exploratory environment for effective and efficient data analysis and mining[14]. For creating lexical knowledge bases, Patrick, Palko, Munro and Zappavigna proposed a semi-automatic approach that exploits training from a knowledgeable user to identify struc- tural elements in the dictionary’s stream of text. Once learnt from the user the structures are then applied automatically to other text streams in the same document or to other documents[15]. In semantic im- age classification, Dorado, Pedrycz and Izquierdo used some domain knowledge about the classification problem as part of the training procedures[16]. Through analyzing the above user-driven or domain-driven data mining methods, we find that there are some common basic ideas in these methods. 1. A user-driven data mining process is constraint based. 2. User’s interests are considered in a user-driven data mining process. 3. Prior knowledge of domain experts is required in a user-driven data mining process. 4. Interaction between user and machine is required in a user-driven data mining process. 2.3. Domain-oriented Data-driven Data Mining (3DM) Is there any confliction between data-driven data mining and user-driven (or domain-driven) data mining? Could they be integrated into one system? We will discuss about this problem in this section. In a database management system (DBMS), different users could access different data of a whole database system from their own view. If we take data as a form of knowledge representation, a database (data set) could be also taken as a knowledge base. So, different user could find and use different subset of the whole knowledge base for his/her task. That is, through his/her view, a user could access a subset of knowledge in the data form and transform it from data form into another form he/she required. The knowledge transformation process for each user could still be done in a data-driven manner. Fig. 5 is an illustration of this understanding. 402 G. Wang and Y. Wang / 3DM: Domain-oriented Data-driven Data Mining Figure 6. User’s interests, constraint, and prior domain knowledge are all taken as input of a data mining process (Domain-oriented data-driven data mining, 3DM) In a domain-driven data mining process, user’s interesting, constraint, and prior domain knowledge are very important. An interaction between user and machine is needed. The data mining process might be controlled by a user. In this case, the knowledge source of this mining process includes data and the user, while not just data. So, the prior domain knowledge is also a source for the data mining process. The control of a user to the data mining process could be taken as additional input of the data mining process. It is just like the data generation process in an incremental dynamic data mining process. So, we may also deal with the user’s control using incremental data-driven data mining methods. Fig. 6 is an illustration of this

本文档为【3DMd Data-driven Data Mining】，请使用软件OFFICE或WPS软件打开。作品中的文字与图均可以修改和编辑，图片更改请在作品中右键图片并更换，文字修改请直接点击文字进行修改，也可以新增和删除文档中的内容。

3DMd Data-driven Data Mining

热门搜索

历史搜索