Dr.Mehmed Kantardzic Photo
Mehmed Kantardzic PhD, Professor DataMining photo
CECS dept,Speed Scientific School, U of L

University of Louisville, CECS Department

======================================

DATA MINING - CECS 632

 

COURSE SYLLABUS

 

Instructor:     

Dr. Mehmed. M. Kantardzic, Professor                                               Phone: (502) 852-3703

CECS Department, Speed School of Engineering                    E-mail: mmkant01@ louisville.edu

University of Louisville                                                                       Office:  DC 210

 

 

Course Description:

 

This course will introduce concepts, models, methods, and techniques of data mining, including artificial neural networks, rule association, and decision trees. Some basic principles of data warehousing will be explained with emphasis on a relation between data mining and data warehousing processes.  Some software tools and successful real world data-mining applications will also be introduced.

 

Course Objectives:

 

After taking this course, the student should:

  1. Understand the basics of data mining process, and requirements in its every phase to build a successful application.
  2. Understand the basic data-mining techniques and will be able to use standard, or to develop new software tools for data mining.

 

Textbook:

           

  • Mehmed Kantardzic, "Data Mining: Concepts, Models, Methods, and Algorithms", second edition, IEEE Press & John Wiley, 2011.

 

Recommended References:

 

  1. Tan P., Steinbach M., Kumar V., Introduction to Data Mining, Addison-Wesley, Boston, MA, 2006.
  2. Han J., Kamber M., Data Mining: Concepts and Techniques, second edition, Morgan Kaufmann, San Francisco, 2006.

3.      Hand D., Mannila H., Smith P., Principles of Data Mining, The MIT Press, Cambridge, MA, 2001.

 

Grading Policy:         

 

Course projects, tests, homeworks, and blackboard discussion about specific topics are elements for final grading. Homework assignments will be given online with due dates. Penalty for late assignments is 10% per day. Instructor will give during the course several topics for discussion on the blackboard. These activities will also be graded. Course projects are practical application of standard data mining tools on the experimental data sets (both are available on the Internet), or implementation of new data mining algorithms in one of the standard programming languages. Final reports on the projects are required. In addition to achieving a passing cumulative score (typically 60%), a minimum performance of 50% must also be achieved in each of next areas (given with weights):

 

Two Tests      (20%+20%)                   40%

Two Projects (20%+20%)                   40 %

Homeworks                                         15 %

Blackboard discussion activities                        5 %

=====

Total                100 %

Grading Scale:

A                     90%  & up

B                      80% - 89.99%

C                      70% - 79.99%

D                     60% - 69.99%

F                      below 60%

 

Cheating Policy:

 

Students are encouraged to work together and learn from each other. However, cheating in any form on the exam, or copying will not be tolerated. Any evidence of cheating will result in a failing grade for the course and appropriate notification will be made to the Dean's office.

 

General Policies:

 

1.                  If you bring food to class, bring enough to share with everyone.

2.                  Soft drinks and bottled water are permitted, as long as you drink quietly, do not spill, and properly dispose of empty containers.

3.                  Once the lecture starts, any student wishing to speak should raise his or her hand and wait for recognition from the instructor before speaking.

 

Students with Disabilities:

 

If you need accommodations because of a disability, if you have emergency medical information to share with me, please send me an email during the first week or make an appointment to discuss it with me as soon as possible.

 

 

Oral and written communication requirements:

 

There will be no oral or written communication requirements beyond homeworks, tests and projects.


Schedule:

 

The following is the tentative schedule of topics, which will be covered during the semester.

 

_____________________________________________________________________________ 

UNIT 1:  Data mining concepts

 

Required reading:

 

-  Slides for Unit 1

-  Textbook:  Chapter 1: "Data Mining Concepts", pp. 1-25.

 

            Additional reading:

 

-          Jeffrey W. Seifert, Data Mining: An Overview

http://www.fas.org/irp/crs/RL31798.pdf 

 

-     Kurt Thearling, An Introduction to Data Mining,

http://www.thearling.com/text/dmwhite/dmwhite.htm

 

____________________________________________________________________________  

UNIT 2:  Data Mining Tasks & Data Mining Algorithms

 

Required reading:

 

-  Slides for Unit 2

-  Textbook: Chapter 4, Section 4.4: "Common Learning Tasks", pp. 101-105.

 

            Additional reading:

 

                        -    Padhraic Smyth, Mining At the Interface of Computer Science And Statistics,

                        http://www.datalab.uci.edu/papers/dmchap.pdf          

 

            Assignment for Units 1 and 2:

                        -  Homework 1            

 

___________________________________________________________________________  

UNIT 3:  Preparing the Data

 

Required reading:

 

-  Slides for Unit 3

-  Textbook:  Chapter 2: "Preparing the Data", pp. 26-52.

 

            Additional reading:

 

-   Les A. Hook, et al, Best Practices for Preparing Environmental Data Sets to Share and Archive

http://daac.ornl.gov/PI/BestPractices-2010.pdf 

 

-    Charu C. Aggarwal, Philip S. Yu, Outlier Detection for High Dimensional Data,

                        http://charuaggarwal.net/outl.pdf

 

________________________________________________________________________________   

UNIT 4:  Data Reduction: features, values, & cases reduction

 

Required reading:

 

-  Slides for Unit 4

-  Textbook:  Chapter 3: "Data Reduction", pp. 53-86

 

            Additional reading:

 

-           Dim-red-1.pdf  document

 

-          Dim-red-2.pdf  document

 

-          Fodor, I. K., A survey of dimension reduction techniques,

                        https://computation.llnl.gov/casc/sapphire/pubs/148494.pdf  or

                        https://e-reports-ext.llnl.gov/pdf/240921.pdf

 

            Assignment for Units 3 and 4:

                        -  Homework 2

 

 

_________________________________________________________________________________   

UNIT 5:  Learning from Data

 

Required reading:

 

-  Slides for Unit 5

-  Textbook:  Chapter 4: "Learning from Data", Sections 4.1-4.3 and 4.7-4.11, pp. 87-105 and 122-139.

 

            Additional readings:

 

-  Olivier Bousquet, Stephane Boucheron, and Gabor Lugosi, Introduction to Statistical Learning Theory,

                        http://www.econ.upf.es/~lugosi/mlss_slt.pdf  

 

 

_________________________________________________________________________________  

UNIT 6:  Support Vector Machines

 

 

Required reading:

 

-  Slides for Unit 6

-  Textbook:  Chapter 4: "Learning from Data", Sections 4.5-4.6, pp. 105-122.

 

            Additional reading:

 

-    Christopher J.C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition,

http://www.umiacs.umd.edu/~joseph/support-vector-machines4.pdf

 

-    Pai-Hsuen Chen, Chih-Jen Lin, and Bernhard Scholkopf,  A Tutorial on Support Vector Machines,

http://www.kernel-machines.org/index.html

 

-          Colin Campbell, SVM, video lecture, 2008

                        http://videolectures.net/epsrcws08_campbell_isvm/

           

            Assignment for Units 5 and 6:

                        -  Homework 3

 

 

________________________________________________________________________________  

UNIT 7:  Statistical Inference          

 

Required reading:

 

-  Slides for Unit 7

-  Textbook:  Chapter 5: "Statistical Methods", pp. 140-168.

 

            Additional readings:

 

                        -  Lindsay I Smith, A Tutorial on Principal Components Analysis,

http://www.cs.otago.ac.nz/cosc453/student_tutorials/principal_components.pdf

 

 

-  Jon Shlens, A Tutorial On Principal Component Analysis: Derivation, Discussion and Singular Value Decomposition,

http://www.cs.princeton.edu/picasso/mats/PCA-Tutorial-Intuition_jp.pdf

 

-  Ella Bingham, Advances In Independent Component Analysis With Applications To Data Mining,

http://www.cis.hut.fi/ella/thesis/thesis.pdf

 

 

D. Mease, Statistical Aspects of Data Mining, video lecture 

http://www.youtube.com/watch?v=zRsMEl6PHhM

 

 

 

_____________________________________________________________________________   

UNIT 8:  Decision Trees & Decision Rules      

 

Required reading:

 

-  Slides for Unit 8

-  Textbook:  Chapter 6: "Decision Trees and Decision Rules", pp. 169-198.

 

            Additional readings:

 

                        -       Howard J. Hamilton, Decision Tree Construction,

http://www.aaai.org/AITopics/html/trees.html#good

 

-                      Murthy S., Automatic Construction of Decision Trees from Data: A Multi- Disciplinary Survey, 2006 

                        http://www.cs.nyu.edu/~roweis/csc2515-2006/readings/murthy_dt.pdf

 

            Assignment for Units 7 and 8:

                        -  Homework 4

==================================

NOTE: Units 1-8 will be covered in the Test 1.

==================================

______________________________________________________________________________   

UNIT 9:  Artificial Neural Networks: Models & Architectures, Multilayer Perceptron

 

Required reading:

 

-  Slides for Unit 9

-  Textbook:  Chapter 7: "Artificial Neural Networks", Sections 7.1-7.5, pp. 199-221.

 

            Additional reading:

 

-     Rolf Pfeifer, Neural Nets

                        http://ailab.ifi.uzh.ch/images/stories/teaching/nn10/script/NN03032010.pdf   

 

-          Hinton G., The Next Generation of Neural Networks, video lecture

                        http://www.youtube.com/watch?v=AyzOUbkUf3M  

 

______________________________________________________________________________   

UNIT 9(a):  Artificial Neural Networks: Models & Architectures, Multilayer Perceptron

 

Required reading:

 

-  Slides for Unit 9(a)

-  Textbook:  Chapter 8: "Ensemble Learning", pp. 235-248.

 

______________________________________________________________________________   

UNIT 10:  Artificial Neural Networks: Competitive Networks & Kohonen Maps

 

Required reading:

 

-  Slides for Unit 10

-  Textbook:  Chapter 7: "Artificial Neural Networks", Sections 7.6-7-9, pp. 221-234.

 

            Additional reading:

 

                        -  Tom Germano, Self Organizing Maps,

http://davis.wpi.edu/~matt/courses/soms/#Introduction

or: http://www.cs.usyd.edu.au/~irena/ai01/nn/som.html http://www.cis.hut.fi/research/som-research/

 

 

-  Michael J. Radwin, Nearest-Neighbor Machine Learning Bakeoff

http://www.radwin.org/michael/projects/learning/

 

 

            Assignment for Units 9, 9(a) and 10:

                        -  Homework 5

 

_______________________________________________________________________     

UNIT 11:  Clustering

 

Required reading:

 

-  Slides for Unit 11

-  Textbook:  Chapter 9: "Cluster Analysis", pp. 249-279.

 

            Additional reading:

 

                        -   A.K. Jain, at al., Data Clustering: A Review,

http://nd.edu/~flynn/papers/Jain-CSUR99.pdf

 

 

-   Glenn Fung, A Comprehensive Overview of Basic Clustering Algorithms,

http://www.cs.wisc.edu/~gfung/clustering.pdf

 

 

_________________________________________________________________________      

UNIT 12:  Association rules

 

Required reading:

 

-  Slides for Unit 12

-  Textbook:  Chapter 10: "Association Rules", pp. 165-176.

 

            Additional reading:

 

-          R. Agraval, Fast Algorithms for Mining Association Rules

http://rakesh.agrawal-family.com/papers/vldb94apriori.pdf

 

 

-          K. Ming Leung, Association Rules, 2007

http://cis.poly.edu/~mleung/FRE7851/f07/AssociationRules3.pdf

 

 

-    Markus Hegland, Algorithms For Association Rules,

http://www.ims.nus.edu.sg/Programs/imgsci/files/heglandm1.pdf

 

            Assignments for Units 11 and 12:

                        -  Homework 6

 

____________________________________________________________________________   

UNIT 13:  Web Mining & Text Mining

 

Required reading:

 

-  Slides for Unit 13

-  Textbook:  Chapter 11: "Association Rules", pp. 300-327.

 

            Additional reading:

 

                        -      Johannes Fürnkranz, Web Mining,

http://www.ke.informatik.tu-darmstadt.de/~juffi/publications/web-mining-chapter.pdf

 

-     Bettina Berendt, at al, A Roadmap for Web Mining: From Web to SemanticWeb,

http://eprints.pascal-network.org/archive/00000841/01/roadmap.pdf

 

-     Marti A. Hearst, Untangling Text Data Mining,

http://www.sims.berkeley.edu/%7Ehearst/papers/acl99/acl99-tdm.html

 

-          M. Grobelnik, Text mining, video lecture,  2005

http://videolectures.net/acai05_grobelnik_tm/

 

-          William Cohen, Text Classification, 2006

http://videolectures.net/mlas06_cohen_tc/

 

 

 

_________________________________________________________________________________  

UNIT 14:  Visual Data Mining

 

Required reading:

 

-  Slides for Unit 14

-  Textbook:  Chapter 15: "Visualization Methods", pp. 447-469.

 

            Additional reading:

 

-  Kurt Thearling, Barry Becker, Dennis DeCoste, Bill Mawby, Michel Pilote, and Dan Sommerfield, Visualizing Data Mining Models,

http://www.thearling.com/text/dmviz/modelviz.htm

 

-  Horst Eidenberger, Visual Data Mining,

http://www.ims.tuwien.ac.at/media/documents/publications/itcom2004-mining.pdf


                        -  Edward J. Wegman, Visual Data Mining,

                        http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.94.7925

 

 

            Assignment for Units 13 and 14:

                        -  Homework 7 

 

==================================

NOTE: Units 9-14 will be covered in the Test 2.

==================================

Project 1 covered methodologies explained in Units 1-9, and Project 2 should use methods explained in Units 10- 14.      

__________________________________________________________________________________  

   
 

Copyright ©2005 Louisville.edu