University of Louisville, CECS Department
======================================
DATA MINING - CECS 632
COURSE SYLLABUS
Instructor:
Dr. Mehmed. M. Kantardzic, Professor Phone: (502) 852-3703
CECS Department, Speed
School of Engineering E-mail: mmkant01@ louisville.edu
University of Louisville Office:
DC 210
Course Description:
This
course will introduce concepts, models, methods, and techniques of data mining,
including artificial neural networks, rule association, and decision trees.
Some basic principles of data warehousing will be explained with emphasis on a
relation between data mining and data warehousing processes. Some software tools and successful real world
data-mining applications will also be introduced.
Course Objectives:
After
taking this course, the student should:
- Understand the basics of data mining process,
and requirements in its every phase to build a successful application.
- Understand the basic data-mining techniques and
will be able to use standard, or to develop new software tools for data
mining.
Textbook:
- Mehmed Kantardzic, "Data Mining: Concepts,
Models, Methods, and Algorithms", second edition, IEEE Press & John Wiley, 2011.
Recommended References:
- Tan P., Steinbach M., Kumar V., Introduction to Data Mining,
Addison-Wesley, Boston, MA, 2006.
- Han J., Kamber M., Data
Mining: Concepts and Techniques, second edition, Morgan Kaufmann, San Francisco,
2006.
3. Hand D., Mannila
H., Smith P., Principles of Data Mining, The MIT Press, Cambridge, MA,
2001.
Grading Policy:
Course
projects, tests, homeworks, and blackboard discussion
about specific topics are elements for final grading. Homework assignments will
be given online with due dates. Penalty for late assignments is 10% per day. Instructor
will give during the course several topics for discussion on the blackboard.
These activities will also be graded. Course projects are practical application
of standard data mining tools on the experimental data sets (both are available
on the Internet), or implementation of new data mining algorithms in one of the
standard programming languages. Final reports on the projects are required. In
addition to achieving a passing cumulative score (typically 60%), a minimum
performance of 50% must also be achieved in each of next areas (given with
weights):
Two
Tests (20%+20%) 40%
Two
Projects (20%+20%) 40 %
Homeworks 15
%
Blackboard
discussion activities 5 %
=====
Total 100 %
Grading Scale:
A 90% & up
B 80% - 89.99%
C 70% - 79.99%
D 60% - 69.99%
F below 60%
Cheating Policy:
Students
are encouraged to work together and learn from each other. However, cheating in
any form on the exam, or copying will not be tolerated. Any evidence of
cheating will result in a failing grade for the course and appropriate
notification will be made to the Dean's office.
General Policies:
1.
If you bring food to class, bring enough to share with everyone.
2.
Soft drinks and bottled water are permitted, as long as you drink
quietly, do not spill, and properly dispose of empty containers.
3.
Once the lecture starts, any student wishing to speak should raise his
or her hand and wait for recognition from the instructor before speaking.
Students with Disabilities:
If
you need accommodations because of a disability, if you have emergency medical
information to share with me, please send me an email during the first week or
make an appointment to discuss it with me as soon as possible.
Oral and written communication requirements:
There will be no oral or written communication
requirements beyond homeworks, tests and projects.
Schedule:
The
following is the tentative schedule of topics, which will be covered during the
semester.
_____________________________________________________________________________
UNIT 1: Data mining concepts
Required
reading:
- Slides for Unit 1
- Textbook: Chapter 1: "Data Mining Concepts", pp. 1-25.
Additional reading:
-
Jeffrey W. Seifert, Data Mining: An Overview
http://www.fas.org/irp/crs/RL31798.pdf
- Kurt Thearling, An Introduction to Data
Mining,
http://www.thearling.com/text/dmwhite/dmwhite.htm
____________________________________________________________________________
UNIT 2: Data Mining Tasks & Data Mining
Algorithms
Required
reading:
- Slides for Unit 2
- Textbook: Chapter 4, Section
4.4: "Common Learning Tasks", pp. 101-105.
Additional reading:
- Padhraic Smyth, Mining At the Interface of Computer Science
And Statistics,
http://www.datalab.uci.edu/papers/dmchap.pdf
Assignment for Units 1 and 2:
- Homework
1
___________________________________________________________________________
UNIT 3: Preparing the Data
Required
reading:
- Slides for Unit 3
- Textbook: Chapter 2: "Preparing the Data", pp. 26-52.
Additional reading:
- Les A. Hook, et al, Best Practices for Preparing Environmental Data Sets to Share and Archive
http://daac.ornl.gov/PI/BestPractices-2010.pdf
- Charu C. Aggarwal, Philip S. Yu, Outlier Detection for High
Dimensional Data,
http://charuaggarwal.net/outl.pdf
________________________________________________________________________________
UNIT 4: Data Reduction: features, values, &
cases reduction
Required
reading:
- Slides for Unit 4
- Textbook: Chapter 3: "Data Reduction", pp. 53-86
Additional reading:
-
Dim-red-1.pdf document
-
Dim-red-2.pdf document
-
Fodor, I. K., A survey of dimension
reduction techniques,
https://computation.llnl.gov/casc/sapphire/pubs/148494.pdf or
https://e-reports-ext.llnl.gov/pdf/240921.pdf
Assignment
for Units 3 and 4:
-
Homework 2
_________________________________________________________________________________
UNIT 5: Learning from Data
Required
reading:
- Slides for Unit 5
- Textbook: Chapter 4: "Learning from Data", Sections
4.1-4.3 and 4.7-4.11, pp. 87-105 and 122-139.
Additional readings:
- Olivier Bousquet, Stephane Boucheron, and
Gabor Lugosi, Introduction to Statistical Learning Theory,
http://www.econ.upf.es/~lugosi/mlss_slt.pdf
_________________________________________________________________________________
UNIT 6: Support Vector Machines
Required
reading:
- Slides for Unit 6
- Textbook:
Chapter 4: "Learning from Data", Sections 4.5-4.6, pp. 105-122.
Additional reading:
- Christopher J.C. Burges, A
Tutorial on Support Vector Machines for Pattern Recognition,
http://www.umiacs.umd.edu/~joseph/support-vector-machines4.pdf
- Pai-Hsuen
Chen, Chih-Jen Lin, and Bernhard Scholkopf, A Tutorial on Support Vector Machines,
http://www.kernel-machines.org/index.html
-
Colin Campbell, SVM, video lecture, 2008
http://videolectures.net/epsrcws08_campbell_isvm/
Assignment
for Units 5 and 6:
- Homework
3
________________________________________________________________________________
UNIT 7: Statistical Inference
Required
reading:
- Slides for Unit 7
- Textbook: Chapter 5: "Statistical Methods", pp. 140-168.
Additional readings:
- Lindsay I
Smith, A Tutorial on Principal Components Analysis,
http://www.cs.otago.ac.nz/cosc453/student_tutorials/principal_components.pdf
- Jon Shlens, A Tutorial On Principal Component
Analysis: Derivation, Discussion and Singular Value Decomposition,
http://www.cs.princeton.edu/picasso/mats/PCA-Tutorial-Intuition_jp.pdf
- Ella Bingham, Advances In Independent
Component Analysis With Applications To Data Mining,
http://www.cis.hut.fi/ella/thesis/thesis.pdf
D. Mease,
Statistical Aspects of Data Mining, video lecture
http://www.youtube.com/watch?v=zRsMEl6PHhM
_____________________________________________________________________________
UNIT 8: Decision Trees & Decision Rules
Required
reading:
- Slides for Unit 8
- Textbook: Chapter 6: "Decision Trees and Decision Rules",
pp. 169-198.
Additional readings:
- Howard J. Hamilton, Decision Tree
Construction,
http://www.aaai.org/AITopics/html/trees.html#good
-
Murthy S., Automatic Construction of Decision Trees from Data: A Multi-
Disciplinary Survey, 2006
http://www.cs.nyu.edu/~roweis/csc2515-2006/readings/murthy_dt.pdf
Assignment
for Units 7 and 8:
- Homework
4
==================================
NOTE: Units 1-8 will be covered in the Test 1.
==================================
______________________________________________________________________________
UNIT 9: Artificial Neural Networks: Models &
Architectures, Multilayer Perceptron
Required
reading:
- Slides for Unit 9
- Textbook: Chapter 7: "Artificial Neural Networks",
Sections 7.1-7.5, pp. 199-221.
Additional reading:
- Rolf Pfeifer, Neural Nets
http://ailab.ifi.uzh.ch/images/stories/teaching/nn10/script/NN03032010.pdf
-
Hinton G., The Next Generation of Neural Networks, video lecture
http://www.youtube.com/watch?v=AyzOUbkUf3M
______________________________________________________________________________
UNIT 9(a): Artificial Neural Networks: Models &
Architectures, Multilayer Perceptron
Required
reading:
- Slides for Unit 9(a)
- Textbook: Chapter 8: "Ensemble Learning", pp. 235-248.
______________________________________________________________________________
UNIT 10: Artificial Neural Networks: Competitive
Networks & Kohonen Maps
Required
reading:
- Slides for Unit 10
- Textbook: Chapter 7: "Artificial Neural Networks",
Sections 7.6-7-9, pp. 221-234.
Additional reading:
- Tom Germano,
Self Organizing Maps,
http://davis.wpi.edu/~matt/courses/soms/#Introduction
or:
http://www.cs.usyd.edu.au/~irena/ai01/nn/som.html
http://www.cis.hut.fi/research/som-research/
- Michael J. Radwin, Nearest-Neighbor Machine Learning Bakeoff
http://www.radwin.org/michael/projects/learning/
Assignment
for Units 9, 9(a) and 10:
-
Homework 5
_______________________________________________________________________
UNIT 11: Clustering
Required
reading:
- Slides for Unit 11
- Textbook: Chapter 9: "Cluster Analysis", pp. 249-279.
Additional reading:
- A.K. Jain, at al., Data Clustering: A Review,
http://nd.edu/~flynn/papers/Jain-CSUR99.pdf
- Glenn Fung, A
Comprehensive Overview of Basic Clustering Algorithms,
http://www.cs.wisc.edu/~gfung/clustering.pdf
_________________________________________________________________________
UNIT 12: Association rules
Required
reading:
- Slides for Unit 12
- Textbook: Chapter 10: "Association Rules", pp. 165-176.
Additional reading:
-
R. Agraval, Fast
Algorithms for Mining Association Rules
http://rakesh.agrawal-family.com/papers/vldb94apriori.pdf
-
K. Ming Leung, Association Rules, 2007
http://cis.poly.edu/~mleung/FRE7851/f07/AssociationRules3.pdf
- Markus
Hegland, Algorithms For
Association Rules,
http://www.ims.nus.edu.sg/Programs/imgsci/files/heglandm1.pdf
Assignments
for Units 11 and 12:
-
Homework 6
____________________________________________________________________________
UNIT 13: Web Mining & Text Mining
Required
reading:
- Slides for Unit 13
- Textbook: Chapter 11: "Association Rules", pp. 300-327.
Additional reading:
- Johannes Fürnkranz, Web Mining,
http://www.ke.informatik.tu-darmstadt.de/~juffi/publications/web-mining-chapter.pdf
- Bettina
Berendt, at al, A Roadmap for Web Mining: From Web to SemanticWeb,
http://eprints.pascal-network.org/archive/00000841/01/roadmap.pdf
-
Marti A. Hearst, Untangling
Text Data Mining,
http://www.sims.berkeley.edu/%7Ehearst/papers/acl99/acl99-tdm.html
-
M. Grobelnik, Text mining,
video lecture, 2005
http://videolectures.net/acai05_grobelnik_tm/
-
William Cohen,
Text Classification, 2006
http://videolectures.net/mlas06_cohen_tc/
_________________________________________________________________________________
UNIT 14: Visual Data Mining
Required
reading:
- Slides for Unit 14
- Textbook: Chapter 15: "Visualization Methods", pp. 447-469.
Additional reading:
- Kurt Thearling, Barry Becker, Dennis DeCoste,
Bill Mawby, Michel Pilote, and Dan Sommerfield, Visualizing Data Mining Models,
http://www.thearling.com/text/dmviz/modelviz.htm
- Horst Eidenberger, Visual Data Mining,
http://www.ims.tuwien.ac.at/media/documents/publications/itcom2004-mining.pdf
- Edward J. Wegman, Visual Data Mining,
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.94.7925
Assignment for
Units 13 and 14:
-
Homework 7
==================================
NOTE: Units 9-14 will be covered in the Test 2.
==================================
Project 1 covered methodologies explained in Units
1-9, and Project 2 should use methods explained in Units 10- 14.
__________________________________________________________________________________