  1. Association Rule Mining: (15 points)


Given a transaction database for mining association rule as follows:


Database D


TID Items
100 A C D
200 B C E
300 A B C E
400 B E


Please use Apriori algorithm to mine association rules with minimum support count = 2.

(Please show the derivation process step by step with candidate itemsets.)


  1. Generating Classification Rules from a Decision Tree (50 points)


Given a database table containing weather data as follows:


Outlook Temperature Humidity Windy Class: Play
Sunny Hot High False No
Sunny Hot High True No
Overcast Hot High False Yes
Rainy Mild High False Yes
Rainy Cool Normal False Yes
Rainy Cool Normal True No
Overcast Cool Normal True Yes
Sunny Mild High False No
Sunny Cool Normal False Yes
Rainy Mild Normal False Yes
Sunny Mild Normal True Yes
Overcast Mild High True Yes
Overcast Hot Normal False Yes
Rainy Mild High True No


  1. Please use the basic algorithm (version of ID3) for inducing a decision tree from the given training samples in the weather database table.



  1. Please extract the classification rules from the generated decision tree in 1.


  1. For the weather database table given in B, please predict a class label for the weather data by using naïve Bayesian classification approach (20 points).


The unknown samples to be classified are:


(Outlook = ‘Sunny’,  Temperature = ‘Mild’ , Humidity = ‘High’ ,  Windy = ‘False’)


(Outlook = ‘Sunny’,  Temperature = ‘Hot’ , Humidity = ‘High’ ,  Windy = ‘False’)



  1. Classification and Characteristic Rule Derivation (50 points)


For the given conditions as follows:


1.)   The relation table:



Name Sex Age Birth_place Major Position Salary
Carrie female 33 Florida DCTE Instructor $31,000
Stilwell male 58 Michigan DISS Manager $65,000
Nana female 35 Japan DCTE Instructor $35,000
O’Hare male 35 Canada DCS Assistant Prof. $45,000
Peabody male 50 New York DISS CIO $70,000
Juliana female 68 California DISS CEO $90,000
O’Neil male 42 France DCS Lecturer $50,000
Diana female 51 Oregon DCTE Assist. Prof. $49,000
Anderson male 45 India DCS Instructor $49,500
Christopher male 42 New Mexico DISS Manager $55,000
Cook female 40 Illinois DISS System Analyst $45,000
Kim Ming female 38 South Korea DCTE Lecturer $35,500
Donovan male 32 Netherlands DCS Assist. Prof. $48,000
George male 42 South Korea DCS Instructor $35,000
Donna female 35 Texas DCS Programmer $57,000
Mike male 60 Ohio DISS Manager $67,500
Lynn female 55 Georgia DISS Manager $60,000
Lisa female 37 Italy DCS Programmer $47,500
Sherry female 46 Germany DCS Lecturer $38,000
Robert male 51 Kansas DISS CIO $72,500



2.)  The concept hierarchy table:




{21 – 30 } Ì Young

{31 – 50 } Ì Mid-Age

{51 – 70 } Ì Old




{Canada, France, Germany , India, Italy, Japan, Netherlands, South Korea }Ì Foreign

{California, Florida, Georgia, Illinois, Kansas, Michigan, New York, Ohio, Oregon, Utah, Texas, New Mexico } Ì USA








{Instructor, Lecturer, Assistant Professor, Associate Professor, Professor} Ì Faculty

{CEO, CIO, Manager, Programmer, System Analyst} Ì Non-Faculty




{ $20,000 – $30,000 }Ì Low

{ $30,001 – $50,000 }Ì Medium

{ $50,001 – $100,000 }Ì High


Please do the followings:


  1. Please derive the quantitative classification and characteristic rules from the given relation table and the concept hierarchy table. The target class for the quantitative rule derivation is male.
  2. Please indicate the sufficient and necessary conditions for these derived rules.
  3. Please indicate which attribute should be removed during the derivation process.
  4. Please indicate the threshold value T for the number of distinct values of each remaining attribute.
  5. Please give the detailed intermediate tables in the derivation process, mark the tuples which overlap between the target class and contrast class, and make assumptions whenever necessary.
  6. Please make reference to the following materials for the examination:


Jiawei Han, Yandong Cai, and Nick Cercone, “Data-Driven Discovery of Quantitative Rules in Relational Databases,” IEEE Transactions on Knowledge and Data Engineering, Vol. 5, No. 1, 1993, pp. 29-40.


Yandong Cai, Nick Cercone, and Jiawei Han, “An Attribute-Oriented Approach for Learning Classification Rules from Relational Databases,” in Proceedings of Sixth International Conference on Data Engineering, February 1990, pp. 281-288.














  1. Please answer the following questions: (35 points)


(a)   What is the confidence for the rules ∅® A and A ® ∅? (10 points)



(b) Let c1, c2, and c3 be the confidence values of the rules {p}®{q}, {p}®{q, r}, and {p, r}®{q}, respectively. If we assume that c1, c2, and c3 have different values, what are the possible relationships that may exist among c1, c2, and c3? Which rule has the lowest confidence?

(15 points)



(c)   Repeat the analysis in part (b) assuming that the rules have identical support. Which rule has the highest confidence? (10 points)



  1. Please give the answers to the following questions:


  1. What is the lower bound and upper bound in terms of the number of candidate sets generated by the Apriori association rule mining algorithm? (15 points)


  1. What is the total number of possible association rules, which can be generated from a given data set that contains n items? Why? (25 points)




