Association Rule Mining
- Association Rule Mining: (15 points)
Given a transaction database for mining association rule as follows:
Database D
TID | Items |
100 | A C D |
200 | B C E |
300 | A B C E |
400 | B E |
Please use Apriori algorithm to mine association rules with minimum support count = 2.
(Please show the derivation process step by step with candidate itemsets.)
- Generating Classification Rules from a Decision Tree (50 points)
Given a database table containing weather data as follows:
Outlook | Temperature | Humidity | Windy | Class: Play |
Sunny | Hot | High | False | No |
Sunny | Hot | High | True | No |
Overcast | Hot | High | False | Yes |
Rainy | Mild | High | False | Yes |
Rainy | Cool | Normal | False | Yes |
Rainy | Cool | Normal | True | No |
Overcast | Cool | Normal | True | Yes |
Sunny | Mild | High | False | No |
Sunny | Cool | Normal | False | Yes |
Rainy | Mild | Normal | False | Yes |
Sunny | Mild | Normal | True | Yes |
Overcast | Mild | High | True | Yes |
Overcast | Hot | Normal | False | Yes |
Rainy | Mild | High | True | No |
- Please use the basic algorithm (version of ID3) for inducing a decision tree from the given training samples in the weather database table.
- Please extract the classification rules from the generated decision tree in 1.
- For the weather database table given in B, please predict a class label for the weather data by using naïve Bayesian classification approach (20 points).
The unknown samples to be classified are:
(Outlook = ‘Sunny’, Temperature = ‘Mild’ , Humidity = ‘High’ , Windy = ‘False’)
(Outlook = ‘Sunny’, Temperature = ‘Hot’ , Humidity = ‘High’ , Windy = ‘False’)
- Classification and Characteristic Rule Derivation (50 points)
For the given conditions as follows:
1.) The relation table:
Student
Name | Sex | Age | Birth_place | Major | Position | Salary |
Carrie | female | 33 | Florida | DCTE | Instructor | $31,000 |
Stilwell | male | 58 | Michigan | DISS | Manager | $65,000 |
Nana | female | 35 | Japan | DCTE | Instructor | $35,000 |
O’Hare | male | 35 | Canada | DCS | Assistant Prof. | $45,000 |
Peabody | male | 50 | New York | DISS | CIO | $70,000 |
Juliana | female | 68 | California | DISS | CEO | $90,000 |
O’Neil | male | 42 | France | DCS | Lecturer | $50,000 |
Diana | female | 51 | Oregon | DCTE | Assist. Prof. | $49,000 |
Anderson | male | 45 | India | DCS | Instructor | $49,500 |
Christopher | male | 42 | New Mexico | DISS | Manager | $55,000 |
Cook | female | 40 | Illinois | DISS | System Analyst | $45,000 |
Kim Ming | female | 38 | South Korea | DCTE | Lecturer | $35,500 |
Donovan | male | 32 | Netherlands | DCS | Assist. Prof. | $48,000 |
George | male | 42 | South Korea | DCS | Instructor | $35,000 |
Donna | female | 35 | Texas | DCS | Programmer | $57,000 |
Mike | male | 60 | Ohio | DISS | Manager | $67,500 |
Lynn | female | 55 | Georgia | DISS | Manager | $60,000 |
Lisa | female | 37 | Italy | DCS | Programmer | $47,500 |
Sherry | female | 46 | Germany | DCS | Lecturer | $38,000 |
Robert | male | 51 | Kansas | DISS | CIO | $72,500 |
2.) The concept hierarchy table:
Age:
{21 – 30 } Ì Young
{31 – 50 } Ì Mid-Age
{51 – 70 } Ì Old
Birth_Place:
{Canada, France, Germany , India, Italy, Japan, Netherlands, South Korea }Ì Foreign
{California, Florida, Georgia, Illinois, Kansas, Michigan, New York, Ohio, Oregon, Utah, Texas, New Mexico } Ì USA
Major:
{ DCS, DISS, DCTE }
Position:
{Instructor, Lecturer, Assistant Professor, Associate Professor, Professor} Ì Faculty
{CEO, CIO, Manager, Programmer, System Analyst} Ì Non-Faculty
Salary:
{ $20,000 – $30,000 }Ì Low
{ $30,001 – $50,000 }Ì Medium
{ $50,001 – $100,000 }Ì High
Please do the followings:
- Please derive the quantitative classification and characteristic rules from the given relation table and the concept hierarchy table. The target class for the quantitative rule derivation is male.
- Please indicate the sufficient and necessary conditions for these derived rules.
- Please indicate which attribute should be removed during the derivation process.
- Please indicate the threshold value T for the number of distinct values of each remaining attribute.
- Please give the detailed intermediate tables in the derivation process, mark the tuples which overlap between the target class and contrast class, and make assumptions whenever necessary.
- Please make reference to the following materials for the examination:
Jiawei Han, Yandong Cai, and Nick Cercone, “Data-Driven Discovery of Quantitative Rules in Relational Databases,” IEEE Transactions on Knowledge and Data Engineering, Vol. 5, No. 1, 1993, pp. 29-40.
Yandong Cai, Nick Cercone, and Jiawei Han, “An Attribute-Oriented Approach for Learning Classification Rules from Relational Databases,” in Proceedings of Sixth International Conference on Data Engineering, February 1990, pp. 281-288.
- Please answer the following questions: (35 points)
(a) What is the confidence for the rules ∅® A and A ® ∅? (10 points)
(b) Let c1, c2, and c3 be the confidence values of the rules {p}®{q}, {p}®{q, r}, and {p, r}®{q}, respectively. If we assume that c1, c2, and c3 have different values, what are the possible relationships that may exist among c1, c2, and c3? Which rule has the lowest confidence?
(15 points)
(c) Repeat the analysis in part (b) assuming that the rules have identical support. Which rule has the highest confidence? (10 points)
- Please give the answers to the following questions:
- What is the lower bound and upper bound in terms of the number of candidate sets generated by the Apriori association rule mining algorithm? (15 points)
- What is the total number of possible association rules, which can be generated from a given data set that contains n items? Why? (25 points)
Is this the question you were looking for? If so, place your order here to get started!