Discussion on data mining

Data Mining Assignment 2
Total 5 questions as below/attached document. Each 200 words (Total 1000). No Plagiarism please 4-6 APA references.

1.Choose the area of your preference, whatever you would like to describe in a dataset and explain using data mining.For example: actresses/actors, food, movies, sports, music bands, or anything you want.Create a data file in .arff format containing about 20 entries, each described byabout 4 attributes, with the last attribute containing your preference (class attribute), e.g.@relation food@attribute calories numeric@attribute taste {sweet, sour, bitter, salty}@attribute course {appetizer, main, dessert, drink}@attribute vegetarian {yes, no}@attribute like_it {yes, no}@data100, sweet, dessert, yes, yes%icecream80, bitter, drink, yes, yes%beer2, sweet, dessert,yes, no%cakeCompare 3 algorithms for classification of your data: decision trees, a classification or an association rule learner, and naive Bayes. For each algorithm check what the error is (which algorithm can explain your personal liking the best), and observe the generated rules (do they tell you anything interesting?).2.Use the following learning schemes to compare the training set and 10-fold stratified cross-validation scores of the labor data (in labor_neg_nominal.arff):•k-nearest neighbours (IBk) with decision trees (j48.J48)•k-nearest neighbours (IBk) with decision trees j48.J48 with option -M 3, so that each leaf has at least 3 instances.A)What does the training set evaluation score tell you? B)What does the cross-validation score evaluate?C)Which one of these models would you say is the best?Why?3.Use the following learning schemes to analyze the Titanic data (in titanic.arff).C4.5- weka.classifiers.j48.J48Association rules-weka.associations.aprioriDecision List- weka. Classifiers.PARTA)What is the most important descriptor (attribute) in titanic.arff?B)How well were these methods able to learn the patterns in the dataset? Quantify youranswer?C)Compare the training set and 10-fold cross-validations scores of the methods.D)Would you trust these models?Did they really learn what was important to survive theTitanic disaster?E)Which one would you trust more, even if just very slightly? Why?4.Choose one of the following three files: soybean.arff, autoprice.arff, hungarian, zoo.arff or zoo2_x.arff and use any two schemas of your choice to build and compare the models. Which one of the models would you keep?Why?5.Use the Association rule learner APRIORI method to find the association rule in the Weather.nominal data set.How many rules did it produce?How large are the item sets? What was the largest one? What happened when you increased/decreased the confidence level? What about the number of rules?What happens when you increase the confidence parameter to 2?Why?

Don't use plagiarized sources. Get Your Custom Essay on
Discussion on data mining
Just from $10/Page
Order Essay

 

  1. Choose the area of your preference, whatever you would like to describe in a dataset and explain using data mining. For example: actresses/actors, food, movies, sports, music bands, or anything you want.

Create a data file in .arff format containing about 20 entries, each described by

about 4 attributes, with the last attribute containing your preference (class attribute), e.g.

 

@relation food

@attribute calories numeric

@attribute taste {sweet, sour, bitter, salty}

@attribute course {appetizer, main, dessert, drink}

@attribute vegetarian {yes, no}

@attribute like_it {yes, no}

@data

100, sweet, dessert, yes, yes%icecream

80, bitter, drink, yes, yes%beer

2, sweet, dessert,yes, no%cake

 

Compare 3 algorithms for classification of your data: decision trees, a classification or an association rule learner, and naive Bayes. For each algorithm check what the error is (which algorithm can explain your personal liking the best), and observe the generated rules (do they tell you anything interesting?).

 

 

 

 

  1. Use the following learning schemes to compare the training set and 10-fold stratified cross-validation scores of the labor data (in labor_neg_nominal.arff):

 

  • k-nearest neighbours (IBk) with decision trees (j48.J48)
  • k-nearest neighbours (IBk) with decision trees j48.J48 with option -M 3, so that each leaf has at least 3 instances.

 

  1. A) What does the training set evaluation score tell you? B) What does the cross-validation score evaluate?
  2. C) Which one of these models would you say is the best? Why?

 

 

 

  1. Use the following learning schemes to analyze the Titanic data (in titanic.arff).

C4.5                             – weka.classifiers.j48.J48

Association rules         -weka.associations.apriori

Decision List                – weka. Classifiers.PART

 

  1. A) What is the most important descriptor (attribute) in titanic.arff?
  2. B) How well were these methods able to learn the patterns in the dataset? Quantify your

answer?

  1. C) Compare the training set and 10-fold cross-validations scores of the methods.
  2. D) Would you trust these models? Did they really learn what was important to survive the

Titanic disaster?

  1. E) Which one would you trust more, even if just very slightly? Why?

 

 

 

 

  1. Choose one of the following three files: soybean.arff, autoprice.arff, hungarian, zoo.arff or zoo2_x.arff and use any two schemas of your choice to build and compare the models. Which one of the models would you keep? Why?

 

 

 

 

 

 

 

 

 

  1. Use the Association rule learner APRIORI method to find the association rule in the Weather.nominal data set. How many rules did it produce?  How large are the item sets? What was the largest one? What happened when you increased/decreased the confidence level? What about the number of rules?  What happens when you increase the confidence parameter to 2?  Why?

Get professional assignment help cheaply

Are you busy and do not have time to handle your assignment? Are you scared that your paper will not make the grade? Do you have responsibilities that may hinder you from turning in your assignment on time? Are you tired and can barely handle your assignment? Are your grades inconsistent?

Whichever your reason may is, it is valid! You can get professional academic help from our service at affordable rates. We have a team of professional academic writers who can handle all your assignments.

Our essay writers are graduates with diplomas, bachelor, masters, Ph.D., and doctorate degrees in various subjects. The minimum requirement to be an essay writer with our essay writing service is to have a college diploma. When assigning your order, we match the paper subject with the area of specialization of the writer.

Why choose our academic writing service?

  • Plagiarism free papers
  • Timely delivery
  • Any deadline
  • Skilled, Experienced Native English Writers
  • Subject-relevant academic writer
  • Adherence to paper instructions
  • Ability to tackle bulk assignments
  • Reasonable prices
  • 24/7 Customer Support
  • Get superb grades consistently

 

 

 

 

 

 

Order a unique copy of this paper
(550 words)

Approximate price: $22

Basic features
  • Free title page and bibliography
  • Unlimited revisions
  • Plagiarism-free guarantee
  • Money-back guarantee
  • 24/7 support
On-demand options
  • Writer’s samples
  • Part-by-part delivery
  • Overnight delivery
  • Copies of used sources
  • Expert Proofreading
Paper format
  • 275 words per page
  • 12 pt Arial/Times New Roman
  • Double line spacing
  • Any citation style (APA, MLA, Chicago/Turabian, Harvard)

Our guarantees

We value our customers and so we ensure that what we do is 100% original..
With us you are guaranteed of quality work done by our qualified experts.Your information and everything that you do with us is kept completely confidential.

Money-back guarantee

You have to be 100% sure of the quality of your product to give a money-back guarantee. This describes us perfectly. Make sure that this guarantee is totally transparent.

Read more

Zero-plagiarism guarantee

The Product ordered is guaranteed to be original. Orders are checked by the most advanced anti-plagiarism software in the market to assure that the Product is 100% original. The Company has a zero tolerance policy for plagiarism.

Read more

Free-revision policy

The Free Revision policy is a courtesy service that the Company provides to help ensure Customer’s total satisfaction with the completed Order. To receive free revision the Company requires that the Customer provide the request within fourteen (14) days from the first completion date and within a period of thirty (30) days for dissertations.

Read more

Privacy policy

The Company is committed to protect the privacy of the Customer and it will never resell or share any of Customer’s personal information, including credit card data, with any third party. All the online transactions are processed through the secure and reliable online payment systems.

Read more

Fair-cooperation guarantee

By placing an order with us, you agree to the service we provide. We will endear to do all that it takes to deliver a comprehensive paper as per your requirements. We also count on your cooperation to ensure that we deliver on this mandate.

Read more

Calculate the price of your order

550 words
We'll send you the first draft for approval by September 11, 2018 at 10:52 AM
Total price:
$26
The price is based on these factors:
Academic level
Number of pages
Urgency
Open chat
1
You can contact our live agent via WhatsApp! Via +1 817 953 0426

Feel free to ask questions, clarifications, or discounts available when placing an order.

Order your essay today and save 20% with the discount code VICTORY