# Applications of Probability in Computer Science

Homework 3: Continuous Variables & Classication
UC Irvine CS177: Applications of Probability in Computer Science
Due on November 10, 2020 at 11:59pm
Question 1: (20 points)
The time that a TA spends helping an individual student in oce hours is exponentially
distributed with a mean of 8 minutes, and independent of the time spent with other students.
For an exponential distribution with parameter ,
fX(x) = e􀀀x for x 0; E[X] =
1

; Var[X] =
1
2 :
Suppose there is a homework due tomorrow, and there are 4 people ahead of you in line.
a) The total time that it takes all 4 students ahead of you to receive help from the TA is a
random variable. What is the mean of this total time?
b) What is the standard deviation of the total time taken by the 4 students ahead of you?
c) What is the probability that all 4 people ahead of you will each take at most 10 minutes?
d) Suppose that the TA has nished helping the rst 3 students, and has already spent 10
minutes with the fourth student. What is the mean and standard deviation of the amount
of additional time you need to wait for help?
Question 2: (20 points)
An articial intelligence class has an assignment to write a program that generates the next
move in a game of chess. Suppose that the runtimes of student programs follow a normal
distribution with mean = 13 seconds, and standard deviation = 2:0 seconds. Hint: The
Python commands scipy.stats.norm.cdf and scipy.stats.norm.ppf may be useful.
a) What is the probability that a random program has a runtime greater than 18 seconds?
b) What is the probability that a random program has a runtime between 10 and 16 seconds?
c) The TA’s want to help the students complete their work faster. What would they have
to lower the average runtime to so that only 1.0% of students have runtimes over 13
seconds? Assume the standard deviation remains xed at = 2:0 seconds.
1
Question 3: (20 points)
You’ve been asked to test the performance of a batch of newly fabricated processors. If the
processors were correctly manufactured (class Y = 0), the time X to complete your test suite
is exponentially distributed with mean 1. If the equipment at the factory malfunctions (class
Y = 1), the time X is exponentially distributed with mean 50. You must decide whether or
not this batch of processors was correctly manufactured.
For the scenarios in the three parts below, it is possible to show that the optimal Bayesian
classier predicts Y = 0 if x c, and predicts Y = 1 if x > c, for some constant c. The
value of c depends on the test time distributions, the prior probabilities of the two classes,
and the assumed loss function. You need to determine the optimal c in each case.
a) Suppose that a new fabrication process has just been deployed, and the probability that
the factory manufactures correctly functioning processors is only P(Y = 0) = 0:5. What
threshold c of the observed test suite time X = x maximizes the probability that your
prediction is correct?
b) Suppose that after some improvements to the new fabrication process, the probability that
the factory manufactures correctly functioning processors increases to P(Y = 0) = 0:99.
What threshold c of the observed test suite time X = x maximizes the probability that
c) Market research suggests that the loss (or cost) of a missed detection (predicting Y = 0
when the processor is actually defective) is 500 times greater than the loss of a false alarm
(predicting Y = 1 when the processor was correctly manufactured). Assuming again that
P(Y = 0) = 0:99, what threshold c of the observed test suite time X = x minimizes the
expected loss?
Question 4: (40 points)
For a given day i, we let Yi = 1 if the ground-level ozone concentration near some city
(Houston, in our data) is at a dangerously high level. This is called an \ozone day”. We let
Yi = 0 if the ozone concentration is low enough to be considered safe.
We want to predict Yi from more easily measured \features” describing atmospheric
pollutant levels and meteorological conditions (temperature, humidity, wind speed, etc.).
There are a total of M = 72 of these features collected each day, which we denote by
Xi = fXij j j = 1; : : : ;Mg. Each feature Xij 2 R is a real number, and we will thus use a
Gaussian distribution to model these continuous random variables.
We will build a \naive Bayes” classier, which predicts observation i to be an ozone day
if P(Yi = 1 j Xi) > P(Yi = 0 j Xi), and a non-ozone day otherwise. Using Bayes rule, this
classier is equivalent to one that chooses Yi = 1 if and only if
pY (1)fXjY (xi j 1)
fX(xi)
>
pY (0)fXjY (xi j 0)
fX(xi)
;
ln pY (1) + ln fXjY (xi j 1) > ln pY (0) + ln fXjY (xi j 0): (1)
2
In this equation, pY (yi) is the probability mass function that denes the prior probability
of ozone and non-ozone days. The conditional probability density function fXjY (xi j yi)
describes the distribution of the M = 72 environmental features, which we assume depends
on the type of day. We make two simplifying assumptions about these densities: the features
Xij are conditionally independent given Yi, and their distributions are Gaussian. Thus:
fXjY (xi j 1) =
MY
j=1
1 q
22
1j
exp

Don't use plagiarized sources. Get Your Custom Essay on
Applications of Probability in Computer Science
Just from \$10/Page

􀀀
(xij 􀀀 1j)2
22
1j

; (2)
fXjY (xi j 0) =
MY
j=1
1 q
22
0j
exp

􀀀
(xij 􀀀 0j)2
22
0j

: (3)
Given Yi = 1, Xij is Gaussian with mean 1j and variance 2
1j . Given Yi = 0, Xij is Gaussian
with mean 0j and variance 2
0j . There are a total of 2M mean parameters and 2M variance
parameters, since every feature Xij has a distinct distribution for each of the two classes.
a) Derive equations for ln fXjY (xi j 1) and ln fXjY (xi j 0), the (natural) logarithms of the
conditional probability density functions in Equations (2,3). For numerical robustness,
simplify your answer so that it does not involve the exponential function.
Because ozone days are relatively rare, a classier that always predicts Yi = 0 would be
correct over 95% of the time, but would obviously not be practically useful for reducing
ozone hazard. To evaluate our classiers, we will thus separately compute the numbers
of false alarms (predictions of ozone days when in reality Yi = 0) and missed detections
(predictions of non-ozone days when in reality Yi = 1). We are willing to allow some false
alarms as long as there are very few missed detections.
For all parts below, assume that the mean parameters 1j ; 0j are set to match the mean
of the empirical distribution of the training data. The demo code computes these means.
b) Start by assuming the classes are equally probable (pY (1) = pY (0) = 1=2), and have
unit variance (2
1j = 2
0j = 1). Write code to compute the log conditional densities from
part (a). Then using Equation (1), classify each test example. Report your classication
accuracy, and the numbers of false alarms and missed detections.
Hint: Your classifer should have fewer than 10 missed detections.
c) Rather than assuming features have variance one, set the variance parameters 2
1j ; 2
0j
equal to the variance of the empirical distribution of the training data. Classify each
test example using Equation (1) with these variance estimates. Report your classication
accuracy, and the numbers of false alarms and missed detections.
d) Rather than assuming the classes are equally probable, estimate pY (1) as the fraction of
training examples that are ozone days. Classify each test example using Equation (1) with
this informative class prior, and the variances from part (c). Report your classication
accuracy, and the numbers of false alarms and missed detections.
3

Basic features
• Free title page and bibliography
• Unlimited revisions
• Plagiarism-free guarantee
• Money-back guarantee
On-demand options
• Writer’s samples
• Part-by-part delivery
• Overnight delivery
• Copies of used sources
Paper format
• 275 words per page
• 12 pt Arial/Times New Roman
• Double line spacing
• Any citation style (APA, MLA, Chicago/Turabian, Harvard)

# Our guarantees

We value our customers and so we ensure that what we do is 100% original..
With us you are guaranteed of quality work done by our qualified experts.Your information and everything that you do with us is kept completely confidential.

### Money-back guarantee

You have to be 100% sure of the quality of your product to give a money-back guarantee. This describes us perfectly. Make sure that this guarantee is totally transparent.

### Zero-plagiarism guarantee

The Product ordered is guaranteed to be original. Orders are checked by the most advanced anti-plagiarism software in the market to assure that the Product is 100% original. The Company has a zero tolerance policy for plagiarism.

### Free-revision policy

The Free Revision policy is a courtesy service that the Company provides to help ensure Customer’s total satisfaction with the completed Order. To receive free revision the Company requires that the Customer provide the request within fourteen (14) days from the first completion date and within a period of thirty (30) days for dissertations.

The Company is committed to protect the privacy of the Customer and it will never resell or share any of Customer’s personal information, including credit card data, with any third party. All the online transactions are processed through the secure and reliable online payment systems.

### Fair-cooperation guarantee

By placing an order with us, you agree to the service we provide. We will endear to do all that it takes to deliver a comprehensive paper as per your requirements. We also count on your cooperation to ensure that we deliver on this mandate.

## Calculate the price of your order

550 words
We'll send you the first draft for approval by September 11, 2018 at 10:52 AM
Total price:
\$26
The price is based on these factors: