Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

I have a created a function using these instructions: Using the Gower distances in the matrix D computed by the provided function, find the most

I have a created a function using these instructions:
Using the Gower distances in the matrix D computed by the provided function, find the most and least similar pairs of colleges in the dataset (15 points).
Note that an item is most similar to itself (distance =0.) but you need to disallow this case since we actually care about finding two distinct items not along the diagonal that are most similar. One quick way to accomplish this is to replace the zeros along the diagonal of the distance matrix D returned by the gower_distances function, with a very large number (e.g.1000) that wouldn't occur as a distance in practice.
You may also find numpy's unravel_index function, in combination with argmax or argmin, useful for finding min/max elements in an array. Remember that the least similar elements will have maximum distance from each other, and most similar will have minimum distance.
Your function should accept as an argument the college dataframe (df) provided above. The gower_distances() function will also utilize this dataframe. See the gower_distances() function definition above.
Your function should return a 2-element tuple, consisting itself of two tuples: the first tuple should be the names (via the College.Name field) of the two colleges that are least similar according the Gower distance. The second tuple should name the most similar colleges.
My function returns: Why are the elements in the second tuple not distinct?
(('Augustana College IL', 'Hope College'),
('Abilene Christian University', 'Abilene Christian University'))
def answer_mixed_features_a(df):
D = gower_distances(df) #a provided function that computes gower distances
np.fill_diagonal(D,1000)
# Find indices of least and most similar pairs
least_similar_idx = np.unravel_index(np.argmin(D), D.shape)
most_similar_idx = np.unravel_index(np.argmax(D), D.shape)
# Get names of least and most similar colleges
least_similar_colleges =(
df.iloc[least_similar_idx[0]]["College.Name"],
df.iloc[least_similar_idx[1]]["College.Name"],
)
most_similar_colleges =(
df.iloc[most_similar_idx[0]]["College.Name"],
df.iloc[most_similar_idx[1]]["College.Name"],
)
return (least_similar_colleges, most_similar_colleges)

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

OpenStack Trove

Authors: Amrith Kumar, Douglas Shelley

1st Edition

1484212215, 9781484212219

More Books

Students also viewed these Databases questions