Question
People You Might Know social network friendship recommendation algorithm builds on the idea that if two people have a lot of common friends, then the
"People You Might Know" social network friendship recommendation algorithm builds on the idea that if two people have a lot of common friends, then the system should recommend them to friend each other.
Let us use MapReduce to implement an algorithm that recommends N = 10 users with the largest number of friends in common with an ego user to this user.
Input:
The input data is made up of multiple lines, each representing a list of friends of a user and having the following format:
Here
A tiny sample of the data (assume that the IDs in the friend list of each user are sorted in ascending order):
=================================
0: 1, 2, 3, 6, 8, 9
1: 0, 5, 9
2: 0, 9
3: 0, 4, 5, 7, 9
4: 3, 8
5: 1, 3, 6, 8
6: 0, 5, 9
7: 3, 8
8: 0, 4, 5, 7
9: 0, 1, 2, 3, 6
==================================
Each line, keyed by the identifier of a user, has a list of identifiers for the friends of the user. For instance, the first line implies that user 0 has friended user 1, user 2, user 3, user 6, user 8, and user 9. Besides, the relationship is mutual, which means that, if user 1 is present in the friend list of user 0, user 0 should also be present in the friend list of user 1.
We can observe that users 1, 3, 6, and 8 have friended with both user 0 and user 5, which means user 0 and user 5 share at least 4 common friends. If they have not friended each other, the recommendation algorithm should suggest them to do so.
Output:
The output should contain one line per user in the following format:
where
For this naïve implementation, let's ignore the possible existence of already-friend pairs, meaning that you don't have to exclude a pair of users who have friended each other from recommendation.
In relation to the above example, it means that even user 1 and user 9 have friended each other, if they co-occur many times in other users' friend lists, user 1 can also show up in user 9's recommendation, and vice versa.
Assume that we use two consecutive MapReduce jobs to achieve this goal, with the output of the first MapReduce job used as the input to the second one.
The first MapReduce job treats each row in the input data as a record and counts the number of common friends each co-occurring pair of users have (or how many users have a certain pair of users appearing concurrently in their friend lists).
The second MapReduce job that consumes the first job's output generates the sorted list of recommended friends for
Questions:
Please provide a description of how you are going to solve this problem. Specifically, describe
1) the key-value pairs emitted from the map functions of both MapReduce jobs;
2) the main operations performed by the reduce functions of both jobs as well;
3) What algorithm design techniques can be used (i.e., use of combiners, secondary sort, etc.), and how to partition, sort, combine, and group data if necessary?
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started