Question

1 Approved Answer

Posted on Sep 24, 2024

Write a matlab code named HW7.m which takes a collection of k-mer reads and outputs the assembled genome. Codes referenced provided below PatternToNumber code: function

Write a matlab code named "HW7.m" which takes a collection of k-mer reads and outputs the assembled genome.

Codes referenced provided below

image text in transcribed

PatternToNumber code:

function number = PatternToNumber(pattern)

number = 0;

CharacterMap = containers.Map({'A', 'C', 'G', 'T'}, [0, 1, 2, 3]);

for i = 1: length(pattern)

number = number * 4 + CharacterMap(pattern(i));

end

number = 1 + number;

end

FindEulerianPath code:

function [path] = FindEulerianPath(adj,n) indegree = zeros(n);%for storing indegree of all vertices outdegree = zeros(n);%for storing outdegree of all vertices %calculating outdegree for i=1:n for j=1:n outdegree(i) = outdegree(i) + adj(i,j); end end %calculating indegree for j=1:n for i=1:n indegree(j) = indegree(j) + adj(i,j); end end

v1 = -1; v2 = -1; %step-1 ,search vertex whose outdegree-indegree = 1 for i=1:n if outdegree(i)-indegree(i) == 1 v1 = i; end end %search vertex whose outdegree-indegree = -1 for i=1:n if outdegree(i)-indegree(i) == -1 v2 = i; end end j = 1; %check if all others have balanced degree for i=1:n if i == v1 || i==v2 continue end if outdegree(i)~=indegree(i) j = 0; break; end end %if not balanced, no eulerian path if j == 0 fprintf('no eulerian path... '); return; end %if no such vertices no eulerian path if(v1 == -1 || v2 == -1) fprintf('no eulerian path... '); return; end %stack for path finding stack = java.util.Stack(); path = []; stack.push(v1);%inserting v1 while 1 if stack.isEmpty() break; end i = stack.pop(); path = [i,path]; for j=1:n if adj(i,j) == 1 stack.push(j); adj(i,j) = 0; end end end path = fliplr(path);%reversing the list to get the real path end

NumberToPattern code:

function pattern= NumberToPattern(number, k)

number = number - 1;

CharacterMap = ['A', 'C', 'G', 'T'];

pattern = ' ';

for i = 1: k

n = mod(number, 4);

number = (number - n) / 4;

pattern = strcat(CharacterMap(n + 1), pattern);

end

Section 1 of the code: Make an adjacency matrix (Al of the graph from the reads Hint: The uploaded "reads.mat file consists of sixty 6-mers. Each 6-mer represents an edge where the beginning node is the prefix and the ending node is the suffix of the 6-mer. Prefix and suffix have length of 5. So the graph can have at most 4 nodes, and therefore you can define A as a matrix with 4 rows and 4 columns where all its elements are zero at first. Then all you have to do is Start reading each k-mer, find its prefix and suffix, convert each to mber between 0 and 4-1 (use your PatternToNumber code from HW2). This way you can update the value of matrix A accordingly. For example, the first 6-mer is TGTTTA So preix TGTTT-959 and suffix='GTTA'=764. This means you need to add 1 to the element of the matrix A that sits on the 960" row and 765h column. Section 2 of the code: Find an Eulerian path for your adjacency matrix. Hint: Use you FindEulerianpath code from HW#6 Subtract 1 from each node number on the path to have a number between 0 and 41 again. For example, if Path- 1960 765... The corrected numbers should be 1959 764.. Section 3 of the code: Assemble the genome from the corrected path numbers Hint: Convert each corrected path number to a string of 5 nucleotides (use your NumberTo Pattern code from HW2). Take the whole 5 nucleotides from the first node, but for the rest of the nodes, just add the last nucleotide from their converted 5-mer strings Section 1 of the code: Make an adjacency matrix (Al of the graph from the reads Hint: The uploaded "reads.mat file consists of sixty 6-mers. Each 6-mer represents an edge where the beginning node is the prefix and the ending node is the suffix of the 6-mer. Prefix and suffix have length of 5. So the graph can have at most 4 nodes, and therefore you can define A as a matrix with 4 rows and 4 columns where all its elements are zero at first. Then all you have to do is Start reading each k-mer, find its prefix and suffix, convert each to mber between 0 and 4-1 (use your PatternToNumber code from HW2). This way you can update the value of matrix A accordingly. For example, the first 6-mer is TGTTTA So preix TGTTT-959 and suffix='GTTA'=764. This means you need to add 1 to the element of the matrix A that sits on the 960" row and 765h column. Section 2 of the code: Find an Eulerian path for your adjacency matrix. Hint: Use you FindEulerianpath code from HW#6 Subtract 1 from each node number on the path to have a number between 0 and 41 again. For example, if Path- 1960 765... The corrected numbers should be 1959 764.. Section 3 of the code: Assemble the genome from the corrected path numbers Hint: Convert each corrected path number to a string of 5 nucleotides (use your NumberTo Pattern code from HW2). Take the whole 5 nucleotides from the first node, but for the rest of the nodes, just add the last nucleotide from their converted 5-mer strings