본문 바로가기

생물정보학(바이오인포매틱스)/생물정보학 책

[Bioinformatics Algorithms Ch 01] The algorithm of DNA Replication

 
 

Hi I'm a TKM.

 
 

Today, I will introduce some algorithms in Chapter 1 of "Bioinformatics Algorithms".
 

 

FREQUENTWORDS

 
 
If you want to create a 4-mer pattern, the number of possible patterns would be 4 x 4 x 4 x 4.
 
1) And if the sequence is "ACGACGACA", 4-mer pattern can be narrowed to 6 patterns:
 
ACGA, CGAC, GACG, ACGA, CGAC, GACA
 
Then, the most frequent sequence is 'CGAC' and 'ACGA'.
 
2) And if the sequence is "ACGAC", 2-mer pattern can be narrowed to 4 patterns:
 
AC, CG, GA, AC
 
Then, the most frequent sequence is 'AC'.
 
By using an algorithm, you can sort and find the most frequent sequence.
 
First, let's create an INDEX on pattern 2).
 

for i <- 0 to 3

INDEX(i) <- PATTERNTONUMBER(Pattern) 

COUNT(i) <- 1

# Pattern : AC, CG, GA, AC

 
 

 

ROSALIND | Implement PatternToNumber

It appears that your browser has JavaScript disabled. Rosalind requires your browser to be JavaScript enabled. Implement PatternToNumber solved by 1455 2015년 7월 29일 1:07:15 오전 by Rosalind Team Implement PatternToNumber Convert a DNA string to a n

rosalind.info

 
The function 'PATTERNTONUMBER(Pattern)' produces an index on the pattern like this:
 

 
And that's a result of INDEX(i) & COUNT(i) of PATTERN 2.
 

 
Then, the next step is sorting the index and combining the COUNT of the same things like this.
 

In this way, we can return what INDEX(sequence) has the most frequent COUNT.
 
However, as I have mentioned, some sequences can include mismatches for several reasons.
 
Therefore, it is essential to develop algorithms that can accommodate mismatches in the DNA sequence.
 

출처: pixabay

 
 
Let's suppose we have to find the 2-mer pattern 'AG' in ACGAGGAGA, allowing one mismatch.
 
In order to find the answer more quickly, I utilized ChatGPT.
 

 
 
In the sequence 'ACGAGGAGA', two patterns can be found, position 1 'AC', and position 7 'AG'.
 
How about two mismatches in the 3-mer pattern 'ACG'?
 

 
 
 
Like this, we can find hidden messages in DNA through these analyses.
 
But, unlike these sequences, we generally have to figure out patterns in a large number of alphabets.
 
 

 
Thank you for reading my posts!
 
I will come with Chapter 2 of the book next time :)


 

 

과학 is 일상 : 네이버 블로그

과학기술정책가를 꿈꾸는 생물정보학도이자 과학기술계와 일반 대중을 연결하는 과학커뮤니케이터 TKM입니다! PC로 보면 좋습니다~ 공지 글로 과학꿈터뷰 전자책을 무료 공유하고 있습니다! 문

blog.naver.com

 
 

 

Bioinformatics OPEN STUDY | Notion

본 페이지는 비영리 목적으로 ‘생물정보학’ 관련 프로그래밍 공부 내용을 정리 및 공유하기 위해 마련해본 공간입니다. 본 페이지는 원래 비공개 페이지였지만, 일부를 공개 내용으로 바꾸는

sparkling-dibble-7ff.notion.site