[Bioinformatics Algorithms Ch 01] The algorithm of DNA Replication

Hi I'm a TKM.

Today, I will introduce some algorithms in Chapter 1 of "Bioinformatics Algorithms".

FREQUENTWORDS

If you want to create a 4-mer pattern, the number of possible patterns would be 4 x 4 x 4 x 4.

1) And if the sequence is "ACGACGACA", 4-mer pattern can be narrowed to 6 patterns:

ACGA, CGAC, GACG, ACGA, CGAC, GACA

Then, the most frequent sequence is 'CGAC' and 'ACGA'.

2) And if the sequence is "ACGAC", 2-mer pattern can be narrowed to 4 patterns:

AC, CG, GA, AC

Then, the most frequent sequence is 'AC'.

By using an algorithm, you can sort and find the most frequent sequence.

First, let's create an INDEX on pattern 2).

for i <- 0 to 3

INDEX(i) <- PATTERNTONUMBER(Pattern) 

COUNT(i) <- 1

# Pattern : AC, CG, GA, AC

ROSALIND | Implement PatternToNumber

It appears that your browser has JavaScript disabled. Rosalind requires your browser to be JavaScript enabled. Implement PatternToNumber solved by 1455 2015년 7월 29일 1:07:15 오전 by Rosalind Team Implement PatternToNumber Convert a DNA string to a n

rosalind.info

The function 'PATTERNTONUMBER(Pattern)' produces an index on the pattern like this:

And that's a result of INDEX(i) & COUNT(i) of PATTERN 2.

Then, the next step is sorting the index and combining the COUNT of the same things like this.

In this way, we can return what INDEX(sequence) has the most frequent COUNT.

However, as I have mentioned, some sequences can include mismatches for several reasons.

Therefore, it is essential to develop algorithms that can accommodate mismatches in the DNA sequence.

Let's suppose we have to find the 2-mer pattern 'AG' in ACGAGGAGA, allowing one mismatch.

In order to find the answer more quickly, I utilized ChatGPT.

In the sequence 'ACGAGGAGA', two patterns can be found, position 1 'AC', and position 7 'AG'.

How about two mismatches in the 3-mer pattern 'ACG'?

Like this, we can find hidden messages in DNA through these analyses.

But, unlike these sequences, we generally have to figure out patterns in a large number of alphabets.

Thank you for reading my posts!

I will come with Chapter 2 of the book next time :)

과학 is 일상 : 네이버 블로그

과학기술정책가를 꿈꾸는 생물정보학도이자 과학기술계와 일반 대중을 연결하는 과학커뮤니케이터 TKM입니다! PC로 보면 좋습니다~ 공지 글로 과학꿈터뷰 전자책을 무료 공유하고 있습니다! 문

blog.naver.com

Bioinformatics OPEN STUDY | Notion

본 페이지는 비영리 목적으로 ‘생물정보학’ 관련 프로그래밍 공부 내용을 정리 및 공유하기 위해 마련해본 공간입니다. 본 페이지는 원래 비공개 페이지였지만, 일부를 공개 내용으로 바꾸는

sparkling-dibble-7ff.notion.site

'생물정보학(바이오인포매틱스) > 생물정보학 책' 카테고리의 다른 글

[생물정보학 알고리듬 4장] '항생제 내성 세균'에 대응하기 위한 '항생제 펩티드 아미노산 순서' 알아내기 (0)	2024.03.08
[생물정보학 알고리듬 03장] 컨티그 조립하기(최대 비분기 경로) & DNA array와 트랜스포존(transposon) (2)	2024.03.02
[생물정보학 알고리듬 3장] '리드(read)'로부터 유전체 뉴클레오티드 서열 조립하기 :: 드 브루인 그래프, 해밀턴 경로, 오일러 경로 (1)	2024.03.02
[Bioinformatics Algorithms Ch 02] Circadian Clock and Regulatory Motifs (3)	2024.02.25
[Bioinformatics Algorithms Ch 01] The story of DNA Replication (1)	2024.02.19

TKM's STUDY BLOG

[Bioinformatics Algorithms Ch 01] The algorithm of DNA Replication

'생물정보학(바이오인포매틱스) > 생물정보학 책' 카테고리의 다른 글

티스토리툴바

[Bioinformatics Algorithms Ch 01] The algorithm of DNA Replication

'생물정보학(바이오인포매틱스) > 생물정보학 책' 카테고리의 다른 글

'생물정보학(바이오인포매틱스)/생물정보학 책' Related Articles

티스토리툴바