Smith-Waterman algorithm for local sequence alignment with affine gap penalties
Clash Royale CLAN TAG#URR8PPP
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;
up vote
2
down vote
favorite
The following is a summary from the top of my head so please correct me if I'm wrong. See the Wikipedia article for more.
This is about determining the (score / edit distance of the) most similar substrings of two given strings.
For example, given the two sequences
TGTTACGG and
GGTTGACTA
the optimal (w.r.t. to a certain scoring function) local alignment would be
GTTA-C
GTTGAC
I am using a slightly different way to score introduction and continuation of gaps than in the Wikipedia article. Technically, in addition to one scoring matrix / edit graph, you have two more - each accounting for gaps in one of the input words.
Given two input words a = a1 a2, ..., an
, b = b1 b2 ... bm
, let s_ij
denote the score of the optimal local alignment of a1..ai
and b1..bj
. This then gives the following recurrence:
With rho
penalty for introducing a gap; sigma
penalty for continuing a gap; and delta
a function that assigns a score to two matched characters (e.g. 1 if equal, -1 if unequal). The higher the score the better.
This can be solved using Dynamic Programming.
On the basis of this blog article I wrote the following Haskell code:
import qualified Data.Array as Array
import Data.Array
scoreLocal :: String -> String -> Float
scoreLocal a b = maximum mid
where (m, n) = (length a, length b)
-- use input as arrays for fast direct indexing
v = Array.listArray (1, m) a
w = Array.listArray (1, n) b
-- declare scoring fns for each "edit graph level"
scmid i 0 = 0
scmid 0 j = 0
scmid i j = maximum [ 0 -- start here ("free" edges from (0,0))
,(mid ! (i-1, j-1)) + ô (v!i) (w!j) -- match or mismatch
, low ! (i , j) -- end deletion / gap
, upp ! (i , j) -- end insertion / gap
]
scupp i 0 = 0
scupp 0 j = 0
scupp i j = maximum [
(upp ! (i, j-1)) - ÃÂ -- cont gap in v
, (mid ! (i, j-1)) - ÃÂ -- start gap in v
]
sclow i 0 = 0
sclow 0 j = 0
sclow i j = maximum [
(low ! (i-1, j)) - ÃÂ -- cont gap in w
, (mid ! (i-1, j)) - ÃÂ -- start gap in w
]
-- declare content of "edit graph levels"
mid = Array.listArray bounds
[scmid i j | (i, j) <- Array.range bounds]
low = Array.listArray bounds
[sclow i j | (i, j) <- Array.range bounds]
upp = Array.listArray bounds
[scupp i j | (i, j) <- Array.range bounds]
bounds = ((0,0), (m,n))
ô v w | v == w = 1 -- match
| otherwise = -2 -- mismatch
ÃÂ = 5
ÃÂ = 0.5
Question
I dont have much experience yet with Haskell. Can (parts of this) be written in a nicer way?
haskell dynamic-programming bioinformatics
add a comment |Â
up vote
2
down vote
favorite
The following is a summary from the top of my head so please correct me if I'm wrong. See the Wikipedia article for more.
This is about determining the (score / edit distance of the) most similar substrings of two given strings.
For example, given the two sequences
TGTTACGG and
GGTTGACTA
the optimal (w.r.t. to a certain scoring function) local alignment would be
GTTA-C
GTTGAC
I am using a slightly different way to score introduction and continuation of gaps than in the Wikipedia article. Technically, in addition to one scoring matrix / edit graph, you have two more - each accounting for gaps in one of the input words.
Given two input words a = a1 a2, ..., an
, b = b1 b2 ... bm
, let s_ij
denote the score of the optimal local alignment of a1..ai
and b1..bj
. This then gives the following recurrence:
With rho
penalty for introducing a gap; sigma
penalty for continuing a gap; and delta
a function that assigns a score to two matched characters (e.g. 1 if equal, -1 if unequal). The higher the score the better.
This can be solved using Dynamic Programming.
On the basis of this blog article I wrote the following Haskell code:
import qualified Data.Array as Array
import Data.Array
scoreLocal :: String -> String -> Float
scoreLocal a b = maximum mid
where (m, n) = (length a, length b)
-- use input as arrays for fast direct indexing
v = Array.listArray (1, m) a
w = Array.listArray (1, n) b
-- declare scoring fns for each "edit graph level"
scmid i 0 = 0
scmid 0 j = 0
scmid i j = maximum [ 0 -- start here ("free" edges from (0,0))
,(mid ! (i-1, j-1)) + ô (v!i) (w!j) -- match or mismatch
, low ! (i , j) -- end deletion / gap
, upp ! (i , j) -- end insertion / gap
]
scupp i 0 = 0
scupp 0 j = 0
scupp i j = maximum [
(upp ! (i, j-1)) - ÃÂ -- cont gap in v
, (mid ! (i, j-1)) - ÃÂ -- start gap in v
]
sclow i 0 = 0
sclow 0 j = 0
sclow i j = maximum [
(low ! (i-1, j)) - ÃÂ -- cont gap in w
, (mid ! (i-1, j)) - ÃÂ -- start gap in w
]
-- declare content of "edit graph levels"
mid = Array.listArray bounds
[scmid i j | (i, j) <- Array.range bounds]
low = Array.listArray bounds
[sclow i j | (i, j) <- Array.range bounds]
upp = Array.listArray bounds
[scupp i j | (i, j) <- Array.range bounds]
bounds = ((0,0), (m,n))
ô v w | v == w = 1 -- match
| otherwise = -2 -- mismatch
ÃÂ = 5
ÃÂ = 0.5
Question
I dont have much experience yet with Haskell. Can (parts of this) be written in a nicer way?
haskell dynamic-programming bioinformatics
Why does the scoring function prefer the A before the gap and not after the gap?
â Vogel612â¦
May 23 at 13:16
2
Your second question, of how to modify the code to compute that value, is off-topic; we only review existing code, we don't enhance or provide enhancements to the functionality.
â Dannnno
May 23 at 13:48
I removed the off-topic part of your question, it was either that or getting the question closed. Please take a look at the help center. Feature requests are not something we do.
â Mast
May 24 at 9:22
add a comment |Â
up vote
2
down vote
favorite
up vote
2
down vote
favorite
The following is a summary from the top of my head so please correct me if I'm wrong. See the Wikipedia article for more.
This is about determining the (score / edit distance of the) most similar substrings of two given strings.
For example, given the two sequences
TGTTACGG and
GGTTGACTA
the optimal (w.r.t. to a certain scoring function) local alignment would be
GTTA-C
GTTGAC
I am using a slightly different way to score introduction and continuation of gaps than in the Wikipedia article. Technically, in addition to one scoring matrix / edit graph, you have two more - each accounting for gaps in one of the input words.
Given two input words a = a1 a2, ..., an
, b = b1 b2 ... bm
, let s_ij
denote the score of the optimal local alignment of a1..ai
and b1..bj
. This then gives the following recurrence:
With rho
penalty for introducing a gap; sigma
penalty for continuing a gap; and delta
a function that assigns a score to two matched characters (e.g. 1 if equal, -1 if unequal). The higher the score the better.
This can be solved using Dynamic Programming.
On the basis of this blog article I wrote the following Haskell code:
import qualified Data.Array as Array
import Data.Array
scoreLocal :: String -> String -> Float
scoreLocal a b = maximum mid
where (m, n) = (length a, length b)
-- use input as arrays for fast direct indexing
v = Array.listArray (1, m) a
w = Array.listArray (1, n) b
-- declare scoring fns for each "edit graph level"
scmid i 0 = 0
scmid 0 j = 0
scmid i j = maximum [ 0 -- start here ("free" edges from (0,0))
,(mid ! (i-1, j-1)) + ô (v!i) (w!j) -- match or mismatch
, low ! (i , j) -- end deletion / gap
, upp ! (i , j) -- end insertion / gap
]
scupp i 0 = 0
scupp 0 j = 0
scupp i j = maximum [
(upp ! (i, j-1)) - ÃÂ -- cont gap in v
, (mid ! (i, j-1)) - ÃÂ -- start gap in v
]
sclow i 0 = 0
sclow 0 j = 0
sclow i j = maximum [
(low ! (i-1, j)) - ÃÂ -- cont gap in w
, (mid ! (i-1, j)) - ÃÂ -- start gap in w
]
-- declare content of "edit graph levels"
mid = Array.listArray bounds
[scmid i j | (i, j) <- Array.range bounds]
low = Array.listArray bounds
[sclow i j | (i, j) <- Array.range bounds]
upp = Array.listArray bounds
[scupp i j | (i, j) <- Array.range bounds]
bounds = ((0,0), (m,n))
ô v w | v == w = 1 -- match
| otherwise = -2 -- mismatch
ÃÂ = 5
ÃÂ = 0.5
Question
I dont have much experience yet with Haskell. Can (parts of this) be written in a nicer way?
haskell dynamic-programming bioinformatics
The following is a summary from the top of my head so please correct me if I'm wrong. See the Wikipedia article for more.
This is about determining the (score / edit distance of the) most similar substrings of two given strings.
For example, given the two sequences
TGTTACGG and
GGTTGACTA
the optimal (w.r.t. to a certain scoring function) local alignment would be
GTTA-C
GTTGAC
I am using a slightly different way to score introduction and continuation of gaps than in the Wikipedia article. Technically, in addition to one scoring matrix / edit graph, you have two more - each accounting for gaps in one of the input words.
Given two input words a = a1 a2, ..., an
, b = b1 b2 ... bm
, let s_ij
denote the score of the optimal local alignment of a1..ai
and b1..bj
. This then gives the following recurrence:
With rho
penalty for introducing a gap; sigma
penalty for continuing a gap; and delta
a function that assigns a score to two matched characters (e.g. 1 if equal, -1 if unequal). The higher the score the better.
This can be solved using Dynamic Programming.
On the basis of this blog article I wrote the following Haskell code:
import qualified Data.Array as Array
import Data.Array
scoreLocal :: String -> String -> Float
scoreLocal a b = maximum mid
where (m, n) = (length a, length b)
-- use input as arrays for fast direct indexing
v = Array.listArray (1, m) a
w = Array.listArray (1, n) b
-- declare scoring fns for each "edit graph level"
scmid i 0 = 0
scmid 0 j = 0
scmid i j = maximum [ 0 -- start here ("free" edges from (0,0))
,(mid ! (i-1, j-1)) + ô (v!i) (w!j) -- match or mismatch
, low ! (i , j) -- end deletion / gap
, upp ! (i , j) -- end insertion / gap
]
scupp i 0 = 0
scupp 0 j = 0
scupp i j = maximum [
(upp ! (i, j-1)) - ÃÂ -- cont gap in v
, (mid ! (i, j-1)) - ÃÂ -- start gap in v
]
sclow i 0 = 0
sclow 0 j = 0
sclow i j = maximum [
(low ! (i-1, j)) - ÃÂ -- cont gap in w
, (mid ! (i-1, j)) - ÃÂ -- start gap in w
]
-- declare content of "edit graph levels"
mid = Array.listArray bounds
[scmid i j | (i, j) <- Array.range bounds]
low = Array.listArray bounds
[sclow i j | (i, j) <- Array.range bounds]
upp = Array.listArray bounds
[scupp i j | (i, j) <- Array.range bounds]
bounds = ((0,0), (m,n))
ô v w | v == w = 1 -- match
| otherwise = -2 -- mismatch
ÃÂ = 5
ÃÂ = 0.5
Question
I dont have much experience yet with Haskell. Can (parts of this) be written in a nicer way?
haskell dynamic-programming bioinformatics
edited May 24 at 9:21
Mast
7,32663484
7,32663484
asked May 23 at 13:09
ngmir
112
112
Why does the scoring function prefer the A before the gap and not after the gap?
â Vogel612â¦
May 23 at 13:16
2
Your second question, of how to modify the code to compute that value, is off-topic; we only review existing code, we don't enhance or provide enhancements to the functionality.
â Dannnno
May 23 at 13:48
I removed the off-topic part of your question, it was either that or getting the question closed. Please take a look at the help center. Feature requests are not something we do.
â Mast
May 24 at 9:22
add a comment |Â
Why does the scoring function prefer the A before the gap and not after the gap?
â Vogel612â¦
May 23 at 13:16
2
Your second question, of how to modify the code to compute that value, is off-topic; we only review existing code, we don't enhance or provide enhancements to the functionality.
â Dannnno
May 23 at 13:48
I removed the off-topic part of your question, it was either that or getting the question closed. Please take a look at the help center. Feature requests are not something we do.
â Mast
May 24 at 9:22
Why does the scoring function prefer the A before the gap and not after the gap?
â Vogel612â¦
May 23 at 13:16
Why does the scoring function prefer the A before the gap and not after the gap?
â Vogel612â¦
May 23 at 13:16
2
2
Your second question, of how to modify the code to compute that value, is off-topic; we only review existing code, we don't enhance or provide enhancements to the functionality.
â Dannnno
May 23 at 13:48
Your second question, of how to modify the code to compute that value, is off-topic; we only review existing code, we don't enhance or provide enhancements to the functionality.
â Dannnno
May 23 at 13:48
I removed the off-topic part of your question, it was either that or getting the question closed. Please take a look at the help center. Feature requests are not something we do.
â Mast
May 24 at 9:22
I removed the off-topic part of your question, it was either that or getting the question closed. Please take a look at the help center. Feature requests are not something we do.
â Mast
May 24 at 9:22
add a comment |Â
active
oldest
votes
active
oldest
votes
active
oldest
votes
active
oldest
votes
active
oldest
votes
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f195020%2fsmith-waterman-algorithm-for-local-sequence-alignment-with-affine-gap-penalties%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Why does the scoring function prefer the A before the gap and not after the gap?
â Vogel612â¦
May 23 at 13:16
2
Your second question, of how to modify the code to compute that value, is off-topic; we only review existing code, we don't enhance or provide enhancements to the functionality.
â Dannnno
May 23 at 13:48
I removed the off-topic part of your question, it was either that or getting the question closed. Please take a look at the help center. Feature requests are not something we do.
â Mast
May 24 at 9:22