Haskell sentence segregation
Clash Royale CLAN TAG#URR8PPP
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;
up vote
2
down vote
favorite
I am trying to implement sentence segregation using Haskell, I have achieved a decent bulk of it using the NLP.FullStop
library, but this doesn't seem to account for sentences with full stops at the end of quotes like this."
or like this.'
, or at the end of bracketed sentences like this.)
I also want to deal with the character âÂÂ
much in the same way as "
, as a lot of the content I am dealing with uses this character. I've been unable to get a successful regex match on this character, so have resorted to replacing it with "
before the regex...
import qualified Data.ByteString.Char8 as BC
import Data.List.Split
import qualified NLP.FullStop as FS
splitter :: String -> [String]
splitter = concatMap FS.segment . splitPunc
where splitPunc = map unwords . split puncSplitter . words
puncSplitter = keepDelimsR $ whenElt (word -> BC.pack (splitPrep word) =~ puncExpr :: Bool)
splitPrep = replace_ 'âÂÂ' '"'
puncExpr = "\.[)'"][^w]?$" :: String
replace_ :: Eq b => b -> b -> [b] -> [b]
replace_ a b = map (x -> if (a == x) then b else x)
haskell natural-language-processing
add a comment |Â
up vote
2
down vote
favorite
I am trying to implement sentence segregation using Haskell, I have achieved a decent bulk of it using the NLP.FullStop
library, but this doesn't seem to account for sentences with full stops at the end of quotes like this."
or like this.'
, or at the end of bracketed sentences like this.)
I also want to deal with the character âÂÂ
much in the same way as "
, as a lot of the content I am dealing with uses this character. I've been unable to get a successful regex match on this character, so have resorted to replacing it with "
before the regex...
import qualified Data.ByteString.Char8 as BC
import Data.List.Split
import qualified NLP.FullStop as FS
splitter :: String -> [String]
splitter = concatMap FS.segment . splitPunc
where splitPunc = map unwords . split puncSplitter . words
puncSplitter = keepDelimsR $ whenElt (word -> BC.pack (splitPrep word) =~ puncExpr :: Bool)
splitPrep = replace_ 'âÂÂ' '"'
puncExpr = "\.[)'"][^w]?$" :: String
replace_ :: Eq b => b -> b -> [b] -> [b]
replace_ a b = map (x -> if (a == x) then b else x)
haskell natural-language-processing
1
Does your code as posted work correctly to accomplish the task?
â Phrancis
Mar 20 at 3:42
Yes, the text in my post is hopefully to give some context around why I have done certain things in this code, hopefully to aid whoever reads it, as it is not very readable to me
â danbroooks
Mar 20 at 8:41
Your code is missing at least one include for=~
.
â Zeta
Mar 29 at 9:07
add a comment |Â
up vote
2
down vote
favorite
up vote
2
down vote
favorite
I am trying to implement sentence segregation using Haskell, I have achieved a decent bulk of it using the NLP.FullStop
library, but this doesn't seem to account for sentences with full stops at the end of quotes like this."
or like this.'
, or at the end of bracketed sentences like this.)
I also want to deal with the character âÂÂ
much in the same way as "
, as a lot of the content I am dealing with uses this character. I've been unable to get a successful regex match on this character, so have resorted to replacing it with "
before the regex...
import qualified Data.ByteString.Char8 as BC
import Data.List.Split
import qualified NLP.FullStop as FS
splitter :: String -> [String]
splitter = concatMap FS.segment . splitPunc
where splitPunc = map unwords . split puncSplitter . words
puncSplitter = keepDelimsR $ whenElt (word -> BC.pack (splitPrep word) =~ puncExpr :: Bool)
splitPrep = replace_ 'âÂÂ' '"'
puncExpr = "\.[)'"][^w]?$" :: String
replace_ :: Eq b => b -> b -> [b] -> [b]
replace_ a b = map (x -> if (a == x) then b else x)
haskell natural-language-processing
I am trying to implement sentence segregation using Haskell, I have achieved a decent bulk of it using the NLP.FullStop
library, but this doesn't seem to account for sentences with full stops at the end of quotes like this."
or like this.'
, or at the end of bracketed sentences like this.)
I also want to deal with the character âÂÂ
much in the same way as "
, as a lot of the content I am dealing with uses this character. I've been unable to get a successful regex match on this character, so have resorted to replacing it with "
before the regex...
import qualified Data.ByteString.Char8 as BC
import Data.List.Split
import qualified NLP.FullStop as FS
splitter :: String -> [String]
splitter = concatMap FS.segment . splitPunc
where splitPunc = map unwords . split puncSplitter . words
puncSplitter = keepDelimsR $ whenElt (word -> BC.pack (splitPrep word) =~ puncExpr :: Bool)
splitPrep = replace_ 'âÂÂ' '"'
puncExpr = "\.[)'"][^w]?$" :: String
replace_ :: Eq b => b -> b -> [b] -> [b]
replace_ a b = map (x -> if (a == x) then b else x)
haskell natural-language-processing
asked Mar 20 at 1:03
danbroooks
1608
1608
1
Does your code as posted work correctly to accomplish the task?
â Phrancis
Mar 20 at 3:42
Yes, the text in my post is hopefully to give some context around why I have done certain things in this code, hopefully to aid whoever reads it, as it is not very readable to me
â danbroooks
Mar 20 at 8:41
Your code is missing at least one include for=~
.
â Zeta
Mar 29 at 9:07
add a comment |Â
1
Does your code as posted work correctly to accomplish the task?
â Phrancis
Mar 20 at 3:42
Yes, the text in my post is hopefully to give some context around why I have done certain things in this code, hopefully to aid whoever reads it, as it is not very readable to me
â danbroooks
Mar 20 at 8:41
Your code is missing at least one include for=~
.
â Zeta
Mar 29 at 9:07
1
1
Does your code as posted work correctly to accomplish the task?
â Phrancis
Mar 20 at 3:42
Does your code as posted work correctly to accomplish the task?
â Phrancis
Mar 20 at 3:42
Yes, the text in my post is hopefully to give some context around why I have done certain things in this code, hopefully to aid whoever reads it, as it is not very readable to me
â danbroooks
Mar 20 at 8:41
Yes, the text in my post is hopefully to give some context around why I have done certain things in this code, hopefully to aid whoever reads it, as it is not very readable to me
â danbroooks
Mar 20 at 8:41
Your code is missing at least one include for
=~
.â Zeta
Mar 29 at 9:07
Your code is missing at least one include for
=~
.â Zeta
Mar 29 at 9:07
add a comment |Â
1 Answer
1
active
oldest
votes
up vote
1
down vote
While your code works and uses type signatures, it's missing documentation. It's not clear from your description or your code what splitter
's intended result will be on a given input. Documentation and tests are therefore highly welcome.
Also, it's not clear why you've added an underscore to replace_
. And your code is missing at least one include for =~
. I assume that you just forgot to include that import line in your question and it is in your actual code.
That being said, the fullstop
library isâÂÂaccording to its own documentationâÂÂa placeholder library:
Note that this package is mostly a placeholder. I hope the Haskell/NLP
communities will run with it and upload a more sophisticated (family
of) segmenter(s) in its place. Patches (and new maintainers) would be
greeted with delight!
Your quarrel about the line endings also comes from segment
, since it hard-codes the allowed punctuations:
-- https://hackage.haskell.org/package/fullstop-0.1.4/docs/src/NLP-FullStop.html#stopPunctuation
stopPunctuation :: [Char]
stopPunctuation = [ '.', '?', '!' ] -- <<<<
Unfortunately, you cannot expand stopPunctuation
, since content in parentheses (like this) does not lead to a new sentence. Note that .)
and ."
aren't valid in some languages, though, they require ).
and ".
, so it's not clear what you try to achieve there (see comment above documentation above).
So all in all, well written, but without additional explanation or documentation there is no way to check whether the function actually does what you want. I also suggest you to add some tests.
add a comment |Â
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
1
down vote
While your code works and uses type signatures, it's missing documentation. It's not clear from your description or your code what splitter
's intended result will be on a given input. Documentation and tests are therefore highly welcome.
Also, it's not clear why you've added an underscore to replace_
. And your code is missing at least one include for =~
. I assume that you just forgot to include that import line in your question and it is in your actual code.
That being said, the fullstop
library isâÂÂaccording to its own documentationâÂÂa placeholder library:
Note that this package is mostly a placeholder. I hope the Haskell/NLP
communities will run with it and upload a more sophisticated (family
of) segmenter(s) in its place. Patches (and new maintainers) would be
greeted with delight!
Your quarrel about the line endings also comes from segment
, since it hard-codes the allowed punctuations:
-- https://hackage.haskell.org/package/fullstop-0.1.4/docs/src/NLP-FullStop.html#stopPunctuation
stopPunctuation :: [Char]
stopPunctuation = [ '.', '?', '!' ] -- <<<<
Unfortunately, you cannot expand stopPunctuation
, since content in parentheses (like this) does not lead to a new sentence. Note that .)
and ."
aren't valid in some languages, though, they require ).
and ".
, so it's not clear what you try to achieve there (see comment above documentation above).
So all in all, well written, but without additional explanation or documentation there is no way to check whether the function actually does what you want. I also suggest you to add some tests.
add a comment |Â
up vote
1
down vote
While your code works and uses type signatures, it's missing documentation. It's not clear from your description or your code what splitter
's intended result will be on a given input. Documentation and tests are therefore highly welcome.
Also, it's not clear why you've added an underscore to replace_
. And your code is missing at least one include for =~
. I assume that you just forgot to include that import line in your question and it is in your actual code.
That being said, the fullstop
library isâÂÂaccording to its own documentationâÂÂa placeholder library:
Note that this package is mostly a placeholder. I hope the Haskell/NLP
communities will run with it and upload a more sophisticated (family
of) segmenter(s) in its place. Patches (and new maintainers) would be
greeted with delight!
Your quarrel about the line endings also comes from segment
, since it hard-codes the allowed punctuations:
-- https://hackage.haskell.org/package/fullstop-0.1.4/docs/src/NLP-FullStop.html#stopPunctuation
stopPunctuation :: [Char]
stopPunctuation = [ '.', '?', '!' ] -- <<<<
Unfortunately, you cannot expand stopPunctuation
, since content in parentheses (like this) does not lead to a new sentence. Note that .)
and ."
aren't valid in some languages, though, they require ).
and ".
, so it's not clear what you try to achieve there (see comment above documentation above).
So all in all, well written, but without additional explanation or documentation there is no way to check whether the function actually does what you want. I also suggest you to add some tests.
add a comment |Â
up vote
1
down vote
up vote
1
down vote
While your code works and uses type signatures, it's missing documentation. It's not clear from your description or your code what splitter
's intended result will be on a given input. Documentation and tests are therefore highly welcome.
Also, it's not clear why you've added an underscore to replace_
. And your code is missing at least one include for =~
. I assume that you just forgot to include that import line in your question and it is in your actual code.
That being said, the fullstop
library isâÂÂaccording to its own documentationâÂÂa placeholder library:
Note that this package is mostly a placeholder. I hope the Haskell/NLP
communities will run with it and upload a more sophisticated (family
of) segmenter(s) in its place. Patches (and new maintainers) would be
greeted with delight!
Your quarrel about the line endings also comes from segment
, since it hard-codes the allowed punctuations:
-- https://hackage.haskell.org/package/fullstop-0.1.4/docs/src/NLP-FullStop.html#stopPunctuation
stopPunctuation :: [Char]
stopPunctuation = [ '.', '?', '!' ] -- <<<<
Unfortunately, you cannot expand stopPunctuation
, since content in parentheses (like this) does not lead to a new sentence. Note that .)
and ."
aren't valid in some languages, though, they require ).
and ".
, so it's not clear what you try to achieve there (see comment above documentation above).
So all in all, well written, but without additional explanation or documentation there is no way to check whether the function actually does what you want. I also suggest you to add some tests.
While your code works and uses type signatures, it's missing documentation. It's not clear from your description or your code what splitter
's intended result will be on a given input. Documentation and tests are therefore highly welcome.
Also, it's not clear why you've added an underscore to replace_
. And your code is missing at least one include for =~
. I assume that you just forgot to include that import line in your question and it is in your actual code.
That being said, the fullstop
library isâÂÂaccording to its own documentationâÂÂa placeholder library:
Note that this package is mostly a placeholder. I hope the Haskell/NLP
communities will run with it and upload a more sophisticated (family
of) segmenter(s) in its place. Patches (and new maintainers) would be
greeted with delight!
Your quarrel about the line endings also comes from segment
, since it hard-codes the allowed punctuations:
-- https://hackage.haskell.org/package/fullstop-0.1.4/docs/src/NLP-FullStop.html#stopPunctuation
stopPunctuation :: [Char]
stopPunctuation = [ '.', '?', '!' ] -- <<<<
Unfortunately, you cannot expand stopPunctuation
, since content in parentheses (like this) does not lead to a new sentence. Note that .)
and ."
aren't valid in some languages, though, they require ).
and ".
, so it's not clear what you try to achieve there (see comment above documentation above).
So all in all, well written, but without additional explanation or documentation there is no way to check whether the function actually does what you want. I also suggest you to add some tests.
answered Mar 29 at 9:17
Zeta
14.3k23267
14.3k23267
add a comment |Â
add a comment |Â
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f189989%2fhaskell-sentence-segregation%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
1
Does your code as posted work correctly to accomplish the task?
â Phrancis
Mar 20 at 3:42
Yes, the text in my post is hopefully to give some context around why I have done certain things in this code, hopefully to aid whoever reads it, as it is not very readable to me
â danbroooks
Mar 20 at 8:41
Your code is missing at least one include for
=~
.â Zeta
Mar 29 at 9:07