Haskell sentence segregation

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP





.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;







up vote
2
down vote

favorite












I am trying to implement sentence segregation using Haskell, I have achieved a decent bulk of it using the NLP.FullStop library, but this doesn't seem to account for sentences with full stops at the end of quotes like this." or like this.', or at the end of bracketed sentences like this.) I also want to deal with the character ” much in the same way as ", as a lot of the content I am dealing with uses this character. I've been unable to get a successful regex match on this character, so have resorted to replacing it with " before the regex...



import qualified Data.ByteString.Char8 as BC
import Data.List.Split
import qualified NLP.FullStop as FS

splitter :: String -> [String]
splitter = concatMap FS.segment . splitPunc
where splitPunc = map unwords . split puncSplitter . words
puncSplitter = keepDelimsR $ whenElt (word -> BC.pack (splitPrep word) =~ puncExpr :: Bool)
splitPrep = replace_ '”' '"'
puncExpr = "\.[)'"][^w]?$" :: String

replace_ :: Eq b => b -> b -> [b] -> [b]
replace_ a b = map (x -> if (a == x) then b else x)






share|improve this question















  • 1




    Does your code as posted work correctly to accomplish the task?
    – Phrancis
    Mar 20 at 3:42










  • Yes, the text in my post is hopefully to give some context around why I have done certain things in this code, hopefully to aid whoever reads it, as it is not very readable to me
    – danbroooks
    Mar 20 at 8:41










  • Your code is missing at least one include for =~.
    – Zeta
    Mar 29 at 9:07
















up vote
2
down vote

favorite












I am trying to implement sentence segregation using Haskell, I have achieved a decent bulk of it using the NLP.FullStop library, but this doesn't seem to account for sentences with full stops at the end of quotes like this." or like this.', or at the end of bracketed sentences like this.) I also want to deal with the character ” much in the same way as ", as a lot of the content I am dealing with uses this character. I've been unable to get a successful regex match on this character, so have resorted to replacing it with " before the regex...



import qualified Data.ByteString.Char8 as BC
import Data.List.Split
import qualified NLP.FullStop as FS

splitter :: String -> [String]
splitter = concatMap FS.segment . splitPunc
where splitPunc = map unwords . split puncSplitter . words
puncSplitter = keepDelimsR $ whenElt (word -> BC.pack (splitPrep word) =~ puncExpr :: Bool)
splitPrep = replace_ '”' '"'
puncExpr = "\.[)'"][^w]?$" :: String

replace_ :: Eq b => b -> b -> [b] -> [b]
replace_ a b = map (x -> if (a == x) then b else x)






share|improve this question















  • 1




    Does your code as posted work correctly to accomplish the task?
    – Phrancis
    Mar 20 at 3:42










  • Yes, the text in my post is hopefully to give some context around why I have done certain things in this code, hopefully to aid whoever reads it, as it is not very readable to me
    – danbroooks
    Mar 20 at 8:41










  • Your code is missing at least one include for =~.
    – Zeta
    Mar 29 at 9:07












up vote
2
down vote

favorite









up vote
2
down vote

favorite











I am trying to implement sentence segregation using Haskell, I have achieved a decent bulk of it using the NLP.FullStop library, but this doesn't seem to account for sentences with full stops at the end of quotes like this." or like this.', or at the end of bracketed sentences like this.) I also want to deal with the character ” much in the same way as ", as a lot of the content I am dealing with uses this character. I've been unable to get a successful regex match on this character, so have resorted to replacing it with " before the regex...



import qualified Data.ByteString.Char8 as BC
import Data.List.Split
import qualified NLP.FullStop as FS

splitter :: String -> [String]
splitter = concatMap FS.segment . splitPunc
where splitPunc = map unwords . split puncSplitter . words
puncSplitter = keepDelimsR $ whenElt (word -> BC.pack (splitPrep word) =~ puncExpr :: Bool)
splitPrep = replace_ '”' '"'
puncExpr = "\.[)'"][^w]?$" :: String

replace_ :: Eq b => b -> b -> [b] -> [b]
replace_ a b = map (x -> if (a == x) then b else x)






share|improve this question











I am trying to implement sentence segregation using Haskell, I have achieved a decent bulk of it using the NLP.FullStop library, but this doesn't seem to account for sentences with full stops at the end of quotes like this." or like this.', or at the end of bracketed sentences like this.) I also want to deal with the character ” much in the same way as ", as a lot of the content I am dealing with uses this character. I've been unable to get a successful regex match on this character, so have resorted to replacing it with " before the regex...



import qualified Data.ByteString.Char8 as BC
import Data.List.Split
import qualified NLP.FullStop as FS

splitter :: String -> [String]
splitter = concatMap FS.segment . splitPunc
where splitPunc = map unwords . split puncSplitter . words
puncSplitter = keepDelimsR $ whenElt (word -> BC.pack (splitPrep word) =~ puncExpr :: Bool)
splitPrep = replace_ '”' '"'
puncExpr = "\.[)'"][^w]?$" :: String

replace_ :: Eq b => b -> b -> [b] -> [b]
replace_ a b = map (x -> if (a == x) then b else x)








share|improve this question










share|improve this question




share|improve this question









asked Mar 20 at 1:03









danbroooks

1608




1608







  • 1




    Does your code as posted work correctly to accomplish the task?
    – Phrancis
    Mar 20 at 3:42










  • Yes, the text in my post is hopefully to give some context around why I have done certain things in this code, hopefully to aid whoever reads it, as it is not very readable to me
    – danbroooks
    Mar 20 at 8:41










  • Your code is missing at least one include for =~.
    – Zeta
    Mar 29 at 9:07












  • 1




    Does your code as posted work correctly to accomplish the task?
    – Phrancis
    Mar 20 at 3:42










  • Yes, the text in my post is hopefully to give some context around why I have done certain things in this code, hopefully to aid whoever reads it, as it is not very readable to me
    – danbroooks
    Mar 20 at 8:41










  • Your code is missing at least one include for =~.
    – Zeta
    Mar 29 at 9:07







1




1




Does your code as posted work correctly to accomplish the task?
– Phrancis
Mar 20 at 3:42




Does your code as posted work correctly to accomplish the task?
– Phrancis
Mar 20 at 3:42












Yes, the text in my post is hopefully to give some context around why I have done certain things in this code, hopefully to aid whoever reads it, as it is not very readable to me
– danbroooks
Mar 20 at 8:41




Yes, the text in my post is hopefully to give some context around why I have done certain things in this code, hopefully to aid whoever reads it, as it is not very readable to me
– danbroooks
Mar 20 at 8:41












Your code is missing at least one include for =~.
– Zeta
Mar 29 at 9:07




Your code is missing at least one include for =~.
– Zeta
Mar 29 at 9:07










1 Answer
1






active

oldest

votes

















up vote
1
down vote













While your code works and uses type signatures, it's missing documentation. It's not clear from your description or your code what splitter's intended result will be on a given input. Documentation and tests are therefore highly welcome.



Also, it's not clear why you've added an underscore to replace_. And your code is missing at least one include for =~. I assume that you just forgot to include that import line in your question and it is in your actual code.



That being said, the fullstop library is—according to its own documentation—a placeholder library:




Note that this package is mostly a placeholder. I hope the Haskell/NLP
communities will run with it and upload a more sophisticated (family
of) segmenter(s) in its place. Patches (and new maintainers) would be
greeted with delight!




Your quarrel about the line endings also comes from segment, since it hard-codes the allowed punctuations:



-- https://hackage.haskell.org/package/fullstop-0.1.4/docs/src/NLP-FullStop.html#stopPunctuation
stopPunctuation :: [Char]
stopPunctuation = [ '.', '?', '!' ] -- <<<<


Unfortunately, you cannot expand stopPunctuation, since content in parentheses (like this) does not lead to a new sentence. Note that .) and ." aren't valid in some languages, though, they require ). and "., so it's not clear what you try to achieve there (see comment above documentation above).



So all in all, well written, but without additional explanation or documentation there is no way to check whether the function actually does what you want. I also suggest you to add some tests.






share|improve this answer





















    Your Answer




    StackExchange.ifUsing("editor", function ()
    return StackExchange.using("mathjaxEditing", function ()
    StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
    StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
    );
    );
    , "mathjax-editing");

    StackExchange.ifUsing("editor", function ()
    StackExchange.using("externalEditor", function ()
    StackExchange.using("snippets", function ()
    StackExchange.snippets.init();
    );
    );
    , "code-snippets");

    StackExchange.ready(function()
    var channelOptions =
    tags: "".split(" "),
    id: "196"
    ;
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function()
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled)
    StackExchange.using("snippets", function()
    createEditor();
    );

    else
    createEditor();

    );

    function createEditor()
    StackExchange.prepareEditor(
    heartbeatType: 'answer',
    convertImagesToLinks: false,
    noModals: false,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: null,
    bindNavPrevention: true,
    postfix: "",
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    );



    );








     

    draft saved


    draft discarded


















    StackExchange.ready(
    function ()
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f189989%2fhaskell-sentence-segregation%23new-answer', 'question_page');

    );

    Post as a guest






























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes








    up vote
    1
    down vote













    While your code works and uses type signatures, it's missing documentation. It's not clear from your description or your code what splitter's intended result will be on a given input. Documentation and tests are therefore highly welcome.



    Also, it's not clear why you've added an underscore to replace_. And your code is missing at least one include for =~. I assume that you just forgot to include that import line in your question and it is in your actual code.



    That being said, the fullstop library is—according to its own documentation—a placeholder library:




    Note that this package is mostly a placeholder. I hope the Haskell/NLP
    communities will run with it and upload a more sophisticated (family
    of) segmenter(s) in its place. Patches (and new maintainers) would be
    greeted with delight!




    Your quarrel about the line endings also comes from segment, since it hard-codes the allowed punctuations:



    -- https://hackage.haskell.org/package/fullstop-0.1.4/docs/src/NLP-FullStop.html#stopPunctuation
    stopPunctuation :: [Char]
    stopPunctuation = [ '.', '?', '!' ] -- <<<<


    Unfortunately, you cannot expand stopPunctuation, since content in parentheses (like this) does not lead to a new sentence. Note that .) and ." aren't valid in some languages, though, they require ). and "., so it's not clear what you try to achieve there (see comment above documentation above).



    So all in all, well written, but without additional explanation or documentation there is no way to check whether the function actually does what you want. I also suggest you to add some tests.






    share|improve this answer

























      up vote
      1
      down vote













      While your code works and uses type signatures, it's missing documentation. It's not clear from your description or your code what splitter's intended result will be on a given input. Documentation and tests are therefore highly welcome.



      Also, it's not clear why you've added an underscore to replace_. And your code is missing at least one include for =~. I assume that you just forgot to include that import line in your question and it is in your actual code.



      That being said, the fullstop library is—according to its own documentation—a placeholder library:




      Note that this package is mostly a placeholder. I hope the Haskell/NLP
      communities will run with it and upload a more sophisticated (family
      of) segmenter(s) in its place. Patches (and new maintainers) would be
      greeted with delight!




      Your quarrel about the line endings also comes from segment, since it hard-codes the allowed punctuations:



      -- https://hackage.haskell.org/package/fullstop-0.1.4/docs/src/NLP-FullStop.html#stopPunctuation
      stopPunctuation :: [Char]
      stopPunctuation = [ '.', '?', '!' ] -- <<<<


      Unfortunately, you cannot expand stopPunctuation, since content in parentheses (like this) does not lead to a new sentence. Note that .) and ." aren't valid in some languages, though, they require ). and "., so it's not clear what you try to achieve there (see comment above documentation above).



      So all in all, well written, but without additional explanation or documentation there is no way to check whether the function actually does what you want. I also suggest you to add some tests.






      share|improve this answer























        up vote
        1
        down vote










        up vote
        1
        down vote









        While your code works and uses type signatures, it's missing documentation. It's not clear from your description or your code what splitter's intended result will be on a given input. Documentation and tests are therefore highly welcome.



        Also, it's not clear why you've added an underscore to replace_. And your code is missing at least one include for =~. I assume that you just forgot to include that import line in your question and it is in your actual code.



        That being said, the fullstop library is—according to its own documentation—a placeholder library:




        Note that this package is mostly a placeholder. I hope the Haskell/NLP
        communities will run with it and upload a more sophisticated (family
        of) segmenter(s) in its place. Patches (and new maintainers) would be
        greeted with delight!




        Your quarrel about the line endings also comes from segment, since it hard-codes the allowed punctuations:



        -- https://hackage.haskell.org/package/fullstop-0.1.4/docs/src/NLP-FullStop.html#stopPunctuation
        stopPunctuation :: [Char]
        stopPunctuation = [ '.', '?', '!' ] -- <<<<


        Unfortunately, you cannot expand stopPunctuation, since content in parentheses (like this) does not lead to a new sentence. Note that .) and ." aren't valid in some languages, though, they require ). and "., so it's not clear what you try to achieve there (see comment above documentation above).



        So all in all, well written, but without additional explanation or documentation there is no way to check whether the function actually does what you want. I also suggest you to add some tests.






        share|improve this answer













        While your code works and uses type signatures, it's missing documentation. It's not clear from your description or your code what splitter's intended result will be on a given input. Documentation and tests are therefore highly welcome.



        Also, it's not clear why you've added an underscore to replace_. And your code is missing at least one include for =~. I assume that you just forgot to include that import line in your question and it is in your actual code.



        That being said, the fullstop library is—according to its own documentation—a placeholder library:




        Note that this package is mostly a placeholder. I hope the Haskell/NLP
        communities will run with it and upload a more sophisticated (family
        of) segmenter(s) in its place. Patches (and new maintainers) would be
        greeted with delight!




        Your quarrel about the line endings also comes from segment, since it hard-codes the allowed punctuations:



        -- https://hackage.haskell.org/package/fullstop-0.1.4/docs/src/NLP-FullStop.html#stopPunctuation
        stopPunctuation :: [Char]
        stopPunctuation = [ '.', '?', '!' ] -- <<<<


        Unfortunately, you cannot expand stopPunctuation, since content in parentheses (like this) does not lead to a new sentence. Note that .) and ." aren't valid in some languages, though, they require ). and "., so it's not clear what you try to achieve there (see comment above documentation above).



        So all in all, well written, but without additional explanation or documentation there is no way to check whether the function actually does what you want. I also suggest you to add some tests.







        share|improve this answer













        share|improve this answer



        share|improve this answer











        answered Mar 29 at 9:17









        Zeta

        14.3k23267




        14.3k23267






















             

            draft saved


            draft discarded


























             


            draft saved


            draft discarded














            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f189989%2fhaskell-sentence-segregation%23new-answer', 'question_page');

            );

            Post as a guest













































































            Popular posts from this blog

            Greedy Best First Search implementation in Rust

            Function to Return a JSON Like Objects Using VBA Collections and Arrays

            C++11 CLH Lock Implementation