Finding the comic ID of the last XKCD comic published

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP





.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;







up vote
3
down vote

favorite












I decided to sidetrack and create a XKCD viewer. For certain functionality, I needed to be able to find the ID of the last comic published. This was my attempt. I'm using Enlive here to parse the page itself.



I struggled with trying to find a CSS selector to get the text node, then finally gave up and decided to do some manual parsing. It got long, and ugly, but it works! The problem is that the only place I can concretely find page IDs is as a note at the bottom of the page:




Permanent link to this comic: https://xkcd.com/1988/




To parse that ID at the end of the link out, I need to find the text node, then parse the String. The latter was easy. The former took me a little under an hour due mostly to inexperience with CSS selectors.



What I'm looking for:



  • Is there a way to get the text node directly via Enlive CSS-like selectors?

  • Anything else that may simplify this. It's quite a series of transformations. I obviously could separate it down into a few function, but I can't see ever needing the functionality anywhere else, and it's fairly simple to test as is. Any recommendations here?

Use as of posting this:



(find-last-id)
=> 1988



(ns xkcd-viewer.mcve
(:require [net.cgrand.enlive-html :as e])
(:import (java.net URL)))

(def base-url "https://xkcd.com/")

; I actually use this a couple time in the real code. It doensn't seem as useful here though.
(defn parse-id?
"Returns the str-n parsed as a long, or nil if it's unparsable."
[str-n]
(try
(Long/parseLong str-n)

(catch NumberFormatException _
nil)))

(defn find-last-id
(let [digit? #(Character/isDigit ^Character %)

id-container (-> (e/html-resource (URL. base-url))
(e/select [:#middleContainer])
(first)
(:content))

raw-id (->> id-container
; The text node to find is surrounded by <br>s, so
(drop-while #(not= (:tag %) :br)) ; get rid of everything before the first br,
(drop 1) ; then the br itself,
(first) ; then get the text node, then
(drop-while (comp not digit?))
(take-while digit?)
(apply str))] ; then turn the digits into a string to be parsed.

(if-let [parsed (parse-id? raw-id)]
parsed
(throw (RuntimeException.
(str "Parser broken! Did XKCD change their site?nFound ID: " raw-id))))))






share|improve this question





















  • I do not know anything about closure... but I have a feeling it would be simpler to grab the link for the previous page and add one to the ID.
    – Gerrit0
    May 3 at 4:04










  • @Gerrit0 LOL. Probably. But who has time to think about logic before spending a couple hours hacking stuff together?
    – Carcigenicate
    May 3 at 4:06
















up vote
3
down vote

favorite












I decided to sidetrack and create a XKCD viewer. For certain functionality, I needed to be able to find the ID of the last comic published. This was my attempt. I'm using Enlive here to parse the page itself.



I struggled with trying to find a CSS selector to get the text node, then finally gave up and decided to do some manual parsing. It got long, and ugly, but it works! The problem is that the only place I can concretely find page IDs is as a note at the bottom of the page:




Permanent link to this comic: https://xkcd.com/1988/




To parse that ID at the end of the link out, I need to find the text node, then parse the String. The latter was easy. The former took me a little under an hour due mostly to inexperience with CSS selectors.



What I'm looking for:



  • Is there a way to get the text node directly via Enlive CSS-like selectors?

  • Anything else that may simplify this. It's quite a series of transformations. I obviously could separate it down into a few function, but I can't see ever needing the functionality anywhere else, and it's fairly simple to test as is. Any recommendations here?

Use as of posting this:



(find-last-id)
=> 1988



(ns xkcd-viewer.mcve
(:require [net.cgrand.enlive-html :as e])
(:import (java.net URL)))

(def base-url "https://xkcd.com/")

; I actually use this a couple time in the real code. It doensn't seem as useful here though.
(defn parse-id?
"Returns the str-n parsed as a long, or nil if it's unparsable."
[str-n]
(try
(Long/parseLong str-n)

(catch NumberFormatException _
nil)))

(defn find-last-id
(let [digit? #(Character/isDigit ^Character %)

id-container (-> (e/html-resource (URL. base-url))
(e/select [:#middleContainer])
(first)
(:content))

raw-id (->> id-container
; The text node to find is surrounded by <br>s, so
(drop-while #(not= (:tag %) :br)) ; get rid of everything before the first br,
(drop 1) ; then the br itself,
(first) ; then get the text node, then
(drop-while (comp not digit?))
(take-while digit?)
(apply str))] ; then turn the digits into a string to be parsed.

(if-let [parsed (parse-id? raw-id)]
parsed
(throw (RuntimeException.
(str "Parser broken! Did XKCD change their site?nFound ID: " raw-id))))))






share|improve this question





















  • I do not know anything about closure... but I have a feeling it would be simpler to grab the link for the previous page and add one to the ID.
    – Gerrit0
    May 3 at 4:04










  • @Gerrit0 LOL. Probably. But who has time to think about logic before spending a couple hours hacking stuff together?
    – Carcigenicate
    May 3 at 4:06












up vote
3
down vote

favorite









up vote
3
down vote

favorite











I decided to sidetrack and create a XKCD viewer. For certain functionality, I needed to be able to find the ID of the last comic published. This was my attempt. I'm using Enlive here to parse the page itself.



I struggled with trying to find a CSS selector to get the text node, then finally gave up and decided to do some manual parsing. It got long, and ugly, but it works! The problem is that the only place I can concretely find page IDs is as a note at the bottom of the page:




Permanent link to this comic: https://xkcd.com/1988/




To parse that ID at the end of the link out, I need to find the text node, then parse the String. The latter was easy. The former took me a little under an hour due mostly to inexperience with CSS selectors.



What I'm looking for:



  • Is there a way to get the text node directly via Enlive CSS-like selectors?

  • Anything else that may simplify this. It's quite a series of transformations. I obviously could separate it down into a few function, but I can't see ever needing the functionality anywhere else, and it's fairly simple to test as is. Any recommendations here?

Use as of posting this:



(find-last-id)
=> 1988



(ns xkcd-viewer.mcve
(:require [net.cgrand.enlive-html :as e])
(:import (java.net URL)))

(def base-url "https://xkcd.com/")

; I actually use this a couple time in the real code. It doensn't seem as useful here though.
(defn parse-id?
"Returns the str-n parsed as a long, or nil if it's unparsable."
[str-n]
(try
(Long/parseLong str-n)

(catch NumberFormatException _
nil)))

(defn find-last-id
(let [digit? #(Character/isDigit ^Character %)

id-container (-> (e/html-resource (URL. base-url))
(e/select [:#middleContainer])
(first)
(:content))

raw-id (->> id-container
; The text node to find is surrounded by <br>s, so
(drop-while #(not= (:tag %) :br)) ; get rid of everything before the first br,
(drop 1) ; then the br itself,
(first) ; then get the text node, then
(drop-while (comp not digit?))
(take-while digit?)
(apply str))] ; then turn the digits into a string to be parsed.

(if-let [parsed (parse-id? raw-id)]
parsed
(throw (RuntimeException.
(str "Parser broken! Did XKCD change their site?nFound ID: " raw-id))))))






share|improve this question













I decided to sidetrack and create a XKCD viewer. For certain functionality, I needed to be able to find the ID of the last comic published. This was my attempt. I'm using Enlive here to parse the page itself.



I struggled with trying to find a CSS selector to get the text node, then finally gave up and decided to do some manual parsing. It got long, and ugly, but it works! The problem is that the only place I can concretely find page IDs is as a note at the bottom of the page:




Permanent link to this comic: https://xkcd.com/1988/




To parse that ID at the end of the link out, I need to find the text node, then parse the String. The latter was easy. The former took me a little under an hour due mostly to inexperience with CSS selectors.



What I'm looking for:



  • Is there a way to get the text node directly via Enlive CSS-like selectors?

  • Anything else that may simplify this. It's quite a series of transformations. I obviously could separate it down into a few function, but I can't see ever needing the functionality anywhere else, and it's fairly simple to test as is. Any recommendations here?

Use as of posting this:



(find-last-id)
=> 1988



(ns xkcd-viewer.mcve
(:require [net.cgrand.enlive-html :as e])
(:import (java.net URL)))

(def base-url "https://xkcd.com/")

; I actually use this a couple time in the real code. It doensn't seem as useful here though.
(defn parse-id?
"Returns the str-n parsed as a long, or nil if it's unparsable."
[str-n]
(try
(Long/parseLong str-n)

(catch NumberFormatException _
nil)))

(defn find-last-id
(let [digit? #(Character/isDigit ^Character %)

id-container (-> (e/html-resource (URL. base-url))
(e/select [:#middleContainer])
(first)
(:content))

raw-id (->> id-container
; The text node to find is surrounded by <br>s, so
(drop-while #(not= (:tag %) :br)) ; get rid of everything before the first br,
(drop 1) ; then the br itself,
(first) ; then get the text node, then
(drop-while (comp not digit?))
(take-while digit?)
(apply str))] ; then turn the digits into a string to be parsed.

(if-let [parsed (parse-id? raw-id)]
parsed
(throw (RuntimeException.
(str "Parser broken! Did XKCD change their site?nFound ID: " raw-id))))))








share|improve this question












share|improve this question




share|improve this question








edited May 3 at 2:57









200_success

123k14142399




123k14142399









asked May 2 at 23:56









Carcigenicate

2,31911128




2,31911128











  • I do not know anything about closure... but I have a feeling it would be simpler to grab the link for the previous page and add one to the ID.
    – Gerrit0
    May 3 at 4:04










  • @Gerrit0 LOL. Probably. But who has time to think about logic before spending a couple hours hacking stuff together?
    – Carcigenicate
    May 3 at 4:06
















  • I do not know anything about closure... but I have a feeling it would be simpler to grab the link for the previous page and add one to the ID.
    – Gerrit0
    May 3 at 4:04










  • @Gerrit0 LOL. Probably. But who has time to think about logic before spending a couple hours hacking stuff together?
    – Carcigenicate
    May 3 at 4:06















I do not know anything about closure... but I have a feeling it would be simpler to grab the link for the previous page and add one to the ID.
– Gerrit0
May 3 at 4:04




I do not know anything about closure... but I have a feeling it would be simpler to grab the link for the previous page and add one to the ID.
– Gerrit0
May 3 at 4:04












@Gerrit0 LOL. Probably. But who has time to think about logic before spending a couple hours hacking stuff together?
– Carcigenicate
May 3 at 4:06




@Gerrit0 LOL. Probably. But who has time to think about logic before spending a couple hours hacking stuff together?
– Carcigenicate
May 3 at 4:06










1 Answer
1






active

oldest

votes

















up vote
2
down vote













I'm not sure it's much shorter than what you wrote, but finding stuff in any tree-like data structure is what I created the tupelo.forest library for.



Here is a solution for your problem:



(dotest
(when false ; manually enable to grab a new copy of the webpage
(spit "xkcd-sample.html"
(slurp "https://xkcd.com")))
(with-forest (new-forest)
(let [doc (it-> (xkcd)
(drop-if #(= :dtd (:type %)) it)
(only it))
root-hid (add-tree-enlive doc)
>> (remove-whitespace-leaves)
;>> (spyx-pretty (hid->bush root-hid))
hid-keep-fn (fn [hid]
(let [node (hid->node hid)
value (when (contains? node :value) (grab :value node))
perm-link? (when (string? value)
(re-find #"Permanent link to this comic" value))]
perm-link?))
found-hids (find-hids-with root-hid [:** :*] hid-keep-fn)
link-node (hid->node (only found-hids)) ; assume there is only 1 link node
value-str (grab :value link-node) ; "nPermanent link to this comic: https://xkcd.com/1988/"
result (re-find #"http.*$" value-str)]
;(spyx-pretty link-node) ;=> :tupelo.forest/khids ,
; :tag :tupelo.forest/raw,
; :value "nPermanent link to this comic: https://xkcd.com/1988/"
;(spyx result) ; => "https://xkcd.com/1988/"
)))


Documentation is ongoing, but you can see a lightning talk from the Clojure Conj 2017.






share|improve this answer





















  • Oh, I see I forgot to parse out just the integer ID. Oh well.
    – Alan Thompson
    May 3 at 1:28










Your Answer




StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
);
);
, "mathjax-editing");

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "196"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: false,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);








 

draft saved


draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f193511%2ffinding-the-comic-id-of-the-last-xkcd-comic-published%23new-answer', 'question_page');

);

Post as a guest






























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes








up vote
2
down vote













I'm not sure it's much shorter than what you wrote, but finding stuff in any tree-like data structure is what I created the tupelo.forest library for.



Here is a solution for your problem:



(dotest
(when false ; manually enable to grab a new copy of the webpage
(spit "xkcd-sample.html"
(slurp "https://xkcd.com")))
(with-forest (new-forest)
(let [doc (it-> (xkcd)
(drop-if #(= :dtd (:type %)) it)
(only it))
root-hid (add-tree-enlive doc)
>> (remove-whitespace-leaves)
;>> (spyx-pretty (hid->bush root-hid))
hid-keep-fn (fn [hid]
(let [node (hid->node hid)
value (when (contains? node :value) (grab :value node))
perm-link? (when (string? value)
(re-find #"Permanent link to this comic" value))]
perm-link?))
found-hids (find-hids-with root-hid [:** :*] hid-keep-fn)
link-node (hid->node (only found-hids)) ; assume there is only 1 link node
value-str (grab :value link-node) ; "nPermanent link to this comic: https://xkcd.com/1988/"
result (re-find #"http.*$" value-str)]
;(spyx-pretty link-node) ;=> :tupelo.forest/khids ,
; :tag :tupelo.forest/raw,
; :value "nPermanent link to this comic: https://xkcd.com/1988/"
;(spyx result) ; => "https://xkcd.com/1988/"
)))


Documentation is ongoing, but you can see a lightning talk from the Clojure Conj 2017.






share|improve this answer





















  • Oh, I see I forgot to parse out just the integer ID. Oh well.
    – Alan Thompson
    May 3 at 1:28














up vote
2
down vote













I'm not sure it's much shorter than what you wrote, but finding stuff in any tree-like data structure is what I created the tupelo.forest library for.



Here is a solution for your problem:



(dotest
(when false ; manually enable to grab a new copy of the webpage
(spit "xkcd-sample.html"
(slurp "https://xkcd.com")))
(with-forest (new-forest)
(let [doc (it-> (xkcd)
(drop-if #(= :dtd (:type %)) it)
(only it))
root-hid (add-tree-enlive doc)
>> (remove-whitespace-leaves)
;>> (spyx-pretty (hid->bush root-hid))
hid-keep-fn (fn [hid]
(let [node (hid->node hid)
value (when (contains? node :value) (grab :value node))
perm-link? (when (string? value)
(re-find #"Permanent link to this comic" value))]
perm-link?))
found-hids (find-hids-with root-hid [:** :*] hid-keep-fn)
link-node (hid->node (only found-hids)) ; assume there is only 1 link node
value-str (grab :value link-node) ; "nPermanent link to this comic: https://xkcd.com/1988/"
result (re-find #"http.*$" value-str)]
;(spyx-pretty link-node) ;=> :tupelo.forest/khids ,
; :tag :tupelo.forest/raw,
; :value "nPermanent link to this comic: https://xkcd.com/1988/"
;(spyx result) ; => "https://xkcd.com/1988/"
)))


Documentation is ongoing, but you can see a lightning talk from the Clojure Conj 2017.






share|improve this answer





















  • Oh, I see I forgot to parse out just the integer ID. Oh well.
    – Alan Thompson
    May 3 at 1:28












up vote
2
down vote










up vote
2
down vote









I'm not sure it's much shorter than what you wrote, but finding stuff in any tree-like data structure is what I created the tupelo.forest library for.



Here is a solution for your problem:



(dotest
(when false ; manually enable to grab a new copy of the webpage
(spit "xkcd-sample.html"
(slurp "https://xkcd.com")))
(with-forest (new-forest)
(let [doc (it-> (xkcd)
(drop-if #(= :dtd (:type %)) it)
(only it))
root-hid (add-tree-enlive doc)
>> (remove-whitespace-leaves)
;>> (spyx-pretty (hid->bush root-hid))
hid-keep-fn (fn [hid]
(let [node (hid->node hid)
value (when (contains? node :value) (grab :value node))
perm-link? (when (string? value)
(re-find #"Permanent link to this comic" value))]
perm-link?))
found-hids (find-hids-with root-hid [:** :*] hid-keep-fn)
link-node (hid->node (only found-hids)) ; assume there is only 1 link node
value-str (grab :value link-node) ; "nPermanent link to this comic: https://xkcd.com/1988/"
result (re-find #"http.*$" value-str)]
;(spyx-pretty link-node) ;=> :tupelo.forest/khids ,
; :tag :tupelo.forest/raw,
; :value "nPermanent link to this comic: https://xkcd.com/1988/"
;(spyx result) ; => "https://xkcd.com/1988/"
)))


Documentation is ongoing, but you can see a lightning talk from the Clojure Conj 2017.






share|improve this answer













I'm not sure it's much shorter than what you wrote, but finding stuff in any tree-like data structure is what I created the tupelo.forest library for.



Here is a solution for your problem:



(dotest
(when false ; manually enable to grab a new copy of the webpage
(spit "xkcd-sample.html"
(slurp "https://xkcd.com")))
(with-forest (new-forest)
(let [doc (it-> (xkcd)
(drop-if #(= :dtd (:type %)) it)
(only it))
root-hid (add-tree-enlive doc)
>> (remove-whitespace-leaves)
;>> (spyx-pretty (hid->bush root-hid))
hid-keep-fn (fn [hid]
(let [node (hid->node hid)
value (when (contains? node :value) (grab :value node))
perm-link? (when (string? value)
(re-find #"Permanent link to this comic" value))]
perm-link?))
found-hids (find-hids-with root-hid [:** :*] hid-keep-fn)
link-node (hid->node (only found-hids)) ; assume there is only 1 link node
value-str (grab :value link-node) ; "nPermanent link to this comic: https://xkcd.com/1988/"
result (re-find #"http.*$" value-str)]
;(spyx-pretty link-node) ;=> :tupelo.forest/khids ,
; :tag :tupelo.forest/raw,
; :value "nPermanent link to this comic: https://xkcd.com/1988/"
;(spyx result) ; => "https://xkcd.com/1988/"
)))


Documentation is ongoing, but you can see a lightning talk from the Clojure Conj 2017.







share|improve this answer













share|improve this answer



share|improve this answer











answered May 3 at 1:23









Alan Thompson

21114




21114











  • Oh, I see I forgot to parse out just the integer ID. Oh well.
    – Alan Thompson
    May 3 at 1:28
















  • Oh, I see I forgot to parse out just the integer ID. Oh well.
    – Alan Thompson
    May 3 at 1:28















Oh, I see I forgot to parse out just the integer ID. Oh well.
– Alan Thompson
May 3 at 1:28




Oh, I see I forgot to parse out just the integer ID. Oh well.
– Alan Thompson
May 3 at 1:28












 

draft saved


draft discarded


























 


draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f193511%2ffinding-the-comic-id-of-the-last-xkcd-comic-published%23new-answer', 'question_page');

);

Post as a guest













































































Popular posts from this blog

Python Lists

Aion

JavaScript Array Iteration Methods