Finding the comic ID of the last XKCD comic published

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;

up vote
3
down vote

favorite

I decided to sidetrack and create a XKCD viewer. For certain functionality, I needed to be able to find the ID of the last comic published. This was my attempt. I'm using Enlive here to parse the page itself.

I struggled with trying to find a CSS selector to get the text node, then finally gave up and decided to do some manual parsing. It got long, and ugly, but it works! The problem is that the only place I can concretely find page IDs is as a note at the bottom of the page:

Permanent link to this comic: https://xkcd.com/1988/

To parse that ID at the end of the link out, I need to find the text node, then parse the String. The latter was easy. The former took me a little under an hour due mostly to inexperience with CSS selectors.

What I'm looking for:

Is there a way to get the text node directly via Enlive CSS-like selectors?

Anything else that may simplify this. It's quite a series of transformations. I obviously could separate it down into a few function, but I can't see ever needing the functionality anywhere else, and it's fairly simple to test as is. Any recommendations here?

Use as of posting this:

(find-last-id)
=> 1988

(ns xkcd-viewer.mcve
 (:require [net.cgrand.enlive-html :as e])
 (:import (java.net URL)))

(def base-url "https://xkcd.com/")

; I actually use this a couple time in the real code. It doensn't seem as useful here though.
(defn parse-id?
 "Returns the str-n parsed as a long, or nil if it's unparsable."
 [str-n]
 (try
 (Long/parseLong str-n)

 (catch NumberFormatException _
 nil)))

(defn find-last-id 
 (let [digit? #(Character/isDigit ^Character %)

 id-container (-> (e/html-resource (URL. base-url))
 (e/select [:#middleContainer])
 (first)
 (:content))

 raw-id (->> id-container
 ; The text node to find is surrounded by <br>s, so
 (drop-while #(not= (:tag %) :br)) ; get rid of everything before the first br,
 (drop 1) ; then the br itself,
 (first) ; then get the text node, then
 (drop-while (comp not digit?))
 (take-while digit?)
 (apply str))] ; then turn the digits into a string to be parsed.

 (if-let [parsed (parse-id? raw-id)]
 parsed
 (throw (RuntimeException.
 (str "Parser broken! Did XKCD change their site?nFound ID: " raw-id))))))

edited May 3 at 2:57

200_success

123k14142399

asked May 2 at 23:56

Carcigenicate

2,31911128

I do not know anything about closure... but I have a feeling it would be simpler to grab the link for the previous page and add one to the ID.
â€“Â Gerrit0
May 3 at 4:04

@Gerrit0 LOL. Probably. But who has time to think about logic before spending a couple hours hacking stuff together?
â€“Â Carcigenicate
May 3 at 4:06

add a commentÂ |Â

up vote
3
down vote

favorite

Permanent link to this comic: https://xkcd.com/1988/

What I'm looking for:

Is there a way to get the text node directly via Enlive CSS-like selectors?

Anything else that may simplify this. It's quite a series of transformations. I obviously could separate it down into a few function, but I can't see ever needing the functionality anywhere else, and it's fairly simple to test as is. Any recommendations here?

Use as of posting this:

(find-last-id)
=> 1988

(ns xkcd-viewer.mcve
 (:require [net.cgrand.enlive-html :as e])
 (:import (java.net URL)))

(def base-url "https://xkcd.com/")

; I actually use this a couple time in the real code. It doensn't seem as useful here though.
(defn parse-id?
 "Returns the str-n parsed as a long, or nil if it's unparsable."
 [str-n]
 (try
 (Long/parseLong str-n)

 (catch NumberFormatException _
 nil)))

(defn find-last-id 
 (let [digit? #(Character/isDigit ^Character %)

 id-container (-> (e/html-resource (URL. base-url))
 (e/select [:#middleContainer])
 (first)
 (:content))

 raw-id (->> id-container
 ; The text node to find is surrounded by <br>s, so
 (drop-while #(not= (:tag %) :br)) ; get rid of everything before the first br,
 (drop 1) ; then the br itself,
 (first) ; then get the text node, then
 (drop-while (comp not digit?))
 (take-while digit?)
 (apply str))] ; then turn the digits into a string to be parsed.

 (if-let [parsed (parse-id? raw-id)]
 parsed
 (throw (RuntimeException.
 (str "Parser broken! Did XKCD change their site?nFound ID: " raw-id))))))

edited May 3 at 2:57

200_success

123k14142399

asked May 2 at 23:56

Carcigenicate

2,31911128

I do not know anything about closure... but I have a feeling it would be simpler to grab the link for the previous page and add one to the ID.
â€“Â Gerrit0
May 3 at 4:04

@Gerrit0 LOL. Probably. But who has time to think about logic before spending a couple hours hacking stuff together?
â€“Â Carcigenicate
May 3 at 4:06

add a commentÂ |Â

up vote
3
down vote

favorite

Permanent link to this comic: https://xkcd.com/1988/

What I'm looking for:

Is there a way to get the text node directly via Enlive CSS-like selectors?

Anything else that may simplify this. It's quite a series of transformations. I obviously could separate it down into a few function, but I can't see ever needing the functionality anywhere else, and it's fairly simple to test as is. Any recommendations here?

Use as of posting this:

(find-last-id)
=> 1988

(ns xkcd-viewer.mcve
 (:require [net.cgrand.enlive-html :as e])
 (:import (java.net URL)))

(def base-url "https://xkcd.com/")

; I actually use this a couple time in the real code. It doensn't seem as useful here though.
(defn parse-id?
 "Returns the str-n parsed as a long, or nil if it's unparsable."
 [str-n]
 (try
 (Long/parseLong str-n)

 (catch NumberFormatException _
 nil)))

(defn find-last-id 
 (let [digit? #(Character/isDigit ^Character %)

 id-container (-> (e/html-resource (URL. base-url))
 (e/select [:#middleContainer])
 (first)
 (:content))

 raw-id (->> id-container
 ; The text node to find is surrounded by <br>s, so
 (drop-while #(not= (:tag %) :br)) ; get rid of everything before the first br,
 (drop 1) ; then the br itself,
 (first) ; then get the text node, then
 (drop-while (comp not digit?))
 (take-while digit?)
 (apply str))] ; then turn the digits into a string to be parsed.

 (if-let [parsed (parse-id? raw-id)]
 parsed
 (throw (RuntimeException.
 (str "Parser broken! Did XKCD change their site?nFound ID: " raw-id))))))

edited May 3 at 2:57

200_success

123k14142399

asked May 2 at 23:56

Carcigenicate

2,31911128

Permanent link to this comic: https://xkcd.com/1988/

What I'm looking for:

Is there a way to get the text node directly via Enlive CSS-like selectors?

Anything else that may simplify this. It's quite a series of transformations. I obviously could separate it down into a few function, but I can't see ever needing the functionality anywhere else, and it's fairly simple to test as is. Any recommendations here?

Use as of posting this:

(find-last-id)
=> 1988

(ns xkcd-viewer.mcve
 (:require [net.cgrand.enlive-html :as e])
 (:import (java.net URL)))

(def base-url "https://xkcd.com/")

; I actually use this a couple time in the real code. It doensn't seem as useful here though.
(defn parse-id?
 "Returns the str-n parsed as a long, or nil if it's unparsable."
 [str-n]
 (try
 (Long/parseLong str-n)

 (catch NumberFormatException _
 nil)))

(defn find-last-id 
 (let [digit? #(Character/isDigit ^Character %)

 id-container (-> (e/html-resource (URL. base-url))
 (e/select [:#middleContainer])
 (first)
 (:content))

 raw-id (->> id-container
 ; The text node to find is surrounded by <br>s, so
 (drop-while #(not= (:tag %) :br)) ; get rid of everything before the first br,
 (drop 1) ; then the br itself,
 (first) ; then get the text node, then
 (drop-while (comp not digit?))
 (take-while digit?)
 (apply str))] ; then turn the digits into a string to be parsed.

 (if-let [parsed (parse-id? raw-id)]
 parsed
 (throw (RuntimeException.
 (str "Parser broken! Did XKCD change their site?nFound ID: " raw-id))))))

edited May 3 at 2:57

200_success

123k14142399

asked May 2 at 23:56

Carcigenicate

2,31911128

edited May 3 at 2:57

200_success

123k14142399

edited May 3 at 2:57

200_success

123k14142399

edited May 3 at 2:57

200_success

123k14142399

asked May 2 at 23:56

Carcigenicate

2,31911128

asked May 2 at 23:56

Carcigenicate

2,31911128

asked May 2 at 23:56

Carcigenicate

2,31911128

I do not know anything about closure... but I have a feeling it would be simpler to grab the link for the previous page and add one to the ID.
â€“Â Gerrit0
May 3 at 4:04

@Gerrit0 LOL. Probably. But who has time to think about logic before spending a couple hours hacking stuff together?
â€“Â Carcigenicate
May 3 at 4:06

add a commentÂ |Â

I do not know anything about closure... but I have a feeling it would be simpler to grab the link for the previous page and add one to the ID.
â€“Â Gerrit0
May 3 at 4:04

@Gerrit0 LOL. Probably. But who has time to think about logic before spending a couple hours hacking stuff together?
â€“Â Carcigenicate
May 3 at 4:06

I do not know anything about closure... but I have a feeling it would be simpler to grab the link for the previous page and add one to the ID.
â€“Â Gerrit0
May 3 at 4:04

@Gerrit0 LOL. Probably. But who has time to think about logic before spending a couple hours hacking stuff together?
â€“Â Carcigenicate
May 3 at 4:06

add a commentÂ |Â

1 Answer
1

active

oldest

votes

up vote
2
down vote

I'm not sure it's much shorter than what you wrote, but finding stuff in any tree-like data structure is what I created the tupelo.forest library for.

Here is a solution for your problem:

(dotest
 (when false ; manually enable to grab a new copy of the webpage
 (spit "xkcd-sample.html"
 (slurp "https://xkcd.com")))
 (with-forest (new-forest)
 (let [doc (it-> (xkcd)
 (drop-if #(= :dtd (:type %)) it)
 (only it))
 root-hid (add-tree-enlive doc)
 >> (remove-whitespace-leaves)
 ;>> (spyx-pretty (hid->bush root-hid))
 hid-keep-fn (fn [hid]
 (let [node (hid->node hid)
 value (when (contains? node :value) (grab :value node))
 perm-link? (when (string? value)
 (re-find #"Permanent link to this comic" value))]
 perm-link?))
 found-hids (find-hids-with root-hid [:** :*] hid-keep-fn)
 link-node (hid->node (only found-hids)) ; assume there is only 1 link node
 value-str (grab :value link-node) ; "nPermanent link to this comic: https://xkcd.com/1988/"
 result (re-find #"http.*$" value-str)]
 ;(spyx-pretty link-node) ;=> :tupelo.forest/khids ,
 ; :tag :tupelo.forest/raw,
 ; :value "nPermanent link to this comic: https://xkcd.com/1988/"
 ;(spyx result) ; => "https://xkcd.com/1988/"
 )))

Documentation is ongoing, but you can see a lightning talk from the Clojure Conj 2017.

answered May 3 at 1:23

Alan Thompson

21114

Oh, I see I forgot to parse out just the integer ID. Oh well.
â€“Â Alan Thompson
May 3 at 1:28

add a commentÂ |Â

Your Answer

StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
);
);
, "mathjax-editing");

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "196"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: false,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f193511%2ffinding-the-comic-id-of-the-last-xkcd-comic-published%23new-answer', 'question_page');

);

Post as a guest

Name

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

up vote
2
down vote

I'm not sure it's much shorter than what you wrote, but finding stuff in any tree-like data structure is what I created the tupelo.forest library for.

Here is a solution for your problem:

(dotest
 (when false ; manually enable to grab a new copy of the webpage
 (spit "xkcd-sample.html"
 (slurp "https://xkcd.com")))
 (with-forest (new-forest)
 (let [doc (it-> (xkcd)
 (drop-if #(= :dtd (:type %)) it)
 (only it))
 root-hid (add-tree-enlive doc)
 >> (remove-whitespace-leaves)
 ;>> (spyx-pretty (hid->bush root-hid))
 hid-keep-fn (fn [hid]
 (let [node (hid->node hid)
 value (when (contains? node :value) (grab :value node))
 perm-link? (when (string? value)
 (re-find #"Permanent link to this comic" value))]
 perm-link?))
 found-hids (find-hids-with root-hid [:** :*] hid-keep-fn)
 link-node (hid->node (only found-hids)) ; assume there is only 1 link node
 value-str (grab :value link-node) ; "nPermanent link to this comic: https://xkcd.com/1988/"
 result (re-find #"http.*$" value-str)]
 ;(spyx-pretty link-node) ;=> :tupelo.forest/khids ,
 ; :tag :tupelo.forest/raw,
 ; :value "nPermanent link to this comic: https://xkcd.com/1988/"
 ;(spyx result) ; => "https://xkcd.com/1988/"
 )))

Documentation is ongoing, but you can see a lightning talk from the Clojure Conj 2017.

answered May 3 at 1:23

Alan Thompson

21114

Oh, I see I forgot to parse out just the integer ID. Oh well.
â€“Â Alan Thompson
May 3 at 1:28

add a commentÂ |Â

up vote
2
down vote

I'm not sure it's much shorter than what you wrote, but finding stuff in any tree-like data structure is what I created the tupelo.forest library for.

Here is a solution for your problem:

(dotest
 (when false ; manually enable to grab a new copy of the webpage
 (spit "xkcd-sample.html"
 (slurp "https://xkcd.com")))
 (with-forest (new-forest)
 (let [doc (it-> (xkcd)
 (drop-if #(= :dtd (:type %)) it)
 (only it))
 root-hid (add-tree-enlive doc)
 >> (remove-whitespace-leaves)
 ;>> (spyx-pretty (hid->bush root-hid))
 hid-keep-fn (fn [hid]
 (let [node (hid->node hid)
 value (when (contains? node :value) (grab :value node))
 perm-link? (when (string? value)
 (re-find #"Permanent link to this comic" value))]
 perm-link?))
 found-hids (find-hids-with root-hid [:** :*] hid-keep-fn)
 link-node (hid->node (only found-hids)) ; assume there is only 1 link node
 value-str (grab :value link-node) ; "nPermanent link to this comic: https://xkcd.com/1988/"
 result (re-find #"http.*$" value-str)]
 ;(spyx-pretty link-node) ;=> :tupelo.forest/khids ,
 ; :tag :tupelo.forest/raw,
 ; :value "nPermanent link to this comic: https://xkcd.com/1988/"
 ;(spyx result) ; => "https://xkcd.com/1988/"
 )))

Documentation is ongoing, but you can see a lightning talk from the Clojure Conj 2017.

answered May 3 at 1:23

Alan Thompson

21114

Oh, I see I forgot to parse out just the integer ID. Oh well.
â€“Â Alan Thompson
May 3 at 1:28

add a commentÂ |Â

up vote
2
down vote

I'm not sure it's much shorter than what you wrote, but finding stuff in any tree-like data structure is what I created the tupelo.forest library for.

Here is a solution for your problem:

(dotest
 (when false ; manually enable to grab a new copy of the webpage
 (spit "xkcd-sample.html"
 (slurp "https://xkcd.com")))
 (with-forest (new-forest)
 (let [doc (it-> (xkcd)
 (drop-if #(= :dtd (:type %)) it)
 (only it))
 root-hid (add-tree-enlive doc)
 >> (remove-whitespace-leaves)
 ;>> (spyx-pretty (hid->bush root-hid))
 hid-keep-fn (fn [hid]
 (let [node (hid->node hid)
 value (when (contains? node :value) (grab :value node))
 perm-link? (when (string? value)
 (re-find #"Permanent link to this comic" value))]
 perm-link?))
 found-hids (find-hids-with root-hid [:** :*] hid-keep-fn)
 link-node (hid->node (only found-hids)) ; assume there is only 1 link node
 value-str (grab :value link-node) ; "nPermanent link to this comic: https://xkcd.com/1988/"
 result (re-find #"http.*$" value-str)]
 ;(spyx-pretty link-node) ;=> :tupelo.forest/khids ,
 ; :tag :tupelo.forest/raw,
 ; :value "nPermanent link to this comic: https://xkcd.com/1988/"
 ;(spyx result) ; => "https://xkcd.com/1988/"
 )))

Documentation is ongoing, but you can see a lightning talk from the Clojure Conj 2017.

answered May 3 at 1:23

Alan Thompson

21114

I'm not sure it's much shorter than what you wrote, but finding stuff in any tree-like data structure is what I created the tupelo.forest library for.

Here is a solution for your problem:

(dotest
 (when false ; manually enable to grab a new copy of the webpage
 (spit "xkcd-sample.html"
 (slurp "https://xkcd.com")))
 (with-forest (new-forest)
 (let [doc (it-> (xkcd)
 (drop-if #(= :dtd (:type %)) it)
 (only it))
 root-hid (add-tree-enlive doc)
 >> (remove-whitespace-leaves)
 ;>> (spyx-pretty (hid->bush root-hid))
 hid-keep-fn (fn [hid]
 (let [node (hid->node hid)
 value (when (contains? node :value) (grab :value node))
 perm-link? (when (string? value)
 (re-find #"Permanent link to this comic" value))]
 perm-link?))
 found-hids (find-hids-with root-hid [:** :*] hid-keep-fn)
 link-node (hid->node (only found-hids)) ; assume there is only 1 link node
 value-str (grab :value link-node) ; "nPermanent link to this comic: https://xkcd.com/1988/"
 result (re-find #"http.*$" value-str)]
 ;(spyx-pretty link-node) ;=> :tupelo.forest/khids ,
 ; :tag :tupelo.forest/raw,
 ; :value "nPermanent link to this comic: https://xkcd.com/1988/"
 ;(spyx result) ; => "https://xkcd.com/1988/"
 )))

Documentation is ongoing, but you can see a lightning talk from the Clojure Conj 2017.

answered May 3 at 1:23

Alan Thompson

21114

answered May 3 at 1:23

Alan Thompson

21114

answered May 3 at 1:23

Alan Thompson

21114

answered May 3 at 1:23

Alan Thompson

21114

Oh, I see I forgot to parse out just the integer ID. Oh well.
â€“Â Alan Thompson
May 3 at 1:28

add a commentÂ |Â

Oh, I see I forgot to parse out just the integer ID. Oh well.
â€“Â Alan Thompson
May 3 at 1:28

Oh, I see I forgot to parse out just the integer ID. Oh well.
â€“Â Alan Thompson
May 3 at 1:28

add a commentÂ |Â

draft saved

draft discarded

draft saved

draft discarded

Post as a guest

Name

搜尋此網誌

trjhtr