Crawl and gather all the urls recursively in a domain

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;

up vote
3
down vote

favorite

This is the script :

#!/bin/bash

set -euo pipefail

SOFT="$(basename $0)"
random=$(( ( RANDOM % 100000 ) ))
tempfolder="/tmp/$SOFT/$random"

mkdir -p "$tempfolder"
pushd "$tempfolder" > /dev/null
cleanup ()
 cat wget.txt 

trap cleanup EXIT
site="$1"

wget -nd --spider --recursive --no-verbose --output-file=wget.txt "$site"

the script essentially creates a subfolder in /tmp/crawl and puts into it the result of wget, then feeds it to sed and url_matcher.

Initially I though of putting the cat instruction just after the wget one, but for long wget it didn't work.

The sed instruction removes colons at the end of lines, which is added by wget when said url isn't valid.
(something along the lines of "http://...: Nothing there")

url_matcher is a c++ scanner made with flex, which recognises urls in a text fed with standard input and prints them to standard output, separated with new lines characters.

the scripts allows me to fetch all urls in a website and use that as a stream directly.

Example :

$ crawl www.example.com | sed 's/some_treatments//' > super_file

asked Jul 12 at 8:05

Pierre Antoine Guillaume

185

add a commentÂ |Â

up vote
3
down vote

favorite

This is the script :

#!/bin/bash

set -euo pipefail

SOFT="$(basename $0)"
random=$(( ( RANDOM % 100000 ) ))
tempfolder="/tmp/$SOFT/$random"

mkdir -p "$tempfolder"
pushd "$tempfolder" > /dev/null
cleanup ()
 cat wget.txt 

trap cleanup EXIT
site="$1"

wget -nd --spider --recursive --no-verbose --output-file=wget.txt "$site"

the script essentially creates a subfolder in /tmp/crawl and puts into it the result of wget, then feeds it to sed and url_matcher.

Initially I though of putting the cat instruction just after the wget one, but for long wget it didn't work.

The sed instruction removes colons at the end of lines, which is added by wget when said url isn't valid.
(something along the lines of "http://...: Nothing there")

url_matcher is a c++ scanner made with flex, which recognises urls in a text fed with standard input and prints them to standard output, separated with new lines characters.

the scripts allows me to fetch all urls in a website and use that as a stream directly.

Example :

$ crawl www.example.com | sed 's/some_treatments//' > super_file

asked Jul 12 at 8:05

Pierre Antoine Guillaume

185

add a commentÂ |Â

up vote
3
down vote

favorite

This is the script :

#!/bin/bash

set -euo pipefail

SOFT="$(basename $0)"
random=$(( ( RANDOM % 100000 ) ))
tempfolder="/tmp/$SOFT/$random"

mkdir -p "$tempfolder"
pushd "$tempfolder" > /dev/null
cleanup ()
 cat wget.txt 

trap cleanup EXIT
site="$1"

wget -nd --spider --recursive --no-verbose --output-file=wget.txt "$site"

the script essentially creates a subfolder in /tmp/crawl and puts into it the result of wget, then feeds it to sed and url_matcher.

Initially I though of putting the cat instruction just after the wget one, but for long wget it didn't work.

The sed instruction removes colons at the end of lines, which is added by wget when said url isn't valid.
(something along the lines of "http://...: Nothing there")

url_matcher is a c++ scanner made with flex, which recognises urls in a text fed with standard input and prints them to standard output, separated with new lines characters.

the scripts allows me to fetch all urls in a website and use that as a stream directly.

Example :

$ crawl www.example.com | sed 's/some_treatments//' > super_file

asked Jul 12 at 8:05

Pierre Antoine Guillaume

185

This is the script :

#!/bin/bash

set -euo pipefail

SOFT="$(basename $0)"
random=$(( ( RANDOM % 100000 ) ))
tempfolder="/tmp/$SOFT/$random"

mkdir -p "$tempfolder"
pushd "$tempfolder" > /dev/null
cleanup ()
 cat wget.txt 

trap cleanup EXIT
site="$1"

wget -nd --spider --recursive --no-verbose --output-file=wget.txt "$site"

the script essentially creates a subfolder in /tmp/crawl and puts into it the result of wget, then feeds it to sed and url_matcher.

Initially I though of putting the cat instruction just after the wget one, but for long wget it didn't work.

The sed instruction removes colons at the end of lines, which is added by wget when said url isn't valid.
(something along the lines of "http://...: Nothing there")

url_matcher is a c++ scanner made with flex, which recognises urls in a text fed with standard input and prints them to standard output, separated with new lines characters.

the scripts allows me to fetch all urls in a website and use that as a stream directly.

Example :

$ crawl www.example.com | sed 's/some_treatments//' > super_file

asked Jul 12 at 8:05

Pierre Antoine Guillaume

185

asked Jul 12 at 8:05

Pierre Antoine Guillaume

185

asked Jul 12 at 8:05

Pierre Antoine Guillaume

185

asked Jul 12 at 8:05

Pierre Antoine Guillaume

185

add a commentÂ |Â

2 Answers
2

active

oldest

votes

up vote
2
down vote

accepted

There's no need to use cat here:

cat wget.txt | sed 's/:$//' | url_matcher

Instead, simply have sed take its input from the file:

sed 's/:$//' wget.txt | url_matcher

Instead of using $RANDOM and mkdir, we can use the mktemp command (GNU coreutils) to create us a new directory (without risk of conflict with an existing one):

tempfolder=$(mktemp -d)

I think that now removes the only Bashism in the script.

Avoid pushd and popd in scripts (having to redirect the unwanted output should be a sign that these are intended for interactive shells). Instead, we can use cd "$tempfolder" and something like cd .. to leave it (it doesn't matter where we end up, as we only use absolute paths after that). Alternatively, don't change directory at all, but specify paths in full (I'll use a shorter name for the temporary directory):

dir=$(mktemp -d)

cleanup() url_matcher
 rm -rf -- "$dir"


trap cleanup EXIT

wget -nd --spider --recursive --no-verbose --output-file="$dir/wget.txt" "$1"

Other alternatives to pushd/popd include saving $PWD into a variable before changing directory, or using cd - (which internally does exactly that, as cd saves $PWD into $OLDPWD for that purpose).

We can avoid the temporary file altogether, to simplify the code and to allow us to start producing output without waiting for the crawl to finish. We can do this in a couple of ways, by being creative with --output-file. If we're on Linux, we can write to /dev/stdout instead of a file:

wget -nd --spider --recursive --no-verbose --output-file=/dev/stdout "$1" 
 | sed -e 's/:$//' | url_matcher

If we're happy to accept a Bash dependency again, we can use process substitution:

wget -nd --spider --recursive --no-verbose 
 --output-file=>(sed -e 's/:$//' | url_matcher) "$1"

However, neither of those dependencies are necessary, as wget follows the common convention of treating the filename - specially, to mean "standard output stream":

wget -nd --spider --recursive --no-verbose --output-file=- "$1" 
 | sed -e 's/:$//' | url_matcher

Modified code

#!/bin/bash

set -uo pipefail

exec wget -nd --spider --recursive --no-verbose --output-file=- "$1" 
 | sed -e 's/:$//' | url_matcher

edited Jul 13 at 8:41

answered Jul 12 at 8:56

Toby Speight

17.2k13486

Neat ! Why can't I put the sed ... after the wget though ?
â€“Â Pierre Antoine Guillaume
Jul 12 at 9:42

I'm not sure what you mean by "put the sed after the wget" - do you mean that you'd like to pipe the wget output into sed? That's certainly possible, and I'll edit to show how.
â€“Â Toby Speight
Jul 12 at 9:44

that'd be good, sure. At first, I used to put the sed outside of the cleanup, after the wget. And for "long" crawls (with wget running for a few minutes), the cat - now sed - would not take place at all.
â€“Â Pierre Antoine Guillaume
Jul 12 at 9:46

Instead of --output-file, why not use -O-?
â€“Â janos
Jul 13 at 5:00

@janos, why not, indeed? I just forgot that wget observes the - convention. I'll edit that in.
â€“Â Toby Speight
Jul 13 at 7:56

add a commentÂ |Â

up vote
1
down vote

A few points on top of Toby's excellent review.

This is not correctly protected from word splitting, and will fail when the script's path contains spaces:

SOFT="$(basename $0)"

The only thing you need to double quote there is the $0:

SOFT=$(basename "$0")

Since RANDOM generates a random integer in the range 0 - 32767, the % 100000 is unnecessary.

The popd is unlikely to ever be useful in a cleanup script. Directory changes within a script have no effect outside an executed (as opposed to sourced script). In any case, it's best to not change directories within scripts at all.

answered Jul 13 at 5:11

janos

95.2k12119342

I think the popd was probably there to make the temporary directory not be the working directory when it's removed (possible confusion with attempting to umount a filesystem when a process is using it for working directory, perhaps?)
â€“Â Toby Speight
Jul 13 at 8:15

I really appreciate. I am going to change my boilerplate script asap ! The popd was there for the bad reasons ;) thank you ! The cd (or pushd) was there for a "good" reason : when I started the script, I didn't know of the wget -nd option. If you don't use it, you create a real mess of subfolders. hence, create a new tempfolder, cd into it and delete it afterward.
â€“Â Pierre Antoine Guillaume
Jul 13 at 12:39

add a commentÂ |Â

Your Answer

StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
);
);
, "mathjax-editing");

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "196"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: false,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f198343%2fcrawl-and-gather-all-the-urls-recursively-in-a-domain%23new-answer', 'question_page');

);

Post as a guest

Name

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

up vote
2
down vote

accepted

There's no need to use cat here:

cat wget.txt | sed 's/:$//' | url_matcher

Instead, simply have sed take its input from the file:

sed 's/:$//' wget.txt | url_matcher

Instead of using $RANDOM and mkdir, we can use the mktemp command (GNU coreutils) to create us a new directory (without risk of conflict with an existing one):

tempfolder=$(mktemp -d)

I think that now removes the only Bashism in the script.

dir=$(mktemp -d)

cleanup() url_matcher
 rm -rf -- "$dir"


trap cleanup EXIT

wget -nd --spider --recursive --no-verbose --output-file="$dir/wget.txt" "$1"

wget -nd --spider --recursive --no-verbose --output-file=/dev/stdout "$1" 
 | sed -e 's/:$//' | url_matcher

If we're happy to accept a Bash dependency again, we can use process substitution:

wget -nd --spider --recursive --no-verbose 
 --output-file=>(sed -e 's/:$//' | url_matcher) "$1"

However, neither of those dependencies are necessary, as wget follows the common convention of treating the filename - specially, to mean "standard output stream":

wget -nd --spider --recursive --no-verbose --output-file=- "$1" 
 | sed -e 's/:$//' | url_matcher

Modified code

#!/bin/bash

set -uo pipefail

exec wget -nd --spider --recursive --no-verbose --output-file=- "$1" 
 | sed -e 's/:$//' | url_matcher

edited Jul 13 at 8:41

answered Jul 12 at 8:56

Toby Speight

17.2k13486

Neat ! Why can't I put the sed ... after the wget though ?
â€“Â Pierre Antoine Guillaume
Jul 12 at 9:42

I'm not sure what you mean by "put the sed after the wget" - do you mean that you'd like to pipe the wget output into sed? That's certainly possible, and I'll edit to show how.
â€“Â Toby Speight
Jul 12 at 9:44

that'd be good, sure. At first, I used to put the sed outside of the cleanup, after the wget. And for "long" crawls (with wget running for a few minutes), the cat - now sed - would not take place at all.
â€“Â Pierre Antoine Guillaume
Jul 12 at 9:46

Instead of --output-file, why not use -O-?
â€“Â janos
Jul 13 at 5:00

@janos, why not, indeed? I just forgot that wget observes the - convention. I'll edit that in.
â€“Â Toby Speight
Jul 13 at 7:56

add a commentÂ |Â

up vote
2
down vote

accepted

There's no need to use cat here:

cat wget.txt | sed 's/:$//' | url_matcher

Instead, simply have sed take its input from the file:

sed 's/:$//' wget.txt | url_matcher

Instead of using $RANDOM and mkdir, we can use the mktemp command (GNU coreutils) to create us a new directory (without risk of conflict with an existing one):

tempfolder=$(mktemp -d)

I think that now removes the only Bashism in the script.

dir=$(mktemp -d)

cleanup() url_matcher
 rm -rf -- "$dir"


trap cleanup EXIT

wget -nd --spider --recursive --no-verbose --output-file="$dir/wget.txt" "$1"

wget -nd --spider --recursive --no-verbose --output-file=/dev/stdout "$1" 
 | sed -e 's/:$//' | url_matcher

If we're happy to accept a Bash dependency again, we can use process substitution:

wget -nd --spider --recursive --no-verbose 
 --output-file=>(sed -e 's/:$//' | url_matcher) "$1"

However, neither of those dependencies are necessary, as wget follows the common convention of treating the filename - specially, to mean "standard output stream":

wget -nd --spider --recursive --no-verbose --output-file=- "$1" 
 | sed -e 's/:$//' | url_matcher

Modified code

#!/bin/bash

set -uo pipefail

exec wget -nd --spider --recursive --no-verbose --output-file=- "$1" 
 | sed -e 's/:$//' | url_matcher

edited Jul 13 at 8:41

answered Jul 12 at 8:56

Toby Speight

17.2k13486

Neat ! Why can't I put the sed ... after the wget though ?
â€“Â Pierre Antoine Guillaume
Jul 12 at 9:42

I'm not sure what you mean by "put the sed after the wget" - do you mean that you'd like to pipe the wget output into sed? That's certainly possible, and I'll edit to show how.
â€“Â Toby Speight
Jul 12 at 9:44

that'd be good, sure. At first, I used to put the sed outside of the cleanup, after the wget. And for "long" crawls (with wget running for a few minutes), the cat - now sed - would not take place at all.
â€“Â Pierre Antoine Guillaume
Jul 12 at 9:46

Instead of --output-file, why not use -O-?
â€“Â janos
Jul 13 at 5:00

@janos, why not, indeed? I just forgot that wget observes the - convention. I'll edit that in.
â€“Â Toby Speight
Jul 13 at 7:56

add a commentÂ |Â

up vote
2
down vote

accepted

There's no need to use cat here:

cat wget.txt | sed 's/:$//' | url_matcher

Instead, simply have sed take its input from the file:

sed 's/:$//' wget.txt | url_matcher

Instead of using $RANDOM and mkdir, we can use the mktemp command (GNU coreutils) to create us a new directory (without risk of conflict with an existing one):

tempfolder=$(mktemp -d)

I think that now removes the only Bashism in the script.

dir=$(mktemp -d)

cleanup() url_matcher
 rm -rf -- "$dir"


trap cleanup EXIT

wget -nd --spider --recursive --no-verbose --output-file="$dir/wget.txt" "$1"

wget -nd --spider --recursive --no-verbose --output-file=/dev/stdout "$1" 
 | sed -e 's/:$//' | url_matcher

If we're happy to accept a Bash dependency again, we can use process substitution:

wget -nd --spider --recursive --no-verbose 
 --output-file=>(sed -e 's/:$//' | url_matcher) "$1"

However, neither of those dependencies are necessary, as wget follows the common convention of treating the filename - specially, to mean "standard output stream":

wget -nd --spider --recursive --no-verbose --output-file=- "$1" 
 | sed -e 's/:$//' | url_matcher

Modified code

#!/bin/bash

set -uo pipefail

exec wget -nd --spider --recursive --no-verbose --output-file=- "$1" 
 | sed -e 's/:$//' | url_matcher

edited Jul 13 at 8:41

answered Jul 12 at 8:56

Toby Speight

17.2k13486

There's no need to use cat here:

cat wget.txt | sed 's/:$//' | url_matcher

Instead, simply have sed take its input from the file:

sed 's/:$//' wget.txt | url_matcher

Instead of using $RANDOM and mkdir, we can use the mktemp command (GNU coreutils) to create us a new directory (without risk of conflict with an existing one):

tempfolder=$(mktemp -d)

I think that now removes the only Bashism in the script.

dir=$(mktemp -d)

cleanup() url_matcher
 rm -rf -- "$dir"


trap cleanup EXIT

wget -nd --spider --recursive --no-verbose --output-file="$dir/wget.txt" "$1"

wget -nd --spider --recursive --no-verbose --output-file=/dev/stdout "$1" 
 | sed -e 's/:$//' | url_matcher

If we're happy to accept a Bash dependency again, we can use process substitution:

wget -nd --spider --recursive --no-verbose 
 --output-file=>(sed -e 's/:$//' | url_matcher) "$1"

However, neither of those dependencies are necessary, as wget follows the common convention of treating the filename - specially, to mean "standard output stream":

wget -nd --spider --recursive --no-verbose --output-file=- "$1" 
 | sed -e 's/:$//' | url_matcher

Modified code

#!/bin/bash

set -uo pipefail

exec wget -nd --spider --recursive --no-verbose --output-file=- "$1" 
 | sed -e 's/:$//' | url_matcher

edited Jul 13 at 8:41

answered Jul 12 at 8:56

Toby Speight

17.2k13486

edited Jul 13 at 8:41

answered Jul 12 at 8:56

Toby Speight

17.2k13486

answered Jul 12 at 8:56

Toby Speight

17.2k13486

answered Jul 12 at 8:56

Toby Speight

17.2k13486

Neat ! Why can't I put the sed ... after the wget though ?
â€“Â Pierre Antoine Guillaume
Jul 12 at 9:42

I'm not sure what you mean by "put the sed after the wget" - do you mean that you'd like to pipe the wget output into sed? That's certainly possible, and I'll edit to show how.
â€“Â Toby Speight
Jul 12 at 9:44

that'd be good, sure. At first, I used to put the sed outside of the cleanup, after the wget. And for "long" crawls (with wget running for a few minutes), the cat - now sed - would not take place at all.
â€“Â Pierre Antoine Guillaume
Jul 12 at 9:46

Instead of --output-file, why not use -O-?
â€“Â janos
Jul 13 at 5:00

@janos, why not, indeed? I just forgot that wget observes the - convention. I'll edit that in.
â€“Â Toby Speight
Jul 13 at 7:56

add a commentÂ |Â

Neat ! Why can't I put the sed ... after the wget though ?
â€“Â Pierre Antoine Guillaume
Jul 12 at 9:42

I'm not sure what you mean by "put the sed after the wget" - do you mean that you'd like to pipe the wget output into sed? That's certainly possible, and I'll edit to show how.
â€“Â Toby Speight
Jul 12 at 9:44

that'd be good, sure. At first, I used to put the sed outside of the cleanup, after the wget. And for "long" crawls (with wget running for a few minutes), the cat - now sed - would not take place at all.
â€“Â Pierre Antoine Guillaume
Jul 12 at 9:46

Instead of --output-file, why not use -O-?
â€“Â janos
Jul 13 at 5:00

@janos, why not, indeed? I just forgot that wget observes the - convention. I'll edit that in.
â€“Â Toby Speight
Jul 13 at 7:56

Neat ! Why can't I put the sed ... after the wget though ?
â€“Â Pierre Antoine Guillaume
Jul 12 at 9:42

I'm not sure what you mean by "put the sed after the wget" - do you mean that you'd like to pipe the wget output into sed? That's certainly possible, and I'll edit to show how.
â€“Â Toby Speight
Jul 12 at 9:44

that'd be good, sure. At first, I used to put the sed outside of the cleanup, after the wget. And for "long" crawls (with wget running for a few minutes), the cat - now sed - would not take place at all.
â€“Â Pierre Antoine Guillaume
Jul 12 at 9:46

Instead of --output-file, why not use -O-?
â€“Â janos
Jul 13 at 5:00

@janos, why not, indeed? I just forgot that wget observes the - convention. I'll edit that in.
â€“Â Toby Speight
Jul 13 at 7:56

add a commentÂ |Â

up vote
1
down vote

A few points on top of Toby's excellent review.

This is not correctly protected from word splitting, and will fail when the script's path contains spaces:

SOFT="$(basename $0)"

The only thing you need to double quote there is the $0:

SOFT=$(basename "$0")

Since RANDOM generates a random integer in the range 0 - 32767, the % 100000 is unnecessary.

answered Jul 13 at 5:11

janos

95.2k12119342

I think the popd was probably there to make the temporary directory not be the working directory when it's removed (possible confusion with attempting to umount a filesystem when a process is using it for working directory, perhaps?)
â€“Â Toby Speight
Jul 13 at 8:15

I really appreciate. I am going to change my boilerplate script asap ! The popd was there for the bad reasons ;) thank you ! The cd (or pushd) was there for a "good" reason : when I started the script, I didn't know of the wget -nd option. If you don't use it, you create a real mess of subfolders. hence, create a new tempfolder, cd into it and delete it afterward.
â€“Â Pierre Antoine Guillaume
Jul 13 at 12:39

add a commentÂ |Â

up vote
1
down vote

A few points on top of Toby's excellent review.

This is not correctly protected from word splitting, and will fail when the script's path contains spaces:

SOFT="$(basename $0)"

The only thing you need to double quote there is the $0:

SOFT=$(basename "$0")

Since RANDOM generates a random integer in the range 0 - 32767, the % 100000 is unnecessary.

answered Jul 13 at 5:11

janos

95.2k12119342

I think the popd was probably there to make the temporary directory not be the working directory when it's removed (possible confusion with attempting to umount a filesystem when a process is using it for working directory, perhaps?)
â€“Â Toby Speight
Jul 13 at 8:15

I really appreciate. I am going to change my boilerplate script asap ! The popd was there for the bad reasons ;) thank you ! The cd (or pushd) was there for a "good" reason : when I started the script, I didn't know of the wget -nd option. If you don't use it, you create a real mess of subfolders. hence, create a new tempfolder, cd into it and delete it afterward.
â€“Â Pierre Antoine Guillaume
Jul 13 at 12:39

add a commentÂ |Â

up vote
1
down vote

A few points on top of Toby's excellent review.

This is not correctly protected from word splitting, and will fail when the script's path contains spaces:

SOFT="$(basename $0)"

The only thing you need to double quote there is the $0:

SOFT=$(basename "$0")

Since RANDOM generates a random integer in the range 0 - 32767, the % 100000 is unnecessary.

answered Jul 13 at 5:11

janos

95.2k12119342

A few points on top of Toby's excellent review.

This is not correctly protected from word splitting, and will fail when the script's path contains spaces:

SOFT="$(basename $0)"

The only thing you need to double quote there is the $0:

SOFT=$(basename "$0")

Since RANDOM generates a random integer in the range 0 - 32767, the % 100000 is unnecessary.

answered Jul 13 at 5:11

janos

95.2k12119342

answered Jul 13 at 5:11

janos

95.2k12119342

answered Jul 13 at 5:11

janos

95.2k12119342

answered Jul 13 at 5:11

janos

95.2k12119342

I think the popd was probably there to make the temporary directory not be the working directory when it's removed (possible confusion with attempting to umount a filesystem when a process is using it for working directory, perhaps?)
â€“Â Toby Speight
Jul 13 at 8:15

I really appreciate. I am going to change my boilerplate script asap ! The popd was there for the bad reasons ;) thank you ! The cd (or pushd) was there for a "good" reason : when I started the script, I didn't know of the wget -nd option. If you don't use it, you create a real mess of subfolders. hence, create a new tempfolder, cd into it and delete it afterward.
â€“Â Pierre Antoine Guillaume
Jul 13 at 12:39

add a commentÂ |Â

I think the popd was probably there to make the temporary directory not be the working directory when it's removed (possible confusion with attempting to umount a filesystem when a process is using it for working directory, perhaps?)
â€“Â Toby Speight
Jul 13 at 8:15

I really appreciate. I am going to change my boilerplate script asap ! The popd was there for the bad reasons ;) thank you ! The cd (or pushd) was there for a "good" reason : when I started the script, I didn't know of the wget -nd option. If you don't use it, you create a real mess of subfolders. hence, create a new tempfolder, cd into it and delete it afterward.
â€“Â Pierre Antoine Guillaume
Jul 13 at 12:39

I think the popd was probably there to make the temporary directory not be the working directory when it's removed (possible confusion with attempting to umount a filesystem when a process is using it for working directory, perhaps?)
â€“Â Toby Speight
Jul 13 at 8:15

I really appreciate. I am going to change my boilerplate script asap ! The popd was there for the bad reasons ;) thank you ! The cd (or pushd) was there for a "good" reason : when I started the script, I didn't know of the wget -nd option. If you don't use it, you create a real mess of subfolders. hence, create a new tempfolder, cd into it and delete it afterward.
â€“Â Pierre Antoine Guillaume
Jul 13 at 12:39

add a commentÂ |Â

draft saved

draft discarded

draft saved

draft discarded

Post as a guest

Name

搜尋此網誌

trjhtr

Crawl and gather all the urls recursively in a domain

2 Answers
2

Modified code

Your Answer

Post as a guest

2 Answers
2

2 Answers
2

Modified code

Modified code

Modified code

Modified code

Post as a guest

Popular posts from this blog

Read files from a directory using Promises

Read an image with ADNS2610 optical sensor and Arduino Uno

Chat program with C++ and SFML

Crawl and gather all the urls recursively in a domain

2 Answers 2

Modified code

Your Answer

Sign up or log in

Post as a guest

Post as a guest

2 Answers 2

2 Answers 2

Modified code

Modified code

Modified code

Modified code

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

Read files from a directory using Promises

Read an image with ADNS2610 optical sensor and Arduino Uno

Chat program with C++ and SFML

2 Answers
2

2 Answers
2

2 Answers
2