Print code-fenced sections of a Markdown document
Clash Royale CLAN TAG#URR8PPP
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;
up vote
7
down vote
favorite
Original code and demo at this gist.
Given a Markdown document like
Here is some text,
and some more text.
```javascript
const message = "This is JavaScript!";
```
More text follows, and then
```javascript
console.log(message);
```
I want to print out the sections in code fences, not including the code
fences, separated by a single blank line:
const message = "This is JavaScript!";
console.log(message);
I came up with the following AWK script, which seems to do the job
nicely:
#!/usr/bin/awk -f
BEGIN in_code_block = 0
/^```/
if (!in_code_block)
in_code_block = 1;
first_line = 1;
else
in_code_block = 0;
print "";
if (in_code_block && !first_line)
print;
first_line = 0;
A goal is for the script to be dependency-minimal. I don't want to have
to install an implementation of CommonMark or an Erlang environment. AWK
fits this bill well.
Correspondingly, a non-goal is for this script to be correct in all
cases: IâÂÂm happy to accept false positives on lines starting with```inline code``` like this
, and similar edge cases.
IâÂÂm mostly looking for critique of my AWK, with respect to which I am a
total newbie. But any comments are welcome!
console markdown awk
add a comment |Â
up vote
7
down vote
favorite
Original code and demo at this gist.
Given a Markdown document like
Here is some text,
and some more text.
```javascript
const message = "This is JavaScript!";
```
More text follows, and then
```javascript
console.log(message);
```
I want to print out the sections in code fences, not including the code
fences, separated by a single blank line:
const message = "This is JavaScript!";
console.log(message);
I came up with the following AWK script, which seems to do the job
nicely:
#!/usr/bin/awk -f
BEGIN in_code_block = 0
/^```/
if (!in_code_block)
in_code_block = 1;
first_line = 1;
else
in_code_block = 0;
print "";
if (in_code_block && !first_line)
print;
first_line = 0;
A goal is for the script to be dependency-minimal. I don't want to have
to install an implementation of CommonMark or an Erlang environment. AWK
fits this bill well.
Correspondingly, a non-goal is for this script to be correct in all
cases: IâÂÂm happy to accept false positives on lines starting with```inline code``` like this
, and similar edge cases.
IâÂÂm mostly looking for critique of my AWK, with respect to which I am a
total newbie. But any comments are welcome!
console markdown awk
add a comment |Â
up vote
7
down vote
favorite
up vote
7
down vote
favorite
Original code and demo at this gist.
Given a Markdown document like
Here is some text,
and some more text.
```javascript
const message = "This is JavaScript!";
```
More text follows, and then
```javascript
console.log(message);
```
I want to print out the sections in code fences, not including the code
fences, separated by a single blank line:
const message = "This is JavaScript!";
console.log(message);
I came up with the following AWK script, which seems to do the job
nicely:
#!/usr/bin/awk -f
BEGIN in_code_block = 0
/^```/
if (!in_code_block)
in_code_block = 1;
first_line = 1;
else
in_code_block = 0;
print "";
if (in_code_block && !first_line)
print;
first_line = 0;
A goal is for the script to be dependency-minimal. I don't want to have
to install an implementation of CommonMark or an Erlang environment. AWK
fits this bill well.
Correspondingly, a non-goal is for this script to be correct in all
cases: IâÂÂm happy to accept false positives on lines starting with```inline code``` like this
, and similar edge cases.
IâÂÂm mostly looking for critique of my AWK, with respect to which I am a
total newbie. But any comments are welcome!
console markdown awk
Original code and demo at this gist.
Given a Markdown document like
Here is some text,
and some more text.
```javascript
const message = "This is JavaScript!";
```
More text follows, and then
```javascript
console.log(message);
```
I want to print out the sections in code fences, not including the code
fences, separated by a single blank line:
const message = "This is JavaScript!";
console.log(message);
I came up with the following AWK script, which seems to do the job
nicely:
#!/usr/bin/awk -f
BEGIN in_code_block = 0
/^```/
if (!in_code_block)
in_code_block = 1;
first_line = 1;
else
in_code_block = 0;
print "";
if (in_code_block && !first_line)
print;
first_line = 0;
A goal is for the script to be dependency-minimal. I don't want to have
to install an implementation of CommonMark or an Erlang environment. AWK
fits this bill well.
Correspondingly, a non-goal is for this script to be correct in all
cases: IâÂÂm happy to accept false positives on lines starting with```inline code``` like this
, and similar edge cases.
IâÂÂm mostly looking for critique of my AWK, with respect to which I am a
total newbie. But any comments are welcome!
console markdown awk
asked May 23 at 4:42
wchargin
7551417
7551417
add a comment |Â
add a comment |Â
3 Answers
3
active
oldest
votes
up vote
1
down vote
accepted
You could shorten the code with the next
statement , which skips the current line.
See in GNU.org AWK manual Next-Statement.
Also you can use your variable as a condition to the command without additional if()
inside.
BEGIN in_code_block = 0
/^```/
if (in_code_block)
print "";
in_code_block = ! in_code_block;
next
in_code_block print;
Tested with GNU Awk 4.1.3.
This is great! "Skip this record" is a better description of what I want to do than "toggle this flag". Thanks.
â wchargin
May 23 at 16:29
I'm accepting this answer because it includes, IMO, the most readable code. oliv's answer is indeed very cute and a nice one-liner to have in my pocket, but requires some thought to figure out what is going on. This one should be readable even to people who don't know AWK. Thanks to all answerersâÂÂI learned something from each. :-)
â wchargin
May 29 at 2:24
add a comment |Â
up vote
6
down vote
While your code looks ok, it could be improved greatly by making use of RS
(record separator) and NR
number of record, provided you're using GNU awk
.
awk -v RS='```[a-z]*n' '(NR+1)%2' file
In this case RS
is set such that it is catching everything between triple backticks with optional text.
The only awk
statement is to print one record out of two.
This is very cute. gawk is required so that a multi-characterRS
is treated as a regular expression, as opposed to having unspecified behavior, correct? (I note that this also removes all text after the closing ```, which is fine with me.) One question: why does usingRS='^```[a-z]*n'
(added start-of-line anchor) not work?
â wchargin
May 23 at 7:14
RS
is by default set ton
which means every line is anawk
record. ChangingRS
changes the meaning of^
and$
because you possibly have multi-lines record (which is the case here). So you cannot use^
inRS
in this case, but you could useRS='n```[a-z]*n'
â oliv
May 23 at 7:27
@wchargin I don't get you comment I note that this also removes all text after the closing ``` All text after a closing should backtick should not be printed, or did I miss something?
â oliv
May 23 at 7:46
It's fine for text after a closing backtick to not be printedâÂÂthis is what my original implementation did. Technically, a closing code fence may only be followed by whitespace (demo), but this is the kind of restriction that I'm happy to drop. RegardingRS
: it sounds like^
is matching beginning-of-document, not beginning-of-line, which is only slightly surprising to me. Good to know, in any case.
â wchargin
May 23 at 15:44
add a comment |Â
up vote
4
down vote
The Code looks perfect to me.
I thought about using the flip-flop operator, but since you take additional action at the beginning and the end of the code block, this may be difficult in this case.
/^```/, /^```/ ...
Maybe you want to experiment with that idea nevertheless. It may prove valuable in the future.
This is good to know; thanks! It looks like these are called "range patterns". I'll keep them in mind.
â wchargin
May 23 at 16:31
add a comment |Â
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
1
down vote
accepted
You could shorten the code with the next
statement , which skips the current line.
See in GNU.org AWK manual Next-Statement.
Also you can use your variable as a condition to the command without additional if()
inside.
BEGIN in_code_block = 0
/^```/
if (in_code_block)
print "";
in_code_block = ! in_code_block;
next
in_code_block print;
Tested with GNU Awk 4.1.3.
This is great! "Skip this record" is a better description of what I want to do than "toggle this flag". Thanks.
â wchargin
May 23 at 16:29
I'm accepting this answer because it includes, IMO, the most readable code. oliv's answer is indeed very cute and a nice one-liner to have in my pocket, but requires some thought to figure out what is going on. This one should be readable even to people who don't know AWK. Thanks to all answerersâÂÂI learned something from each. :-)
â wchargin
May 29 at 2:24
add a comment |Â
up vote
1
down vote
accepted
You could shorten the code with the next
statement , which skips the current line.
See in GNU.org AWK manual Next-Statement.
Also you can use your variable as a condition to the command without additional if()
inside.
BEGIN in_code_block = 0
/^```/
if (in_code_block)
print "";
in_code_block = ! in_code_block;
next
in_code_block print;
Tested with GNU Awk 4.1.3.
This is great! "Skip this record" is a better description of what I want to do than "toggle this flag". Thanks.
â wchargin
May 23 at 16:29
I'm accepting this answer because it includes, IMO, the most readable code. oliv's answer is indeed very cute and a nice one-liner to have in my pocket, but requires some thought to figure out what is going on. This one should be readable even to people who don't know AWK. Thanks to all answerersâÂÂI learned something from each. :-)
â wchargin
May 29 at 2:24
add a comment |Â
up vote
1
down vote
accepted
up vote
1
down vote
accepted
You could shorten the code with the next
statement , which skips the current line.
See in GNU.org AWK manual Next-Statement.
Also you can use your variable as a condition to the command without additional if()
inside.
BEGIN in_code_block = 0
/^```/
if (in_code_block)
print "";
in_code_block = ! in_code_block;
next
in_code_block print;
Tested with GNU Awk 4.1.3.
You could shorten the code with the next
statement , which skips the current line.
See in GNU.org AWK manual Next-Statement.
Also you can use your variable as a condition to the command without additional if()
inside.
BEGIN in_code_block = 0
/^```/
if (in_code_block)
print "";
in_code_block = ! in_code_block;
next
in_code_block print;
Tested with GNU Awk 4.1.3.
edited May 23 at 16:01
answered May 23 at 15:55
CiaPan
1,1351311
1,1351311
This is great! "Skip this record" is a better description of what I want to do than "toggle this flag". Thanks.
â wchargin
May 23 at 16:29
I'm accepting this answer because it includes, IMO, the most readable code. oliv's answer is indeed very cute and a nice one-liner to have in my pocket, but requires some thought to figure out what is going on. This one should be readable even to people who don't know AWK. Thanks to all answerersâÂÂI learned something from each. :-)
â wchargin
May 29 at 2:24
add a comment |Â
This is great! "Skip this record" is a better description of what I want to do than "toggle this flag". Thanks.
â wchargin
May 23 at 16:29
I'm accepting this answer because it includes, IMO, the most readable code. oliv's answer is indeed very cute and a nice one-liner to have in my pocket, but requires some thought to figure out what is going on. This one should be readable even to people who don't know AWK. Thanks to all answerersâÂÂI learned something from each. :-)
â wchargin
May 29 at 2:24
This is great! "Skip this record" is a better description of what I want to do than "toggle this flag". Thanks.
â wchargin
May 23 at 16:29
This is great! "Skip this record" is a better description of what I want to do than "toggle this flag". Thanks.
â wchargin
May 23 at 16:29
I'm accepting this answer because it includes, IMO, the most readable code. oliv's answer is indeed very cute and a nice one-liner to have in my pocket, but requires some thought to figure out what is going on. This one should be readable even to people who don't know AWK. Thanks to all answerersâÂÂI learned something from each. :-)
â wchargin
May 29 at 2:24
I'm accepting this answer because it includes, IMO, the most readable code. oliv's answer is indeed very cute and a nice one-liner to have in my pocket, but requires some thought to figure out what is going on. This one should be readable even to people who don't know AWK. Thanks to all answerersâÂÂI learned something from each. :-)
â wchargin
May 29 at 2:24
add a comment |Â
up vote
6
down vote
While your code looks ok, it could be improved greatly by making use of RS
(record separator) and NR
number of record, provided you're using GNU awk
.
awk -v RS='```[a-z]*n' '(NR+1)%2' file
In this case RS
is set such that it is catching everything between triple backticks with optional text.
The only awk
statement is to print one record out of two.
This is very cute. gawk is required so that a multi-characterRS
is treated as a regular expression, as opposed to having unspecified behavior, correct? (I note that this also removes all text after the closing ```, which is fine with me.) One question: why does usingRS='^```[a-z]*n'
(added start-of-line anchor) not work?
â wchargin
May 23 at 7:14
RS
is by default set ton
which means every line is anawk
record. ChangingRS
changes the meaning of^
and$
because you possibly have multi-lines record (which is the case here). So you cannot use^
inRS
in this case, but you could useRS='n```[a-z]*n'
â oliv
May 23 at 7:27
@wchargin I don't get you comment I note that this also removes all text after the closing ``` All text after a closing should backtick should not be printed, or did I miss something?
â oliv
May 23 at 7:46
It's fine for text after a closing backtick to not be printedâÂÂthis is what my original implementation did. Technically, a closing code fence may only be followed by whitespace (demo), but this is the kind of restriction that I'm happy to drop. RegardingRS
: it sounds like^
is matching beginning-of-document, not beginning-of-line, which is only slightly surprising to me. Good to know, in any case.
â wchargin
May 23 at 15:44
add a comment |Â
up vote
6
down vote
While your code looks ok, it could be improved greatly by making use of RS
(record separator) and NR
number of record, provided you're using GNU awk
.
awk -v RS='```[a-z]*n' '(NR+1)%2' file
In this case RS
is set such that it is catching everything between triple backticks with optional text.
The only awk
statement is to print one record out of two.
This is very cute. gawk is required so that a multi-characterRS
is treated as a regular expression, as opposed to having unspecified behavior, correct? (I note that this also removes all text after the closing ```, which is fine with me.) One question: why does usingRS='^```[a-z]*n'
(added start-of-line anchor) not work?
â wchargin
May 23 at 7:14
RS
is by default set ton
which means every line is anawk
record. ChangingRS
changes the meaning of^
and$
because you possibly have multi-lines record (which is the case here). So you cannot use^
inRS
in this case, but you could useRS='n```[a-z]*n'
â oliv
May 23 at 7:27
@wchargin I don't get you comment I note that this also removes all text after the closing ``` All text after a closing should backtick should not be printed, or did I miss something?
â oliv
May 23 at 7:46
It's fine for text after a closing backtick to not be printedâÂÂthis is what my original implementation did. Technically, a closing code fence may only be followed by whitespace (demo), but this is the kind of restriction that I'm happy to drop. RegardingRS
: it sounds like^
is matching beginning-of-document, not beginning-of-line, which is only slightly surprising to me. Good to know, in any case.
â wchargin
May 23 at 15:44
add a comment |Â
up vote
6
down vote
up vote
6
down vote
While your code looks ok, it could be improved greatly by making use of RS
(record separator) and NR
number of record, provided you're using GNU awk
.
awk -v RS='```[a-z]*n' '(NR+1)%2' file
In this case RS
is set such that it is catching everything between triple backticks with optional text.
The only awk
statement is to print one record out of two.
While your code looks ok, it could be improved greatly by making use of RS
(record separator) and NR
number of record, provided you're using GNU awk
.
awk -v RS='```[a-z]*n' '(NR+1)%2' file
In this case RS
is set such that it is catching everything between triple backticks with optional text.
The only awk
statement is to print one record out of two.
edited May 23 at 7:00
answered May 23 at 6:50
oliv
1914
1914
This is very cute. gawk is required so that a multi-characterRS
is treated as a regular expression, as opposed to having unspecified behavior, correct? (I note that this also removes all text after the closing ```, which is fine with me.) One question: why does usingRS='^```[a-z]*n'
(added start-of-line anchor) not work?
â wchargin
May 23 at 7:14
RS
is by default set ton
which means every line is anawk
record. ChangingRS
changes the meaning of^
and$
because you possibly have multi-lines record (which is the case here). So you cannot use^
inRS
in this case, but you could useRS='n```[a-z]*n'
â oliv
May 23 at 7:27
@wchargin I don't get you comment I note that this also removes all text after the closing ``` All text after a closing should backtick should not be printed, or did I miss something?
â oliv
May 23 at 7:46
It's fine for text after a closing backtick to not be printedâÂÂthis is what my original implementation did. Technically, a closing code fence may only be followed by whitespace (demo), but this is the kind of restriction that I'm happy to drop. RegardingRS
: it sounds like^
is matching beginning-of-document, not beginning-of-line, which is only slightly surprising to me. Good to know, in any case.
â wchargin
May 23 at 15:44
add a comment |Â
This is very cute. gawk is required so that a multi-characterRS
is treated as a regular expression, as opposed to having unspecified behavior, correct? (I note that this also removes all text after the closing ```, which is fine with me.) One question: why does usingRS='^```[a-z]*n'
(added start-of-line anchor) not work?
â wchargin
May 23 at 7:14
RS
is by default set ton
which means every line is anawk
record. ChangingRS
changes the meaning of^
and$
because you possibly have multi-lines record (which is the case here). So you cannot use^
inRS
in this case, but you could useRS='n```[a-z]*n'
â oliv
May 23 at 7:27
@wchargin I don't get you comment I note that this also removes all text after the closing ``` All text after a closing should backtick should not be printed, or did I miss something?
â oliv
May 23 at 7:46
It's fine for text after a closing backtick to not be printedâÂÂthis is what my original implementation did. Technically, a closing code fence may only be followed by whitespace (demo), but this is the kind of restriction that I'm happy to drop. RegardingRS
: it sounds like^
is matching beginning-of-document, not beginning-of-line, which is only slightly surprising to me. Good to know, in any case.
â wchargin
May 23 at 15:44
This is very cute. gawk is required so that a multi-character
RS
is treated as a regular expression, as opposed to having unspecified behavior, correct? (I note that this also removes all text after the closing ```, which is fine with me.) One question: why does using RS='^```[a-z]*n'
(added start-of-line anchor) not work?â wchargin
May 23 at 7:14
This is very cute. gawk is required so that a multi-character
RS
is treated as a regular expression, as opposed to having unspecified behavior, correct? (I note that this also removes all text after the closing ```, which is fine with me.) One question: why does using RS='^```[a-z]*n'
(added start-of-line anchor) not work?â wchargin
May 23 at 7:14
RS
is by default set to n
which means every line is an awk
record. Changing RS
changes the meaning of ^
and $
because you possibly have multi-lines record (which is the case here). So you cannot use ^
in RS
in this case, but you could use RS='n```[a-z]*n'
â oliv
May 23 at 7:27
RS
is by default set to n
which means every line is an awk
record. Changing RS
changes the meaning of ^
and $
because you possibly have multi-lines record (which is the case here). So you cannot use ^
in RS
in this case, but you could use RS='n```[a-z]*n'
â oliv
May 23 at 7:27
@wchargin I don't get you comment I note that this also removes all text after the closing ``` All text after a closing should backtick should not be printed, or did I miss something?
â oliv
May 23 at 7:46
@wchargin I don't get you comment I note that this also removes all text after the closing ``` All text after a closing should backtick should not be printed, or did I miss something?
â oliv
May 23 at 7:46
It's fine for text after a closing backtick to not be printedâÂÂthis is what my original implementation did. Technically, a closing code fence may only be followed by whitespace (demo), but this is the kind of restriction that I'm happy to drop. Regarding
RS
: it sounds like ^
is matching beginning-of-document, not beginning-of-line, which is only slightly surprising to me. Good to know, in any case.â wchargin
May 23 at 15:44
It's fine for text after a closing backtick to not be printedâÂÂthis is what my original implementation did. Technically, a closing code fence may only be followed by whitespace (demo), but this is the kind of restriction that I'm happy to drop. Regarding
RS
: it sounds like ^
is matching beginning-of-document, not beginning-of-line, which is only slightly surprising to me. Good to know, in any case.â wchargin
May 23 at 15:44
add a comment |Â
up vote
4
down vote
The Code looks perfect to me.
I thought about using the flip-flop operator, but since you take additional action at the beginning and the end of the code block, this may be difficult in this case.
/^```/, /^```/ ...
Maybe you want to experiment with that idea nevertheless. It may prove valuable in the future.
This is good to know; thanks! It looks like these are called "range patterns". I'll keep them in mind.
â wchargin
May 23 at 16:31
add a comment |Â
up vote
4
down vote
The Code looks perfect to me.
I thought about using the flip-flop operator, but since you take additional action at the beginning and the end of the code block, this may be difficult in this case.
/^```/, /^```/ ...
Maybe you want to experiment with that idea nevertheless. It may prove valuable in the future.
This is good to know; thanks! It looks like these are called "range patterns". I'll keep them in mind.
â wchargin
May 23 at 16:31
add a comment |Â
up vote
4
down vote
up vote
4
down vote
The Code looks perfect to me.
I thought about using the flip-flop operator, but since you take additional action at the beginning and the end of the code block, this may be difficult in this case.
/^```/, /^```/ ...
Maybe you want to experiment with that idea nevertheless. It may prove valuable in the future.
The Code looks perfect to me.
I thought about using the flip-flop operator, but since you take additional action at the beginning and the end of the code block, this may be difficult in this case.
/^```/, /^```/ ...
Maybe you want to experiment with that idea nevertheless. It may prove valuable in the future.
answered May 23 at 5:27
Roland Illig
10.4k11543
10.4k11543
This is good to know; thanks! It looks like these are called "range patterns". I'll keep them in mind.
â wchargin
May 23 at 16:31
add a comment |Â
This is good to know; thanks! It looks like these are called "range patterns". I'll keep them in mind.
â wchargin
May 23 at 16:31
This is good to know; thanks! It looks like these are called "range patterns". I'll keep them in mind.
â wchargin
May 23 at 16:31
This is good to know; thanks! It looks like these are called "range patterns". I'll keep them in mind.
â wchargin
May 23 at 16:31
add a comment |Â
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f194986%2fprint-code-fenced-sections-of-a-markdown-document%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password