Modify XML attribute based on element value

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;

up vote
0
down vote

favorite

I have some xml files that have some nodes like (among other nodes)

<disp-formula id="deqn*">
...tag1
</disp-formula>
<disp-formula id="deqn*">
...tag2
</disp-formula>
<disp-formula id="deqnxyz">
...tag3
...
...tag4
...tag5...
...
......tag6
</disp-formula>

I'm trying to get the values inside the string/strings tag which is inside the nodes with an attribute named id and modify it with the value/values inside its respective tag i.e. the output should be

<disp-formula id="deqn1">
...tag1
</disp-formula>
<disp-formula id="deqn2">
...tag2
</disp-formula>
<disp-formula id="deqn3-6">
...tag3
...
...tag4
...tag5...
...
......tag6
</disp-formula>

I've done

//for nodes containing single tag
Regex regex = new Regex(@"(?<=\tag)(w+)(?=)");
var xml = File.ReadAllText(@"D:Testsample.xml");
var xdoc = Xdocument.Parse(xml, LoadOptions.PreserveWhitespace);
var _descriptions = xdoc.Descendants("disp-formula")
 .Where(x => regex.Match(x.Value).Success);
foreach (var description in _descriptions)

 var _Result = regex.Match(description.Value).Value;
 description.Attribute("id").Value = "deqn" + _Result;
 xdoc.Save(@"D:Testsample.xml", SaveOptions.DisableFormatting);

//for nodes containing multiple tag's
var descriptions = xdoc.Descendants("disp-formula")
 .Where(x => regex.Matches(x.Value).Count > 1);
foreach (var description in descriptions)

 var p = regex.Matches(description.Value).Cast<Match>().Select(m => m.Value).ToArray();
 var x = p[0];
 var y = p[p.Count() - 1];
 var Result = x + "-" + y;
 description.Attribute("id").Value = "deqn" + Result;
 xdoc.Save(@"D:Testsample.xml", SaveOptions.DisableFormatting);

How do I make this code more efficient?

edited Jan 7 at 12:17

t3chb0t

32.1k54195

asked Jan 7 at 11:42

Don_B

1255

add a commentÂ |Â

up vote
0
down vote

favorite

I have some xml files that have some nodes like (among other nodes)

<disp-formula id="deqn*">
...tag1
</disp-formula>
<disp-formula id="deqn*">
...tag2
</disp-formula>
<disp-formula id="deqnxyz">
...tag3
...
...tag4
...tag5...
...
......tag6
</disp-formula>

I'm trying to get the values inside the string/strings tag which is inside the nodes with an attribute named id and modify it with the value/values inside its respective tag i.e. the output should be

<disp-formula id="deqn1">
...tag1
</disp-formula>
<disp-formula id="deqn2">
...tag2
</disp-formula>
<disp-formula id="deqn3-6">
...tag3
...
...tag4
...tag5...
...
......tag6
</disp-formula>

I've done

//for nodes containing single tag
Regex regex = new Regex(@"(?<=\tag)(w+)(?=)");
var xml = File.ReadAllText(@"D:Testsample.xml");
var xdoc = Xdocument.Parse(xml, LoadOptions.PreserveWhitespace);
var _descriptions = xdoc.Descendants("disp-formula")
 .Where(x => regex.Match(x.Value).Success);
foreach (var description in _descriptions)

 var _Result = regex.Match(description.Value).Value;
 description.Attribute("id").Value = "deqn" + _Result;
 xdoc.Save(@"D:Testsample.xml", SaveOptions.DisableFormatting);

//for nodes containing multiple tag's
var descriptions = xdoc.Descendants("disp-formula")
 .Where(x => regex.Matches(x.Value).Count > 1);
foreach (var description in descriptions)

 var p = regex.Matches(description.Value).Cast<Match>().Select(m => m.Value).ToArray();
 var x = p[0];
 var y = p[p.Count() - 1];
 var Result = x + "-" + y;
 description.Attribute("id").Value = "deqn" + Result;
 xdoc.Save(@"D:Testsample.xml", SaveOptions.DisableFormatting);

How do I make this code more efficient?

edited Jan 7 at 12:17

t3chb0t

32.1k54195

asked Jan 7 at 11:42

Don_B

1255

add a commentÂ |Â

up vote
0
down vote

favorite

I have some xml files that have some nodes like (among other nodes)

<disp-formula id="deqn*">
...tag1
</disp-formula>
<disp-formula id="deqn*">
...tag2
</disp-formula>
<disp-formula id="deqnxyz">
...tag3
...
...tag4
...tag5...
...
......tag6
</disp-formula>

I'm trying to get the values inside the string/strings tag which is inside the nodes with an attribute named id and modify it with the value/values inside its respective tag i.e. the output should be

<disp-formula id="deqn1">
...tag1
</disp-formula>
<disp-formula id="deqn2">
...tag2
</disp-formula>
<disp-formula id="deqn3-6">
...tag3
...
...tag4
...tag5...
...
......tag6
</disp-formula>

I've done

//for nodes containing single tag
Regex regex = new Regex(@"(?<=\tag)(w+)(?=)");
var xml = File.ReadAllText(@"D:Testsample.xml");
var xdoc = Xdocument.Parse(xml, LoadOptions.PreserveWhitespace);
var _descriptions = xdoc.Descendants("disp-formula")
 .Where(x => regex.Match(x.Value).Success);
foreach (var description in _descriptions)

 var _Result = regex.Match(description.Value).Value;
 description.Attribute("id").Value = "deqn" + _Result;
 xdoc.Save(@"D:Testsample.xml", SaveOptions.DisableFormatting);

//for nodes containing multiple tag's
var descriptions = xdoc.Descendants("disp-formula")
 .Where(x => regex.Matches(x.Value).Count > 1);
foreach (var description in descriptions)

 var p = regex.Matches(description.Value).Cast<Match>().Select(m => m.Value).ToArray();
 var x = p[0];
 var y = p[p.Count() - 1];
 var Result = x + "-" + y;
 description.Attribute("id").Value = "deqn" + Result;
 xdoc.Save(@"D:Testsample.xml", SaveOptions.DisableFormatting);

How do I make this code more efficient?

edited Jan 7 at 12:17

t3chb0t

32.1k54195

asked Jan 7 at 11:42

Don_B

1255

I have some xml files that have some nodes like (among other nodes)

<disp-formula id="deqn*">
...tag1
</disp-formula>
<disp-formula id="deqn*">
...tag2
</disp-formula>
<disp-formula id="deqnxyz">
...tag3
...
...tag4
...tag5...
...
......tag6
</disp-formula>

I'm trying to get the values inside the string/strings tag which is inside the nodes with an attribute named id and modify it with the value/values inside its respective tag i.e. the output should be

<disp-formula id="deqn1">
...tag1
</disp-formula>
<disp-formula id="deqn2">
...tag2
</disp-formula>
<disp-formula id="deqn3-6">
...tag3
...
...tag4
...tag5...
...
......tag6
</disp-formula>

I've done

//for nodes containing single tag
Regex regex = new Regex(@"(?<=\tag)(w+)(?=)");
var xml = File.ReadAllText(@"D:Testsample.xml");
var xdoc = Xdocument.Parse(xml, LoadOptions.PreserveWhitespace);
var _descriptions = xdoc.Descendants("disp-formula")
 .Where(x => regex.Match(x.Value).Success);
foreach (var description in _descriptions)

 var _Result = regex.Match(description.Value).Value;
 description.Attribute("id").Value = "deqn" + _Result;
 xdoc.Save(@"D:Testsample.xml", SaveOptions.DisableFormatting);

//for nodes containing multiple tag's
var descriptions = xdoc.Descendants("disp-formula")
 .Where(x => regex.Matches(x.Value).Count > 1);
foreach (var description in descriptions)

 var p = regex.Matches(description.Value).Cast<Match>().Select(m => m.Value).ToArray();
 var x = p[0];
 var y = p[p.Count() - 1];
 var Result = x + "-" + y;
 description.Attribute("id").Value = "deqn" + Result;
 xdoc.Save(@"D:Testsample.xml", SaveOptions.DisableFormatting);

How do I make this code more efficient?

edited Jan 7 at 12:17

t3chb0t

32.1k54195

asked Jan 7 at 11:42

Don_B

1255

edited Jan 7 at 12:17

t3chb0t

32.1k54195

edited Jan 7 at 12:17

t3chb0t

32.1k54195

edited Jan 7 at 12:17

t3chb0t

32.1k54195

asked Jan 7 at 11:42

Don_B

1255

asked Jan 7 at 11:42

Don_B

1255

asked Jan 7 at 11:42

Don_B

1255

add a commentÂ |Â

1 Answer
1

active

oldest

votes

up vote
3
down vote

accepted

Performance

The most obvious improvement would be to only save the file once, after you've made all the necessary changes.

You're also doing a lot of duplicate regex matching. The Where calls do matching, and then the foreach loops have to look for the same matches again. There's also some overlap between the single and multiple match logic

All of that can be simplified into a single foreach loop:

// read file content

foreach (var formulaNode in xdoc.Descendants("disp-formula"))

 var matches = regex.Matches(formulaNode.Value);
 if (matches.Count == 0)
 continue;

 var id = "deqn" + matches[0].Value;
 if (matches.Count > 1)
 id += "-" + matches[matches.Count - 1].Value;

 formulaNode.Attribute("id").Value = id;


// save file

Do you have a specific reason for using a lookbehind and lookahead in your regex? \tag(w+) will be faster. The match value no longer contains just the tag ID, but because you're using a capture group it's still easy to obtain that ID: match.Groups[1].Value.

Instead of using Count(), you can use an array's Length property directly. Count() is a Linq method, and for arrays it'll just return their Length, but it does have to do some type checking, so it takes a tiny bit of extra work.

Code quality

The filename is duplicated several times. Duplication makes code harder to maintain. Store the filename in a variable, or better: make it a method parameter.

Variable naming is a little inconsistent (camelCase, _leadingUnderscore, _PascalCase). camelCase is normally used for parameters and local variables, PascalCase for type, property and method names. Some people use leading underscores for private fields, others don't - but whatever approach you pick, being consistent will make your code easier to read and understand.

Some variable names are rather undescriptive: p, x, y. Something like tagIDs, firstTagID and lastTagID would make the code easier to understand.

Other

The multi-match logic only looks at the first and last tag IDs. Does your input always contain sequential tag IDs, or could they come in a different order or contain 'gaps'? You may want to document this assumption - if you ever need to modify this code later on then at least you'll know what you were thinking back then.

answered Jan 7 at 20:28

Pieter Witvoet

3,611721

my input does not always contain sequential tag IDs, or they could come in a different order or contain 'gaps'. How do I document this assumption? Also the contents inside the string tag.. could also be like tag1.2.6, tag2a, tag1(a) etc. so I'm using the regex as (?<=\tags?)[^]+(?=}), is it right?
â€“Â Don_B
Jan 8 at 2:54

Your regex will work, but it's slower than just \tags?[^]+}. So your tag IDs aren't always sequential, and they're not even always numerical? What's the use of a first-last name then? How are the new formula IDs meant to be used?
â€“Â Pieter Witvoet
Jan 8 at 7:48

The xml file/files are basically the xml versions of a book/article converted to xml from a pdf file by using a program given to us by our client. Most files have the tag IDs numeric in the article (pdf), some have alpha-numeric, some have decimal values, thats what the author's who wrote the article decided to use. Can't do anything about that...and we were told to use the numbering system that the source(pdf) has..
â€“Â Don_B
Jan 8 at 14:57

add a commentÂ |Â

Your Answer

StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
);
);
, "mathjax-editing");

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "196"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: false,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f184502%2fmodify-xml-attribute-based-on-element-value%23new-answer', 'question_page');

);

Post as a guest

Name

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

up vote
3
down vote

accepted

Performance

The most obvious improvement would be to only save the file once, after you've made all the necessary changes.

You're also doing a lot of duplicate regex matching. The Where calls do matching, and then the foreach loops have to look for the same matches again. There's also some overlap between the single and multiple match logic

All of that can be simplified into a single foreach loop:

// read file content

foreach (var formulaNode in xdoc.Descendants("disp-formula"))

 var matches = regex.Matches(formulaNode.Value);
 if (matches.Count == 0)
 continue;

 var id = "deqn" + matches[0].Value;
 if (matches.Count > 1)
 id += "-" + matches[matches.Count - 1].Value;

 formulaNode.Attribute("id").Value = id;


// save file

Do you have a specific reason for using a lookbehind and lookahead in your regex? \tag(w+) will be faster. The match value no longer contains just the tag ID, but because you're using a capture group it's still easy to obtain that ID: match.Groups[1].Value.

Instead of using Count(), you can use an array's Length property directly. Count() is a Linq method, and for arrays it'll just return their Length, but it does have to do some type checking, so it takes a tiny bit of extra work.

Code quality

The filename is duplicated several times. Duplication makes code harder to maintain. Store the filename in a variable, or better: make it a method parameter.

Variable naming is a little inconsistent (camelCase, _leadingUnderscore, _PascalCase). camelCase is normally used for parameters and local variables, PascalCase for type, property and method names. Some people use leading underscores for private fields, others don't - but whatever approach you pick, being consistent will make your code easier to read and understand.

Some variable names are rather undescriptive: p, x, y. Something like tagIDs, firstTagID and lastTagID would make the code easier to understand.

Other

The multi-match logic only looks at the first and last tag IDs. Does your input always contain sequential tag IDs, or could they come in a different order or contain 'gaps'? You may want to document this assumption - if you ever need to modify this code later on then at least you'll know what you were thinking back then.

answered Jan 7 at 20:28

Pieter Witvoet

3,611721

my input does not always contain sequential tag IDs, or they could come in a different order or contain 'gaps'. How do I document this assumption? Also the contents inside the string tag.. could also be like tag1.2.6, tag2a, tag1(a) etc. so I'm using the regex as (?<=\tags?)[^]+(?=}), is it right?
â€“Â Don_B
Jan 8 at 2:54

Your regex will work, but it's slower than just \tags?[^]+}. So your tag IDs aren't always sequential, and they're not even always numerical? What's the use of a first-last name then? How are the new formula IDs meant to be used?
â€“Â Pieter Witvoet
Jan 8 at 7:48

The xml file/files are basically the xml versions of a book/article converted to xml from a pdf file by using a program given to us by our client. Most files have the tag IDs numeric in the article (pdf), some have alpha-numeric, some have decimal values, thats what the author's who wrote the article decided to use. Can't do anything about that...and we were told to use the numbering system that the source(pdf) has..
â€“Â Don_B
Jan 8 at 14:57

add a commentÂ |Â

up vote
3
down vote

accepted

Performance

The most obvious improvement would be to only save the file once, after you've made all the necessary changes.

You're also doing a lot of duplicate regex matching. The Where calls do matching, and then the foreach loops have to look for the same matches again. There's also some overlap between the single and multiple match logic

All of that can be simplified into a single foreach loop:

// read file content

foreach (var formulaNode in xdoc.Descendants("disp-formula"))

 var matches = regex.Matches(formulaNode.Value);
 if (matches.Count == 0)
 continue;

 var id = "deqn" + matches[0].Value;
 if (matches.Count > 1)
 id += "-" + matches[matches.Count - 1].Value;

 formulaNode.Attribute("id").Value = id;


// save file

Do you have a specific reason for using a lookbehind and lookahead in your regex? \tag(w+) will be faster. The match value no longer contains just the tag ID, but because you're using a capture group it's still easy to obtain that ID: match.Groups[1].Value.

Instead of using Count(), you can use an array's Length property directly. Count() is a Linq method, and for arrays it'll just return their Length, but it does have to do some type checking, so it takes a tiny bit of extra work.

Code quality

The filename is duplicated several times. Duplication makes code harder to maintain. Store the filename in a variable, or better: make it a method parameter.

Variable naming is a little inconsistent (camelCase, _leadingUnderscore, _PascalCase). camelCase is normally used for parameters and local variables, PascalCase for type, property and method names. Some people use leading underscores for private fields, others don't - but whatever approach you pick, being consistent will make your code easier to read and understand.

Some variable names are rather undescriptive: p, x, y. Something like tagIDs, firstTagID and lastTagID would make the code easier to understand.

Other

The multi-match logic only looks at the first and last tag IDs. Does your input always contain sequential tag IDs, or could they come in a different order or contain 'gaps'? You may want to document this assumption - if you ever need to modify this code later on then at least you'll know what you were thinking back then.

answered Jan 7 at 20:28

Pieter Witvoet

3,611721

my input does not always contain sequential tag IDs, or they could come in a different order or contain 'gaps'. How do I document this assumption? Also the contents inside the string tag.. could also be like tag1.2.6, tag2a, tag1(a) etc. so I'm using the regex as (?<=\tags?)[^]+(?=}), is it right?
â€“Â Don_B
Jan 8 at 2:54

Your regex will work, but it's slower than just \tags?[^]+}. So your tag IDs aren't always sequential, and they're not even always numerical? What's the use of a first-last name then? How are the new formula IDs meant to be used?
â€“Â Pieter Witvoet
Jan 8 at 7:48

The xml file/files are basically the xml versions of a book/article converted to xml from a pdf file by using a program given to us by our client. Most files have the tag IDs numeric in the article (pdf), some have alpha-numeric, some have decimal values, thats what the author's who wrote the article decided to use. Can't do anything about that...and we were told to use the numbering system that the source(pdf) has..
â€“Â Don_B
Jan 8 at 14:57

add a commentÂ |Â

up vote
3
down vote

accepted

Performance

The most obvious improvement would be to only save the file once, after you've made all the necessary changes.

You're also doing a lot of duplicate regex matching. The Where calls do matching, and then the foreach loops have to look for the same matches again. There's also some overlap between the single and multiple match logic

All of that can be simplified into a single foreach loop:

// read file content

foreach (var formulaNode in xdoc.Descendants("disp-formula"))

 var matches = regex.Matches(formulaNode.Value);
 if (matches.Count == 0)
 continue;

 var id = "deqn" + matches[0].Value;
 if (matches.Count > 1)
 id += "-" + matches[matches.Count - 1].Value;

 formulaNode.Attribute("id").Value = id;


// save file

Do you have a specific reason for using a lookbehind and lookahead in your regex? \tag(w+) will be faster. The match value no longer contains just the tag ID, but because you're using a capture group it's still easy to obtain that ID: match.Groups[1].Value.

Instead of using Count(), you can use an array's Length property directly. Count() is a Linq method, and for arrays it'll just return their Length, but it does have to do some type checking, so it takes a tiny bit of extra work.

Code quality

The filename is duplicated several times. Duplication makes code harder to maintain. Store the filename in a variable, or better: make it a method parameter.

Variable naming is a little inconsistent (camelCase, _leadingUnderscore, _PascalCase). camelCase is normally used for parameters and local variables, PascalCase for type, property and method names. Some people use leading underscores for private fields, others don't - but whatever approach you pick, being consistent will make your code easier to read and understand.

Some variable names are rather undescriptive: p, x, y. Something like tagIDs, firstTagID and lastTagID would make the code easier to understand.

Other

The multi-match logic only looks at the first and last tag IDs. Does your input always contain sequential tag IDs, or could they come in a different order or contain 'gaps'? You may want to document this assumption - if you ever need to modify this code later on then at least you'll know what you were thinking back then.

answered Jan 7 at 20:28

Pieter Witvoet

3,611721

Performance

The most obvious improvement would be to only save the file once, after you've made all the necessary changes.

You're also doing a lot of duplicate regex matching. The Where calls do matching, and then the foreach loops have to look for the same matches again. There's also some overlap between the single and multiple match logic

All of that can be simplified into a single foreach loop:

// read file content

foreach (var formulaNode in xdoc.Descendants("disp-formula"))

 var matches = regex.Matches(formulaNode.Value);
 if (matches.Count == 0)
 continue;

 var id = "deqn" + matches[0].Value;
 if (matches.Count > 1)
 id += "-" + matches[matches.Count - 1].Value;

 formulaNode.Attribute("id").Value = id;


// save file

Do you have a specific reason for using a lookbehind and lookahead in your regex? \tag(w+) will be faster. The match value no longer contains just the tag ID, but because you're using a capture group it's still easy to obtain that ID: match.Groups[1].Value.

Instead of using Count(), you can use an array's Length property directly. Count() is a Linq method, and for arrays it'll just return their Length, but it does have to do some type checking, so it takes a tiny bit of extra work.

Code quality

The filename is duplicated several times. Duplication makes code harder to maintain. Store the filename in a variable, or better: make it a method parameter.

Variable naming is a little inconsistent (camelCase, _leadingUnderscore, _PascalCase). camelCase is normally used for parameters and local variables, PascalCase for type, property and method names. Some people use leading underscores for private fields, others don't - but whatever approach you pick, being consistent will make your code easier to read and understand.

Some variable names are rather undescriptive: p, x, y. Something like tagIDs, firstTagID and lastTagID would make the code easier to understand.

Other

The multi-match logic only looks at the first and last tag IDs. Does your input always contain sequential tag IDs, or could they come in a different order or contain 'gaps'? You may want to document this assumption - if you ever need to modify this code later on then at least you'll know what you were thinking back then.

answered Jan 7 at 20:28

Pieter Witvoet

3,611721

answered Jan 7 at 20:28

Pieter Witvoet

3,611721

answered Jan 7 at 20:28

Pieter Witvoet

3,611721

answered Jan 7 at 20:28

Pieter Witvoet

3,611721

my input does not always contain sequential tag IDs, or they could come in a different order or contain 'gaps'. How do I document this assumption? Also the contents inside the string tag.. could also be like tag1.2.6, tag2a, tag1(a) etc. so I'm using the regex as (?<=\tags?)[^]+(?=}), is it right?
â€“Â Don_B
Jan 8 at 2:54

Your regex will work, but it's slower than just \tags?[^]+}. So your tag IDs aren't always sequential, and they're not even always numerical? What's the use of a first-last name then? How are the new formula IDs meant to be used?
â€“Â Pieter Witvoet
Jan 8 at 7:48

The xml file/files are basically the xml versions of a book/article converted to xml from a pdf file by using a program given to us by our client. Most files have the tag IDs numeric in the article (pdf), some have alpha-numeric, some have decimal values, thats what the author's who wrote the article decided to use. Can't do anything about that...and we were told to use the numbering system that the source(pdf) has..
â€“Â Don_B
Jan 8 at 14:57

add a commentÂ |Â

my input does not always contain sequential tag IDs, or they could come in a different order or contain 'gaps'. How do I document this assumption? Also the contents inside the string tag.. could also be like tag1.2.6, tag2a, tag1(a) etc. so I'm using the regex as (?<=\tags?)[^]+(?=}), is it right?
â€“Â Don_B
Jan 8 at 2:54

Your regex will work, but it's slower than just \tags?[^]+}. So your tag IDs aren't always sequential, and they're not even always numerical? What's the use of a first-last name then? How are the new formula IDs meant to be used?
â€“Â Pieter Witvoet
Jan 8 at 7:48

The xml file/files are basically the xml versions of a book/article converted to xml from a pdf file by using a program given to us by our client. Most files have the tag IDs numeric in the article (pdf), some have alpha-numeric, some have decimal values, thats what the author's who wrote the article decided to use. Can't do anything about that...and we were told to use the numbering system that the source(pdf) has..
â€“Â Don_B
Jan 8 at 14:57

my input does not always contain sequential tag IDs, or they could come in a different order or contain 'gaps'. How do I document this assumption? Also the contents inside the string tag.. could also be like tag1.2.6, tag2a, tag1(a) etc. so I'm using the regex as (?<=\tags?)[^]+(?=}), is it right?
â€“Â Don_B
Jan 8 at 2:54

Your regex will work, but it's slower than just \tags?[^]+}. So your tag IDs aren't always sequential, and they're not even always numerical? What's the use of a first-last name then? How are the new formula IDs meant to be used?
â€“Â Pieter Witvoet
Jan 8 at 7:48

The xml file/files are basically the xml versions of a book/article converted to xml from a pdf file by using a program given to us by our client. Most files have the tag IDs numeric in the article (pdf), some have alpha-numeric, some have decimal values, thats what the author's who wrote the article decided to use. Can't do anything about that...and we were told to use the numbering system that the source(pdf) has..
â€“Â Don_B
Jan 8 at 14:57

add a commentÂ |Â

draft saved

draft discarded

draft saved

draft discarded

Post as a guest

Name

搜尋此網誌

trjhtr