Modify XML attribute based on element value

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP





.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;







up vote
0
down vote

favorite












I have some xml files that have some nodes like (among other nodes)



<disp-formula id="deqn*">
...tag1
</disp-formula>
<disp-formula id="deqn*">
...tag2
</disp-formula>
<disp-formula id="deqnxyz">
...tag3
...
...tag4
...tag5...
...
......tag6
</disp-formula>


I'm trying to get the values inside the string/strings tag which is inside the nodes with an attribute named id and modify it with the value/values inside its respective tag i.e. the output should be



<disp-formula id="deqn1">
...tag1
</disp-formula>
<disp-formula id="deqn2">
...tag2
</disp-formula>
<disp-formula id="deqn3-6">
...tag3
...
...tag4
...tag5...
...
......tag6
</disp-formula>


I've done



//for nodes containing single tag
Regex regex = new Regex(@"(?<=\tag)(w+)(?=)");
var xml = File.ReadAllText(@"D:Testsample.xml");
var xdoc = Xdocument.Parse(xml, LoadOptions.PreserveWhitespace);
var _descriptions = xdoc.Descendants("disp-formula")
.Where(x => regex.Match(x.Value).Success);
foreach (var description in _descriptions)

var _Result = regex.Match(description.Value).Value;
description.Attribute("id").Value = "deqn" + _Result;
xdoc.Save(@"D:Testsample.xml", SaveOptions.DisableFormatting);

//for nodes containing multiple tag's
var descriptions = xdoc.Descendants("disp-formula")
.Where(x => regex.Matches(x.Value).Count > 1);
foreach (var description in descriptions)

var p = regex.Matches(description.Value).Cast<Match>().Select(m => m.Value).ToArray();
var x = p[0];
var y = p[p.Count() - 1];
var Result = x + "-" + y;
description.Attribute("id").Value = "deqn" + Result;
xdoc.Save(@"D:Testsample.xml", SaveOptions.DisableFormatting);



How do I make this code more efficient?







share|improve this question



























    up vote
    0
    down vote

    favorite












    I have some xml files that have some nodes like (among other nodes)



    <disp-formula id="deqn*">
    ...tag1
    </disp-formula>
    <disp-formula id="deqn*">
    ...tag2
    </disp-formula>
    <disp-formula id="deqnxyz">
    ...tag3
    ...
    ...tag4
    ...tag5...
    ...
    ......tag6
    </disp-formula>


    I'm trying to get the values inside the string/strings tag which is inside the nodes with an attribute named id and modify it with the value/values inside its respective tag i.e. the output should be



    <disp-formula id="deqn1">
    ...tag1
    </disp-formula>
    <disp-formula id="deqn2">
    ...tag2
    </disp-formula>
    <disp-formula id="deqn3-6">
    ...tag3
    ...
    ...tag4
    ...tag5...
    ...
    ......tag6
    </disp-formula>


    I've done



    //for nodes containing single tag
    Regex regex = new Regex(@"(?<=\tag)(w+)(?=)");
    var xml = File.ReadAllText(@"D:Testsample.xml");
    var xdoc = Xdocument.Parse(xml, LoadOptions.PreserveWhitespace);
    var _descriptions = xdoc.Descendants("disp-formula")
    .Where(x => regex.Match(x.Value).Success);
    foreach (var description in _descriptions)

    var _Result = regex.Match(description.Value).Value;
    description.Attribute("id").Value = "deqn" + _Result;
    xdoc.Save(@"D:Testsample.xml", SaveOptions.DisableFormatting);

    //for nodes containing multiple tag's
    var descriptions = xdoc.Descendants("disp-formula")
    .Where(x => regex.Matches(x.Value).Count > 1);
    foreach (var description in descriptions)

    var p = regex.Matches(description.Value).Cast<Match>().Select(m => m.Value).ToArray();
    var x = p[0];
    var y = p[p.Count() - 1];
    var Result = x + "-" + y;
    description.Attribute("id").Value = "deqn" + Result;
    xdoc.Save(@"D:Testsample.xml", SaveOptions.DisableFormatting);



    How do I make this code more efficient?







    share|improve this question























      up vote
      0
      down vote

      favorite









      up vote
      0
      down vote

      favorite











      I have some xml files that have some nodes like (among other nodes)



      <disp-formula id="deqn*">
      ...tag1
      </disp-formula>
      <disp-formula id="deqn*">
      ...tag2
      </disp-formula>
      <disp-formula id="deqnxyz">
      ...tag3
      ...
      ...tag4
      ...tag5...
      ...
      ......tag6
      </disp-formula>


      I'm trying to get the values inside the string/strings tag which is inside the nodes with an attribute named id and modify it with the value/values inside its respective tag i.e. the output should be



      <disp-formula id="deqn1">
      ...tag1
      </disp-formula>
      <disp-formula id="deqn2">
      ...tag2
      </disp-formula>
      <disp-formula id="deqn3-6">
      ...tag3
      ...
      ...tag4
      ...tag5...
      ...
      ......tag6
      </disp-formula>


      I've done



      //for nodes containing single tag
      Regex regex = new Regex(@"(?<=\tag)(w+)(?=)");
      var xml = File.ReadAllText(@"D:Testsample.xml");
      var xdoc = Xdocument.Parse(xml, LoadOptions.PreserveWhitespace);
      var _descriptions = xdoc.Descendants("disp-formula")
      .Where(x => regex.Match(x.Value).Success);
      foreach (var description in _descriptions)

      var _Result = regex.Match(description.Value).Value;
      description.Attribute("id").Value = "deqn" + _Result;
      xdoc.Save(@"D:Testsample.xml", SaveOptions.DisableFormatting);

      //for nodes containing multiple tag's
      var descriptions = xdoc.Descendants("disp-formula")
      .Where(x => regex.Matches(x.Value).Count > 1);
      foreach (var description in descriptions)

      var p = regex.Matches(description.Value).Cast<Match>().Select(m => m.Value).ToArray();
      var x = p[0];
      var y = p[p.Count() - 1];
      var Result = x + "-" + y;
      description.Attribute("id").Value = "deqn" + Result;
      xdoc.Save(@"D:Testsample.xml", SaveOptions.DisableFormatting);



      How do I make this code more efficient?







      share|improve this question













      I have some xml files that have some nodes like (among other nodes)



      <disp-formula id="deqn*">
      ...tag1
      </disp-formula>
      <disp-formula id="deqn*">
      ...tag2
      </disp-formula>
      <disp-formula id="deqnxyz">
      ...tag3
      ...
      ...tag4
      ...tag5...
      ...
      ......tag6
      </disp-formula>


      I'm trying to get the values inside the string/strings tag which is inside the nodes with an attribute named id and modify it with the value/values inside its respective tag i.e. the output should be



      <disp-formula id="deqn1">
      ...tag1
      </disp-formula>
      <disp-formula id="deqn2">
      ...tag2
      </disp-formula>
      <disp-formula id="deqn3-6">
      ...tag3
      ...
      ...tag4
      ...tag5...
      ...
      ......tag6
      </disp-formula>


      I've done



      //for nodes containing single tag
      Regex regex = new Regex(@"(?<=\tag)(w+)(?=)");
      var xml = File.ReadAllText(@"D:Testsample.xml");
      var xdoc = Xdocument.Parse(xml, LoadOptions.PreserveWhitespace);
      var _descriptions = xdoc.Descendants("disp-formula")
      .Where(x => regex.Match(x.Value).Success);
      foreach (var description in _descriptions)

      var _Result = regex.Match(description.Value).Value;
      description.Attribute("id").Value = "deqn" + _Result;
      xdoc.Save(@"D:Testsample.xml", SaveOptions.DisableFormatting);

      //for nodes containing multiple tag's
      var descriptions = xdoc.Descendants("disp-formula")
      .Where(x => regex.Matches(x.Value).Count > 1);
      foreach (var description in descriptions)

      var p = regex.Matches(description.Value).Cast<Match>().Select(m => m.Value).ToArray();
      var x = p[0];
      var y = p[p.Count() - 1];
      var Result = x + "-" + y;
      description.Attribute("id").Value = "deqn" + Result;
      xdoc.Save(@"D:Testsample.xml", SaveOptions.DisableFormatting);



      How do I make this code more efficient?









      share|improve this question












      share|improve this question




      share|improve this question








      edited Jan 7 at 12:17









      t3chb0t

      32.1k54195




      32.1k54195









      asked Jan 7 at 11:42









      Don_B

      1255




      1255




















          1 Answer
          1






          active

          oldest

          votes

















          up vote
          3
          down vote



          accepted










          Performance



          • The most obvious improvement would be to only save the file once, after you've made all the necessary changes.

          • You're also doing a lot of duplicate regex matching. The Where calls do matching, and then the foreach loops have to look for the same matches again. There's also some overlap between the single and multiple match logic

          All of that can be simplified into a single foreach loop:



          // read file content

          foreach (var formulaNode in xdoc.Descendants("disp-formula"))

          var matches = regex.Matches(formulaNode.Value);
          if (matches.Count == 0)
          continue;

          var id = "deqn" + matches[0].Value;
          if (matches.Count > 1)
          id += "-" + matches[matches.Count - 1].Value;

          formulaNode.Attribute("id").Value = id;


          // save file


          • Do you have a specific reason for using a lookbehind and lookahead in your regex? \tag(w+) will be faster. The match value no longer contains just the tag ID, but because you're using a capture group it's still easy to obtain that ID: match.Groups[1].Value.

          • Instead of using Count(), you can use an array's Length property directly. Count() is a Linq method, and for arrays it'll just return their Length, but it does have to do some type checking, so it takes a tiny bit of extra work.

          Code quality



          • The filename is duplicated several times. Duplication makes code harder to maintain. Store the filename in a variable, or better: make it a method parameter.

          • Variable naming is a little inconsistent (camelCase, _leadingUnderscore, _PascalCase). camelCase is normally used for parameters and local variables, PascalCase for type, property and method names. Some people use leading underscores for private fields, others don't - but whatever approach you pick, being consistent will make your code easier to read and understand.

          • Some variable names are rather undescriptive: p, x, y. Something like tagIDs, firstTagID and lastTagID would make the code easier to understand.

          Other



          • The multi-match logic only looks at the first and last tag IDs. Does your input always contain sequential tag IDs, or could they come in a different order or contain 'gaps'? You may want to document this assumption - if you ever need to modify this code later on then at least you'll know what you were thinking back then.





          share|improve this answer





















          • my input does not always contain sequential tag IDs, or they could come in a different order or contain 'gaps'. How do I document this assumption? Also the contents inside the string tag.. could also be like tag1.2.6, tag2a, tag1(a) etc. so I'm using the regex as (?<=\tags?)[^]+(?=}), is it right?
            – Don_B
            Jan 8 at 2:54










          • Your regex will work, but it's slower than just \tags?[^]+}. So your tag IDs aren't always sequential, and they're not even always numerical? What's the use of a first-last name then? How are the new formula IDs meant to be used?
            – Pieter Witvoet
            Jan 8 at 7:48










          • The xml file/files are basically the xml versions of a book/article converted to xml from a pdf file by using a program given to us by our client. Most files have the tag IDs numeric in the article (pdf), some have alpha-numeric, some have decimal values, thats what the author's who wrote the article decided to use. Can't do anything about that...and we were told to use the numbering system that the source(pdf) has..
            – Don_B
            Jan 8 at 14:57










          Your Answer




          StackExchange.ifUsing("editor", function ()
          return StackExchange.using("mathjaxEditing", function ()
          StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
          StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
          );
          );
          , "mathjax-editing");

          StackExchange.ifUsing("editor", function ()
          StackExchange.using("externalEditor", function ()
          StackExchange.using("snippets", function ()
          StackExchange.snippets.init();
          );
          );
          , "code-snippets");

          StackExchange.ready(function()
          var channelOptions =
          tags: "".split(" "),
          id: "196"
          ;
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function()
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled)
          StackExchange.using("snippets", function()
          createEditor();
          );

          else
          createEditor();

          );

          function createEditor()
          StackExchange.prepareEditor(
          heartbeatType: 'answer',
          convertImagesToLinks: false,
          noModals: false,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: null,
          bindNavPrevention: true,
          postfix: "",
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          );



          );








           

          draft saved


          draft discarded


















          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f184502%2fmodify-xml-attribute-based-on-element-value%23new-answer', 'question_page');

          );

          Post as a guest






























          1 Answer
          1






          active

          oldest

          votes








          1 Answer
          1






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes








          up vote
          3
          down vote



          accepted










          Performance



          • The most obvious improvement would be to only save the file once, after you've made all the necessary changes.

          • You're also doing a lot of duplicate regex matching. The Where calls do matching, and then the foreach loops have to look for the same matches again. There's also some overlap between the single and multiple match logic

          All of that can be simplified into a single foreach loop:



          // read file content

          foreach (var formulaNode in xdoc.Descendants("disp-formula"))

          var matches = regex.Matches(formulaNode.Value);
          if (matches.Count == 0)
          continue;

          var id = "deqn" + matches[0].Value;
          if (matches.Count > 1)
          id += "-" + matches[matches.Count - 1].Value;

          formulaNode.Attribute("id").Value = id;


          // save file


          • Do you have a specific reason for using a lookbehind and lookahead in your regex? \tag(w+) will be faster. The match value no longer contains just the tag ID, but because you're using a capture group it's still easy to obtain that ID: match.Groups[1].Value.

          • Instead of using Count(), you can use an array's Length property directly. Count() is a Linq method, and for arrays it'll just return their Length, but it does have to do some type checking, so it takes a tiny bit of extra work.

          Code quality



          • The filename is duplicated several times. Duplication makes code harder to maintain. Store the filename in a variable, or better: make it a method parameter.

          • Variable naming is a little inconsistent (camelCase, _leadingUnderscore, _PascalCase). camelCase is normally used for parameters and local variables, PascalCase for type, property and method names. Some people use leading underscores for private fields, others don't - but whatever approach you pick, being consistent will make your code easier to read and understand.

          • Some variable names are rather undescriptive: p, x, y. Something like tagIDs, firstTagID and lastTagID would make the code easier to understand.

          Other



          • The multi-match logic only looks at the first and last tag IDs. Does your input always contain sequential tag IDs, or could they come in a different order or contain 'gaps'? You may want to document this assumption - if you ever need to modify this code later on then at least you'll know what you were thinking back then.





          share|improve this answer





















          • my input does not always contain sequential tag IDs, or they could come in a different order or contain 'gaps'. How do I document this assumption? Also the contents inside the string tag.. could also be like tag1.2.6, tag2a, tag1(a) etc. so I'm using the regex as (?<=\tags?)[^]+(?=}), is it right?
            – Don_B
            Jan 8 at 2:54










          • Your regex will work, but it's slower than just \tags?[^]+}. So your tag IDs aren't always sequential, and they're not even always numerical? What's the use of a first-last name then? How are the new formula IDs meant to be used?
            – Pieter Witvoet
            Jan 8 at 7:48










          • The xml file/files are basically the xml versions of a book/article converted to xml from a pdf file by using a program given to us by our client. Most files have the tag IDs numeric in the article (pdf), some have alpha-numeric, some have decimal values, thats what the author's who wrote the article decided to use. Can't do anything about that...and we were told to use the numbering system that the source(pdf) has..
            – Don_B
            Jan 8 at 14:57














          up vote
          3
          down vote



          accepted










          Performance



          • The most obvious improvement would be to only save the file once, after you've made all the necessary changes.

          • You're also doing a lot of duplicate regex matching. The Where calls do matching, and then the foreach loops have to look for the same matches again. There's also some overlap between the single and multiple match logic

          All of that can be simplified into a single foreach loop:



          // read file content

          foreach (var formulaNode in xdoc.Descendants("disp-formula"))

          var matches = regex.Matches(formulaNode.Value);
          if (matches.Count == 0)
          continue;

          var id = "deqn" + matches[0].Value;
          if (matches.Count > 1)
          id += "-" + matches[matches.Count - 1].Value;

          formulaNode.Attribute("id").Value = id;


          // save file


          • Do you have a specific reason for using a lookbehind and lookahead in your regex? \tag(w+) will be faster. The match value no longer contains just the tag ID, but because you're using a capture group it's still easy to obtain that ID: match.Groups[1].Value.

          • Instead of using Count(), you can use an array's Length property directly. Count() is a Linq method, and for arrays it'll just return their Length, but it does have to do some type checking, so it takes a tiny bit of extra work.

          Code quality



          • The filename is duplicated several times. Duplication makes code harder to maintain. Store the filename in a variable, or better: make it a method parameter.

          • Variable naming is a little inconsistent (camelCase, _leadingUnderscore, _PascalCase). camelCase is normally used for parameters and local variables, PascalCase for type, property and method names. Some people use leading underscores for private fields, others don't - but whatever approach you pick, being consistent will make your code easier to read and understand.

          • Some variable names are rather undescriptive: p, x, y. Something like tagIDs, firstTagID and lastTagID would make the code easier to understand.

          Other



          • The multi-match logic only looks at the first and last tag IDs. Does your input always contain sequential tag IDs, or could they come in a different order or contain 'gaps'? You may want to document this assumption - if you ever need to modify this code later on then at least you'll know what you were thinking back then.





          share|improve this answer





















          • my input does not always contain sequential tag IDs, or they could come in a different order or contain 'gaps'. How do I document this assumption? Also the contents inside the string tag.. could also be like tag1.2.6, tag2a, tag1(a) etc. so I'm using the regex as (?<=\tags?)[^]+(?=}), is it right?
            – Don_B
            Jan 8 at 2:54










          • Your regex will work, but it's slower than just \tags?[^]+}. So your tag IDs aren't always sequential, and they're not even always numerical? What's the use of a first-last name then? How are the new formula IDs meant to be used?
            – Pieter Witvoet
            Jan 8 at 7:48










          • The xml file/files are basically the xml versions of a book/article converted to xml from a pdf file by using a program given to us by our client. Most files have the tag IDs numeric in the article (pdf), some have alpha-numeric, some have decimal values, thats what the author's who wrote the article decided to use. Can't do anything about that...and we were told to use the numbering system that the source(pdf) has..
            – Don_B
            Jan 8 at 14:57












          up vote
          3
          down vote



          accepted







          up vote
          3
          down vote



          accepted






          Performance



          • The most obvious improvement would be to only save the file once, after you've made all the necessary changes.

          • You're also doing a lot of duplicate regex matching. The Where calls do matching, and then the foreach loops have to look for the same matches again. There's also some overlap between the single and multiple match logic

          All of that can be simplified into a single foreach loop:



          // read file content

          foreach (var formulaNode in xdoc.Descendants("disp-formula"))

          var matches = regex.Matches(formulaNode.Value);
          if (matches.Count == 0)
          continue;

          var id = "deqn" + matches[0].Value;
          if (matches.Count > 1)
          id += "-" + matches[matches.Count - 1].Value;

          formulaNode.Attribute("id").Value = id;


          // save file


          • Do you have a specific reason for using a lookbehind and lookahead in your regex? \tag(w+) will be faster. The match value no longer contains just the tag ID, but because you're using a capture group it's still easy to obtain that ID: match.Groups[1].Value.

          • Instead of using Count(), you can use an array's Length property directly. Count() is a Linq method, and for arrays it'll just return their Length, but it does have to do some type checking, so it takes a tiny bit of extra work.

          Code quality



          • The filename is duplicated several times. Duplication makes code harder to maintain. Store the filename in a variable, or better: make it a method parameter.

          • Variable naming is a little inconsistent (camelCase, _leadingUnderscore, _PascalCase). camelCase is normally used for parameters and local variables, PascalCase for type, property and method names. Some people use leading underscores for private fields, others don't - but whatever approach you pick, being consistent will make your code easier to read and understand.

          • Some variable names are rather undescriptive: p, x, y. Something like tagIDs, firstTagID and lastTagID would make the code easier to understand.

          Other



          • The multi-match logic only looks at the first and last tag IDs. Does your input always contain sequential tag IDs, or could they come in a different order or contain 'gaps'? You may want to document this assumption - if you ever need to modify this code later on then at least you'll know what you were thinking back then.





          share|improve this answer













          Performance



          • The most obvious improvement would be to only save the file once, after you've made all the necessary changes.

          • You're also doing a lot of duplicate regex matching. The Where calls do matching, and then the foreach loops have to look for the same matches again. There's also some overlap between the single and multiple match logic

          All of that can be simplified into a single foreach loop:



          // read file content

          foreach (var formulaNode in xdoc.Descendants("disp-formula"))

          var matches = regex.Matches(formulaNode.Value);
          if (matches.Count == 0)
          continue;

          var id = "deqn" + matches[0].Value;
          if (matches.Count > 1)
          id += "-" + matches[matches.Count - 1].Value;

          formulaNode.Attribute("id").Value = id;


          // save file


          • Do you have a specific reason for using a lookbehind and lookahead in your regex? \tag(w+) will be faster. The match value no longer contains just the tag ID, but because you're using a capture group it's still easy to obtain that ID: match.Groups[1].Value.

          • Instead of using Count(), you can use an array's Length property directly. Count() is a Linq method, and for arrays it'll just return their Length, but it does have to do some type checking, so it takes a tiny bit of extra work.

          Code quality



          • The filename is duplicated several times. Duplication makes code harder to maintain. Store the filename in a variable, or better: make it a method parameter.

          • Variable naming is a little inconsistent (camelCase, _leadingUnderscore, _PascalCase). camelCase is normally used for parameters and local variables, PascalCase for type, property and method names. Some people use leading underscores for private fields, others don't - but whatever approach you pick, being consistent will make your code easier to read and understand.

          • Some variable names are rather undescriptive: p, x, y. Something like tagIDs, firstTagID and lastTagID would make the code easier to understand.

          Other



          • The multi-match logic only looks at the first and last tag IDs. Does your input always contain sequential tag IDs, or could they come in a different order or contain 'gaps'? You may want to document this assumption - if you ever need to modify this code later on then at least you'll know what you were thinking back then.






          share|improve this answer













          share|improve this answer



          share|improve this answer











          answered Jan 7 at 20:28









          Pieter Witvoet

          3,611721




          3,611721











          • my input does not always contain sequential tag IDs, or they could come in a different order or contain 'gaps'. How do I document this assumption? Also the contents inside the string tag.. could also be like tag1.2.6, tag2a, tag1(a) etc. so I'm using the regex as (?<=\tags?)[^]+(?=}), is it right?
            – Don_B
            Jan 8 at 2:54










          • Your regex will work, but it's slower than just \tags?[^]+}. So your tag IDs aren't always sequential, and they're not even always numerical? What's the use of a first-last name then? How are the new formula IDs meant to be used?
            – Pieter Witvoet
            Jan 8 at 7:48










          • The xml file/files are basically the xml versions of a book/article converted to xml from a pdf file by using a program given to us by our client. Most files have the tag IDs numeric in the article (pdf), some have alpha-numeric, some have decimal values, thats what the author's who wrote the article decided to use. Can't do anything about that...and we were told to use the numbering system that the source(pdf) has..
            – Don_B
            Jan 8 at 14:57
















          • my input does not always contain sequential tag IDs, or they could come in a different order or contain 'gaps'. How do I document this assumption? Also the contents inside the string tag.. could also be like tag1.2.6, tag2a, tag1(a) etc. so I'm using the regex as (?<=\tags?)[^]+(?=}), is it right?
            – Don_B
            Jan 8 at 2:54










          • Your regex will work, but it's slower than just \tags?[^]+}. So your tag IDs aren't always sequential, and they're not even always numerical? What's the use of a first-last name then? How are the new formula IDs meant to be used?
            – Pieter Witvoet
            Jan 8 at 7:48










          • The xml file/files are basically the xml versions of a book/article converted to xml from a pdf file by using a program given to us by our client. Most files have the tag IDs numeric in the article (pdf), some have alpha-numeric, some have decimal values, thats what the author's who wrote the article decided to use. Can't do anything about that...and we were told to use the numbering system that the source(pdf) has..
            – Don_B
            Jan 8 at 14:57















          my input does not always contain sequential tag IDs, or they could come in a different order or contain 'gaps'. How do I document this assumption? Also the contents inside the string tag.. could also be like tag1.2.6, tag2a, tag1(a) etc. so I'm using the regex as (?<=\tags?)[^]+(?=}), is it right?
          – Don_B
          Jan 8 at 2:54




          my input does not always contain sequential tag IDs, or they could come in a different order or contain 'gaps'. How do I document this assumption? Also the contents inside the string tag.. could also be like tag1.2.6, tag2a, tag1(a) etc. so I'm using the regex as (?<=\tags?)[^]+(?=}), is it right?
          – Don_B
          Jan 8 at 2:54












          Your regex will work, but it's slower than just \tags?[^]+}. So your tag IDs aren't always sequential, and they're not even always numerical? What's the use of a first-last name then? How are the new formula IDs meant to be used?
          – Pieter Witvoet
          Jan 8 at 7:48




          Your regex will work, but it's slower than just \tags?[^]+}. So your tag IDs aren't always sequential, and they're not even always numerical? What's the use of a first-last name then? How are the new formula IDs meant to be used?
          – Pieter Witvoet
          Jan 8 at 7:48












          The xml file/files are basically the xml versions of a book/article converted to xml from a pdf file by using a program given to us by our client. Most files have the tag IDs numeric in the article (pdf), some have alpha-numeric, some have decimal values, thats what the author's who wrote the article decided to use. Can't do anything about that...and we were told to use the numbering system that the source(pdf) has..
          – Don_B
          Jan 8 at 14:57




          The xml file/files are basically the xml versions of a book/article converted to xml from a pdf file by using a program given to us by our client. Most files have the tag IDs numeric in the article (pdf), some have alpha-numeric, some have decimal values, thats what the author's who wrote the article decided to use. Can't do anything about that...and we were told to use the numbering system that the source(pdf) has..
          – Don_B
          Jan 8 at 14:57












           

          draft saved


          draft discarded


























           


          draft saved


          draft discarded














          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f184502%2fmodify-xml-attribute-based-on-element-value%23new-answer', 'question_page');

          );

          Post as a guest













































































          Popular posts from this blog

          Greedy Best First Search implementation in Rust

          Function to Return a JSON Like Objects Using VBA Collections and Arrays

          C++11 CLH Lock Implementation