Parser for json-like format in PHP

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP





.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;







up vote
1
down vote

favorite












At work they use a format similar to JSON but without quotes that looks like this:



foo:qux:1,quux:0, bar:


The reason they don't just use json is because the C# package Newtonsoft.Json can deserialize this like if it was regular json and it works. I need to use it in Php but json_decode is not as forgiving.



So here's my attempt at a simple parser:



<?php

namespace FooBar;


class NotJsonParser

const STEP_NAME = 0;
const STEP_VALUE = 1;

/**
* @param $string
* @return array
*/
public static function parseNotJSON($string)

$generator = self::stringIterator($string);
$data = self::parser($generator);
return $data[''];


/**
* @param Generator $generator
* @return array
*/
private static function parser(Generator $generator)

$data = ;
$step = self::STEP_NAME;
$name = '';
$value = '';

while ($generator->valid())
$i = $generator->current();
switch ($i)
case ' ':
case "n":
continue;
case '':
$generator->next();
$value = self::parser($generator);
$data[$name] = $value;
$step = self::STEP_NAME;
$name = '';
$value = '';
break;
case '':
if ($name)
$data[$name] = $value;

return $data;
case ',':
if ($name)
$data[$name] = $value;

$step = self::STEP_NAME;
$name = '';
$value = '';
break;
case ':':
$step = self::STEP_VALUE;
break;
default:
if ($step === self::STEP_NAME)
$name .= $i;
else
$value .= $i;


$generator->next();

return $data;


/**
* @param string $str
* @return Generator
*/
private static function stringIterator($str)

for ($i = 0; $i < strlen($str); $i++)
yield $str[$i];





And here's the usage:



>>> $result = FooBarNotJsonParser::parseNotJSON("foo:qux:1,quux:0, bar:");
=> [
"foo" => [
"qux" => "1",
"quux" => "0",
],
"bar" => ,
]


How could I improve this? I know it really lacks error handling. I don't mind that the numbers stay as strings. Also the format never goes more than 2 levels deep. Whitespace between tokens is non important but there shouldn't be whitespace inside the keys (e.g.: foo bar: baz should be an error).



Also how should I encode it back from an array to a string? I was thinking just using json_encode and then removing the quote characters.







share|improve this question

























    up vote
    1
    down vote

    favorite












    At work they use a format similar to JSON but without quotes that looks like this:



    foo:qux:1,quux:0, bar:


    The reason they don't just use json is because the C# package Newtonsoft.Json can deserialize this like if it was regular json and it works. I need to use it in Php but json_decode is not as forgiving.



    So here's my attempt at a simple parser:



    <?php

    namespace FooBar;


    class NotJsonParser

    const STEP_NAME = 0;
    const STEP_VALUE = 1;

    /**
    * @param $string
    * @return array
    */
    public static function parseNotJSON($string)

    $generator = self::stringIterator($string);
    $data = self::parser($generator);
    return $data[''];


    /**
    * @param Generator $generator
    * @return array
    */
    private static function parser(Generator $generator)

    $data = ;
    $step = self::STEP_NAME;
    $name = '';
    $value = '';

    while ($generator->valid())
    $i = $generator->current();
    switch ($i)
    case ' ':
    case "n":
    continue;
    case '':
    $generator->next();
    $value = self::parser($generator);
    $data[$name] = $value;
    $step = self::STEP_NAME;
    $name = '';
    $value = '';
    break;
    case '':
    if ($name)
    $data[$name] = $value;

    return $data;
    case ',':
    if ($name)
    $data[$name] = $value;

    $step = self::STEP_NAME;
    $name = '';
    $value = '';
    break;
    case ':':
    $step = self::STEP_VALUE;
    break;
    default:
    if ($step === self::STEP_NAME)
    $name .= $i;
    else
    $value .= $i;


    $generator->next();

    return $data;


    /**
    * @param string $str
    * @return Generator
    */
    private static function stringIterator($str)

    for ($i = 0; $i < strlen($str); $i++)
    yield $str[$i];





    And here's the usage:



    >>> $result = FooBarNotJsonParser::parseNotJSON("foo:qux:1,quux:0, bar:");
    => [
    "foo" => [
    "qux" => "1",
    "quux" => "0",
    ],
    "bar" => ,
    ]


    How could I improve this? I know it really lacks error handling. I don't mind that the numbers stay as strings. Also the format never goes more than 2 levels deep. Whitespace between tokens is non important but there shouldn't be whitespace inside the keys (e.g.: foo bar: baz should be an error).



    Also how should I encode it back from an array to a string? I was thinking just using json_encode and then removing the quote characters.







    share|improve this question





















      up vote
      1
      down vote

      favorite









      up vote
      1
      down vote

      favorite











      At work they use a format similar to JSON but without quotes that looks like this:



      foo:qux:1,quux:0, bar:


      The reason they don't just use json is because the C# package Newtonsoft.Json can deserialize this like if it was regular json and it works. I need to use it in Php but json_decode is not as forgiving.



      So here's my attempt at a simple parser:



      <?php

      namespace FooBar;


      class NotJsonParser

      const STEP_NAME = 0;
      const STEP_VALUE = 1;

      /**
      * @param $string
      * @return array
      */
      public static function parseNotJSON($string)

      $generator = self::stringIterator($string);
      $data = self::parser($generator);
      return $data[''];


      /**
      * @param Generator $generator
      * @return array
      */
      private static function parser(Generator $generator)

      $data = ;
      $step = self::STEP_NAME;
      $name = '';
      $value = '';

      while ($generator->valid())
      $i = $generator->current();
      switch ($i)
      case ' ':
      case "n":
      continue;
      case '':
      $generator->next();
      $value = self::parser($generator);
      $data[$name] = $value;
      $step = self::STEP_NAME;
      $name = '';
      $value = '';
      break;
      case '':
      if ($name)
      $data[$name] = $value;

      return $data;
      case ',':
      if ($name)
      $data[$name] = $value;

      $step = self::STEP_NAME;
      $name = '';
      $value = '';
      break;
      case ':':
      $step = self::STEP_VALUE;
      break;
      default:
      if ($step === self::STEP_NAME)
      $name .= $i;
      else
      $value .= $i;


      $generator->next();

      return $data;


      /**
      * @param string $str
      * @return Generator
      */
      private static function stringIterator($str)

      for ($i = 0; $i < strlen($str); $i++)
      yield $str[$i];





      And here's the usage:



      >>> $result = FooBarNotJsonParser::parseNotJSON("foo:qux:1,quux:0, bar:");
      => [
      "foo" => [
      "qux" => "1",
      "quux" => "0",
      ],
      "bar" => ,
      ]


      How could I improve this? I know it really lacks error handling. I don't mind that the numbers stay as strings. Also the format never goes more than 2 levels deep. Whitespace between tokens is non important but there shouldn't be whitespace inside the keys (e.g.: foo bar: baz should be an error).



      Also how should I encode it back from an array to a string? I was thinking just using json_encode and then removing the quote characters.







      share|improve this question











      At work they use a format similar to JSON but without quotes that looks like this:



      foo:qux:1,quux:0, bar:


      The reason they don't just use json is because the C# package Newtonsoft.Json can deserialize this like if it was regular json and it works. I need to use it in Php but json_decode is not as forgiving.



      So here's my attempt at a simple parser:



      <?php

      namespace FooBar;


      class NotJsonParser

      const STEP_NAME = 0;
      const STEP_VALUE = 1;

      /**
      * @param $string
      * @return array
      */
      public static function parseNotJSON($string)

      $generator = self::stringIterator($string);
      $data = self::parser($generator);
      return $data[''];


      /**
      * @param Generator $generator
      * @return array
      */
      private static function parser(Generator $generator)

      $data = ;
      $step = self::STEP_NAME;
      $name = '';
      $value = '';

      while ($generator->valid())
      $i = $generator->current();
      switch ($i)
      case ' ':
      case "n":
      continue;
      case '':
      $generator->next();
      $value = self::parser($generator);
      $data[$name] = $value;
      $step = self::STEP_NAME;
      $name = '';
      $value = '';
      break;
      case '':
      if ($name)
      $data[$name] = $value;

      return $data;
      case ',':
      if ($name)
      $data[$name] = $value;

      $step = self::STEP_NAME;
      $name = '';
      $value = '';
      break;
      case ':':
      $step = self::STEP_VALUE;
      break;
      default:
      if ($step === self::STEP_NAME)
      $name .= $i;
      else
      $value .= $i;


      $generator->next();

      return $data;


      /**
      * @param string $str
      * @return Generator
      */
      private static function stringIterator($str)

      for ($i = 0; $i < strlen($str); $i++)
      yield $str[$i];





      And here's the usage:



      >>> $result = FooBarNotJsonParser::parseNotJSON("foo:qux:1,quux:0, bar:");
      => [
      "foo" => [
      "qux" => "1",
      "quux" => "0",
      ],
      "bar" => ,
      ]


      How could I improve this? I know it really lacks error handling. I don't mind that the numbers stay as strings. Also the format never goes more than 2 levels deep. Whitespace between tokens is non important but there shouldn't be whitespace inside the keys (e.g.: foo bar: baz should be an error).



      Also how should I encode it back from an array to a string? I was thinking just using json_encode and then removing the quote characters.









      share|improve this question










      share|improve this question




      share|improve this question









      asked May 15 at 5:58









      solarc

      1062




      1062




















          1 Answer
          1






          active

          oldest

          votes

















          up vote
          2
          down vote













          The following workaround will wrap your keys and values with double quotes. Such hacks will always be vulnerable to edge cases. To avoid sprinting down a rabbit hole of possibilities (I can think of a few cases off the top of my head -- 1. keys/values already containing quotes 2. declared empty/null keys ...there will be more), I'll just provide a solution for your sample input.



          For now, I'm using w to ensure that the space before bar is not included. There are several ways to do this action, but I would need to have intimate knowledge of your project data to develop the expression that I feel is "best / most robust".



          If you discover any fringe cases that break this simple regex pattern, please update your question and I can create a patch.



          Code: (Demo)



          $unquoted_json = <<<NOTJSON
          foo:qux:1,quux:0, bar:
          NOTJSON;

          $quoted_json = preg_replace('~w[^:,]*~', '"$0"', $unquoted_json);
          $array = json_decode($quoted_json, true);
          var_export($array);
          echo "n---n";
          echo json_encode($array);


          Output:



          array (
          'foo' =>
          array (
          'qux' => '1',
          'quux' => '0',
          ),
          'bar' =>
          array (
          ),
          )
          ---
          "foo":"qux":"1","quux":"0","bar":





          share|improve this answer























          • That's an interesting take on it. I'll try it and see if it is better than the other approach. Thanks.
            – solarc
            Jun 3 at 21:06










          • The difficult thing about fabricated sample input is that we don't have a lot of certainty that the fabricated data is a true indicator of the quality of characters/data that can potentially exist in your actual project. Does your project data only use letters and numbers as key/value substrings? Might you have floats? indexed arrays? quotes? Might your keys/values contain one of the delimiting characters (:,)? Please improve your question by further clarifying the range of known/expected formats that the data may have. I want to see you find resolution. Some feedback would help me.
            – mickmackusa
            Jun 6 at 22:13










          Your Answer




          StackExchange.ifUsing("editor", function ()
          return StackExchange.using("mathjaxEditing", function ()
          StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
          StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
          );
          );
          , "mathjax-editing");

          StackExchange.ifUsing("editor", function ()
          StackExchange.using("externalEditor", function ()
          StackExchange.using("snippets", function ()
          StackExchange.snippets.init();
          );
          );
          , "code-snippets");

          StackExchange.ready(function()
          var channelOptions =
          tags: "".split(" "),
          id: "196"
          ;
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function()
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled)
          StackExchange.using("snippets", function()
          createEditor();
          );

          else
          createEditor();

          );

          function createEditor()
          StackExchange.prepareEditor(
          heartbeatType: 'answer',
          convertImagesToLinks: false,
          noModals: false,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: null,
          bindNavPrevention: true,
          postfix: "",
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          );



          );








           

          draft saved


          draft discarded


















          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f194427%2fparser-for-json-like-format-in-php%23new-answer', 'question_page');

          );

          Post as a guest






























          1 Answer
          1






          active

          oldest

          votes








          1 Answer
          1






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes








          up vote
          2
          down vote













          The following workaround will wrap your keys and values with double quotes. Such hacks will always be vulnerable to edge cases. To avoid sprinting down a rabbit hole of possibilities (I can think of a few cases off the top of my head -- 1. keys/values already containing quotes 2. declared empty/null keys ...there will be more), I'll just provide a solution for your sample input.



          For now, I'm using w to ensure that the space before bar is not included. There are several ways to do this action, but I would need to have intimate knowledge of your project data to develop the expression that I feel is "best / most robust".



          If you discover any fringe cases that break this simple regex pattern, please update your question and I can create a patch.



          Code: (Demo)



          $unquoted_json = <<<NOTJSON
          foo:qux:1,quux:0, bar:
          NOTJSON;

          $quoted_json = preg_replace('~w[^:,]*~', '"$0"', $unquoted_json);
          $array = json_decode($quoted_json, true);
          var_export($array);
          echo "n---n";
          echo json_encode($array);


          Output:



          array (
          'foo' =>
          array (
          'qux' => '1',
          'quux' => '0',
          ),
          'bar' =>
          array (
          ),
          )
          ---
          "foo":"qux":"1","quux":"0","bar":





          share|improve this answer























          • That's an interesting take on it. I'll try it and see if it is better than the other approach. Thanks.
            – solarc
            Jun 3 at 21:06










          • The difficult thing about fabricated sample input is that we don't have a lot of certainty that the fabricated data is a true indicator of the quality of characters/data that can potentially exist in your actual project. Does your project data only use letters and numbers as key/value substrings? Might you have floats? indexed arrays? quotes? Might your keys/values contain one of the delimiting characters (:,)? Please improve your question by further clarifying the range of known/expected formats that the data may have. I want to see you find resolution. Some feedback would help me.
            – mickmackusa
            Jun 6 at 22:13














          up vote
          2
          down vote













          The following workaround will wrap your keys and values with double quotes. Such hacks will always be vulnerable to edge cases. To avoid sprinting down a rabbit hole of possibilities (I can think of a few cases off the top of my head -- 1. keys/values already containing quotes 2. declared empty/null keys ...there will be more), I'll just provide a solution for your sample input.



          For now, I'm using w to ensure that the space before bar is not included. There are several ways to do this action, but I would need to have intimate knowledge of your project data to develop the expression that I feel is "best / most robust".



          If you discover any fringe cases that break this simple regex pattern, please update your question and I can create a patch.



          Code: (Demo)



          $unquoted_json = <<<NOTJSON
          foo:qux:1,quux:0, bar:
          NOTJSON;

          $quoted_json = preg_replace('~w[^:,]*~', '"$0"', $unquoted_json);
          $array = json_decode($quoted_json, true);
          var_export($array);
          echo "n---n";
          echo json_encode($array);


          Output:



          array (
          'foo' =>
          array (
          'qux' => '1',
          'quux' => '0',
          ),
          'bar' =>
          array (
          ),
          )
          ---
          "foo":"qux":"1","quux":"0","bar":





          share|improve this answer























          • That's an interesting take on it. I'll try it and see if it is better than the other approach. Thanks.
            – solarc
            Jun 3 at 21:06










          • The difficult thing about fabricated sample input is that we don't have a lot of certainty that the fabricated data is a true indicator of the quality of characters/data that can potentially exist in your actual project. Does your project data only use letters and numbers as key/value substrings? Might you have floats? indexed arrays? quotes? Might your keys/values contain one of the delimiting characters (:,)? Please improve your question by further clarifying the range of known/expected formats that the data may have. I want to see you find resolution. Some feedback would help me.
            – mickmackusa
            Jun 6 at 22:13












          up vote
          2
          down vote










          up vote
          2
          down vote









          The following workaround will wrap your keys and values with double quotes. Such hacks will always be vulnerable to edge cases. To avoid sprinting down a rabbit hole of possibilities (I can think of a few cases off the top of my head -- 1. keys/values already containing quotes 2. declared empty/null keys ...there will be more), I'll just provide a solution for your sample input.



          For now, I'm using w to ensure that the space before bar is not included. There are several ways to do this action, but I would need to have intimate knowledge of your project data to develop the expression that I feel is "best / most robust".



          If you discover any fringe cases that break this simple regex pattern, please update your question and I can create a patch.



          Code: (Demo)



          $unquoted_json = <<<NOTJSON
          foo:qux:1,quux:0, bar:
          NOTJSON;

          $quoted_json = preg_replace('~w[^:,]*~', '"$0"', $unquoted_json);
          $array = json_decode($quoted_json, true);
          var_export($array);
          echo "n---n";
          echo json_encode($array);


          Output:



          array (
          'foo' =>
          array (
          'qux' => '1',
          'quux' => '0',
          ),
          'bar' =>
          array (
          ),
          )
          ---
          "foo":"qux":"1","quux":"0","bar":





          share|improve this answer















          The following workaround will wrap your keys and values with double quotes. Such hacks will always be vulnerable to edge cases. To avoid sprinting down a rabbit hole of possibilities (I can think of a few cases off the top of my head -- 1. keys/values already containing quotes 2. declared empty/null keys ...there will be more), I'll just provide a solution for your sample input.



          For now, I'm using w to ensure that the space before bar is not included. There are several ways to do this action, but I would need to have intimate knowledge of your project data to develop the expression that I feel is "best / most robust".



          If you discover any fringe cases that break this simple regex pattern, please update your question and I can create a patch.



          Code: (Demo)



          $unquoted_json = <<<NOTJSON
          foo:qux:1,quux:0, bar:
          NOTJSON;

          $quoted_json = preg_replace('~w[^:,]*~', '"$0"', $unquoted_json);
          $array = json_decode($quoted_json, true);
          var_export($array);
          echo "n---n";
          echo json_encode($array);


          Output:



          array (
          'foo' =>
          array (
          'qux' => '1',
          'quux' => '0',
          ),
          'bar' =>
          array (
          ),
          )
          ---
          "foo":"qux":"1","quux":"0","bar":






          share|improve this answer















          share|improve this answer



          share|improve this answer








          edited May 31 at 23:26


























          answered May 31 at 23:15









          mickmackusa

          790112




          790112











          • That's an interesting take on it. I'll try it and see if it is better than the other approach. Thanks.
            – solarc
            Jun 3 at 21:06










          • The difficult thing about fabricated sample input is that we don't have a lot of certainty that the fabricated data is a true indicator of the quality of characters/data that can potentially exist in your actual project. Does your project data only use letters and numbers as key/value substrings? Might you have floats? indexed arrays? quotes? Might your keys/values contain one of the delimiting characters (:,)? Please improve your question by further clarifying the range of known/expected formats that the data may have. I want to see you find resolution. Some feedback would help me.
            – mickmackusa
            Jun 6 at 22:13
















          • That's an interesting take on it. I'll try it and see if it is better than the other approach. Thanks.
            – solarc
            Jun 3 at 21:06










          • The difficult thing about fabricated sample input is that we don't have a lot of certainty that the fabricated data is a true indicator of the quality of characters/data that can potentially exist in your actual project. Does your project data only use letters and numbers as key/value substrings? Might you have floats? indexed arrays? quotes? Might your keys/values contain one of the delimiting characters (:,)? Please improve your question by further clarifying the range of known/expected formats that the data may have. I want to see you find resolution. Some feedback would help me.
            – mickmackusa
            Jun 6 at 22:13















          That's an interesting take on it. I'll try it and see if it is better than the other approach. Thanks.
          – solarc
          Jun 3 at 21:06




          That's an interesting take on it. I'll try it and see if it is better than the other approach. Thanks.
          – solarc
          Jun 3 at 21:06












          The difficult thing about fabricated sample input is that we don't have a lot of certainty that the fabricated data is a true indicator of the quality of characters/data that can potentially exist in your actual project. Does your project data only use letters and numbers as key/value substrings? Might you have floats? indexed arrays? quotes? Might your keys/values contain one of the delimiting characters (:,)? Please improve your question by further clarifying the range of known/expected formats that the data may have. I want to see you find resolution. Some feedback would help me.
          – mickmackusa
          Jun 6 at 22:13




          The difficult thing about fabricated sample input is that we don't have a lot of certainty that the fabricated data is a true indicator of the quality of characters/data that can potentially exist in your actual project. Does your project data only use letters and numbers as key/value substrings? Might you have floats? indexed arrays? quotes? Might your keys/values contain one of the delimiting characters (:,)? Please improve your question by further clarifying the range of known/expected formats that the data may have. I want to see you find resolution. Some feedback would help me.
          – mickmackusa
          Jun 6 at 22:13












           

          draft saved


          draft discarded


























           


          draft saved


          draft discarded














          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f194427%2fparser-for-json-like-format-in-php%23new-answer', 'question_page');

          );

          Post as a guest













































































          Popular posts from this blog

          Python Lists

          Aion

          JavaScript Array Iteration Methods