Parser for json-like format in PHP

Clash Royale CLAN TAG#URR8PPP
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;
up vote
1
down vote
favorite
At work they use a format similar to JSON but without quotes that looks like this:
foo:qux:1,quux:0, bar:
The reason they don't just use json is because the C# package Newtonsoft.Json can deserialize this like if it was regular json and it works. I need to use it in Php but json_decode is not as forgiving.
So here's my attempt at a simple parser:
<?php
namespace FooBar;
class NotJsonParser
const STEP_NAME = 0;
const STEP_VALUE = 1;
/**
* @param $string
* @return array
*/
public static function parseNotJSON($string)
$generator = self::stringIterator($string);
$data = self::parser($generator);
return $data[''];
/**
* @param Generator $generator
* @return array
*/
private static function parser(Generator $generator)
$data = ;
$step = self::STEP_NAME;
$name = '';
$value = '';
while ($generator->valid())
$i = $generator->current();
switch ($i)
case ' ':
case "n":
continue;
case '':
$generator->next();
$value = self::parser($generator);
$data[$name] = $value;
$step = self::STEP_NAME;
$name = '';
$value = '';
break;
case '':
if ($name)
$data[$name] = $value;
return $data;
case ',':
if ($name)
$data[$name] = $value;
$step = self::STEP_NAME;
$name = '';
$value = '';
break;
case ':':
$step = self::STEP_VALUE;
break;
default:
if ($step === self::STEP_NAME)
$name .= $i;
else
$value .= $i;
$generator->next();
return $data;
/**
* @param string $str
* @return Generator
*/
private static function stringIterator($str)
for ($i = 0; $i < strlen($str); $i++)
yield $str[$i];
And here's the usage:
>>> $result = FooBarNotJsonParser::parseNotJSON("foo:qux:1,quux:0, bar:");
=> [
"foo" => [
"qux" => "1",
"quux" => "0",
],
"bar" => ,
]
How could I improve this? I know it really lacks error handling. I don't mind that the numbers stay as strings. Also the format never goes more than 2 levels deep. Whitespace between tokens is non important but there shouldn't be whitespace inside the keys (e.g.: foo bar: baz should be an error).
Also how should I encode it back from an array to a string? I was thinking just using json_encode and then removing the quote characters.
php parsing
add a comment |Â
up vote
1
down vote
favorite
At work they use a format similar to JSON but without quotes that looks like this:
foo:qux:1,quux:0, bar:
The reason they don't just use json is because the C# package Newtonsoft.Json can deserialize this like if it was regular json and it works. I need to use it in Php but json_decode is not as forgiving.
So here's my attempt at a simple parser:
<?php
namespace FooBar;
class NotJsonParser
const STEP_NAME = 0;
const STEP_VALUE = 1;
/**
* @param $string
* @return array
*/
public static function parseNotJSON($string)
$generator = self::stringIterator($string);
$data = self::parser($generator);
return $data[''];
/**
* @param Generator $generator
* @return array
*/
private static function parser(Generator $generator)
$data = ;
$step = self::STEP_NAME;
$name = '';
$value = '';
while ($generator->valid())
$i = $generator->current();
switch ($i)
case ' ':
case "n":
continue;
case '':
$generator->next();
$value = self::parser($generator);
$data[$name] = $value;
$step = self::STEP_NAME;
$name = '';
$value = '';
break;
case '':
if ($name)
$data[$name] = $value;
return $data;
case ',':
if ($name)
$data[$name] = $value;
$step = self::STEP_NAME;
$name = '';
$value = '';
break;
case ':':
$step = self::STEP_VALUE;
break;
default:
if ($step === self::STEP_NAME)
$name .= $i;
else
$value .= $i;
$generator->next();
return $data;
/**
* @param string $str
* @return Generator
*/
private static function stringIterator($str)
for ($i = 0; $i < strlen($str); $i++)
yield $str[$i];
And here's the usage:
>>> $result = FooBarNotJsonParser::parseNotJSON("foo:qux:1,quux:0, bar:");
=> [
"foo" => [
"qux" => "1",
"quux" => "0",
],
"bar" => ,
]
How could I improve this? I know it really lacks error handling. I don't mind that the numbers stay as strings. Also the format never goes more than 2 levels deep. Whitespace between tokens is non important but there shouldn't be whitespace inside the keys (e.g.: foo bar: baz should be an error).
Also how should I encode it back from an array to a string? I was thinking just using json_encode and then removing the quote characters.
php parsing
add a comment |Â
up vote
1
down vote
favorite
up vote
1
down vote
favorite
At work they use a format similar to JSON but without quotes that looks like this:
foo:qux:1,quux:0, bar:
The reason they don't just use json is because the C# package Newtonsoft.Json can deserialize this like if it was regular json and it works. I need to use it in Php but json_decode is not as forgiving.
So here's my attempt at a simple parser:
<?php
namespace FooBar;
class NotJsonParser
const STEP_NAME = 0;
const STEP_VALUE = 1;
/**
* @param $string
* @return array
*/
public static function parseNotJSON($string)
$generator = self::stringIterator($string);
$data = self::parser($generator);
return $data[''];
/**
* @param Generator $generator
* @return array
*/
private static function parser(Generator $generator)
$data = ;
$step = self::STEP_NAME;
$name = '';
$value = '';
while ($generator->valid())
$i = $generator->current();
switch ($i)
case ' ':
case "n":
continue;
case '':
$generator->next();
$value = self::parser($generator);
$data[$name] = $value;
$step = self::STEP_NAME;
$name = '';
$value = '';
break;
case '':
if ($name)
$data[$name] = $value;
return $data;
case ',':
if ($name)
$data[$name] = $value;
$step = self::STEP_NAME;
$name = '';
$value = '';
break;
case ':':
$step = self::STEP_VALUE;
break;
default:
if ($step === self::STEP_NAME)
$name .= $i;
else
$value .= $i;
$generator->next();
return $data;
/**
* @param string $str
* @return Generator
*/
private static function stringIterator($str)
for ($i = 0; $i < strlen($str); $i++)
yield $str[$i];
And here's the usage:
>>> $result = FooBarNotJsonParser::parseNotJSON("foo:qux:1,quux:0, bar:");
=> [
"foo" => [
"qux" => "1",
"quux" => "0",
],
"bar" => ,
]
How could I improve this? I know it really lacks error handling. I don't mind that the numbers stay as strings. Also the format never goes more than 2 levels deep. Whitespace between tokens is non important but there shouldn't be whitespace inside the keys (e.g.: foo bar: baz should be an error).
Also how should I encode it back from an array to a string? I was thinking just using json_encode and then removing the quote characters.
php parsing
At work they use a format similar to JSON but without quotes that looks like this:
foo:qux:1,quux:0, bar:
The reason they don't just use json is because the C# package Newtonsoft.Json can deserialize this like if it was regular json and it works. I need to use it in Php but json_decode is not as forgiving.
So here's my attempt at a simple parser:
<?php
namespace FooBar;
class NotJsonParser
const STEP_NAME = 0;
const STEP_VALUE = 1;
/**
* @param $string
* @return array
*/
public static function parseNotJSON($string)
$generator = self::stringIterator($string);
$data = self::parser($generator);
return $data[''];
/**
* @param Generator $generator
* @return array
*/
private static function parser(Generator $generator)
$data = ;
$step = self::STEP_NAME;
$name = '';
$value = '';
while ($generator->valid())
$i = $generator->current();
switch ($i)
case ' ':
case "n":
continue;
case '':
$generator->next();
$value = self::parser($generator);
$data[$name] = $value;
$step = self::STEP_NAME;
$name = '';
$value = '';
break;
case '':
if ($name)
$data[$name] = $value;
return $data;
case ',':
if ($name)
$data[$name] = $value;
$step = self::STEP_NAME;
$name = '';
$value = '';
break;
case ':':
$step = self::STEP_VALUE;
break;
default:
if ($step === self::STEP_NAME)
$name .= $i;
else
$value .= $i;
$generator->next();
return $data;
/**
* @param string $str
* @return Generator
*/
private static function stringIterator($str)
for ($i = 0; $i < strlen($str); $i++)
yield $str[$i];
And here's the usage:
>>> $result = FooBarNotJsonParser::parseNotJSON("foo:qux:1,quux:0, bar:");
=> [
"foo" => [
"qux" => "1",
"quux" => "0",
],
"bar" => ,
]
How could I improve this? I know it really lacks error handling. I don't mind that the numbers stay as strings. Also the format never goes more than 2 levels deep. Whitespace between tokens is non important but there shouldn't be whitespace inside the keys (e.g.: foo bar: baz should be an error).
Also how should I encode it back from an array to a string? I was thinking just using json_encode and then removing the quote characters.
php parsing
asked May 15 at 5:58
solarc
1062
1062
add a comment |Â
add a comment |Â
1 Answer
1
active
oldest
votes
up vote
2
down vote
The following workaround will wrap your keys and values with double quotes. Such hacks will always be vulnerable to edge cases. To avoid sprinting down a rabbit hole of possibilities (I can think of a few cases off the top of my head -- 1. keys/values already containing quotes 2. declared empty/null keys ...there will be more), I'll just provide a solution for your sample input.
For now, I'm using w to ensure that the space before bar is not included. There are several ways to do this action, but I would need to have intimate knowledge of your project data to develop the expression that I feel is "best / most robust".
If you discover any fringe cases that break this simple regex pattern, please update your question and I can create a patch.
Code: (Demo)
$unquoted_json = <<<NOTJSON
foo:qux:1,quux:0, bar:
NOTJSON;
$quoted_json = preg_replace('~w[^:,]*~', '"$0"', $unquoted_json);
$array = json_decode($quoted_json, true);
var_export($array);
echo "n---n";
echo json_encode($array);
Output:
array (
'foo' =>
array (
'qux' => '1',
'quux' => '0',
),
'bar' =>
array (
),
)
---
"foo":"qux":"1","quux":"0","bar":
That's an interesting take on it. I'll try it and see if it is better than the other approach. Thanks.
â solarc
Jun 3 at 21:06
The difficult thing about fabricated sample input is that we don't have a lot of certainty that the fabricated data is a true indicator of the quality of characters/data that can potentially exist in your actual project. Does your project data only use letters and numbers as key/value substrings? Might you have floats? indexed arrays? quotes? Might your keys/values contain one of the delimiting characters (:,)? Please improve your question by further clarifying the range of known/expected formats that the data may have. I want to see you find resolution. Some feedback would help me.
â mickmackusa
Jun 6 at 22:13
add a comment |Â
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
2
down vote
The following workaround will wrap your keys and values with double quotes. Such hacks will always be vulnerable to edge cases. To avoid sprinting down a rabbit hole of possibilities (I can think of a few cases off the top of my head -- 1. keys/values already containing quotes 2. declared empty/null keys ...there will be more), I'll just provide a solution for your sample input.
For now, I'm using w to ensure that the space before bar is not included. There are several ways to do this action, but I would need to have intimate knowledge of your project data to develop the expression that I feel is "best / most robust".
If you discover any fringe cases that break this simple regex pattern, please update your question and I can create a patch.
Code: (Demo)
$unquoted_json = <<<NOTJSON
foo:qux:1,quux:0, bar:
NOTJSON;
$quoted_json = preg_replace('~w[^:,]*~', '"$0"', $unquoted_json);
$array = json_decode($quoted_json, true);
var_export($array);
echo "n---n";
echo json_encode($array);
Output:
array (
'foo' =>
array (
'qux' => '1',
'quux' => '0',
),
'bar' =>
array (
),
)
---
"foo":"qux":"1","quux":"0","bar":
That's an interesting take on it. I'll try it and see if it is better than the other approach. Thanks.
â solarc
Jun 3 at 21:06
The difficult thing about fabricated sample input is that we don't have a lot of certainty that the fabricated data is a true indicator of the quality of characters/data that can potentially exist in your actual project. Does your project data only use letters and numbers as key/value substrings? Might you have floats? indexed arrays? quotes? Might your keys/values contain one of the delimiting characters (:,)? Please improve your question by further clarifying the range of known/expected formats that the data may have. I want to see you find resolution. Some feedback would help me.
â mickmackusa
Jun 6 at 22:13
add a comment |Â
up vote
2
down vote
The following workaround will wrap your keys and values with double quotes. Such hacks will always be vulnerable to edge cases. To avoid sprinting down a rabbit hole of possibilities (I can think of a few cases off the top of my head -- 1. keys/values already containing quotes 2. declared empty/null keys ...there will be more), I'll just provide a solution for your sample input.
For now, I'm using w to ensure that the space before bar is not included. There are several ways to do this action, but I would need to have intimate knowledge of your project data to develop the expression that I feel is "best / most robust".
If you discover any fringe cases that break this simple regex pattern, please update your question and I can create a patch.
Code: (Demo)
$unquoted_json = <<<NOTJSON
foo:qux:1,quux:0, bar:
NOTJSON;
$quoted_json = preg_replace('~w[^:,]*~', '"$0"', $unquoted_json);
$array = json_decode($quoted_json, true);
var_export($array);
echo "n---n";
echo json_encode($array);
Output:
array (
'foo' =>
array (
'qux' => '1',
'quux' => '0',
),
'bar' =>
array (
),
)
---
"foo":"qux":"1","quux":"0","bar":
That's an interesting take on it. I'll try it and see if it is better than the other approach. Thanks.
â solarc
Jun 3 at 21:06
The difficult thing about fabricated sample input is that we don't have a lot of certainty that the fabricated data is a true indicator of the quality of characters/data that can potentially exist in your actual project. Does your project data only use letters and numbers as key/value substrings? Might you have floats? indexed arrays? quotes? Might your keys/values contain one of the delimiting characters (:,)? Please improve your question by further clarifying the range of known/expected formats that the data may have. I want to see you find resolution. Some feedback would help me.
â mickmackusa
Jun 6 at 22:13
add a comment |Â
up vote
2
down vote
up vote
2
down vote
The following workaround will wrap your keys and values with double quotes. Such hacks will always be vulnerable to edge cases. To avoid sprinting down a rabbit hole of possibilities (I can think of a few cases off the top of my head -- 1. keys/values already containing quotes 2. declared empty/null keys ...there will be more), I'll just provide a solution for your sample input.
For now, I'm using w to ensure that the space before bar is not included. There are several ways to do this action, but I would need to have intimate knowledge of your project data to develop the expression that I feel is "best / most robust".
If you discover any fringe cases that break this simple regex pattern, please update your question and I can create a patch.
Code: (Demo)
$unquoted_json = <<<NOTJSON
foo:qux:1,quux:0, bar:
NOTJSON;
$quoted_json = preg_replace('~w[^:,]*~', '"$0"', $unquoted_json);
$array = json_decode($quoted_json, true);
var_export($array);
echo "n---n";
echo json_encode($array);
Output:
array (
'foo' =>
array (
'qux' => '1',
'quux' => '0',
),
'bar' =>
array (
),
)
---
"foo":"qux":"1","quux":"0","bar":
The following workaround will wrap your keys and values with double quotes. Such hacks will always be vulnerable to edge cases. To avoid sprinting down a rabbit hole of possibilities (I can think of a few cases off the top of my head -- 1. keys/values already containing quotes 2. declared empty/null keys ...there will be more), I'll just provide a solution for your sample input.
For now, I'm using w to ensure that the space before bar is not included. There are several ways to do this action, but I would need to have intimate knowledge of your project data to develop the expression that I feel is "best / most robust".
If you discover any fringe cases that break this simple regex pattern, please update your question and I can create a patch.
Code: (Demo)
$unquoted_json = <<<NOTJSON
foo:qux:1,quux:0, bar:
NOTJSON;
$quoted_json = preg_replace('~w[^:,]*~', '"$0"', $unquoted_json);
$array = json_decode($quoted_json, true);
var_export($array);
echo "n---n";
echo json_encode($array);
Output:
array (
'foo' =>
array (
'qux' => '1',
'quux' => '0',
),
'bar' =>
array (
),
)
---
"foo":"qux":"1","quux":"0","bar":
edited May 31 at 23:26
answered May 31 at 23:15
mickmackusa
790112
790112
That's an interesting take on it. I'll try it and see if it is better than the other approach. Thanks.
â solarc
Jun 3 at 21:06
The difficult thing about fabricated sample input is that we don't have a lot of certainty that the fabricated data is a true indicator of the quality of characters/data that can potentially exist in your actual project. Does your project data only use letters and numbers as key/value substrings? Might you have floats? indexed arrays? quotes? Might your keys/values contain one of the delimiting characters (:,)? Please improve your question by further clarifying the range of known/expected formats that the data may have. I want to see you find resolution. Some feedback would help me.
â mickmackusa
Jun 6 at 22:13
add a comment |Â
That's an interesting take on it. I'll try it and see if it is better than the other approach. Thanks.
â solarc
Jun 3 at 21:06
The difficult thing about fabricated sample input is that we don't have a lot of certainty that the fabricated data is a true indicator of the quality of characters/data that can potentially exist in your actual project. Does your project data only use letters and numbers as key/value substrings? Might you have floats? indexed arrays? quotes? Might your keys/values contain one of the delimiting characters (:,)? Please improve your question by further clarifying the range of known/expected formats that the data may have. I want to see you find resolution. Some feedback would help me.
â mickmackusa
Jun 6 at 22:13
That's an interesting take on it. I'll try it and see if it is better than the other approach. Thanks.
â solarc
Jun 3 at 21:06
That's an interesting take on it. I'll try it and see if it is better than the other approach. Thanks.
â solarc
Jun 3 at 21:06
The difficult thing about fabricated sample input is that we don't have a lot of certainty that the fabricated data is a true indicator of the quality of characters/data that can potentially exist in your actual project. Does your project data only use letters and numbers as key/value substrings? Might you have floats? indexed arrays? quotes? Might your keys/values contain one of the delimiting characters (
:,)? Please improve your question by further clarifying the range of known/expected formats that the data may have. I want to see you find resolution. Some feedback would help me.â mickmackusa
Jun 6 at 22:13
The difficult thing about fabricated sample input is that we don't have a lot of certainty that the fabricated data is a true indicator of the quality of characters/data that can potentially exist in your actual project. Does your project data only use letters and numbers as key/value substrings? Might you have floats? indexed arrays? quotes? Might your keys/values contain one of the delimiting characters (
:,)? Please improve your question by further clarifying the range of known/expected formats that the data may have. I want to see you find resolution. Some feedback would help me.â mickmackusa
Jun 6 at 22:13
add a comment |Â
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f194427%2fparser-for-json-like-format-in-php%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password