Design database for OHLC
Clash Royale CLAN TAG#URR8PPP
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;
up vote
1
down vote
favorite
Requirements:
I have to design a database to store tick data of 350 stocks. Inside this database queries will be executed to transform tick data and retrieve OHLC candles of 1 minute, 1 hour, 1 day.
There are on average 15 tick rows inserted each second for every stock and each table is initially populated with an average of 6 million tick data (from a minimum of 100k to a peak of 50 million).
There are no delete queries to be supported efficiently.
SELECT
queries need to filter the time of the candles, but I would like to support huge bulks of 600k rows in case a whole year of 1 minute candles is requested.
Initial target is to support at least 5 concurrent queries on the same stock or 20 concurrent queries in total.
Implementation:
The implementation I have chosen is to use one PostgreSQL (10) database and store tick data in tables split by stock. Index on trade_id and trade_time.
In order to transform tick data in OHLC candles, I thought to use MATERIALIZED VIEW
for each type of candle (1m, 1h, 1d) refreshed concurrently every 24 hours in sequence for all the tables. The reason is to avoid the database to compute every time the transformation from tick data to candles, but to be able to retrieve the candles during the refresh. The views have index on date.
-- Tick data <pair>
CREATE TABLE IF NOT EXISTS :title (
"trade_time" bigint NOT NULL,
"trade_id" bigint NOT NULL,
"first_trade_id" bigint NOT NULL,
"last_trade_id" bigint NOT NULL,
"price" TEXT NOT NULL,
"quantity" TEXT NOT NULL,
"is_buyer_mm" BOOLEAN NOT NULL,
"is_best_match" BOOLEAN NOT NULL
) WITH (
OIDS=FALSE
);
CREATE INDEX IF NOT EXISTS trade_idx ON :title (trade_id);
CREATE INDEX IF NOT EXISTS trade_timex ON :title (trade_time);
-- Tick data materialized view <pair>_tick (last day trades)
CREATE MATERIALIZED VIEW IF NOT EXISTS :tick AS
SELECT trade_id,
price::numeric,
quantity::numeric,
TIMESTAMP WITH TIME ZONE 'epoch' + trade_time * INTERVAL '1 ms' AS trade_time,
first_trade_id,
last_trade_id,
is_buyer_mm,
is_best_match
FROM :title
WHERE TIMESTAMP WITH TIME ZONE 'epoch' + trade_time * INTERVAL '1 ms' < CURRENT_DATE
ORDER BY trade_time;
CREATE INDEX IF NOT EXISTS trade_idx ON :tick (trade_id);
CREATE INDEX IF NOT EXISTS trade_timex ON :tick (trade_time);
-- 1m materialized view <pair>_1m
CREATE MATERIALIZED VIEW IF NOT EXISTS :minute AS
SELECT
date_trunc('minute', a.trade_time) dt,
(array_agg(a.price ORDER BY trade_time ASC))[1] o,
MAX(a.price) h,
MIN(a.price) l,
(array_agg(a.price ORDER BY trade_time DESC))[1] c,
SUM(a.quantity) vol,
count(*) ticks
FROM :tick AS a
GROUP BY date_trunc('minute', a.trade_time);
CREATE INDEX IF NOT EXISTS dtx ON :minute (dt);
-- 1h materialized view <pair>_1h
CREATE MATERIALIZED VIEW IF NOT EXISTS :hour AS
SELECT
date_trunc('hour', a.trade_time) dt,
(array_agg(a.price ORDER BY trade_time ASC))[1] o,
MAX(a.price) h,
MIN(a.price) l,
(array_agg(a.price ORDER BY trade_time DESC))[1] c,
SUM(a.quantity) vol,
count(*) ticks
FROM :tick AS a
GROUP BY date_trunc('hour', a.trade_time);
CREATE INDEX IF NOT EXISTS dtx ON :hour (dt);
-- 1d materialized view <pair>_1d
CREATE MATERIALIZED VIEW IF NOT EXISTS :day AS
SELECT
date_trunc('day', a.trade_time) dt,
(array_agg(a.price ORDER BY trade_time ASC))[1] o,
MAX(a.price) h,
MIN(a.price) l,
(array_agg(a.price ORDER BY trade_time DESC))[1] c,
SUM(a.quantity) vol,
count(*) ticks
FROM :tick AS a
GROUP BY date_trunc('day', a.trade_time);
CREATE INDEX IF NOT EXISTS dtx ON :day (dt);
Problems:
- For each stock I add, the sizes of the views are more or less the same as the size of the table itself
- 50 million rows make use of 4.5GB
Honestly, I still have to benchmark the throughput of my design, but I feel that there could be better implementations.
Ideal:
Looking at various similar questions about OHLC series I found these ideas:
- Switch to ZFS to reduce size of database
- Change database (NoSQL, InfluxDB, BerkeleyDB)
- Use redis to cache queries (Integration with PostgreSQL is not very straightforward)
sql postgresql
add a comment |Â
up vote
1
down vote
favorite
Requirements:
I have to design a database to store tick data of 350 stocks. Inside this database queries will be executed to transform tick data and retrieve OHLC candles of 1 minute, 1 hour, 1 day.
There are on average 15 tick rows inserted each second for every stock and each table is initially populated with an average of 6 million tick data (from a minimum of 100k to a peak of 50 million).
There are no delete queries to be supported efficiently.
SELECT
queries need to filter the time of the candles, but I would like to support huge bulks of 600k rows in case a whole year of 1 minute candles is requested.
Initial target is to support at least 5 concurrent queries on the same stock or 20 concurrent queries in total.
Implementation:
The implementation I have chosen is to use one PostgreSQL (10) database and store tick data in tables split by stock. Index on trade_id and trade_time.
In order to transform tick data in OHLC candles, I thought to use MATERIALIZED VIEW
for each type of candle (1m, 1h, 1d) refreshed concurrently every 24 hours in sequence for all the tables. The reason is to avoid the database to compute every time the transformation from tick data to candles, but to be able to retrieve the candles during the refresh. The views have index on date.
-- Tick data <pair>
CREATE TABLE IF NOT EXISTS :title (
"trade_time" bigint NOT NULL,
"trade_id" bigint NOT NULL,
"first_trade_id" bigint NOT NULL,
"last_trade_id" bigint NOT NULL,
"price" TEXT NOT NULL,
"quantity" TEXT NOT NULL,
"is_buyer_mm" BOOLEAN NOT NULL,
"is_best_match" BOOLEAN NOT NULL
) WITH (
OIDS=FALSE
);
CREATE INDEX IF NOT EXISTS trade_idx ON :title (trade_id);
CREATE INDEX IF NOT EXISTS trade_timex ON :title (trade_time);
-- Tick data materialized view <pair>_tick (last day trades)
CREATE MATERIALIZED VIEW IF NOT EXISTS :tick AS
SELECT trade_id,
price::numeric,
quantity::numeric,
TIMESTAMP WITH TIME ZONE 'epoch' + trade_time * INTERVAL '1 ms' AS trade_time,
first_trade_id,
last_trade_id,
is_buyer_mm,
is_best_match
FROM :title
WHERE TIMESTAMP WITH TIME ZONE 'epoch' + trade_time * INTERVAL '1 ms' < CURRENT_DATE
ORDER BY trade_time;
CREATE INDEX IF NOT EXISTS trade_idx ON :tick (trade_id);
CREATE INDEX IF NOT EXISTS trade_timex ON :tick (trade_time);
-- 1m materialized view <pair>_1m
CREATE MATERIALIZED VIEW IF NOT EXISTS :minute AS
SELECT
date_trunc('minute', a.trade_time) dt,
(array_agg(a.price ORDER BY trade_time ASC))[1] o,
MAX(a.price) h,
MIN(a.price) l,
(array_agg(a.price ORDER BY trade_time DESC))[1] c,
SUM(a.quantity) vol,
count(*) ticks
FROM :tick AS a
GROUP BY date_trunc('minute', a.trade_time);
CREATE INDEX IF NOT EXISTS dtx ON :minute (dt);
-- 1h materialized view <pair>_1h
CREATE MATERIALIZED VIEW IF NOT EXISTS :hour AS
SELECT
date_trunc('hour', a.trade_time) dt,
(array_agg(a.price ORDER BY trade_time ASC))[1] o,
MAX(a.price) h,
MIN(a.price) l,
(array_agg(a.price ORDER BY trade_time DESC))[1] c,
SUM(a.quantity) vol,
count(*) ticks
FROM :tick AS a
GROUP BY date_trunc('hour', a.trade_time);
CREATE INDEX IF NOT EXISTS dtx ON :hour (dt);
-- 1d materialized view <pair>_1d
CREATE MATERIALIZED VIEW IF NOT EXISTS :day AS
SELECT
date_trunc('day', a.trade_time) dt,
(array_agg(a.price ORDER BY trade_time ASC))[1] o,
MAX(a.price) h,
MIN(a.price) l,
(array_agg(a.price ORDER BY trade_time DESC))[1] c,
SUM(a.quantity) vol,
count(*) ticks
FROM :tick AS a
GROUP BY date_trunc('day', a.trade_time);
CREATE INDEX IF NOT EXISTS dtx ON :day (dt);
Problems:
- For each stock I add, the sizes of the views are more or less the same as the size of the table itself
- 50 million rows make use of 4.5GB
Honestly, I still have to benchmark the throughput of my design, but I feel that there could be better implementations.
Ideal:
Looking at various similar questions about OHLC series I found these ideas:
- Switch to ZFS to reduce size of database
- Change database (NoSQL, InfluxDB, BerkeleyDB)
- Use redis to cache queries (Integration with PostgreSQL is not very straightforward)
sql postgresql
why dont you use SQLite? a) sqlite is faster than postgres if you are planning to do anything in bulk b) multiple reads in parallel are easily accomplished c) you can use redis as an intermediate store to store data from live streams and run a cron job every minute that adds to sqlite
â PirateApp
Jun 26 at 13:32
add a comment |Â
up vote
1
down vote
favorite
up vote
1
down vote
favorite
Requirements:
I have to design a database to store tick data of 350 stocks. Inside this database queries will be executed to transform tick data and retrieve OHLC candles of 1 minute, 1 hour, 1 day.
There are on average 15 tick rows inserted each second for every stock and each table is initially populated with an average of 6 million tick data (from a minimum of 100k to a peak of 50 million).
There are no delete queries to be supported efficiently.
SELECT
queries need to filter the time of the candles, but I would like to support huge bulks of 600k rows in case a whole year of 1 minute candles is requested.
Initial target is to support at least 5 concurrent queries on the same stock or 20 concurrent queries in total.
Implementation:
The implementation I have chosen is to use one PostgreSQL (10) database and store tick data in tables split by stock. Index on trade_id and trade_time.
In order to transform tick data in OHLC candles, I thought to use MATERIALIZED VIEW
for each type of candle (1m, 1h, 1d) refreshed concurrently every 24 hours in sequence for all the tables. The reason is to avoid the database to compute every time the transformation from tick data to candles, but to be able to retrieve the candles during the refresh. The views have index on date.
-- Tick data <pair>
CREATE TABLE IF NOT EXISTS :title (
"trade_time" bigint NOT NULL,
"trade_id" bigint NOT NULL,
"first_trade_id" bigint NOT NULL,
"last_trade_id" bigint NOT NULL,
"price" TEXT NOT NULL,
"quantity" TEXT NOT NULL,
"is_buyer_mm" BOOLEAN NOT NULL,
"is_best_match" BOOLEAN NOT NULL
) WITH (
OIDS=FALSE
);
CREATE INDEX IF NOT EXISTS trade_idx ON :title (trade_id);
CREATE INDEX IF NOT EXISTS trade_timex ON :title (trade_time);
-- Tick data materialized view <pair>_tick (last day trades)
CREATE MATERIALIZED VIEW IF NOT EXISTS :tick AS
SELECT trade_id,
price::numeric,
quantity::numeric,
TIMESTAMP WITH TIME ZONE 'epoch' + trade_time * INTERVAL '1 ms' AS trade_time,
first_trade_id,
last_trade_id,
is_buyer_mm,
is_best_match
FROM :title
WHERE TIMESTAMP WITH TIME ZONE 'epoch' + trade_time * INTERVAL '1 ms' < CURRENT_DATE
ORDER BY trade_time;
CREATE INDEX IF NOT EXISTS trade_idx ON :tick (trade_id);
CREATE INDEX IF NOT EXISTS trade_timex ON :tick (trade_time);
-- 1m materialized view <pair>_1m
CREATE MATERIALIZED VIEW IF NOT EXISTS :minute AS
SELECT
date_trunc('minute', a.trade_time) dt,
(array_agg(a.price ORDER BY trade_time ASC))[1] o,
MAX(a.price) h,
MIN(a.price) l,
(array_agg(a.price ORDER BY trade_time DESC))[1] c,
SUM(a.quantity) vol,
count(*) ticks
FROM :tick AS a
GROUP BY date_trunc('minute', a.trade_time);
CREATE INDEX IF NOT EXISTS dtx ON :minute (dt);
-- 1h materialized view <pair>_1h
CREATE MATERIALIZED VIEW IF NOT EXISTS :hour AS
SELECT
date_trunc('hour', a.trade_time) dt,
(array_agg(a.price ORDER BY trade_time ASC))[1] o,
MAX(a.price) h,
MIN(a.price) l,
(array_agg(a.price ORDER BY trade_time DESC))[1] c,
SUM(a.quantity) vol,
count(*) ticks
FROM :tick AS a
GROUP BY date_trunc('hour', a.trade_time);
CREATE INDEX IF NOT EXISTS dtx ON :hour (dt);
-- 1d materialized view <pair>_1d
CREATE MATERIALIZED VIEW IF NOT EXISTS :day AS
SELECT
date_trunc('day', a.trade_time) dt,
(array_agg(a.price ORDER BY trade_time ASC))[1] o,
MAX(a.price) h,
MIN(a.price) l,
(array_agg(a.price ORDER BY trade_time DESC))[1] c,
SUM(a.quantity) vol,
count(*) ticks
FROM :tick AS a
GROUP BY date_trunc('day', a.trade_time);
CREATE INDEX IF NOT EXISTS dtx ON :day (dt);
Problems:
- For each stock I add, the sizes of the views are more or less the same as the size of the table itself
- 50 million rows make use of 4.5GB
Honestly, I still have to benchmark the throughput of my design, but I feel that there could be better implementations.
Ideal:
Looking at various similar questions about OHLC series I found these ideas:
- Switch to ZFS to reduce size of database
- Change database (NoSQL, InfluxDB, BerkeleyDB)
- Use redis to cache queries (Integration with PostgreSQL is not very straightforward)
sql postgresql
Requirements:
I have to design a database to store tick data of 350 stocks. Inside this database queries will be executed to transform tick data and retrieve OHLC candles of 1 minute, 1 hour, 1 day.
There are on average 15 tick rows inserted each second for every stock and each table is initially populated with an average of 6 million tick data (from a minimum of 100k to a peak of 50 million).
There are no delete queries to be supported efficiently.
SELECT
queries need to filter the time of the candles, but I would like to support huge bulks of 600k rows in case a whole year of 1 minute candles is requested.
Initial target is to support at least 5 concurrent queries on the same stock or 20 concurrent queries in total.
Implementation:
The implementation I have chosen is to use one PostgreSQL (10) database and store tick data in tables split by stock. Index on trade_id and trade_time.
In order to transform tick data in OHLC candles, I thought to use MATERIALIZED VIEW
for each type of candle (1m, 1h, 1d) refreshed concurrently every 24 hours in sequence for all the tables. The reason is to avoid the database to compute every time the transformation from tick data to candles, but to be able to retrieve the candles during the refresh. The views have index on date.
-- Tick data <pair>
CREATE TABLE IF NOT EXISTS :title (
"trade_time" bigint NOT NULL,
"trade_id" bigint NOT NULL,
"first_trade_id" bigint NOT NULL,
"last_trade_id" bigint NOT NULL,
"price" TEXT NOT NULL,
"quantity" TEXT NOT NULL,
"is_buyer_mm" BOOLEAN NOT NULL,
"is_best_match" BOOLEAN NOT NULL
) WITH (
OIDS=FALSE
);
CREATE INDEX IF NOT EXISTS trade_idx ON :title (trade_id);
CREATE INDEX IF NOT EXISTS trade_timex ON :title (trade_time);
-- Tick data materialized view <pair>_tick (last day trades)
CREATE MATERIALIZED VIEW IF NOT EXISTS :tick AS
SELECT trade_id,
price::numeric,
quantity::numeric,
TIMESTAMP WITH TIME ZONE 'epoch' + trade_time * INTERVAL '1 ms' AS trade_time,
first_trade_id,
last_trade_id,
is_buyer_mm,
is_best_match
FROM :title
WHERE TIMESTAMP WITH TIME ZONE 'epoch' + trade_time * INTERVAL '1 ms' < CURRENT_DATE
ORDER BY trade_time;
CREATE INDEX IF NOT EXISTS trade_idx ON :tick (trade_id);
CREATE INDEX IF NOT EXISTS trade_timex ON :tick (trade_time);
-- 1m materialized view <pair>_1m
CREATE MATERIALIZED VIEW IF NOT EXISTS :minute AS
SELECT
date_trunc('minute', a.trade_time) dt,
(array_agg(a.price ORDER BY trade_time ASC))[1] o,
MAX(a.price) h,
MIN(a.price) l,
(array_agg(a.price ORDER BY trade_time DESC))[1] c,
SUM(a.quantity) vol,
count(*) ticks
FROM :tick AS a
GROUP BY date_trunc('minute', a.trade_time);
CREATE INDEX IF NOT EXISTS dtx ON :minute (dt);
-- 1h materialized view <pair>_1h
CREATE MATERIALIZED VIEW IF NOT EXISTS :hour AS
SELECT
date_trunc('hour', a.trade_time) dt,
(array_agg(a.price ORDER BY trade_time ASC))[1] o,
MAX(a.price) h,
MIN(a.price) l,
(array_agg(a.price ORDER BY trade_time DESC))[1] c,
SUM(a.quantity) vol,
count(*) ticks
FROM :tick AS a
GROUP BY date_trunc('hour', a.trade_time);
CREATE INDEX IF NOT EXISTS dtx ON :hour (dt);
-- 1d materialized view <pair>_1d
CREATE MATERIALIZED VIEW IF NOT EXISTS :day AS
SELECT
date_trunc('day', a.trade_time) dt,
(array_agg(a.price ORDER BY trade_time ASC))[1] o,
MAX(a.price) h,
MIN(a.price) l,
(array_agg(a.price ORDER BY trade_time DESC))[1] c,
SUM(a.quantity) vol,
count(*) ticks
FROM :tick AS a
GROUP BY date_trunc('day', a.trade_time);
CREATE INDEX IF NOT EXISTS dtx ON :day (dt);
Problems:
- For each stock I add, the sizes of the views are more or less the same as the size of the table itself
- 50 million rows make use of 4.5GB
Honestly, I still have to benchmark the throughput of my design, but I feel that there could be better implementations.
Ideal:
Looking at various similar questions about OHLC series I found these ideas:
- Switch to ZFS to reduce size of database
- Change database (NoSQL, InfluxDB, BerkeleyDB)
- Use redis to cache queries (Integration with PostgreSQL is not very straightforward)
sql postgresql
edited Jun 11 at 23:37
Jamalâ¦
30.1k11114225
30.1k11114225
asked Jun 7 at 9:56
Afe
63
63
why dont you use SQLite? a) sqlite is faster than postgres if you are planning to do anything in bulk b) multiple reads in parallel are easily accomplished c) you can use redis as an intermediate store to store data from live streams and run a cron job every minute that adds to sqlite
â PirateApp
Jun 26 at 13:32
add a comment |Â
why dont you use SQLite? a) sqlite is faster than postgres if you are planning to do anything in bulk b) multiple reads in parallel are easily accomplished c) you can use redis as an intermediate store to store data from live streams and run a cron job every minute that adds to sqlite
â PirateApp
Jun 26 at 13:32
why dont you use SQLite? a) sqlite is faster than postgres if you are planning to do anything in bulk b) multiple reads in parallel are easily accomplished c) you can use redis as an intermediate store to store data from live streams and run a cron job every minute that adds to sqlite
â PirateApp
Jun 26 at 13:32
why dont you use SQLite? a) sqlite is faster than postgres if you are planning to do anything in bulk b) multiple reads in parallel are easily accomplished c) you can use redis as an intermediate store to store data from live streams and run a cron job every minute that adds to sqlite
â PirateApp
Jun 26 at 13:32
add a comment |Â
active
oldest
votes
active
oldest
votes
active
oldest
votes
active
oldest
votes
active
oldest
votes
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f196015%2fdesign-database-for-ohlc%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
why dont you use SQLite? a) sqlite is faster than postgres if you are planning to do anything in bulk b) multiple reads in parallel are easily accomplished c) you can use redis as an intermediate store to store data from live streams and run a cron job every minute that adds to sqlite
â PirateApp
Jun 26 at 13:32