Design database for OHLC

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP





.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;







up vote
1
down vote

favorite
1












Requirements:



I have to design a database to store tick data of 350 stocks. Inside this database queries will be executed to transform tick data and retrieve OHLC candles of 1 minute, 1 hour, 1 day.



There are on average 15 tick rows inserted each second for every stock and each table is initially populated with an average of 6 million tick data (from a minimum of 100k to a peak of 50 million).



There are no delete queries to be supported efficiently.



SELECT queries need to filter the time of the candles, but I would like to support huge bulks of 600k rows in case a whole year of 1 minute candles is requested.



Initial target is to support at least 5 concurrent queries on the same stock or 20 concurrent queries in total.



Implementation:



The implementation I have chosen is to use one PostgreSQL (10) database and store tick data in tables split by stock. Index on trade_id and trade_time.



In order to transform tick data in OHLC candles, I thought to use MATERIALIZED VIEW for each type of candle (1m, 1h, 1d) refreshed concurrently every 24 hours in sequence for all the tables. The reason is to avoid the database to compute every time the transformation from tick data to candles, but to be able to retrieve the candles during the refresh. The views have index on date.



-- Tick data <pair>
CREATE TABLE IF NOT EXISTS :title (
"trade_time" bigint NOT NULL,
"trade_id" bigint NOT NULL,
"first_trade_id" bigint NOT NULL,
"last_trade_id" bigint NOT NULL,
"price" TEXT NOT NULL,
"quantity" TEXT NOT NULL,
"is_buyer_mm" BOOLEAN NOT NULL,
"is_best_match" BOOLEAN NOT NULL
) WITH (
OIDS=FALSE
);

CREATE INDEX IF NOT EXISTS trade_idx ON :title (trade_id);
CREATE INDEX IF NOT EXISTS trade_timex ON :title (trade_time);

-- Tick data materialized view <pair>_tick (last day trades)
CREATE MATERIALIZED VIEW IF NOT EXISTS :tick AS
SELECT trade_id,
price::numeric,
quantity::numeric,
TIMESTAMP WITH TIME ZONE 'epoch' + trade_time * INTERVAL '1 ms' AS trade_time,
first_trade_id,
last_trade_id,
is_buyer_mm,
is_best_match
FROM :title
WHERE TIMESTAMP WITH TIME ZONE 'epoch' + trade_time * INTERVAL '1 ms' < CURRENT_DATE
ORDER BY trade_time;

CREATE INDEX IF NOT EXISTS trade_idx ON :tick (trade_id);
CREATE INDEX IF NOT EXISTS trade_timex ON :tick (trade_time);

-- 1m materialized view <pair>_1m
CREATE MATERIALIZED VIEW IF NOT EXISTS :minute AS
SELECT
date_trunc('minute', a.trade_time) dt,
(array_agg(a.price ORDER BY trade_time ASC))[1] o,
MAX(a.price) h,
MIN(a.price) l,
(array_agg(a.price ORDER BY trade_time DESC))[1] c,
SUM(a.quantity) vol,
count(*) ticks
FROM :tick AS a
GROUP BY date_trunc('minute', a.trade_time);

CREATE INDEX IF NOT EXISTS dtx ON :minute (dt);

-- 1h materialized view <pair>_1h
CREATE MATERIALIZED VIEW IF NOT EXISTS :hour AS
SELECT
date_trunc('hour', a.trade_time) dt,
(array_agg(a.price ORDER BY trade_time ASC))[1] o,
MAX(a.price) h,
MIN(a.price) l,
(array_agg(a.price ORDER BY trade_time DESC))[1] c,
SUM(a.quantity) vol,
count(*) ticks
FROM :tick AS a
GROUP BY date_trunc('hour', a.trade_time);

CREATE INDEX IF NOT EXISTS dtx ON :hour (dt);

-- 1d materialized view <pair>_1d
CREATE MATERIALIZED VIEW IF NOT EXISTS :day AS
SELECT
date_trunc('day', a.trade_time) dt,
(array_agg(a.price ORDER BY trade_time ASC))[1] o,
MAX(a.price) h,
MIN(a.price) l,
(array_agg(a.price ORDER BY trade_time DESC))[1] c,
SUM(a.quantity) vol,
count(*) ticks
FROM :tick AS a
GROUP BY date_trunc('day', a.trade_time);

CREATE INDEX IF NOT EXISTS dtx ON :day (dt);


Problems:



  • For each stock I add, the sizes of the views are more or less the same as the size of the table itself

  • 50 million rows make use of 4.5GB

Honestly, I still have to benchmark the throughput of my design, but I feel that there could be better implementations.



Ideal:



Looking at various similar questions about OHLC series I found these ideas:



  • Switch to ZFS to reduce size of database

  • Change database (NoSQL, InfluxDB, BerkeleyDB)

  • Use redis to cache queries (Integration with PostgreSQL is not very straightforward)






share|improve this question





















  • why dont you use SQLite? a) sqlite is faster than postgres if you are planning to do anything in bulk b) multiple reads in parallel are easily accomplished c) you can use redis as an intermediate store to store data from live streams and run a cron job every minute that adds to sqlite
    – PirateApp
    Jun 26 at 13:32

















up vote
1
down vote

favorite
1












Requirements:



I have to design a database to store tick data of 350 stocks. Inside this database queries will be executed to transform tick data and retrieve OHLC candles of 1 minute, 1 hour, 1 day.



There are on average 15 tick rows inserted each second for every stock and each table is initially populated with an average of 6 million tick data (from a minimum of 100k to a peak of 50 million).



There are no delete queries to be supported efficiently.



SELECT queries need to filter the time of the candles, but I would like to support huge bulks of 600k rows in case a whole year of 1 minute candles is requested.



Initial target is to support at least 5 concurrent queries on the same stock or 20 concurrent queries in total.



Implementation:



The implementation I have chosen is to use one PostgreSQL (10) database and store tick data in tables split by stock. Index on trade_id and trade_time.



In order to transform tick data in OHLC candles, I thought to use MATERIALIZED VIEW for each type of candle (1m, 1h, 1d) refreshed concurrently every 24 hours in sequence for all the tables. The reason is to avoid the database to compute every time the transformation from tick data to candles, but to be able to retrieve the candles during the refresh. The views have index on date.



-- Tick data <pair>
CREATE TABLE IF NOT EXISTS :title (
"trade_time" bigint NOT NULL,
"trade_id" bigint NOT NULL,
"first_trade_id" bigint NOT NULL,
"last_trade_id" bigint NOT NULL,
"price" TEXT NOT NULL,
"quantity" TEXT NOT NULL,
"is_buyer_mm" BOOLEAN NOT NULL,
"is_best_match" BOOLEAN NOT NULL
) WITH (
OIDS=FALSE
);

CREATE INDEX IF NOT EXISTS trade_idx ON :title (trade_id);
CREATE INDEX IF NOT EXISTS trade_timex ON :title (trade_time);

-- Tick data materialized view <pair>_tick (last day trades)
CREATE MATERIALIZED VIEW IF NOT EXISTS :tick AS
SELECT trade_id,
price::numeric,
quantity::numeric,
TIMESTAMP WITH TIME ZONE 'epoch' + trade_time * INTERVAL '1 ms' AS trade_time,
first_trade_id,
last_trade_id,
is_buyer_mm,
is_best_match
FROM :title
WHERE TIMESTAMP WITH TIME ZONE 'epoch' + trade_time * INTERVAL '1 ms' < CURRENT_DATE
ORDER BY trade_time;

CREATE INDEX IF NOT EXISTS trade_idx ON :tick (trade_id);
CREATE INDEX IF NOT EXISTS trade_timex ON :tick (trade_time);

-- 1m materialized view <pair>_1m
CREATE MATERIALIZED VIEW IF NOT EXISTS :minute AS
SELECT
date_trunc('minute', a.trade_time) dt,
(array_agg(a.price ORDER BY trade_time ASC))[1] o,
MAX(a.price) h,
MIN(a.price) l,
(array_agg(a.price ORDER BY trade_time DESC))[1] c,
SUM(a.quantity) vol,
count(*) ticks
FROM :tick AS a
GROUP BY date_trunc('minute', a.trade_time);

CREATE INDEX IF NOT EXISTS dtx ON :minute (dt);

-- 1h materialized view <pair>_1h
CREATE MATERIALIZED VIEW IF NOT EXISTS :hour AS
SELECT
date_trunc('hour', a.trade_time) dt,
(array_agg(a.price ORDER BY trade_time ASC))[1] o,
MAX(a.price) h,
MIN(a.price) l,
(array_agg(a.price ORDER BY trade_time DESC))[1] c,
SUM(a.quantity) vol,
count(*) ticks
FROM :tick AS a
GROUP BY date_trunc('hour', a.trade_time);

CREATE INDEX IF NOT EXISTS dtx ON :hour (dt);

-- 1d materialized view <pair>_1d
CREATE MATERIALIZED VIEW IF NOT EXISTS :day AS
SELECT
date_trunc('day', a.trade_time) dt,
(array_agg(a.price ORDER BY trade_time ASC))[1] o,
MAX(a.price) h,
MIN(a.price) l,
(array_agg(a.price ORDER BY trade_time DESC))[1] c,
SUM(a.quantity) vol,
count(*) ticks
FROM :tick AS a
GROUP BY date_trunc('day', a.trade_time);

CREATE INDEX IF NOT EXISTS dtx ON :day (dt);


Problems:



  • For each stock I add, the sizes of the views are more or less the same as the size of the table itself

  • 50 million rows make use of 4.5GB

Honestly, I still have to benchmark the throughput of my design, but I feel that there could be better implementations.



Ideal:



Looking at various similar questions about OHLC series I found these ideas:



  • Switch to ZFS to reduce size of database

  • Change database (NoSQL, InfluxDB, BerkeleyDB)

  • Use redis to cache queries (Integration with PostgreSQL is not very straightforward)






share|improve this question





















  • why dont you use SQLite? a) sqlite is faster than postgres if you are planning to do anything in bulk b) multiple reads in parallel are easily accomplished c) you can use redis as an intermediate store to store data from live streams and run a cron job every minute that adds to sqlite
    – PirateApp
    Jun 26 at 13:32













up vote
1
down vote

favorite
1









up vote
1
down vote

favorite
1






1





Requirements:



I have to design a database to store tick data of 350 stocks. Inside this database queries will be executed to transform tick data and retrieve OHLC candles of 1 minute, 1 hour, 1 day.



There are on average 15 tick rows inserted each second for every stock and each table is initially populated with an average of 6 million tick data (from a minimum of 100k to a peak of 50 million).



There are no delete queries to be supported efficiently.



SELECT queries need to filter the time of the candles, but I would like to support huge bulks of 600k rows in case a whole year of 1 minute candles is requested.



Initial target is to support at least 5 concurrent queries on the same stock or 20 concurrent queries in total.



Implementation:



The implementation I have chosen is to use one PostgreSQL (10) database and store tick data in tables split by stock. Index on trade_id and trade_time.



In order to transform tick data in OHLC candles, I thought to use MATERIALIZED VIEW for each type of candle (1m, 1h, 1d) refreshed concurrently every 24 hours in sequence for all the tables. The reason is to avoid the database to compute every time the transformation from tick data to candles, but to be able to retrieve the candles during the refresh. The views have index on date.



-- Tick data <pair>
CREATE TABLE IF NOT EXISTS :title (
"trade_time" bigint NOT NULL,
"trade_id" bigint NOT NULL,
"first_trade_id" bigint NOT NULL,
"last_trade_id" bigint NOT NULL,
"price" TEXT NOT NULL,
"quantity" TEXT NOT NULL,
"is_buyer_mm" BOOLEAN NOT NULL,
"is_best_match" BOOLEAN NOT NULL
) WITH (
OIDS=FALSE
);

CREATE INDEX IF NOT EXISTS trade_idx ON :title (trade_id);
CREATE INDEX IF NOT EXISTS trade_timex ON :title (trade_time);

-- Tick data materialized view <pair>_tick (last day trades)
CREATE MATERIALIZED VIEW IF NOT EXISTS :tick AS
SELECT trade_id,
price::numeric,
quantity::numeric,
TIMESTAMP WITH TIME ZONE 'epoch' + trade_time * INTERVAL '1 ms' AS trade_time,
first_trade_id,
last_trade_id,
is_buyer_mm,
is_best_match
FROM :title
WHERE TIMESTAMP WITH TIME ZONE 'epoch' + trade_time * INTERVAL '1 ms' < CURRENT_DATE
ORDER BY trade_time;

CREATE INDEX IF NOT EXISTS trade_idx ON :tick (trade_id);
CREATE INDEX IF NOT EXISTS trade_timex ON :tick (trade_time);

-- 1m materialized view <pair>_1m
CREATE MATERIALIZED VIEW IF NOT EXISTS :minute AS
SELECT
date_trunc('minute', a.trade_time) dt,
(array_agg(a.price ORDER BY trade_time ASC))[1] o,
MAX(a.price) h,
MIN(a.price) l,
(array_agg(a.price ORDER BY trade_time DESC))[1] c,
SUM(a.quantity) vol,
count(*) ticks
FROM :tick AS a
GROUP BY date_trunc('minute', a.trade_time);

CREATE INDEX IF NOT EXISTS dtx ON :minute (dt);

-- 1h materialized view <pair>_1h
CREATE MATERIALIZED VIEW IF NOT EXISTS :hour AS
SELECT
date_trunc('hour', a.trade_time) dt,
(array_agg(a.price ORDER BY trade_time ASC))[1] o,
MAX(a.price) h,
MIN(a.price) l,
(array_agg(a.price ORDER BY trade_time DESC))[1] c,
SUM(a.quantity) vol,
count(*) ticks
FROM :tick AS a
GROUP BY date_trunc('hour', a.trade_time);

CREATE INDEX IF NOT EXISTS dtx ON :hour (dt);

-- 1d materialized view <pair>_1d
CREATE MATERIALIZED VIEW IF NOT EXISTS :day AS
SELECT
date_trunc('day', a.trade_time) dt,
(array_agg(a.price ORDER BY trade_time ASC))[1] o,
MAX(a.price) h,
MIN(a.price) l,
(array_agg(a.price ORDER BY trade_time DESC))[1] c,
SUM(a.quantity) vol,
count(*) ticks
FROM :tick AS a
GROUP BY date_trunc('day', a.trade_time);

CREATE INDEX IF NOT EXISTS dtx ON :day (dt);


Problems:



  • For each stock I add, the sizes of the views are more or less the same as the size of the table itself

  • 50 million rows make use of 4.5GB

Honestly, I still have to benchmark the throughput of my design, but I feel that there could be better implementations.



Ideal:



Looking at various similar questions about OHLC series I found these ideas:



  • Switch to ZFS to reduce size of database

  • Change database (NoSQL, InfluxDB, BerkeleyDB)

  • Use redis to cache queries (Integration with PostgreSQL is not very straightforward)






share|improve this question













Requirements:



I have to design a database to store tick data of 350 stocks. Inside this database queries will be executed to transform tick data and retrieve OHLC candles of 1 minute, 1 hour, 1 day.



There are on average 15 tick rows inserted each second for every stock and each table is initially populated with an average of 6 million tick data (from a minimum of 100k to a peak of 50 million).



There are no delete queries to be supported efficiently.



SELECT queries need to filter the time of the candles, but I would like to support huge bulks of 600k rows in case a whole year of 1 minute candles is requested.



Initial target is to support at least 5 concurrent queries on the same stock or 20 concurrent queries in total.



Implementation:



The implementation I have chosen is to use one PostgreSQL (10) database and store tick data in tables split by stock. Index on trade_id and trade_time.



In order to transform tick data in OHLC candles, I thought to use MATERIALIZED VIEW for each type of candle (1m, 1h, 1d) refreshed concurrently every 24 hours in sequence for all the tables. The reason is to avoid the database to compute every time the transformation from tick data to candles, but to be able to retrieve the candles during the refresh. The views have index on date.



-- Tick data <pair>
CREATE TABLE IF NOT EXISTS :title (
"trade_time" bigint NOT NULL,
"trade_id" bigint NOT NULL,
"first_trade_id" bigint NOT NULL,
"last_trade_id" bigint NOT NULL,
"price" TEXT NOT NULL,
"quantity" TEXT NOT NULL,
"is_buyer_mm" BOOLEAN NOT NULL,
"is_best_match" BOOLEAN NOT NULL
) WITH (
OIDS=FALSE
);

CREATE INDEX IF NOT EXISTS trade_idx ON :title (trade_id);
CREATE INDEX IF NOT EXISTS trade_timex ON :title (trade_time);

-- Tick data materialized view <pair>_tick (last day trades)
CREATE MATERIALIZED VIEW IF NOT EXISTS :tick AS
SELECT trade_id,
price::numeric,
quantity::numeric,
TIMESTAMP WITH TIME ZONE 'epoch' + trade_time * INTERVAL '1 ms' AS trade_time,
first_trade_id,
last_trade_id,
is_buyer_mm,
is_best_match
FROM :title
WHERE TIMESTAMP WITH TIME ZONE 'epoch' + trade_time * INTERVAL '1 ms' < CURRENT_DATE
ORDER BY trade_time;

CREATE INDEX IF NOT EXISTS trade_idx ON :tick (trade_id);
CREATE INDEX IF NOT EXISTS trade_timex ON :tick (trade_time);

-- 1m materialized view <pair>_1m
CREATE MATERIALIZED VIEW IF NOT EXISTS :minute AS
SELECT
date_trunc('minute', a.trade_time) dt,
(array_agg(a.price ORDER BY trade_time ASC))[1] o,
MAX(a.price) h,
MIN(a.price) l,
(array_agg(a.price ORDER BY trade_time DESC))[1] c,
SUM(a.quantity) vol,
count(*) ticks
FROM :tick AS a
GROUP BY date_trunc('minute', a.trade_time);

CREATE INDEX IF NOT EXISTS dtx ON :minute (dt);

-- 1h materialized view <pair>_1h
CREATE MATERIALIZED VIEW IF NOT EXISTS :hour AS
SELECT
date_trunc('hour', a.trade_time) dt,
(array_agg(a.price ORDER BY trade_time ASC))[1] o,
MAX(a.price) h,
MIN(a.price) l,
(array_agg(a.price ORDER BY trade_time DESC))[1] c,
SUM(a.quantity) vol,
count(*) ticks
FROM :tick AS a
GROUP BY date_trunc('hour', a.trade_time);

CREATE INDEX IF NOT EXISTS dtx ON :hour (dt);

-- 1d materialized view <pair>_1d
CREATE MATERIALIZED VIEW IF NOT EXISTS :day AS
SELECT
date_trunc('day', a.trade_time) dt,
(array_agg(a.price ORDER BY trade_time ASC))[1] o,
MAX(a.price) h,
MIN(a.price) l,
(array_agg(a.price ORDER BY trade_time DESC))[1] c,
SUM(a.quantity) vol,
count(*) ticks
FROM :tick AS a
GROUP BY date_trunc('day', a.trade_time);

CREATE INDEX IF NOT EXISTS dtx ON :day (dt);


Problems:



  • For each stock I add, the sizes of the views are more or less the same as the size of the table itself

  • 50 million rows make use of 4.5GB

Honestly, I still have to benchmark the throughput of my design, but I feel that there could be better implementations.



Ideal:



Looking at various similar questions about OHLC series I found these ideas:



  • Switch to ZFS to reduce size of database

  • Change database (NoSQL, InfluxDB, BerkeleyDB)

  • Use redis to cache queries (Integration with PostgreSQL is not very straightforward)








share|improve this question












share|improve this question




share|improve this question








edited Jun 11 at 23:37









Jamal♦

30.1k11114225




30.1k11114225









asked Jun 7 at 9:56









Afe

63




63











  • why dont you use SQLite? a) sqlite is faster than postgres if you are planning to do anything in bulk b) multiple reads in parallel are easily accomplished c) you can use redis as an intermediate store to store data from live streams and run a cron job every minute that adds to sqlite
    – PirateApp
    Jun 26 at 13:32

















  • why dont you use SQLite? a) sqlite is faster than postgres if you are planning to do anything in bulk b) multiple reads in parallel are easily accomplished c) you can use redis as an intermediate store to store data from live streams and run a cron job every minute that adds to sqlite
    – PirateApp
    Jun 26 at 13:32
















why dont you use SQLite? a) sqlite is faster than postgres if you are planning to do anything in bulk b) multiple reads in parallel are easily accomplished c) you can use redis as an intermediate store to store data from live streams and run a cron job every minute that adds to sqlite
– PirateApp
Jun 26 at 13:32





why dont you use SQLite? a) sqlite is faster than postgres if you are planning to do anything in bulk b) multiple reads in parallel are easily accomplished c) you can use redis as an intermediate store to store data from live streams and run a cron job every minute that adds to sqlite
– PirateApp
Jun 26 at 13:32
















active

oldest

votes











Your Answer




StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
);
);
, "mathjax-editing");

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "196"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: false,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);








 

draft saved


draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f196015%2fdesign-database-for-ohlc%23new-answer', 'question_page');

);

Post as a guest



































active

oldest

votes













active

oldest

votes









active

oldest

votes






active

oldest

votes










 

draft saved


draft discarded


























 


draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f196015%2fdesign-database-for-ohlc%23new-answer', 'question_page');

);

Post as a guest













































































Popular posts from this blog

Greedy Best First Search implementation in Rust

Function to Return a JSON Like Objects Using VBA Collections and Arrays

C++11 CLH Lock Implementation