Design database for OHLC

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;

up vote
1
down vote

favorite

Requirements:

I have to design a database to store tick data of 350 stocks. Inside this database queries will be executed to transform tick data and retrieve OHLC candles of 1 minute, 1 hour, 1 day.

There are on average 15 tick rows inserted each second for every stock and each table is initially populated with an average of 6 million tick data (from a minimum of 100k to a peak of 50 million).

There are no delete queries to be supported efficiently.

SELECT queries need to filter the time of the candles, but I would like to support huge bulks of 600k rows in case a whole year of 1 minute candles is requested.

Initial target is to support at least 5 concurrent queries on the same stock or 20 concurrent queries in total.

Implementation:

The implementation I have chosen is to use one PostgreSQL (10) database and store tick data in tables split by stock. Index on trade_id and trade_time.

In order to transform tick data in OHLC candles, I thought to use MATERIALIZED VIEW for each type of candle (1m, 1h, 1d) refreshed concurrently every 24 hours in sequence for all the tables. The reason is to avoid the database to compute every time the transformation from tick data to candles, but to be able to retrieve the candles during the refresh. The views have index on date.

-- Tick data <pair>
CREATE TABLE IF NOT EXISTS :title (
 "trade_time" bigint NOT NULL,
 "trade_id" bigint NOT NULL,
 "first_trade_id" bigint NOT NULL,
 "last_trade_id" bigint NOT NULL,
 "price" TEXT NOT NULL,
 "quantity" TEXT NOT NULL,
 "is_buyer_mm" BOOLEAN NOT NULL,
 "is_best_match" BOOLEAN NOT NULL
) WITH (
 OIDS=FALSE
);

CREATE INDEX IF NOT EXISTS trade_idx ON :title (trade_id);
CREATE INDEX IF NOT EXISTS trade_timex ON :title (trade_time);

-- Tick data materialized view <pair>_tick (last day trades)
CREATE MATERIALIZED VIEW IF NOT EXISTS :tick AS
SELECT trade_id,
price::numeric,
quantity::numeric,
TIMESTAMP WITH TIME ZONE 'epoch' + trade_time * INTERVAL '1 ms' AS trade_time,
first_trade_id,
last_trade_id,
is_buyer_mm,
is_best_match
FROM :title
WHERE TIMESTAMP WITH TIME ZONE 'epoch' + trade_time * INTERVAL '1 ms' < CURRENT_DATE
ORDER BY trade_time;

CREATE INDEX IF NOT EXISTS trade_idx ON :tick (trade_id);
CREATE INDEX IF NOT EXISTS trade_timex ON :tick (trade_time);

-- 1m materialized view <pair>_1m
CREATE MATERIALIZED VIEW IF NOT EXISTS :minute AS
SELECT 
date_trunc('minute', a.trade_time) dt,
(array_agg(a.price ORDER BY trade_time ASC))[1] o,
MAX(a.price) h,
MIN(a.price) l,
(array_agg(a.price ORDER BY trade_time DESC))[1] c,
SUM(a.quantity) vol,
count(*) ticks
FROM :tick AS a
GROUP BY date_trunc('minute', a.trade_time);

CREATE INDEX IF NOT EXISTS dtx ON :minute (dt);

-- 1h materialized view <pair>_1h
CREATE MATERIALIZED VIEW IF NOT EXISTS :hour AS
SELECT 
date_trunc('hour', a.trade_time) dt,
(array_agg(a.price ORDER BY trade_time ASC))[1] o,
MAX(a.price) h,
MIN(a.price) l,
(array_agg(a.price ORDER BY trade_time DESC))[1] c,
SUM(a.quantity) vol,
count(*) ticks
FROM :tick AS a
GROUP BY date_trunc('hour', a.trade_time);

CREATE INDEX IF NOT EXISTS dtx ON :hour (dt);

-- 1d materialized view <pair>_1d
CREATE MATERIALIZED VIEW IF NOT EXISTS :day AS
SELECT 
date_trunc('day', a.trade_time) dt,
(array_agg(a.price ORDER BY trade_time ASC))[1] o,
MAX(a.price) h,
MIN(a.price) l,
(array_agg(a.price ORDER BY trade_time DESC))[1] c,
SUM(a.quantity) vol,
count(*) ticks
FROM :tick AS a
GROUP BY date_trunc('day', a.trade_time);

CREATE INDEX IF NOT EXISTS dtx ON :day (dt);

Problems:

For each stock I add, the sizes of the views are more or less the same as the size of the table itself

50 million rows make use of 4.5GB

Honestly, I still have to benchmark the throughput of my design, but I feel that there could be better implementations.

Ideal:

Looking at various similar questions about OHLC series I found these ideas:

Switch to ZFS to reduce size of database

Change database (NoSQL, InfluxDB, BerkeleyDB)

Use redis to cache queries (Integration with PostgreSQL is not very straightforward)

edited Jun 11 at 23:37

Jamalâ™¦

30.1k11114225

asked Jun 7 at 9:56

Afe

why dont you use SQLite? a) sqlite is faster than postgres if you are planning to do anything in bulk b) multiple reads in parallel are easily accomplished c) you can use redis as an intermediate store to store data from live streams and run a cron job every minute that adds to sqlite
â€“Â PirateApp
Jun 26 at 13:32

add a commentÂ |Â

up vote
1
down vote

favorite

Requirements:

I have to design a database to store tick data of 350 stocks. Inside this database queries will be executed to transform tick data and retrieve OHLC candles of 1 minute, 1 hour, 1 day.

There are on average 15 tick rows inserted each second for every stock and each table is initially populated with an average of 6 million tick data (from a minimum of 100k to a peak of 50 million).

There are no delete queries to be supported efficiently.

SELECT queries need to filter the time of the candles, but I would like to support huge bulks of 600k rows in case a whole year of 1 minute candles is requested.

Initial target is to support at least 5 concurrent queries on the same stock or 20 concurrent queries in total.

Implementation:

The implementation I have chosen is to use one PostgreSQL (10) database and store tick data in tables split by stock. Index on trade_id and trade_time.

-- Tick data <pair>
CREATE TABLE IF NOT EXISTS :title (
 "trade_time" bigint NOT NULL,
 "trade_id" bigint NOT NULL,
 "first_trade_id" bigint NOT NULL,
 "last_trade_id" bigint NOT NULL,
 "price" TEXT NOT NULL,
 "quantity" TEXT NOT NULL,
 "is_buyer_mm" BOOLEAN NOT NULL,
 "is_best_match" BOOLEAN NOT NULL
) WITH (
 OIDS=FALSE
);

CREATE INDEX IF NOT EXISTS trade_idx ON :title (trade_id);
CREATE INDEX IF NOT EXISTS trade_timex ON :title (trade_time);

-- Tick data materialized view <pair>_tick (last day trades)
CREATE MATERIALIZED VIEW IF NOT EXISTS :tick AS
SELECT trade_id,
price::numeric,
quantity::numeric,
TIMESTAMP WITH TIME ZONE 'epoch' + trade_time * INTERVAL '1 ms' AS trade_time,
first_trade_id,
last_trade_id,
is_buyer_mm,
is_best_match
FROM :title
WHERE TIMESTAMP WITH TIME ZONE 'epoch' + trade_time * INTERVAL '1 ms' < CURRENT_DATE
ORDER BY trade_time;

CREATE INDEX IF NOT EXISTS trade_idx ON :tick (trade_id);
CREATE INDEX IF NOT EXISTS trade_timex ON :tick (trade_time);

-- 1m materialized view <pair>_1m
CREATE MATERIALIZED VIEW IF NOT EXISTS :minute AS
SELECT 
date_trunc('minute', a.trade_time) dt,
(array_agg(a.price ORDER BY trade_time ASC))[1] o,
MAX(a.price) h,
MIN(a.price) l,
(array_agg(a.price ORDER BY trade_time DESC))[1] c,
SUM(a.quantity) vol,
count(*) ticks
FROM :tick AS a
GROUP BY date_trunc('minute', a.trade_time);

CREATE INDEX IF NOT EXISTS dtx ON :minute (dt);

-- 1h materialized view <pair>_1h
CREATE MATERIALIZED VIEW IF NOT EXISTS :hour AS
SELECT 
date_trunc('hour', a.trade_time) dt,
(array_agg(a.price ORDER BY trade_time ASC))[1] o,
MAX(a.price) h,
MIN(a.price) l,
(array_agg(a.price ORDER BY trade_time DESC))[1] c,
SUM(a.quantity) vol,
count(*) ticks
FROM :tick AS a
GROUP BY date_trunc('hour', a.trade_time);

CREATE INDEX IF NOT EXISTS dtx ON :hour (dt);

-- 1d materialized view <pair>_1d
CREATE MATERIALIZED VIEW IF NOT EXISTS :day AS
SELECT 
date_trunc('day', a.trade_time) dt,
(array_agg(a.price ORDER BY trade_time ASC))[1] o,
MAX(a.price) h,
MIN(a.price) l,
(array_agg(a.price ORDER BY trade_time DESC))[1] c,
SUM(a.quantity) vol,
count(*) ticks
FROM :tick AS a
GROUP BY date_trunc('day', a.trade_time);

CREATE INDEX IF NOT EXISTS dtx ON :day (dt);

Problems:

For each stock I add, the sizes of the views are more or less the same as the size of the table itself

50 million rows make use of 4.5GB

Honestly, I still have to benchmark the throughput of my design, but I feel that there could be better implementations.

Ideal:

Looking at various similar questions about OHLC series I found these ideas:

Switch to ZFS to reduce size of database

Change database (NoSQL, InfluxDB, BerkeleyDB)

Use redis to cache queries (Integration with PostgreSQL is not very straightforward)

edited Jun 11 at 23:37

Jamalâ™¦

30.1k11114225

asked Jun 7 at 9:56

Afe

why dont you use SQLite? a) sqlite is faster than postgres if you are planning to do anything in bulk b) multiple reads in parallel are easily accomplished c) you can use redis as an intermediate store to store data from live streams and run a cron job every minute that adds to sqlite
â€“Â PirateApp
Jun 26 at 13:32

add a commentÂ |Â

up vote
1
down vote

favorite

Requirements:

I have to design a database to store tick data of 350 stocks. Inside this database queries will be executed to transform tick data and retrieve OHLC candles of 1 minute, 1 hour, 1 day.

There are on average 15 tick rows inserted each second for every stock and each table is initially populated with an average of 6 million tick data (from a minimum of 100k to a peak of 50 million).

There are no delete queries to be supported efficiently.

SELECT queries need to filter the time of the candles, but I would like to support huge bulks of 600k rows in case a whole year of 1 minute candles is requested.

Initial target is to support at least 5 concurrent queries on the same stock or 20 concurrent queries in total.

Implementation:

The implementation I have chosen is to use one PostgreSQL (10) database and store tick data in tables split by stock. Index on trade_id and trade_time.

-- Tick data <pair>
CREATE TABLE IF NOT EXISTS :title (
 "trade_time" bigint NOT NULL,
 "trade_id" bigint NOT NULL,
 "first_trade_id" bigint NOT NULL,
 "last_trade_id" bigint NOT NULL,
 "price" TEXT NOT NULL,
 "quantity" TEXT NOT NULL,
 "is_buyer_mm" BOOLEAN NOT NULL,
 "is_best_match" BOOLEAN NOT NULL
) WITH (
 OIDS=FALSE
);

CREATE INDEX IF NOT EXISTS trade_idx ON :title (trade_id);
CREATE INDEX IF NOT EXISTS trade_timex ON :title (trade_time);

-- Tick data materialized view <pair>_tick (last day trades)
CREATE MATERIALIZED VIEW IF NOT EXISTS :tick AS
SELECT trade_id,
price::numeric,
quantity::numeric,
TIMESTAMP WITH TIME ZONE 'epoch' + trade_time * INTERVAL '1 ms' AS trade_time,
first_trade_id,
last_trade_id,
is_buyer_mm,
is_best_match
FROM :title
WHERE TIMESTAMP WITH TIME ZONE 'epoch' + trade_time * INTERVAL '1 ms' < CURRENT_DATE
ORDER BY trade_time;

CREATE INDEX IF NOT EXISTS trade_idx ON :tick (trade_id);
CREATE INDEX IF NOT EXISTS trade_timex ON :tick (trade_time);

-- 1m materialized view <pair>_1m
CREATE MATERIALIZED VIEW IF NOT EXISTS :minute AS
SELECT 
date_trunc('minute', a.trade_time) dt,
(array_agg(a.price ORDER BY trade_time ASC))[1] o,
MAX(a.price) h,
MIN(a.price) l,
(array_agg(a.price ORDER BY trade_time DESC))[1] c,
SUM(a.quantity) vol,
count(*) ticks
FROM :tick AS a
GROUP BY date_trunc('minute', a.trade_time);

CREATE INDEX IF NOT EXISTS dtx ON :minute (dt);

-- 1h materialized view <pair>_1h
CREATE MATERIALIZED VIEW IF NOT EXISTS :hour AS
SELECT 
date_trunc('hour', a.trade_time) dt,
(array_agg(a.price ORDER BY trade_time ASC))[1] o,
MAX(a.price) h,
MIN(a.price) l,
(array_agg(a.price ORDER BY trade_time DESC))[1] c,
SUM(a.quantity) vol,
count(*) ticks
FROM :tick AS a
GROUP BY date_trunc('hour', a.trade_time);

CREATE INDEX IF NOT EXISTS dtx ON :hour (dt);

-- 1d materialized view <pair>_1d
CREATE MATERIALIZED VIEW IF NOT EXISTS :day AS
SELECT 
date_trunc('day', a.trade_time) dt,
(array_agg(a.price ORDER BY trade_time ASC))[1] o,
MAX(a.price) h,
MIN(a.price) l,
(array_agg(a.price ORDER BY trade_time DESC))[1] c,
SUM(a.quantity) vol,
count(*) ticks
FROM :tick AS a
GROUP BY date_trunc('day', a.trade_time);

CREATE INDEX IF NOT EXISTS dtx ON :day (dt);

Problems:

For each stock I add, the sizes of the views are more or less the same as the size of the table itself

50 million rows make use of 4.5GB

Honestly, I still have to benchmark the throughput of my design, but I feel that there could be better implementations.

Ideal:

Looking at various similar questions about OHLC series I found these ideas:

Switch to ZFS to reduce size of database

Change database (NoSQL, InfluxDB, BerkeleyDB)

Use redis to cache queries (Integration with PostgreSQL is not very straightforward)

edited Jun 11 at 23:37

Jamalâ™¦

30.1k11114225

asked Jun 7 at 9:56

Afe

Requirements:

I have to design a database to store tick data of 350 stocks. Inside this database queries will be executed to transform tick data and retrieve OHLC candles of 1 minute, 1 hour, 1 day.

There are on average 15 tick rows inserted each second for every stock and each table is initially populated with an average of 6 million tick data (from a minimum of 100k to a peak of 50 million).

There are no delete queries to be supported efficiently.

SELECT queries need to filter the time of the candles, but I would like to support huge bulks of 600k rows in case a whole year of 1 minute candles is requested.

Initial target is to support at least 5 concurrent queries on the same stock or 20 concurrent queries in total.

Implementation:

The implementation I have chosen is to use one PostgreSQL (10) database and store tick data in tables split by stock. Index on trade_id and trade_time.

-- Tick data <pair>
CREATE TABLE IF NOT EXISTS :title (
 "trade_time" bigint NOT NULL,
 "trade_id" bigint NOT NULL,
 "first_trade_id" bigint NOT NULL,
 "last_trade_id" bigint NOT NULL,
 "price" TEXT NOT NULL,
 "quantity" TEXT NOT NULL,
 "is_buyer_mm" BOOLEAN NOT NULL,
 "is_best_match" BOOLEAN NOT NULL
) WITH (
 OIDS=FALSE
);

CREATE INDEX IF NOT EXISTS trade_idx ON :title (trade_id);
CREATE INDEX IF NOT EXISTS trade_timex ON :title (trade_time);

-- Tick data materialized view <pair>_tick (last day trades)
CREATE MATERIALIZED VIEW IF NOT EXISTS :tick AS
SELECT trade_id,
price::numeric,
quantity::numeric,
TIMESTAMP WITH TIME ZONE 'epoch' + trade_time * INTERVAL '1 ms' AS trade_time,
first_trade_id,
last_trade_id,
is_buyer_mm,
is_best_match
FROM :title
WHERE TIMESTAMP WITH TIME ZONE 'epoch' + trade_time * INTERVAL '1 ms' < CURRENT_DATE
ORDER BY trade_time;

CREATE INDEX IF NOT EXISTS trade_idx ON :tick (trade_id);
CREATE INDEX IF NOT EXISTS trade_timex ON :tick (trade_time);

-- 1m materialized view <pair>_1m
CREATE MATERIALIZED VIEW IF NOT EXISTS :minute AS
SELECT 
date_trunc('minute', a.trade_time) dt,
(array_agg(a.price ORDER BY trade_time ASC))[1] o,
MAX(a.price) h,
MIN(a.price) l,
(array_agg(a.price ORDER BY trade_time DESC))[1] c,
SUM(a.quantity) vol,
count(*) ticks
FROM :tick AS a
GROUP BY date_trunc('minute', a.trade_time);

CREATE INDEX IF NOT EXISTS dtx ON :minute (dt);

-- 1h materialized view <pair>_1h
CREATE MATERIALIZED VIEW IF NOT EXISTS :hour AS
SELECT 
date_trunc('hour', a.trade_time) dt,
(array_agg(a.price ORDER BY trade_time ASC))[1] o,
MAX(a.price) h,
MIN(a.price) l,
(array_agg(a.price ORDER BY trade_time DESC))[1] c,
SUM(a.quantity) vol,
count(*) ticks
FROM :tick AS a
GROUP BY date_trunc('hour', a.trade_time);

CREATE INDEX IF NOT EXISTS dtx ON :hour (dt);

-- 1d materialized view <pair>_1d
CREATE MATERIALIZED VIEW IF NOT EXISTS :day AS
SELECT 
date_trunc('day', a.trade_time) dt,
(array_agg(a.price ORDER BY trade_time ASC))[1] o,
MAX(a.price) h,
MIN(a.price) l,
(array_agg(a.price ORDER BY trade_time DESC))[1] c,
SUM(a.quantity) vol,
count(*) ticks
FROM :tick AS a
GROUP BY date_trunc('day', a.trade_time);

CREATE INDEX IF NOT EXISTS dtx ON :day (dt);

Problems:

For each stock I add, the sizes of the views are more or less the same as the size of the table itself

50 million rows make use of 4.5GB

Honestly, I still have to benchmark the throughput of my design, but I feel that there could be better implementations.

Ideal:

Looking at various similar questions about OHLC series I found these ideas:

Switch to ZFS to reduce size of database

Change database (NoSQL, InfluxDB, BerkeleyDB)

Use redis to cache queries (Integration with PostgreSQL is not very straightforward)

edited Jun 11 at 23:37

Jamalâ™¦

30.1k11114225

asked Jun 7 at 9:56

Afe

edited Jun 11 at 23:37

Jamalâ™¦

30.1k11114225

edited Jun 11 at 23:37

Jamalâ™¦

30.1k11114225

edited Jun 11 at 23:37

Jamalâ™¦

30.1k11114225

asked Jun 7 at 9:56

Afe

asked Jun 7 at 9:56

Afe

asked Jun 7 at 9:56

Afe

why dont you use SQLite? a) sqlite is faster than postgres if you are planning to do anything in bulk b) multiple reads in parallel are easily accomplished c) you can use redis as an intermediate store to store data from live streams and run a cron job every minute that adds to sqlite
â€“Â PirateApp
Jun 26 at 13:32

add a commentÂ |Â

why dont you use SQLite? a) sqlite is faster than postgres if you are planning to do anything in bulk b) multiple reads in parallel are easily accomplished c) you can use redis as an intermediate store to store data from live streams and run a cron job every minute that adds to sqlite
â€“Â PirateApp
Jun 26 at 13:32

why dont you use SQLite? a) sqlite is faster than postgres if you are planning to do anything in bulk b) multiple reads in parallel are easily accomplished c) you can use redis as an intermediate store to store data from live streams and run a cron job every minute that adds to sqlite
â€“Â PirateApp
Jun 26 at 13:32

add a commentÂ |Â

active

oldest

votes

Your Answer

StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
);
);
, "mathjax-editing");

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "196"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: false,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f196015%2fdesign-database-for-ohlc%23new-answer', 'question_page');

);

Post as a guest

Name

active

oldest

votes

draft saved

draft discarded

draft saved

draft discarded

Post as a guest

Name

搜尋此網誌

trjhtr