TimescaleDB for Time-series Data
Are you an analyst working on Time Series Data? Are you intrigued by which databases to use? Considering the fact that even traditional databases can also serve the purpose of storing and retrieving Time Series data, why is there a need for a specialized Time Series database?
We will be examining most of these questions in this blog. We will also answer them, based on our understanding of our use-cases. While doing so, we have compared two fledgling Time Series databases, OpenTSDB and TimeScaleDB on their key characteristics, technical features and architecture.
Time-series Data :
Basic definition of Time Series data points to two key characteristics –
- A discrete time-data sequence taken at successive equally spaced points in time-indexed manner.
- Data that collectively represents how a system, process, or behavior changes over time.
Components of Time-series Data :
- Seasonal effect (Seasonal Variation or Seasonal Fluctuations)
- Other Cyclic Changes (Cyclical Variation or Cyclic Fluctuations)
- Trend (Secular Trend or Long Term Variation)
- Other Irregular Variation (Irregular Fluctuations)
Weather monitoring and Stock Market Analysis data are example of Time Series data that are easy to relate in our daily life. At the back-end, most of these data are generated as a part of various IoT platforms or applications. Thus, the speed at which the data are generated becomes a key complexity. Thus database to which the data is ingested should have the ability to record and retrieve the data coming from thousands of IoT devices every microsecond to ensure timely monitoring.
In this digital environment, a great deal of data is gathered by various devices and applications. For example, current location, browsing data, personal fitness/metrics trackers etc. In this kind of scenario, it is really important to store the data for the overall population in an effective time-series database for future predictions/forecast.
fig1: Example of Time series data points
TimescaleDB Overview :
TimescaleDB is the first time-series database specifically designed for scale, ease of use, and complex queries. While TimescaleDB is an extension of PostgreSQL, it provides the following:
- Automatic partitioning across time and space (partitioning key)
- Full SQL support
- Easy to use; like a relational database
- Fast and parallel ingestion
fig 2: PostgreSQL and TimescaleDB – A Comparison of Insert Rates
As we can see in the above figure (fig 2), the insert rates go down as the Dataset size increases in PostgreSQL. While in TimeScaleDB, steady insertion rate is maintained irrespective of the size of the Dataset. Thus, the performance of the application that sits on top of TimescaleDB improves greatly.
TimescaleDB executes the query on Hypertable comprising of many Chunks partitioned by time and space which really look like regular tables.
“ Time-series data is largely immutable. Writes primarily occur as new appends to recent time intervals, not as updates to existing rows. Both read and writes have a natural partitioning across both time and space.”
-TimeScale DB Developers
Data Handling in TimescaleDB :
HyperTable Outlook: Abstracts the table as the Hypertable composed of many right-sized chunks partitioned as per Time and Space.
Optimized Query Execution: During Query Execution it checks whether only the necessary chunks are used for retrieval of data. This can be done by aggressive constraint exclusion.
Data Model: TimeScale DB follows the Wide table Model which helps in the estimate, measure, or note the similarity between data.
Benchmarking TimeScaleDB :
Our use case required running complex aggregation queries while also supporting simultaneous ingestion of incoming time-series data. To ensure that the chosen platform can handle this load, we did some benchmarking.
An ingestion application kept pumping in data into the database. We ran three types of queries that access a varying number of rows. Every query was run several times to ensure stable results. Execution time over a table varied only slightly. This is because the query hits only the selected number of chunks satisfying the filtering conditions. The results are shown in the table below. We will describe the individual queries in a future blog post.
Results of TimescaleDB benchmark:
Query 1 Execution time(seconds)
Query 2 Execution Time(seconds)
Query 3 Execution time(seconds)
TimeScaleDB(v0.4.2) Vs OpenTSDB(v2.3.0)
Based on our use case for handling and storing time-series data for IoT implementation in a Big Data environment, TimescaleDB compares better on features such as partitioning, data retention, access methods, and compatibility to scripting for automation. Moreover, ease of access and simple retrieval of data for further application makes it more convenient when compared to the other time-series databases that we have used.