====== Time series basics ====== Metrics collected by the monitoring are stored in a time series-database It is important to understand the query principles and mechanisms to ensure displayed data are accurate and relevant ===== Concepts ===== * A time-serie consists of values associated to timestamps. * A time serie has a name identifying the kind of data it represents, such as ''system.disk.free_space'' * A time serie can also have **tags**, in general used to identify the resources associated the the data, such as a server name or a disk * One times serie is created for each set of unique combination of tags, it is basically a table associating timestamps to values. ===== Querying ===== ==== Time serie name ==== * To be used in a table or a graph, you have to query the database to return data associated to a particular time serie name (aka metric) , within a period of time * For a particular metric name, you can have multiple underlying time series, one for each combination of tags. * Example: metric ''server.disk.free_space'', for 2 servers and 2 disks each, this creates 4 time series: 12:00:00 server.disk.free_space;server=server1;disk=C, 50 12:00:00 server.disk.free_space;server=server1;disk=D, 80 12:00:00 server.disk.free_space;server=server2;disk=C, 20 12:00:00 server.disk.free_space;server=server2;disk=D, 10 ==== Aggregation ==== * When building a query, you must first define the time serie name and time window * Then you must define the aggregation: This defines how, **for each timestamp**, the values will be collided. * Using the above example, it gives: * Max → 80 * Min → 10 * Avg → 40 * Sum → 160 (not relevant) ==== Grouping ===== * For a given time serie name, you can have multiple time series, for each combination of tags * It is often necessary to collide those time series * Similarly to a **GROUP BY**, you need to select the associated operation and tags. * The group by operation is the one defined during the previous **Aggregation** step. * Here you need to define which tags are going to be used as keys the regroup the data. * Using above example: * max(none) → returns 1 result : 80 * max(system) → returns 2 results: 80 for server1 and 20 for server 2 * max(disk) → returns 2 results: 50 for disk C and 80 for disk D * max(server, disk) → returns the 4 results. no colliding happens when the query spans over all existing combination of tags. ==== Roll up ==== * Until now, you have defined how to vertically aggregate the values of the time series for each timestamp. * You also need to define how the data points are going to be aggregated horizontally, across time. this is called the roll-up. * Used in a graph, Roll-up defines how to aggregate the values from within a given time window * Example: * A bar graph represents number of errors per day, and the number of errors are collected every 15 minutes, which represents 96 data points per day. * The roll up will define how we compute the value which will better represent those 96 datapoints in a single bar: Min / Max / Sum / Avg / Last * Any datapoint on a graph will have to usually represent multiple, more granular datapoints. * The roll-up defines what is the best way to reflect the information you want to get. * In a table or a gauge, all the data points of the collided series must still result to a single value, the roll-up is used for that. ==== Summary ==== * As you can see, the choice of aggregation and roll-up is VERY important. * The displayed information can be very misleading if not correctly defined.