Fasten your seat belts, you are about to read one of my best posts ever. (Honestly, it is true.)
I am a bit of a nerd, there is no need to try to deny this fact. I love gadgets, I love technology, and especially, I love plots. I love if they have more than two dimensions, if there is also color or even symbol shapes or sizes used to represent a third or fourth dimension. Moreover, I love GPS devices. At the moment I have a hiking GPS (Garmin GPSMap 60 CSx), a smartphone with built in GPS, and a cycling GPS (Garmin Edge 500), and I had three other ones (two hiking GPS devices, and a bluetooth GPS receiver with my previous Nokia phone) before these… But I am not the nerd who just sits in front of the computer, because I also love sports. I mean, I love doing sports, and not just watching them on TV (which I also like, but this is not the point). If you are not completely new to my blog, then you know that these days I am especially into cycling – this is also the reason why I own a special cycling GPS (which I bought basically on the morning of my first ride with my – back then new – racing bike). With Garmin (and even with other brands), you have the option to upload your workouts to the Garmin Connect website, which gives you some nice overview plots, a calendar, reports, some statistics, and maybe most importantly the ability to share what you have done with the on-line community. The plots and statistics are nice and detailed enough if you are an average – or so called hobby – user. In that case, there is no problem. But if you are such a plot-fetishist as I am, or you need professional analysis, and you have a desire for more, than you will like what I am just about to show after the break. (If you have no food and drinks with you, go and grab something before clicking on continue…)
The main problem with the Garmin Connect plots (see above – and note that this time all images can be viewed in original size after clicking on them), that there is no way to zoom in the x direction, so even if you did a 30 min ride, or an 8 hour sportive, you will have the data displayed on the same width, leading to bad resolution and crowding (can you see any detail on the speed graph up there?, no, you can’t). Also, there are no histograms, though they are very handy and informative when looking at your workout. There are some free and commercial softwares and websites out there already which give you some of the options I miss, but when I gave them a try, I quickly uncovered some of their shortcomings, which forced me to come up with my own solution instead of living with annoying problems of existing analysis tools. As I use the python language at work (which is by the way an open source, free solution with a huge community standing behind the development, plus it has an extensive plotting library), I decided to write something which does exactly what I need, but can be used later by other Garmin users too. So my strategy was to construct a script, which reads in data from the XML file which comes from the Garmin Connect website (I do not deal with the original file from the device, because that is encrypted, while the XML structure is easy to work with and contains the same information), does all the needed analysis and plotting, and uses a parameter file (see below), which enables the input of personal details (like heart rate zones, bike details, etc.) and the change of some given parameters (e.g. plotting limits, histogram resolution, plotting modes, thresholds, etc.) in an easy way without touching the script itself.
So let’s assume you already have your personal details in the parameter file, and – from experience – you have your favorite plotting settings (which I have for months now, and I never change them because they seem to fit all situations), then you do not even need to touch the parameter file to process a new workout, you just open a terminal (sorry for my Windows user readers, but the script works only on Max and Linux systems), go to the folder where you have the script file, the parameter file, and your workouts, and type the following:
>python plotgarmintcx.py 'activity_86515505.tcx'
This will start the read in, the analysis, and the plotting, and after some time (1-5 minutes depending on the length of your ride, and the speed of your computer), you will have all your plots and statistics nicely arranged in a corresponding subfolder with no additional input needed from your side :) The only thing which has to be calculated by the script and is not directly read in from the data file is the slope gradient, and as it is a bit noisy, I also smooth it with a box which is a 100 meters wide. Also, if you are interested in total time, or moving time, you need to handle the data in a different way. Luckily these conditions can be very easily and nicely handled with the structures of python – so of course these things are also included.
So let’s see the different (high resolution, print quality – don’t forget you can click on them and check out the original versions) plots I produce.
First of all, there is a map produced (with scales), which just shows you the track you did (maybe I will put an OpenStreetMap background on it later on, but I do not really care about this output, it was made only to experiment with the handling of longitude and latitude data in python). The nicest thing is the scale which is also displayed, and the map is plotted in a way that in the center this scale is valid in both the latitudinal and longitudinal directions. Also, the length which is used as the scale is always an integer kilometer till the point when the width of the map would be too small (so the ride does not cover much of a distance in the E-W direction) for it. In such case it is displayed in meters, and is always close to the 1/4th of the full width.
Then there are overview plots (elevation, slope gradient, speed, heart rate and cadence versus distance, time and moving time), which basically show you what you already saw on Garmin Connect, but already with a much better detail and quality (and it fits an A4 page perfectly, if you want to kill the rain forests, which I do not suggest…). I show you the one where the values are plotted against the distance.
But as I like to see the different measurements (elevation, gradient, speed, heart rate and cadence) together, I also produce plots (versus distance, time and moving time) where these are displayed on one figure with multiple axes, so the connection between different values or their change can be seen much easier. I produce one plot wich covers the whole workout (but gets longer if the workout is longer, not like the Garmin Connect plots),
but as this can easily be 20000 pixels wide even though it is plotted in lower resolution (and even I can not handle such a thing easily, but it would look very pretty on the wall), I also produce the same plots in slices of given lengths (you can see an example below) and given time intervals.
I have marked two places on this slice. At a) you can see how nicely the heart rate, the gradient, the speed, and the cadence are correlated (the latter two change to the opposite direction compared to the first two) when you are doing a steady climb. If the gradient drops, your heart rate will also drop, as the climb gets easier (and after 2000 m of elevation gain, you don’t start pushing), and the speed rises with the cadence (if you do not shift gears). At point b) you can see that I am rolling down with 0 cadence on a curvy road (the sharp curves are the sudden drops in speed, with a steady acceleration – caused by gravity – afterwards), and where the arrow mark is placed, you can see that right after the minimum in speed, I started pedaling to speed up as I was coming out of the curve. It is really nice that you can see such details.
Then there are simple histograms (of time) showing barplots of the distribution of the gradient, the speed, the heart rate (and the time spent in heart rate zones), and the cadence. As an example here is a heart rate plot (with the first hump corresponding to descents, and the second one for the ascent on that day).
The histograms can tell a lot of your workout, e.g. steady efforts on flat terrain produce close to normal distributions of heart rate, speed, and cadence, while alternating climbing and descending sections produce these double peaked distributions… Seeing how much time you spent in your personal heart rate zones is also very important. (The plot below is from another day than the plot above, do not get confused! The green bars are the 5a and 5b – and there is nothing in 5c – zones within zone 5, which is the largest blue bar here, as this was a full intensity time trial effort from this week.)
Then there are so-called 3D plots where 3 of the 4 metrics (gradient, speed, heart rate and cadence) are shown (the third with color). I think they are pretty self-descriptive. It is a great way again to show correlations in the data.
On the plot above you can perfectly see how the descents (bottom left) and ascents (top lefts) are separated in this parameter space. Descents have low heart rates, high speeds, and of course negative slope gradients. I truly love these plots. And again, different rides show different structures. Of course a time trial on a flat course will not be interesting on this 3D plot, but it will be informative on another one.
This one above on is again very interesting (it’s from another day of constant climbing and descending). You can see that first of all there is the lower red section, which is just rolling with various speeds. Then as you go higher along the y axis (and the gradient gets positive, and then steeper and steeper) the speed starts dropping along a linear (and the heart rate starts to be higher). Then as we reach a gradient of ~5%, the linear relations stops (there is something like a break point there), and another relation takes over along a much steeper linear. This probably has something to do with the fact that on that day I have used the smallest gear at and above 5%, and as there is no way to shift gears beyond that ratio, there is no way to get into a more comfortable rhythm on a steeper slope Speaking of gear ratios, have a look at the plot below:
As you can see, the data points are distributed along the theoretical speed-cadence curves, which are set by the number of cogs on the front chainrings and rear cassette. Also, the colors prove that I use lower ratios while climbing, and higher ones while pedalling downhills. Simple and beautiful.
The only problem with these, is that the symbols are plotted on top of each other (signs of this can be see on the 2nd example 3D plot above), so when you have a longer ride (which is better for the visibility of clusters in the data), then the data points from the end of the ride will be plotted over the data points from the beginning… This is especially the problem when both the x and y axes have metrics which can only have integer values (like cadence versus heart rate), because then over-plotting happens pretty soon. Still, as there are quite good correlations existing between given metrics, it is very likely, that an older point will be over-plotted with a very similar more recent one. Still, if the recording was not done with the recent ‘every one second’ option of your device (so if the data is not equally spaced in time), then the density of groups will not be correlated with the time spent in these 2D areas of the 3D plots, so to visualise how much time was spent around a given x,y combination of two of the four metrics, I also produce 2D histograms (see the three corresponding ones for the above 3D plots below).
Though it was not that evident from the 1st 3D plot, now you can see that much more time was spent in the climbing sections (upper right cluster).
Again, it is clear now, that most of the time was spent in the low speed steep gradient section, sweating on the climbs.
And again, most of the time I was pedaling using the easiest gears. There is no need to explain these further, but it is very easy to see, that using these (there are 6 different 2D histograms) together with the 3D plots (12 different ones) is again very handy and immensely informative :)
The nicest thing about these plots is that the limits, the scales, the colors are always the same (if you use the default setting of the parameter file – when the comparison mode is on), so if you want to compare two workouts, then you just place the corresponding plots next to each other (or – what I prefer to do – blink them on the screen), and voi là (see a comparison of a mountainous ride from La Palma with my good time trial effort from this week below)!
Then at the end, a file with all the statistics is also saved. If you ask for climb specific analysis (again, via the parameter file), then on top of the previous things (plots and statistics of the selected section, all nicely placed in a subfolder within the workout’s folder – so usually I just run the script for the whole data, then I run it for the selected climbs), the statistics will be more extensive (with estimated power and climb difficulty ratings – which also classify your climb, and – to make things even more precise – these categories are fixed to known difficulty categories of climbs on the Tour de France, based on this post). You can see a sample statistics file below, with the climb specific parts marked on the bottom of the file.
The difficulty score is calculated according to the formula of climbbybike.com, while the climb rating is based on the simple, but yet very logical formula of Dan Connelly. The relative power is calculated from a widely used estimation, which can be found also on Wikipedia. And, at last but not least, a nice gradient plot will be also produced for your climb section. Very professional!
This immerse amount of plots and statistics is what I want to see after I go cycling ;) If you like what I did here, just leave a comment, it is highly appreciated! And now go and ride you bike for God’s sake :D (Ps.: if you were wondering, the script itself is ~1500 lines of code, but most of it is just plotting…)
The script is available on GitHub.