Tag Archives: python

The Leuven Star Atlas – making a publication quality stellar atlas from scratch

1. Motivation

I am a professional astronomer who loves maps. But let me start with a small background story: I joined the Hungarian Astronomical Association in 1999 (the year of the great European total solar eclipse) at the age of 14, just after receiving a 76/700 Newton reflector from my parents for Christmas. Even though my love of astronomy dates back to long before that, joining the HAA was the start of my actual amateur astronomy career. I went on youth astronomy camps, where I tried to find as many Messier objects as I could under the dark rural skies of the Mátra mountains (besides regularly drawing sunspots and observing variable stars whenever I could), and for this I definitely needed a sky map.

Back then – just like 99% of the Hungarian amateurs – I had no money to order any of the nice star atlases from abroad (e.g., the actual edition of Uranometria 2000.0 was definitely high on my wish-list, but always way over my budget), so I used mostly the AAVSO charts for variable stars, and an “atlas” that I printed at home using a free version of the SkyMap Pro software. It took a few years until I managed to buy a proper atlas, thanks to the release of the “Égabrosz” in 2004, which was the first professional-quality sky atlas produced in Hungary. It was (and still is) a great black and white atlas positioned somewhere between Uranometria 2000.0 and Sky Atlas 2000.0 (based on its limited magnitude and resolution).

My practical participation in amateur astronomy ceased to continue soon after I started my university studies, and moving under the heavily light-polluted skies of Belgium did not help to reignite my interests later on, but I always kept my fascination towards mapping (the night sky), and I always wanted to make a proper (sky) atlas myself. Now that I know quite a lot about astronomical data, and I know how to make nice plots using the python programming language, I though the time was just about right to actually try and see if I can really make one. How difficult could it be, right?

2. Introduction

My goal was to produce a publication quality, both practical and visually pleasing star atlas aimed at amateur astronomers. To do this I had to first study the atlases currently available on the market, because I did not want to reinvent the wheel. Here is a non-complete list of the probably most widely-known atlases on the market:

While there are also a few free well-known atlases online:

Each atlas has their pros and cons. It is beyond the scope of this post to compare these atlases (but you find a nice comparison here, the text is in Polish, but the numbers and pictures are more important anyway), but there are a few things to consider when planning a new one. These are:

  • Format (size, number of pages)
  • Grayscale or colour
  • Legibility (colours, font types, label placement, print quality)
  • Resolution (scale in cm per degree of declination)
  • Limiting magnitude (typically 7.5 – 11.5)
  • Database and selection of non-stellar objects
  • Representation of objects (marker shapes, sizes, etc.)
  • Consistency (especially colour, line width, and font size)

I wanted to mix the best ingredients of these atlases, building around these main ideas:

  • A layout similar to the one of Uranometria 2000.0
  • Colours similar to Interstellarum Deep Sky Atlas
  • Milky way contours similar to Sky Atlas 2000.0

And then add the following refinements / improvements into the mix:

  • Slightly larger field of view (FOV) than in Uranometria 2000.0 with the full sky on 344 A4 – or alternatively B4 – pages (1.6 cm per degree or 1.9 cm per degree)
  • Non-stellar objects are printed to scale (down to a cutoff size), line width and fill colour relating to the actual brightness
  • Legible colour scheme: avoid using red since it is invisible under red light
  • Precise Milky Way representation (unlike in any printed atlas)
  • Clever navigation between pages
  • Hand-drawn bright and dark nebula outlines from scratch
  • Nice custom typography
  • Vector graphics
  • Automated label placement
  • Automated compilation of the full atlas by a press of the button
  • Flexible script (change FOV, magnitude limit, colours with ease)

I will go over all these aspects in detail in the following section and explain step-by-step how I built the atlas. As it is not 100% done yet, I will also note down the features that still need to be completed or refined in the (hopefully) near future. I started working on this during the first days of March, and by the middle of April most of the things that I am going to discuss here were done. Since then I spent less time on the project and took care of minor refinements only.

3. Practical execution – techniques and design aspects

3.1 Python setup

I am using a very basic setup installed via Canopy. My main environment is python 2.7, and I make use of the following packages extensively:

  • numpy for all kinds of data handling and numerical operations
  • pylab / matplotlib for all the main plotting operations
  • basemap for the mapping (takes care of the projection and the related transformations)
  • scipy for some specific interpolations and contours connected to the Milky Way
  • astropy and pyephem for celestial coordinate transformations

3.2 Source data

All databases that I am using are either publicly available from the internet (under various licences), or they are compiled by me from publicly available data (which is discussed in the following sections).

3.3 Custom, and compiled / processed databases

While some of the data can be used as is (for example the data of constellation boundaries and lines), most data needs to be first processed in a way or another before it is ready to be plotted. There are three large groups that need to be dealt with: stellar data, milky way contour data, and deep sky data (with a large stress on extended bright and dark nebulae).

3.3.1 Stellar database

Building the stellar database is taken care of inside the main plotting script (which will be discussed in detail in Section 3.4). Every time the script is executed, it checks if a precompiled magnitude limited stellar database is already available, and if not, it builds it from the databases discussed in Section 3.2. When the script is ran with a given magnitude limit for the first time, a new magnitude limited database must to be compiled, which happens in multiple stages (and at a magnitude limit of 10.0 it takes ~3 hours to complete):

  1. Tycho-2 data read in and magnitude limit applied (after computing V magnitudes from the available BT and VT magnitudes), magnitude limited Tycho-2 data saved.
  2. Bright Star Catalog (BSC) data read in, magnitude limit applied, names formatted to a pylab / LaTeX compatible format (so when annotating later on, proper Greek characters will appear automatically), magnitude limited BSC data saved.
  3. General Catalogue of Variable Stars (GCVS) data read in, stars in maximum dimmer than the magnitude limit thrown out, stars with an amplitude less than 0.1 mag (rounded to 1 decimal) are thrown away, and stars of some selected types such as Novae and Supernovae are thrown out, names formatted, then the remaining data is filtered and formatted, magnitude limited GCVS data is saved.
  4. The filtered GCVS data is used to update the filtered Tycho-2 data. First the GCVS coordinates are cross-matched one-by-one with the Tycho-2 coordinates (within 6″). There are three possible outcomes of this match:
    a) there are multiple Tycho-2 stars within 6″ from the coordinates of a given GCVS star (which happens sometimes in crowded areas), then we use the maximum magnitude information to match the Tycho-2 magnitudes, and update the best matching Tycho-2 entry with minimum and maximum magnitudes, the GCVS name, and a variability flag = 1
    b) there is only one match (which is the most common case), when life is easy, the GCVS entry is for that Tycho-2 star, so we add the magnitudes, the variable name, and the variability flag = 1 to the Tycho-2 entry
    c) there is no match (a rare case, but happens, e.g. when an irregular variable was much dimmer during the Hipparcos measurements then the magnitude limit, but on a longer period it can get brighter), then we need to add the GCVS star as it is to the compiled catalog, because it is not yet in the magnitude limited Tycho-2 sample.
    After this is done, the magnitude limited database with variability information is saved, and later on in the plotting stage stars with a variability flag = 1 will be plotted with a spacial symbol.
  5. In the next step, this database is updated with multiplicity information (in our case a multiplicity flag, and a separation value for the two brightest components). The reason for this is that for stars that appear closer to each other than a threshold (in our case this is set to be 60″), we do not want to plot each component, only the brightest one with a special symbol, which is common practice in stellar atlases. (Small not here: at this point, components that are closer to each other than 0.8″ are not included, but resolving stars that are so narrowly separated is practically impossible visually and/or without professional telescopes and exceptional weather conditions.) This is also done in multiple steps:
    a) stars are ordered from brightest to dimmest, each given a multiplicity flag = 0 to start with
    b) starting from the brightest we check each star one-by-one if there are other (dimmer) stars within 60″ from it
    c) if yes, then the brightest component (A) gets a multiplicity flag = 1 and the dimmer component(s) (B,…) get(s) a multiplicity flag = -1
    d) in the next steps only stars with multiplicity flag != -1 are tested as potential components, which means that, e.g., we will not identify a dimmer star later on as primary component when it was already found to be a secondary component of a brighter star earlier
    e) later on in the plotting stage only those stars will be plotted where the multiplicity flag is not -1, and from these the stars where the multiplicity flag = 1 will be plotted using a special symbol.
    When this is done, the data file is saved (now containing a magnitude limited Tycho-2 database that is updated with variability and multiplicity information). This step takes the longest time of all by a significant margin.
  6. In the last step this database is updated with names from the precompiled magnitude limited BSC file that was saved in step 2, similarly to how the GCVS entries were matched in step 4, but with a slightly higher tolerance in the coordinate match (to adjust for the possible apparent merging of multiple stars’ coordinates in step 5). Then this database is saved as the final stellar database of the atlas.

At this stage here is how a small section of the compiled stellar database looks like with a pair of coordinates, a pair of proper motions, a visual magnitude (which is a maximum brightness for variables), a variability flag, a possible minimum brightness (set to 99.000 for non variable stars), a multiplicity flag, a separation value of the brightest components for stars that have a multiplicity flag of 1 (otherwise set to 0.000), and a name (excluding the abbreviation of the constellation) in LaTeX format:

2017_LSA-01

3.3.2 Contours of the Milky Way

One of the main new features of my atlas (compared to other atlases on the market) is the inclusion of the (as) precise (as possible) contours of the Milky Way on its pages. For this I had to make my own data. I started from the APOD of the Milky Way by Nick Risinger (for which I still need to ask permission if I want to get this atlas distributed). I scaled this up slightly to 3600×1800 pixels (to match the 360×180 degree size of the full sky with a 0.1 degree pixel size), converted it to grayscale, applied a star-removal filter, a small blur, and enhanced the contrast slightly. Then using python I converted it to an array of brightness data on the galactic coordinate grid, and after a coordinate transformation from the galactic to the equatorial frame I saved it in a numpy specific file format using the numpy.savez() function.

This specific format is needed because reading in a 150 MB ASCII file is too slow with numpy, and it would create a bottleneck in speed when plotting the atlas pages. I use this brightness data later during the plotting and display it using contours set at some cleverly specified brightness levels to create a both visually pleasing and highly-realistic image of the Milky Way. The results are very nice, e.g., one can see multiple brightness levels even inside the Magellanic Clouds (as it can be seen below, but for the sake of clarity plotted using stronger colours than what is actually used in the atlas).

2017_LSA-02

I also tested if the coordinates of the original image were correct by repeating the same steps but without removing the stars, so one could actually see the stars on the contour plot too. In this experiment they lined up perfectly with the actual stellar data, confirming that the position data in the Milky Way image was correct.

3.3.3 Deep sky objects (contours of extended bright and dark nebulae)

The SAC database is a good starting point, since it has a great selection of objects that are accessible with a various range if amateur instruments. On the other hand, it has quite a few mistakes too, so I had to work on it a bit to make it more consistent and less messy. First of all I checked all objects that have an apparent size larger than 12′ (twice the cutoff size of my atlas) manually in DSS using the Aladin desktop client, and corrected sizes and/or position angles where I found it necessary (only a few cases). I also removed all duplicate entries from the database (which occurred with quite a few galaxies: many had multiple entries from different catalogues). I also decided to filter out those objects that had no useful/trustworthy brightness information (dropping mostly small open clusters that were barely identifiable on the DSS images anyway), and those open clusters that appeared too large on the scale of the atlas (typically the ones above 100′-120′, depending on the density of the field). Finally, I made sure that for Messier objects the Messier entry is used as name later on, not the NGC one. These were the easier parts.

The more difficult and time consuming part (taking around 40-50 hours) was drawing all bright and dark nebulae from the SAC database larger than 6′-12′ (meaning that 100% of the objects that are at 12′ or larger in diameter had to be drawn, and some of the smaller non-symmetric ones too), which I had to do manually, along with a few missing ones that I found worthy by looking through images in Aladin or by cross-checking with the selection on Sky Atlas 2000.0′s pages. (I am aware that there is a set of nebula outlines available online by Mark Smedley and Jan Kotek, but I did not find these good enough for my needs, and I did not want to copy other people’s data to begin with, because nebula outlines are one of those elements of an atlas that make it personal.) In practice I used Aladin to draw the contours of these objects with a mouse (I wish sometimes that I had a proper tablet and a digital pencil to do such things…), and export them as coordinate arrays.

For the brighter ones I tried to make the contours resemble the visual look based on drawings done by amateurs, while for the more difficult targets I went with the expected photographic outlines. It helped a lot that Aladin can create contour plots on the displayed images, leading my eyes while tracking the outlines. (For this I used the Mellinger Optical Survey and the DSS layers.) Some complex or large nebulae are drawn in multiple sections, for example the Vela Supernova Remnant in 22 pieces, IC 1318 in 14 pieces, and Sh2-240 in 42 pieces. Here is a rough example how this process looks like in Aladin:

2017_LSA-03

3.4 Plotting one page of the atlas

There is one single python script that takes care of the plotting of a single page of the atlas (plot_map.py). At the moment it is 1545 lines long, and contains – among others – 16 custom functions (some of which will be discussed later).

3.4.1 Arguments and control variables

To control what and how is plotted by the script, I am using a small set of arguments, and a larger set of variables (that I like to call control variables). These are separate for a good reason. Arguments can be given (values) simply from the terminal when running the script, and as such they can be also set by a higher level python script when compiling the whole atlas (see Section 3.5). These set the centre of the field of view, the output format (vector – pdf, or raster – png), and optionally the page number along with the page numbers of the pages that show the field above and below (thus up and down in declination) the current page in the atlas. These last three are optional, and when not given the script will simply produce a sky chart without the extra formatting of an atlas page.

2017_LSA-04

On the other hand control variables have to be set inside the main script, and they control the colours, size and magnitude limits, the different types of objects that need to be plotted, the field of view, etc. These stay the same for each page.

2017_LSA-05

Later on when closeup charts and chart index pages are created I will save a different version for each (e.g., plot_map_closeup.py and plot_map_index.py), where the limited magnitude, and a few other things will be set differently.

3.4.2 Colours and typography

When choosing the colours I experimented a bit with different colour combinations, but the main principles were always the same. I had to avoid red because it is invisible when the chart is browsed using a red torch (and this is common practice, since red light interferes the least with our dark-adapted eyes), and I wanted colours that go well together, but are easy to distinguish. At the end I choose my palette from Google’s material design guide.

When choosing the individual colours, the following ideas were factored into my decisions. First of all, I wanted the Milky Way to be plotted as shades of blue, not only because that looks great in Sky Atlas 2000.0, but also because the most prominent structures in a galaxy – the spiral arms – consist of young massive stars, and they are predominantly blueish in colour. As a consequence all galaxies had to be plotted in blue. Groups of stars are yellow as in most other colour atlases, but globular clusters are orange, since these are dominated by old(er), low-mass stars, and thus their light (spectrum) is more shifted towards this colour. Planetary nebulae are green, because the strongest emission line in their spectrum in the range where our eyes are the most sensitive is the [OIII] line at 500.7 nm, which appears to be green. Even though other types of bright nebulae might have different spectra (especially bright reflection nebulae), they are also plotted green (although a slightly different hue) to stay consistent. Stars are black and map-related lines are different shades of grey.

Outlines are always darker than the fill colours, and while the outline colours are constant, the line width and the opacity of the fill depend on the magnitude of a given object. Labels are always the same colour as the labelled objects’ outline.

I am using the Open Sans font family throughout the atlas, this is a nice, free, slightly narrower than default sans serif font with an extensive set of glyphs (including all necessary accented and Greek characters), that provides excellent legibility in many font sizes and weights (of which I am using the regular and bold variants).

3.4.3 Vertical layering

One important thing to handle when printing the chart is the vertical layering (or hierarchy) of the different object types, but also the vertical layering within a given object type. This way we can avoid larger objects overlapping – and blocking from view – smaller ones. Therefore, for example, the Milky Way needs to be plotted under everything else, dark nebulae need to be plotted over bright nebulae, small galaxies need to be plotted over large galaxies (otherwise, e.g, M 32 in the Andromeda Galaxy would not even be visible), stars need to be plotted over deep sky objects (so they are not covered by extended objects), and dim stars need to be plotted over bright stars (otherwise some dim stars close to bright stars could be covered by the larger disk of the bright star).

Placing different object types on different vertical layers is taken care using the zorder keyword inside any king of plotting command in python, while the required hierarchy within a given object type is ensured by sorting the objects according to size (from large to small, or from bright do dim in the case of stars), which means that when they are plotted by a plotting routine later on, they will be plotted according to the order they were sorted into. There are some special cases where the plotting order of open clusters and bright nebulae needs to be flipped (where a larger open cluster includes a smaller bright nebula), but this is also taken care of with a zorder-modifier value in the database of the hand-drawn nebulae, which effectively moves the given nebula to a layer above the open cluster.

3.4.4 Setting up the page

Before plotting any celestial data, we need to set up the plotting environment to be able to work with celestial coordinates. As already mentioned before, I use the basemap library to do this:

map = Basemap(projection=’stere’, width=fov_ra, height=fov_dec, lat_ts=central_dec, lat_0=central_dec, lon_0=360-(central_ra*15), celestial=True, rsphere=180/math.pi)

By using the celestial=True setting we make sure that we use the astronomical conventions for longitude (negative longitudes to the East of zero), and by setting the radius of the sphere to be 180/pi we can give the horizontal and vertical size of the area covered in the projection in degrees.

Like most celestial atlases I also use the stereographic projection. This is a conformal projection, meaning that it preserves the angles, but it does not preserve distances or areas. (There is no projection which would preserve all of these.) While indeed towards the edges equal distances seem to get larger, this effect is practically negligible at zoom levels that are used in a celestial atlas like this one.

After the map environment is set, we need to draw the grid of latitude and longitude lines (using the ICRS frame and a standard J2000.0 epoch), and I make sure that none of these lines get too dense as we get close to the poles, by tapering selected meridian lines before they reach the poles (see below). The density of the labelled meridian lines is also set by the distance from the pole, so labels never get too dense around the edges of the map area.

After the celestial grid is set up, I draw the ecliptic and the galactic equator along with the ecliptic and galactic poles, and small tick markers for each degree along the equator lines. Drawing the equators is very easy, one just needs to transform a constant (ecliptic or galactic) latitude = 0° line from ecliptic or galactic coordinates to equatorial coordinates, and plot the resulting array as a line. When drawing the poles, I draw their X shaped markers from two separate perpendicular lines (crossing at the poles) which point towards longitudes 0°, 90°, 180°, and 270°, this way, e.g., when looking at the galactic poles the reader can see how the galactic coordinate frame is rotated very differently compared to the equatorial grid.

As we are trying to produce not only a simple sky chart but a page in a stellar atlas, we also need to plot page numbers, navigation aids, and a legend box. Page numbers are plotted in the top and bottom outside corners (from a double page point of view) , which makes it easy to flip through the atlas when looking for a page number. There is also a special navigation bar plotted along the outer edge of each page. This navigation bar shows the central RA coordinate of the page, the minimum and maximum declination values of the page, and also the page number(s) of the closest neighbouring page(s) in declination (a.k.a. the pages above and below the current field). The location of this feature along the edge of the page depends on the central declination of the page. This means that just by looking at the edge of the closed atlas, or while flipping through it, the user can immediately see which region of the sky they are heading towards from the position of the navigation bar. (For example, on a page towards the North Celestial Pole this bar will be near the top of the edge of the page, while for pages around the Celestial Equator it will be at its centre.)

2017_LSA-07

Creating the legend box gave birth to probably the most interesting solution (in terms of coding) in the production of the atlas. In the following sections I will show that I set the size of all objects on the map using angles (angular diameters), and not by using marker sizes (in points or pixels) as done by default in python. This is very useful in the main map area of the page, but it meant that I had no idea how large an actual marker was when not working in a map instance. Therefore I could not simply create a simple subplot below the main map area and place the different symbols and their keys there without writing transformation functions to make sure that the size scales between the legend area and the main map area match. Instead of this, I created a 1 degree tall subplot with the same basemap settings (so also the same width in degrees, resulting in the same scale) as the main map area above, meaning that I could use the same functions and scales to plot objects here as I do in the main plot. Creating the legends themselves afterwards was only a question of placing a set of given objects next to each other, making sure that they are properly spaced, and that their labels are at the same horizontal levels. Odd and even pages got different legend boxes, so a double page always shows a full set of legends (stellar objects and lines on the left hand page, and deep sky objects on the right hand side – see examples later on).

I do not want to discuss all the remaining details here, but every piece of text (e.g., page numbers, the numbers in the navigation box, etc.) is aligned with other lines or texts in a very strict way to ensure symmetry at all times, no matter what area of the sky is being plotted. There was a very large effort put into this, even though the reader might not (consciously) notice these fine details.

3.4.5 Plotting stellar data

I have written one single function to handle the plotting of stars, from the simplest single stars to multiple variable systems with high proper motions:

draw_screen_star(ra, dec, pm_ra, pm_dec, pmlimit, mag, maglimit, map, double=0, sep=0, variable=0, vmin=0, facecolor=’k', edgecolor=’w', zorder=90)

Stars are plotted from bright to dim as mentioned earlier. All plotting is done in map (sky) coordinates, meaning that the symbol sizes are given in degrees, and there is an inverse linear scale between symbol radius and magnitude value. All stars are plotted with a white edge, so if symbols of nearby stars overlap, they remain nicely visible.

The width of this wide edge is set to be the narrowest well visible line width in print (matching the width of the lines of the celestial grid), while the smallest stars’ size is set by the symbol of the dimmest variable star (where the black central region needs another white filling, which must be still well visible). The size of the largest star is set to be aesthetically pleasing (not too small so bright stars can be well differentiated from dim ones, but not too large so they do not became overly extended). This also sets the other fix point of the linear magnitude scale. On the figure below you can see various tests of the star plotting function, with one notable example of a set of stars plotted on top of each other where the magnitudes range from -2 to 10 with a step size of 1 (resulting in a nice well separated pattern).

2017_LSA-06

In practice, taking the example of a variable double star (where the star is also visible in minimum, so, e.g., the 3rd star from the left in the row of multiple variable stars), this is how the star symbol is created: first a white circle is plotted and a white bar across (these will be the white outlines), then a narrower black line (corresponding to the separation of the brightest components of the multiple system) and a black circle with a smaller radius (corresponding to the brightness of the variable star in maximum), followed by another white circle with a smaller radius (the inner fill of the variable), and finally a smaller black circle (with a radius corresponding to the minimum brightness of the star). If the star has a high proper motion, it is also plotted with an arrow in the background. In summary, the symbol of a star is built from a set of pylab.patches (circles, polygons, and fancyarrows).

2017_LSA-08

Not strictly speaking stellar data, but I note here that constellation boundaries and stick figures are also plotted using simple lines. It is worth noting that to avoid plotting the overlapping sections of neighbouring constellations’ borders twice (which could introduce artefacts in the representation of dashed lines), I am using a version of this data where the duplicate line sections were removed.

3.4.6 Plotting non-stellar data

Similar to the stars, all deep sky objects are plotted in sky coordinates using specific functions that I have written for each different object type so I can feed sizes in arc minutes into them. The symbols are again made from various pylab.patches (ellipses, wedges, and rectangles) inside these functions. As an example, the symbol of global clusters is made out of four wedges that have a 90° opening angle (instead of trying to draw a nice cross inside a circle, which believe me is not as easy as it sounds). But outside of the function drawing a GC looks simply like this:

draw_screen_gcsymbol(ra, dec, size ,map, **kwargs)

Objects are plotted to scale when larger than a cutoff size (6′ in the LSA), otherwise they are plotted at given (smaller) fix sizes depending on which size-bin they fall inside the category. As mentioned earlier, the edge line width and the fill colour represents the visual brightness of a given object. The orientation (position angle, or PA) of galaxies is plotted true to reality, and to do this the local North had to be calculated at each position in the map (with a small function), which was then used to correct the PA coming from the SAC catalog (since on our map North is only ‘up’ along the central longitude of each page – except for right above the North and right below the South Celestial Poles, where the apparent direction of North along the central meridian is inverted).

I have kept most of the classical symbols, with the exception that planetary nebulae that are larger than the cutoff size (so planetary nebula that are plotted to scale) are plotted with a doughnut symbol (symbolising the classical PN look). Size bins were chosen so they are more or less equally populated by the smaller objects.

2017_LSA-09

3.4.7 Labelling objects

The last large task in creating a page in the atlas is labelling all plotted objects. My original goal was a fully automated setup based on the adjustText package, but this seems to be a much larger challenge to implement than first thought. The package works relatively well when labelling a set of small points or a set of objects that have the same size, but things get complicated when one wants to label a mix of patch objects (that have various sizes) and lines… Therefore after some experimenting I decided to put the automatisation attempts aside, and write a labelling graphical user interface (GUI). In any case even if the automated labelling would work, I would need this GUI to be able to make refinements in the label placements.

The main idea is that I run a slightly modified version of the original plot_map.py script (label_map.py, updated with the functions that make up the GUI), which produces an interactive version of the given page where I can just click on the labels (that are placed at the bottom right corner of each object by default as a starting point) and manipulate them. Using the GUI I can move labels (this way I can place them in a way that there is no overlap between objects and labels), I can make labels disappear (should a region be too busy for all labels to be visible), and I can even add in additional labels (to label the constellations, the poles, and the Magellanic Clouds – since they are plotted using the Milky Way contours and not as separate objects with their own label entries). When the GUI is run for the first time, all plotted objects are also fed into the

appendtolabellist(ra, dec, name, colour, size, rotation=0, visibility=1)

function, which builds a label table (from the names that were already available both in the stellar database and in the deep sky database) for the given page (placement positions, label texts, sizes, colours, rotation values – that are used for the labels of the ecliptic and galactic equator -, and a visibility flag), that I can modify with the GUI, and export it to be used by the main plot_map.py script. See part of such a label table below for a dense Southern part of Scorpion.

2017_LSA-10

If the label table is available for a given page when the plot_map.py script is used, labels will be also plotted. The advantage of having an actual label table (so a file like labels_P_004.txt for Page 4) for each page is that I can also modify label texts and sizes manually to fine-tune details for the final version, without having to edit the original object databases.

Of course since now I have to move almost all labels around manually, it takes a while to make the label table for a page (even with a good default offset from the corresponding symbol, there are so many stars that the chance that the default placement will overlap with something is pretty big). In practice this can be anything from ~20 minutes to 2 hours depending on how dense the filed is… My estimate is that labelling all pages would easily require 300 hours of work. Or I need to come up with a clever solution and write an advanced function to move the labels around automatically, but this seems equally difficult to me right now. So for the time being I labelled 5 sample pages that can be seen in Section 4, and depending on the future distribution of the atlas (proper printed version or only on-line for free), I will see how much effort and what kind of effort I need/want to put into labelling the rest.

The animation below shows how the formatting and the different layers of objects come together to form one page of the atlas.

2017_LSA_anim

3.5 Plotting the whole atlas in one go

Plotting the whole atlas is taken care by a separate script called plot_atlas.py: this is only 91 lines long, and has a very limited amount of tasks to do. First of all it has a list of the central coordinates of the atlas pages (given as an equidistant list of declinations, and for each declination a list of right ascensions).

2017_LSA-12

From this the script compiles the set of arguments that were discussed in Section 3.4.1 for each page by first building a list of page numbers and central coordinates, then calculating the closest (neighbouring) pages in declination (to North and South) for each page, which are stored as the pageup and pagedown values, respectively. Finally the script simply calls the plot_map.py script for each page – feeding the corresponding list of arguments to it – to initiate their production one-by-one (or four-by-four when using all available cores).

The whole process takes around 4 hours on my laptop (using 4 cores in parallel). This time could actually be still shortened quite significantly, because right now all data files (e.g., the magnitude limited stellar catalog, or the deep sky catalog) are read in for each page over and over again, instead of just keeping them in the memory after the first read in operation. This originates from the fact that the plot_map.py script is written as a standalone script that can produce a page of a stellar atlas or a simple star map from scratch (and even compile databases from existing databases as we have seen it earlier) if necessary, and not only as a plotter routine. I do not think that I will change this, because 4 hours for a full atlas is not a lot (and in practice you do not do this very often anyway).

At the end I have a tidy ordered set of pages in the output folder. The size difference between the different pages shows nicely which pages have a larger number of objects plotted on them.

2017_LSA-11

4. Sample pages and statistics

At the current stage, these are the numbers for the Leuven Star Atlas (for more details, see Section 3.3):

  • Total number of stars: 361980 (down to and including magnitude 10.0)
  • Variable stars: 6998
  • Multiple (visual or physical) systems: 6411
  • Total number of deep sky objects: 8690
  • Galaxies: 6768
  • Galaxy clusters: 29
  • Globular clusters: 156
  • Open clusters: 644
  • Planetary nebulae: 742
  • Bright nebulae: 154 (of which 134 have hand-drawn outlines – many of these actually cover multiple NGC objects, thus the total number of plotted catalog entries is higher)
  • Dark nebulae: 197 (of which 163 have hand-drawn outlines)

The full sky is displayed on 344 A4 pages (1.6 cm per degree of declination resolution), five of which can be downloaded in pdf by clicking on the example rasterised images below.

Page 4: Northern Ursa Minor

20170620_LSA_P_004_RA_15.00_DEC_+84.0

Page 55: Cygnus around NGC 7000 (North America Nebula)

20170620_LSA_P_055_RA_21.00_DEC_+42.0

Page 113: The Pleiades and the border of Taurus and Perseus

20170620_LSA_P_113_RA_04.00_DEC_+28.0

Page 136: Galaxies in Virgo and Coma Berenices

20170620_LSA_P_136_RA_12.67_DEC_+14.0

Page 272: Southern Scorpius

20170620_LSA_P_272_RA_17.25_DEC_-42.0

5. Future plans

The atlas is at a stage now where most of the larger work-packages are already completed, but there is still some work to be done before it can be printed into a nice book. A lot depends on whether I can actually get this published. I am determined to finish it and have a few copies printed in any case for myself and a few friends, but without an actual publication deadline, this will probably not happen next week, or the week after…

These are the components that are still missing or need completion:

  • Labelling the remaining 339 pages. As mentioned earlier this could take me minimum 300 hours of work… (Or ~8 full time work weeks, or a year if I just do one page every day.)
  • Close-up maps of selected dense regions: this means that I need to compile a stellar database down to ~11.0-11.5 magnitudes (press of a button, I do not need to change anything in my scripts), and implement a zoom-in factor (a scale multiplier that could be 2, 3, or 4), which I can use to modify, e.g., the field of view, and the cutoff sizes of deep sky objects. I could do this most likely in a day.
  • Refinements in the deep sky object database: I still need to clean up a smaller mess in the Large and Small Magellanic Clouds, because I am not happy about the completeness of the database there (another day of work, but I have already drawn quite a few bright nebula there too), and I am also not satisfied with the number of galaxy clusters on the map right now, so I would like to extend that based on the original Abell+ data (I guess I could also do this in one or maximum two days). In general, a small extension might be necessary to the deep sky database in general, but this is up for some further considerations.
  • Nicer constellation stick figures (as mentioned already earlier). This will take a bit longer (1-3 days), because I have to build a data file for this from scratch, and it is 100% manual work.
  • Index charts: I need to create 6 pages (looking towards the poles, and the four cardinal directions) with a large field of view showing only the brightest stars, the constellations, and the borders of the fields covered by the individual atlas pages labelled with the page numbers, as these overview charts provide the basic navigation feature of any stellar atlas. To make this I need a stellar database down to ~5.5 magnitudes (press of a button), a function that draws the outline of the sky that is displayed on given atlas pages (maximum an hour of work), and some labelling, so in total this will not take more than a day again.
  • An introduction chapter describing the use of the atlas, object types, etc. Also an index to the most interesting objects (Messier, Herschel 400, etc.) pointing to the atlas pages that contain them. This will probably be work for a week or two, since it includes quite some writing and brainstorming about what to write in the introduction.
  • A nice cover page: I would like to have either something very classical, or a montage of a page of this atlas with a classical illustration of a constellation (from here). This design process will definitely take a few days too.

When all these points are crossed off the list, I will write another post to showcase the new features and the final atlas. Without the labelling work this could be done in less than a month, but I have holidays coming up, and soon I will also start looking for a new job, so I would not expect much before Christmas. Please if you have any remarks or suggestions, let me know in the comments section, I would highly appreciate it.

February, March, April…

Again more-or-less three months without blogging. It is getting more and more difficult to find time or motivation to write, but there are so many things I would want to remember later, and my blog is the best diary, so let’s see what you missed. (Also, since most of my written English practice comes from this blog, I have to admit that I felt quite bad about my language skills while writing this post… Another reason to write more often. Then again, there is this thing about promises and not being able to keep them, so, whatever…)

LEGO: After my mountain bike, I got myself another – much cheaper – present for my birthday: the NASA Mars Science Laboratory Curiosity Rover from LEGO! It gave me a nice evening of assembling, and now it is on display below our TV. Undoubtedly, LEGO was the best toy of my childhood – even when we got our first computer, I kept playing with it. Among my favourite constructions were a suspension bridge, and an astronomical telescope in a proper rotating dome, but I always enjoyed simply following the instructions too. Ah, those were the days! I don’t think I will ever be too old for LEGO. Maybe at one point I should get all my LEGO from Hungary, since I don’t think my brother wants to play with it… The only problem is, that 1) we probably have no space for all that LEGO, 2) there is no way we could fly them over without paying extra for the overweight bags. It is really a lot of LEGO :)

2014_spring-1

Plots: Anyone who has been reading this blog for a while must know, that I am a data-freak. While preparing my annual seminar talk for the institute I also spent some time on data-mining and visualisation using the database of observations made with the HERMES spectrograph (at the Mercator telescope on La Palma). It turned out, that I have the 3rd highest amount of observing time at the instrument. (Even without counting the time I spent there with the master students as support astronomer – if one was to add those nights to my sum, then I would be winning with quite a landslide.) As an example, here is a plot showing the distribution of all HERMES observations throughout the first 5 years of HERMES, and a plot showing their distribution on the sky near the original Kepler field (different colours mean different observing programs, the dark blue area on the first plot represents the night time, and the symbol size is connected to the exposure time).

2014_Mercatorplot-1

2014_Mercatorplot-2

You can see – among many other things – how seasonal some observing programs are, and how well covered the Kepler field is. This was a nice exercise with python to learn a few new things about projections and calendar-data visualisation.

Making Belgian chocolate: After my public PhD defence I got a voucher for a chocolate workshop from my colleagues, but we only managed to go and do it now, at the end of March. It was a two hour session in the Bittersweet Chocolatier in the centre of Leuven, and we got to make small praline filled chocolate easter eggs, larger chocolate figurines, and pistachio balls covered in chocolate. It was a very nice experience, even without mentioning the half kilogram of – both self made and original Bittersweet – chocolate each of us got to take home afterwards :) I think it was definitely the best PhD defence present I have seen so far.

2014_spring-6

2014_spring-2

2014_spring-3

2014_spring-5

2014_spring-4

2014_spring-7

Royal Greenhouses of Laeken: Last week my mother came for a visit, and this time she finally got to enjoy the Belgian sunshine for a bit longer than a few seconds. Her only wish was to visit the coast and take De Kusttram between a few of the coastal towns, which we did on our first full day. The weather was warm, but we still got to use our umbrellas in a short thunderstorm which caught us while walking through the dunes. I also brought along my kite, but this time there was basically no wind at all (which is extremely rare on the coast), so I could not play much with it. There were other people with kites desperately waiting for stronger winds too…

On the next day, we went to Brussels to visit the Royal Greenhouses of Laeken, which is only accessible to the public during a short, three week long period each spring. It was very beautiful, despite being a little bit crowded. Although I think that the variety of plants in the – much smaller – botanic garden of Leuven (which we visited just a few days earlier with Clio, and on the following day with my mother) is larger, there is no doubt that the architectural beauty of this place, and the amount of blooming flowers there was truly exceptional.

2014_spring-8

2014_spring-9

2014_spring-10

2014_spring-11

2014_spring-12

2014_spring-13

Work: Research usually takes more time than originally expected, but this time I really underestimated how much of it would take me to finish my latest paper. When I discovered something particularly interesting during the conference in Sydney last year, I though I could be finished with the necessary analysis before the end of the year (2013). Then we ran into various unexpected problems (shit happens when you do thing which were never done before), resulting in a delay of several months… But now, just before leaving to Chile (more about that later), I finally submitted the manuscript to the usual journal (A&A). Let’s hope the referee will find our work worthy, and I can show you (or at least post about) the results in a few months!

The Performance Management Chart

Now that I can analyze and plot my workouts in detail, and I can estimate the Training Stress Score (TSS) of each ride, it is time to make use of all the statistics. It is nice to have numbers quantifying my trainings one-by-one, but how nice would it be to track my form and fitness during the season based on the volume and intensity of my rides. This is the concept behind creating the Performance Management Chart. I will write down (partly quote, or just rephrase) here the most basic things, but again, everything is based on the following two articles (take time to read them, they are very interesting):

What is the Performance Management Chart in TrainingPeaks WKO+?, by Hunter Allen
The scientific inspiration for the Performance Manager, by Andrew R. Coggan, Ph.D.

Andrew R. Coggan defines form as a combination of fitness and freshness (Form = Fitness + Freshness). You will have a really good day in the saddle when your fitness level is high, and when your fatigue level is low. Fitness is a response to training stress (or training load); the more you train the better you will be. Freshness is simply a result of rest. As TSS quantifies training load, knowing TSS values of workouts from the past enables us to determine the rider’s fitness in the present, and see how much training is needed in the future to raise this level higher.

The Performance Management Chart is based on Banister’s impulse-response model (see 2nd article for details), and we define the following components:

1) Chronic training load (CTL) provides a measure of how much an athlete has been training historically (chronically). It is calculated as an exponentially-weighted moving average of daily TSS values, with the default time constant set to 42 days. CTL can be viewed as analogous to the positive effect of training on performance in the impulse-response model. It is a relative indicator of changes in performance ability due to changes in fitness.

2) Acute training load (ATL) provides a measure of how much an athlete has been training recently (acutely). It is calculated as an exponentially-weighted moving average of daily TSS values, with the default time constant set to 7 days. ATL can be viewed as analogous to the negative effect of training on performance in the impulse-response model. It is a relative indicator of changes in performance ability due to fatigue.

3) Training stress balance (TSB) is the difference between CTL and ATL, i.e., TSB = CTL – ATL. TSB provides a measure of how much an athlete has been training recently, compared to how much they have been training historically. It can be viewed as an indicator of how fully-adapted an individual is to their recent training load, i.e., how “fresh” they are likely to be.

In the Performance Manager concept, an individual’s CTL determines their performance potential, but their TSB influences their ability to fully express that potential. Their actual performance at any point in time will therefore depend on both their CTL and their TSB, but determining how much emphasis to accord to each is now a matter of trial-and-error/experience, not science. (Really go and see more details in the original article.)

Some other interesting quotes before I tell you about my progress in the topic:

“The following approximate guidelines may prove useful when analyzing prior data: a TSB of less than ‑10 would usually not be accompanied by the feeling of very fresh legs, while a TSB of greater than +10 usually would be. A TSB of -10 to +10, then, might be considered neutral, i.e., the individual is unlikely to feel either particularly fatigued or particularly rested. The precise values, however, will depend not only on the individual but also the time constants used to calculate CTL and ATL, and therefore should not be applied too literally.”

“The optimal training load seems to lie at a CTL somewhere between 100 and 150 TSS/d. That is, individuals whose CTL is less than 100 TSS/d usually feel that they are undertraining, i.e., they recognize that they could tolerate a heavier training load, if only they had more time available to train and/or if other stresses in life (e.g., job, family) were minimized. (…) On the other hand, few, if any, athletes seem to be able to sustain a long-term average of >150 TSS/d.”

“In addition to the absolute magnitude of CTL, considerable insight into an individual’s training (and/or mistakes in training) can often be obtained by examining the pattern of change in CTL over time. Specifically, a long (e.g., 4-6 week) plateau in CTL during a time when a) the focus of training has not changed, and b) the athlete’s performance is constant is generally evidence of what might be termed training stagnation – that is, the individual may feel that they are training well, by being very consistent and repeatedly performing the same workouts, but in fact they are not training at all, but simply maintaining, because the overload principle is not being applied. On the other hand, attempting to increase CTL too rapidly, i.e., at a rate of >5-7 TSS/d/week for four or more weeks, is often a recipe for disaster, in that it appears to frequently lead to illness and/or other symptoms of overreaching/overtraining. Of course, since changes in CTL are driven by changes in ATL, this means that any sudden increase in the training load (e.g., training camp, stage race) must be followed by an appropriate period of reduced training/recovery, so as to avoid too great of an overload.”

So now that I have all my rides processed with my python script (even the indoor ones, so really all cycling workouts), I only had to write a script which grabs the TSS values from every statistics file (then calculates daily values if there were more rides on a given day), calculates the ATL, CTL and TSB values for every day in the given period, and plots them (plus saves the data as a table). This was not too hard ;) There is even an option to change the default time constants in the parameter file (the same parameter file which is used by the workout analyzer and plotter script), and to define a start date and end date for the plotting in the command line. Then after giving the

>python plotgarmintcx_makePMC.py 20101201 20110920

command (where the dates are optional and only affect the plotting limits, while the calculations are always done using all the data), you have your Performance Management Chart plotted and all the data saved in an ASCII file too. Let’s see how the result (with some additional comments photoshopped over in red) looks like (click and it will be bigger)!

You can see that I came into the season with some residual CTL (in blue) from last year, but one 1 hour trainer session per week was not enough to keep that on a steady level… Then as the spring was exceptionally warm and sunny I had some quite good trainings in March and April raising my CTL from 20 TSS/d to 55 TSS/d, which then dropped because of no training and very low training load during my observing run. Then the four hard days of cycling on La Palma gave a huge amount of training load (ATL in magenta), my CTL jumped from 45 TSS/d to 68 TSS/d, but my TSB (in yellow) fell to -87 TSS/d, which is extremely low, if you keep riding there then the chances are high that you will get an injury… It also resembles well that I was extremely tired after that week… Then, only after two days of rest, with a still pretty negative TSB I rode my personal best on the standard Leuven-Mechelen-Leuven route. Probably I should have waited a bit more, because another 4 days later I did a ride where I easily had a better average speed on 62 kilometers than my personal best on 48 kilometers from 2010. I was extremely surprised about my performance on that day. It is not such a surprise anymore when you look at this plot. My CTL was still very high, and as I was resting a lot in the days before, my TSB became almost zero. This is the perfect combination (high CTL, zero or slightly positive TSB – remember, actual form is a combination of fitness and freshness) for record breaking rides (or races), this is an example of a sweet spot on the Performance Management Chart. Unluckily after this really good period I could not ride for almost three weeks (work, work, and weather…), which had a huge impact on my CTL (dropping all the way down to 43 TSS/d). From this point, I started a quite systematic and serious training, as I wanted to finish July with at least 1000 kilometers. You can see that my ATL is almost always above my CTL, which leads to the increase of CTL. Then I crashed into a car just days before I was about to climb the Mont Ventoux, which gave me some time to rest (and a bit if pain in my right knee, which is missing from this chart…), so again I had a high CTL and an ~zero TSB when I went to France. And I was again very surprised how easy the first ride felt there, especially after an accident. But now from this chart, it is not so surprising anymore. My TSB was still only -13 TSS/d when I climbed the Mont Ventoux, so it was close to optimal, but I could have done better with a bit less riding on the first two days in France. It was also the day of my highest CTL with 71.3 TSS/d this year (till now). Then during my observing run, it dropped again (to 57 TSS/d), so now I am working on getting it back up to the region where is should be :) Next year I want to be at 100 TSS/d at this time. At least. Especially if I am serious about my plans for the summer…

As a conclusion, I think it is a very nice visualization and an extremely useful tool for a cyclist!

Estimating cycling power and training load

Updated 1: I modified the script slightly to perfectly match the estimation method of Joe Friel (see below), and as this had a small but clearly positive effect on the results, I have also updated the last plot and the paragraphs describing it at the end.

Updated 2: I have also tested the power estimate on very low intensity efforts with known average wattages (check out the bottom of the post).

You might remember that I have written a python script earlier this year, to analyze my cycling workouts and besides the calculation of detailed statistics, also create all kinds of fancy (multidimensional) plots. If you do not remember, or you would like to refresh your memories, please click here before reading this post further.

The most important thing missing from my script was the ability to directly compare different workouts. Of course you can tell that a 150 km ride with 5000 meters of elevation gain is more difficult than an easy 50 km Sunday afternoon ride, but how much more difficult? And what about an easy 75 kilometer training and a short but hard interval session? How do these compare? How tired am I going to feel myself after these? I really wanted to create or find a metric which tells me how much I did on a training. Of course this is only a problem when you don’t have a power meter installed on the bike, because that would directly tell you the amount of work in SI units for every workout. But power meters are expensive, and most importantly I don’t have one. This situation might change in the future, but till then let’s see what can we do.

I kinda forgot about the problem (and I did not even start dealing with it earlier, I just simply made a note on my To Do list), but some days ago Strava (a similar site to Garmin Connect, but with competitive – social media based – extras, unluckily with basically no users in Europe, so it is not an alternative for me) introduced a so called suffer score on their website, basically giving a rating to every ride based on its intensity (estimated from heart rate – HR – data) and duration. I knew that the algorithm behind this will be quickly decoded on one of the blogs I frequently read, and indeed it did happen very quickly, go and have a look there. First I wanted to implement this into my script, but then I got some extra motivation from the post and the comments there, so I looked a bit into the literature, and thus I got to know about the Training Stress Score, or simply TSS (among others, but this is the most important thing for us right now). To save some space here, check out the following three sites, so I can skip retyping things which are already on the Internet.

Estimating Training Stress Score (TSS), by Joe Friel
Normalized Power (NP), Intensity Factor (IF), and Training Stress Score (TSS), by Andrew R. Coggan, Ph.D.
What does 100 TSS mean, and the connection to Functional Threshold Power (FTP)

Though TSS and all these metrics are used in and based on power measurements, you can see that it is possible to give a good estimate (as good as estimating the realtive power on climbs from the slope gradient and VAM, which my script already did before) of it from time spent in HR zones using the scaling values from Joe Friel. We have seen that by definition a TSS of 100 equals to a one hour ride at FTP (so as hard as you can go for one hour). This means that rides shorter than one hour can have a TSS/hour higher than 100, but longer rides will have a lover value. So the TSS value tells you how big the training load of a given ride was (and it is related to the amount of post-ride fatigue), and the TSS/hour value gives you the intensity of your workout. The way it’s calcualted, TSS varies by the square of intensity. That means if you’re only going at 90% for an hour then you’re only accumulating 81 TSS per hour (90% = 0.9, which squared is 0.81, then you multiply by 100 to get TSS). So from the TSS/hour you can calculate an intensity factor, which gives you your average power output in units of your FTP power for the whole workout. Given that you know your FTP power, you can have a good estimate of the average power of any of your training rides! Now that is pretty cool. So that is what I have built into my script, this way now the TSS, TSS/h, average power and total work estimate values are also given in the summary file. And I also performed some test calculations and comparisons – because you should never forget, these are estimates! You need to know your FTP, and your HR at FTP, and then you might even get a reasonable value…

First of all, I wanted to see what is the intensity factor of my personal best ride to Mechelen and back. It was an approximately 1 hour 20 min all out effort, and the script gave an intensity factor of 1.00, meaning that I was riding on FTP. Though the ride was a bit longer than one hour (when it should not be possible to ride on FTP anymore), but I had a small 10 min break between the two legs in Mechelen, so it might still be a valid approximation, but for sure it should be extremely close to reality. Also, though it was a very tough ride, because it was short, the TSS value is ‘only’ 135.0, meaning that in 24 hours I might have probably almost fully recovered. In comparison, a normal ride (in my terms, so an average of 32 km/h instead of the record 35.7 km/h) on the same route gives a TSS around 110, while a recovery ride is probably below 100, and my epic ride (147 km and a bit more than 5000 meters of elevation gain) climbing twice up from sea level to the highest point of La Palma had a TSS of 483.7, which – as you have already guessed it probably – is in the epic category. Just as comparison, the last short  (46 km and 400 meters of ascent) and recovery paced ride from France had a TSS of 69.5 (which is approximately half the TSS of my personal best ride which was done on an almost equally long, but completely flat route, so this really shows how easy this French ride was). So as I got an IF (intensity factor) of 1.00 on a ride which is very close to the definition of FTP I was already quite happy with the result. (If for such a ride you get something like 1.05 or even higher as IF, then that is a sign that your threshold heart rate is now higher, so you should change it accordingly in the parameter file.)

As a second test I wanted to see how does this power estimate compare to the power estimate from VAM and average gradient, which is a widely accepted and used relation for climbing sections. If the power values from the two methods match, then it is OK to use the power estimate from the hourly TSS value on rides even with no climbing at all. So first I took my recent ride up to the Mont Ventoux, where I did 1 hour and 40 minutes of all out riding, so I expect to get an IF around 0.95 and of course I am very interested to see how the two different power estimates will differ from each other (if they will differ at all). So here is the statistics file (recent additions marked with a grey background):

So from the IF of 0.96 you can see that indeed I did all I could in this time frame. Now we can estimate the average power from my FTP, which is somewhere around 300 W (or maybe a bit more, but I have to admit I don’t have a recent measurement, so I can only guess this from my workouts on the trainer back in the first months of the year). The result from this is 288 W. What about the value from the other method? I was ~70.5 kg and my bike + drink and food + clothes is an extra ~11.5 kg, then with a total weight of 82 kg and the relative power estimate from the VAM and gradient you get an average power of 287 W. These values are surprisingly almost perfectly identical! This means that I can most probably trust the TSS based method, and use it for complete rides, while the VAM and gradient based method only works on climb sections. This is quite nice! Of course one measurement is not a measurement, so I wanted to check this on other climbs as well. Unluckily I don’t have to many climbs (as Flanders is pretty flat), but I still managed to put together a sample of 18 climbs from this year, mostly from my rides on La Palma in may, and some others from later (so there is a chance that my FTP was not the same at the time of the different rides, but I still calculated with the same value, while for my total weight – with equipment included – I could make small changes based on me having a backpack with 2 extra liters of water or not). The sample has climbs from short explosive hills to the long ascents of La Palma, steep climbs and not so steep climbs, climbs where I was really fit and well recovered, and also climbs where I was tired, to see what is the effect of these on the relation. So all these climbs are displayed on the plot below (the size of the circles is related to the length of the climb, while the colour resembles the steepness).

With a perfect 1:1 relation between the two different power estimates we expect the climbs to fall on the x=y linear (which is the grey dashed line here). The correlation is well visible, with of corse some noise, but the trend seems to be pretty clear. The biggest outlier is the Smeysberg on the top right, which is a very short (430 meter) but very steep (an average of 9.8%) climb, and the reason why it is an outlier is that it was a sprint effort well above threshold, but starting with a very low heart rate, so it took some time till my HR got up into the regime which really corresponds to the level of my power output, and this time was in the order of the full length of the effort, so of course the TSS based metric is lower. For such sprint efforts the heart rate based estimated TSS will be always lower than the power based, because 1) the already mentioned lag of HR behind the sudden raise of power output, 2) that much above FTP the HR will not get higher, it does not matter if you maintain 400 W or 600 W for 30 seconds, your HR will be probably stuck at you maximum HR value. But these problems only arise when you do very short above threshold sprints. This is also why one of the small blue dots is also a clear outlier. Still, in the region where most of my training rides are (~220 W to 300 W) the match is almost perfect. Fitting a linear [y = f(x) = mx + b] to all the climbs (solid grey line) or only climbs which were longer than 3 km (a.k.a. efforts where there was a significant heart rate lag were dropped, thick black line) gives the following equations:

a) m = 1.41±0.16, b = -99±41, but we don’t really care about this
b) m = 1.00±0.12, b = 0±31, which is a perfect 1:1 match (with some noise of course)!

The other problem with HR based power estimates, that your heart rate at, e.g., 250 W will not be the same for two workouts which were ridden in different conditions, as for example dehydration and the level of residual fatigue strongly affects the heart rate. So again, these can significantly change the resulting estimated power. Like in case of the medium sized light blue circle slightly above the 1:1 line in the bottom left – this was a very slow paced ride (I was not alone), but I was extremely fit (after more than 1000 km ridden already in the same month, but basically no hard workouts in the previous one week), so probably my heart rate zones were a bit shifted. On the other hand, the two other climbs where the difference between the two values is larger than 5% (the small blue circle and the larger dark blue circle slightly below the 1:1 line towards the bottom left) were the last climbs of my two hardest days on La Palma, so at those ascents I was already very tired, which led to the shift of my HR zones – but now to the other direction. (At least this is my explanation.) Furthermore, small ascending sections can effect the estimate from overall VAM and slope gradient, and to convert relative power to power you need to know your total weight with bike, clothes, and everything included. Taking all these factors into account it is easy to understand why there is a scatter around the 1:1 relation.

To see what happens when you go to lower power – so the region of true recovery workouts – I have modified the script to be able to handle TCX files which do not contain GPS position information (so files which contain indoor trainer workouts). I have only three rides where I maintained a constant power instead of doing intervals (where the heart rate based method is clearly off, but we know that already), but in case of these, I can compare the ‘real’ average power (from the Tacx Flow) to the estimate. So for these three rides, the real average power values were 156 W, 140 W, and 200 W, while the estimates for the same rides are 133 W, 136W, and 220 W (assuming an FTP of 290 W as these were done very early in the season). The difference between the measurements and the estimates is in the order of ~10% (of the trainer values), which is OK for an estimate.

Conclusions: we can say that indeed the TSS based method can be used to estimate the average power (and workload) of workouts (even from HR data) if they do not consist of (only) short sprint efforts (so sections where there is a significant lag in the change of heart rate compared to the power output), as it scales well with another widely used power estimation method, and as it is consistent with wattage data from indoor trainer workouts (though the latter is only tested for low intensities). As a second conclusion: I really need a power meter (to test these estimates, and to have real power data, damn it)…

And oh, this was my 500th post!

Visualizing cycling workouts (a.k.a. I love plots!)

Fasten your seat belts, you are about to read one of my best posts ever. (Honestly, it is true.)

I am a bit of a nerd, there is no need to try to deny this fact. I love gadgets, I love technology, and especially, I love plots. I love if they have more than two dimensions, if there is also color or even symbol shapes or sizes used to represent a third or fourth dimension. Moreover, I love GPS devices. At the moment I have a hiking GPS (Garmin GPSMap 60 CSx), a smartphone with built in GPS, and a cycling GPS (Garmin Edge 500), and I had three other ones (two hiking GPS devices, and a bluetooth GPS receiver with my previous Nokia phone) before these… But I am not the nerd who just sits in front of the computer, because I also love sports. I mean, I love doing sports, and not just watching them on TV (which I also like, but this is not the point). If you are not completely new to my blog, then you know that these days I am especially into cycling – this is also the reason why I own a special cycling GPS (which I bought basically on the morning of my first ride with my – back then new – racing bike). With Garmin (and even with other brands), you have the option to upload your workouts to the Garmin Connect website, which gives you some nice overview plots, a calendar, reports, some statistics, and maybe most importantly the ability to share what you have done with the on-line community. The plots and statistics are nice and detailed enough if you are an average – or so called hobby – user. In that case, there is no problem. But if you are such a plot-fetishist as I am, or you need professional analysis, and you have a desire for more, than you will like what I am just about to show after the break. (If you have no food and drinks with you, go and grab something before clicking on continue…) Continue reading