Mining the Strava Data

Moved to here. Have a nice day 😁


7 thoughts on “Mining the Strava Data

  1. Nice article, though way above my mathematical knowledge.
    I’m trying to detect cheating myself at
    The problem I’ve found is that outside the United States, altitude data tends to be much less accurate, so trying to detect abnormal speeds based on anything but significant altitude changes tends to be problematic.
    It’s also difficult because the Strava API’s don’t let you see all of a riders activity data – just a segment at a time.
    Good luck with your analysis if you keep it up.


  2. Thanks Martin.

    Your KOMDefender website looks really cool. It’s a shame about the full activity streams not being available. (I’m surprised they don’t make this clearer in the docs here

    Incidentally the noisy altitude data is annoying but it should be possible to mitigate the issue significantly with a bit of effort. You may well be employing such techniques already but if not and you have the time/motivation I would suggest using variance reduction techniques to combine the estimates of altitude at a given distance along a segment from many/all segment efforts for that segment (e.g., in simplest case, just simple average or maybe median, interpolated). That could be a one-off (or at least infrequent) calculation that you would do for each segment and then you would discard the altitude data for a given segment effort and just use the private altitude series that you have calculated. Alternatively Google Elevation API may do the trick.

    Anyway, thanks again and good luck.

  3. Nice work Oliver. Know that view well also :). Beautiful Wiclow mountains. Been playing around with the Strava api myself as a project to learn web dev. I decided to tackle integrating weather info with segment leader-boards as I’m convinced all my KOM’s were stolen by massive tailwinds. Only joking :), but weather does have a huge effect on leader-board times. Thank you for posting the piece on power calculations as It gave me plenty of fuel for thought. I think I’ll code this into my app, also factoring in the wind which I’m also capturing.
    I’ve added a link to a segment I’m sure you’ve ridden many times showing integrated weather info.


  4. Thanks Padraig, I’m glad you found my words interesting. I do indeed know that segment.

    Yes, the weather has a huge effect on times. So much so that I expect the top KOM times were generally executed under very favourable conditions. Fortunately, one of the bonuses of being an average (at best) cyclist like myself is that those impressive times are too far away for me to worry about!

    Anyway good luck with your own investigations of Strava’s API. There’s a wealth of data there.

  5. Hi Oliver, while searching for ways to plot strava elevation data I came across this analysis. As a keen cyclist, I was particularly intrigued by the flatness of Alpe d’Huez and agree that looking at the time element could explain a lot of that flatness. While Alpe is often as final climb for a cyclosportive (notably the annual ~170km Marmotte), it’s easy access means there are also many riders who climb the Alpe just to cross it off their bucket list of famous climbs, so their ride is only 21km long. And since 2 peculiarities of Alpe are a) the steepest grades are at the base of the climb instead of the top; b) it has southern exposure, this means your average bucket-lister riding in the morning will likely ride those steep grades a lot faster than your average long-distance rider who would be hitting those grades in the heat of the afternoon.

    So the additional features to analyze could be distance cycled prior to climb, total ride distance, and especially temperature at start of climb.

    Also, I wonder if avg cadence might be a better overall predictor of grade than speed?

    1. Hi Sarah, thanks for such a thoughtful comment. You clearly know far, far more about cycling than I. Each of your points is dead right as far as I can see.

      One detail I should have mentioned in the post was that I removed segment efforts that included obvious “stops”. I forget the exact logic but it was something like efforts that included 10 seconds of stationary position were discarded. So that would also have created a bit of a selection bias. Since you mention the Marmotte, it occurs to me that it should be possible to label each effort according to whether or not it was the end of a Marmotte or not. (Even just from the date.) It might be interesting to compare Marmotte vs. non-Marmotte results.

      Working off cadence data would be awesome but unfortunately was unavailable at least at the time I wrote this. I’d imagine that has changed.

      Anyway, thanks again and best wishes.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s