Can you trust the FTP test to give correct threshold power?

A while back I presented my instagram followers with a training question:

If I recently completed a well-executed FTP (20 min) test, and I wanted to perform a threshold workout, could I then simply aim for 91-105% of FTP and get a perfect threshold session?

Following the question was a simple survey with the options:

In no way does my modest IG follower list represent an oracle. Yet, there are a good bunch of highly accomplished sports scientists, trainers and pro athletes in the mix.

Interestingly, all of the respondents I deem to be in the above category gave the same answer.

“No.” (66% concurred)

In other words, all the professional players indicated they would not follow the FTP guidelines to a tee.

This is an interesting result – of course begging the question why not?

I would suggest the following probable explanation.

As solid of a construct as the FTP concept is, strict adherence without regard for other intensity parameters comes with a potentially disastrous downside.

Let us review a few recent studies and discuss why that is.

What is FTP?

I am going to assume you are already familiar with the concept of Functional Threshold Power (FTP).

If not, here is a brief explanation.

Functional Threshold Power (FTP) represents an approach to estimate anaerobic threshold by power meter recordings.

“FTP is the highest power that a rider can maintain in a quasi-steady state for approximately one hour without fatiguing.” (1)

The concept was originally developed by Allen & Coggan and is a popular system upon which many cyclists and coaches base their intensity control.

While several testing protocols are practiced, the more common ones include a 60 minute all out test (60 min average watt = FTP) and a 20 min all out test (20 min average watt x 0.95 = FTP).

For more on FTP and power based zones, see Trainingpeaks.

The problem with big data and their models

Constructs like FTP, while being tremendously useful in many scenarios, are not without flaws.

To this day, I have yet to see the development of the FTP model accounted for. As such I cannot comment on the data underlying FTP as a construct.

However, models like this one are commonly arrived upon by the means of regression analysis. Nevermind the difficult name, as demonstrated below the concept is simple enough.

How models are usually developed, and why you need to know

Let us assume you perform a 60 min all-out test and a 20 min all-out test. The 60 min test yields an average power of 285 W. The 20 min test results in an average power of 300 W. The relationship between the 60 and 20 min average powers are 0.95 (285/300 = 0.95).

If we accept the premise that 60 minute all-out average power is your threshold power, the formula to calculate your threshold power from the 20 minute test would be:

20 min average power x 0.95 = threshold power

However, when creating models to be applied to the general population of athletes, we cannot rely on data from a single rider alone.

Let us assume we decide to test hundreds of cyclists in a similar manner. They would most likely display a range of different average power outputs. If we plot their 60 min average power against 20 min average power we would probably see somewhat of a trend.

If we were to draw a single line (trend line) to best represent all the results, this line would the give us our model for the relationship between 60 min power and 20 min power.

As you see in our made up chart, most rider results lie fairly close, but not precisely on the 0.95 line. As such, the model represents an attempt to squeeze the results of many riders into a single formula. And in the case of the FTP model, research suggests it does so very well.

Immediately, you should be able to spot a potential problem.

You are not the average of 100 riders.

You are a single subject whose results may fall anywhere on this chart.

Probability dictates your result will likely fall close to the 0.95 relationship. But you cannot know beforehand whether or not you are one of the outliers, or if your results fit perfectly with the model.

This leads us to a highly interesting question:

How likely are your results to adhere to the FTP model and the 0.95 relationship?

Interestingly, a few recent studies have shed light on this matter.

FTP tests compared to lactate profile test

Up until 2018, no study had ever compared the popular 20 min FTP test with traditional means of estimating anaerobic threshold.

That is until a study in International Journal of Sports Medicine did just that.

Functional threshold power in cyclists: Validity of the concept and physiological responses

In this study from 2018, the authors compared the results from three popular methods of estimating anaerobic threshold: 20 min FTP test, 60 min FTP test and a stepwise lactate profile test.

Who were the athletes?

23 male cyclists (mean age 32 yrs) with at least 2 years of experience from regional and national level racing participated in the study. Their mean VO2 max was recorded to 59.4.

How were they tested?

Over 3 weeks the riders performed 4 tests, each with at least 48 hours in between.

The tests consisted of a stepwise lactate profile test, a 20 minute all-out test, a 60 min all-out test and finally a maximal effort test on the resulting power from the FTP 20 test (that is 100% of FTP20). The objective during the final test was to maintain this power output for as long as possible.

Preceeding the 20 minute all-out test, riders used a warm-up protocol described by Allen & Coggan witch includes 3×1 minute at high cadence (low intensity) and a 5 minute all-out effort. Prior to the 60 minute all-out test, 10 minutes of self-selected warm-up was performed.

What did they find?

The results displayed a strong correlation between FTP-values derived from the 20 minute test (FTP20) and the average power from the 60 minute test (FTP60) and threshold power estimated by the lactate profile test (IAT).

The mean results were:

FTP20: 236W
FTP60: 231W
Lactate profile test: 237W

When performing the final test on the power output equal to FTP20, riders were able to maintain this intensity for an average of 50.9 minutes. This was within the expected window for sustained efforts at anaerobic threshold (45-60 min).

The authors conclude that these results support the FTP concept as a valid method of estimating threshold power. On a group level.

In other words, the results of this study apparently correlates well with the FTP model.

The estimated FTP values from the 20 min test (236W) were close to those of the stepwise lactate profile test (237W) and the 60 min test (231W).

The 60 min average power divided by the 20 min average power (231W / 248W) yields a relationship of 0.9315. In other words, fairly close to the accepted “rule” of FTP equalling 20 min average power x 0.95.

Yet, there is one problem with these results. And it is an important one.

Great individual differences

The authors highlight that their results correlate well with the FTP model on a group level.

However, when looking at results of individual riders, the situation changes.

…it is difficult to accept FTP as a thoroughly valid concept. We found large limits of agreement between most variables, suggesting a high level of interindividual variability in the relationship between FTP20 vs. FTP60 and between both measurements vs. IAT (me: stepwise lactate profile test).

– Borszcz et al. Int J Sports Med 2018

In other words, the average results fits well with the FTP model. But the results of individual riders often did not.

The above numbers illustrate that a rather great proportion of riders achieved quite different estimates of threshold power by the different test protocols.

FTP (60 min power) is not always 20 min power x 0.95

As previously stated, average power output for 20 minutes all-out x 0.95 is a fair estimate of threshold power. Which is considered synonymous to average power over 60 minutes all-out.

This study observed individual threshold power values differing by up to 45W between the FTP20 and FTP60 test. This for riders averaging power outputs in the range of 200-300W.

Example: If you test an FTP of 250W, a 30W difference between tests equates to a 12% margin of error. That is sufficient to move you into into a different intensity zone, depending on which test you base your training on. For a rider with FTP20 of 250W and FTP60 of 220W, the actual relationship between 60 and 20 min results are not 0.95. It is 0.84 (220/263 = 0.8365).

In summary, several of the included riders displayed a relationship between 60 min and 20 min average power far different to the “rule” of 0.95.

Discrepancies between 20 min FTP and lactate profile test

How then, did the 20 min FTP test fare when compared to the lactate profile test?

The differences for individual riders were significant here too.

11 of the 23 riders had a threshold power estimate from the 20 min test that differed by 30-50 watt from that of the lactate profile test. Only 5 of the 23 riders had a 20 min estimate and lactate profile estimate with a difference smaller than 10 watt.

Did one test yield higher results?

Interestingly, there seemed to be no consistency in which of the two FTP tests yielded the higher FTP estimate.

By visual inspection of the presented graphs in the original paper riders appear evenly distributed between testing a better result with the 20 min test and testing the higher value with the 60 minute test.

Reproducing the experiment with MLSS

Of note, the same lead author published a similar experiment in 2019 (3).

This compared the 20 min FTP test with a different lactate profile testing protocol – the Maximal Lactate Steady State (MLSS).

Again, the results point in a similar direction. The average results demonstrated solid agreement between tests.

For 14 of the trained (VO2 max 55-64.9 mL/kg/min) and well-trained (65-71 mL/kg/min) male cyclists, results differed by no more than 5%. A single rider displayed a difference of 10%.

What is the risk?

This entire discussion is relevant to our desire to become faster cyclists. In particular, to which tools we decide to use in controlling training intensity.

Getting your training intensity wrong may result in one of either two extremes:

Undershooting your training load
Overshooting your training load

One could argue that the former is less of an evil – if you train too little, you can simply up the ante and be back on track.

Whereas an overload of training stimulus may lead to overreaching, injuries and potentially long breaks in training.

Either way, the result is lack of progress and poor results.

If you have ever trained with a power meter, you will know that changing your power by 20, 30, 40 or 50 watt from an intensity close to anaerobic threshold will make a huge difference on intensity and effort.

At the very least, the above study should prompt riders to consider basing their intensity control on more parameters than FTP alone.

Limitations of study

One obvious limitation to this study was the number of athletes (23). Yet, the individual spread of results is of such a magnitude I suspect a larger subject number would be unlikely to alter the observation of individual differences between test results.

A second, and perhaps bigger limitation is the performance level of the included riders.

In endurance circles, a VO2 max of 59 places these subjects in the middle of the amateur pack, at best. As such, it is quite possible that these results are not representative of more highly trained cyclists.

A third issue is whether or not the riders were familiar with the testing protocols. To the best of my abilities, I cannot see if this was the case in the 2018 study by Borszcz. In the 2019 paper by the same author, which produced better agreements between tests, all riders were already familiar with the 20 min FTP test.

Take home message

The take-aways from this paper include the following:

1 | FTP appears to be a valid construct and estimate of anaerobic threshold whether using the 20 minute or 60 minute test.

2 | Estimates of anaerobic threshold power by the 20 minute FTP test and lactate profile testing may vary by up to 50 watt.

3 | The above results are derived from trained, but not highly trained riders. It may be that the FTP model is a better predictor of individual threshold power for more highly trained riders.

4 | Nevertheless, models like the FTP are usually derived on pooled results from a great number of riders. You have no guarantee that this model will be 100% representative of your physiology.

5 | Repeted testing to familiarize yourself with the test protocol may produce more accurate results.

6 | Nevertheless, it is probably a good idea to correlate your FTP values and training zones to heart rate values, feeling (RPE), lactate measurements, experienced recovery time and training adaptation.

Additional in-depth reading

Did you know that an additional study on FTP testing was published in 2018 by Valenzuela and colleagues (4)?

It provides some intriguing observations that shed further light on the FTP test conundrum.

In this experiment, 20 male riders underwent a 20 minute FTP test and a lactate profile test (MLSS). Interestingly, they observed that threshold power estimates from the 20 min FTP test correlated well with estimates from the lactate profile test for riders that were well-trained (average FTP 3.96 W/kg).

Here too, test results differed by approximately 0-30 W between the two tests in individual riders. In other words, not dissimilar to the Borszcz study, but with a somewhat greater agreement between tests.

Interestingly, for riders on a lower performance level (average FTP 2.93 W/kg) the 20 min FTP test yielded a lower threshold power estimate compared to the lactate profile test.

How do we explain this discrepancy?

One potential answer might be found in the warm-up protocol.

The all-out warm-up protocol

The 20 minute FTP test involves a warm-up protocol including a 5 minute all-out effort. This warm-up protocol was used in both of our discussed FTP studies.

Borszcz and colleagues measured blood lactate immediately following the Allen & Coggan warm-up protocol. Results varied from 3.6 up to 9.6 mmol/l/min (2). By comparison, the “normal” range of anaerobic threshold is usually considered from 2.5 to 4.0 mmol (anecdotally, variations may occur).

Not surprisingly, some athletes displayed lactate measure that we can only interpret as significantly above anaerobic threshold. This would be expected after a 5 minute all-out effort.

However, it would be natural to assume that a highly trained athlete would recover from such a state with less depletion of his/her “performance potential” when compared to a less highly trained rider.

Could it be that the FTP model has a less optimal fit for riders at more novice performance levels due to their ability (or lack thereof) to absorb and recover from the rigorous warm-up protocol? If so, this may carry implications for how less elite riders execute their FTP testing.

For well-trained riders, the all-out warm-up is part of the actual test

It is worthwhile highlighting that the strong correlation of pooled results for FTP20 tests and FTP60 tests were found when using the 5 min all-out warm-up protocol by Allen & Coggan.

In my experience, many riders are not to fond of this warm-up, because they feel it negatively affects their performance in the 20 minute test to come.

Which it probably does – however, it is in this somewhat depleted state that the relationship of 0.95 between 20 minute results and 60 minute results are being observed (in well-trained riders).

I would expect that by skipping the all-out 5 min protocol, the well-trained cyclists is fooling him or herself and quite possibly testing a FTP20 result higher than his or her actual threshold power.

Some food for thought.

References:

Allen H & Coggan A. Training and racing with a power meter. 2nd edition. Velopress 2010, Boulder, Colorado
Borszcz FK et al. Functional threshold power in cyclists: Validity of the concept and physiological responses. International Journal of Sports Medicine, 2018;39:737-742
Borszcz FK et al. Is the functional threshold power (FTP) interchangeable with the maximal lactate steady state in trained cyclists? International Journal of Sports Physiology and Performance, 2019;24:1-21
Valenzuela PL et al. Is the functional threshold power (FTP) a valid surrogate of the lactate threshold? International Journal of Sports Physiology and Performance, 2018;20:1-6

Additional menu