Having all the pairs of spring images classified means I can now analyze them! In particular, I’m using these images to get a best estimate for the “start of spring” and the “end of spring”. These are metrics that are used in remote sensing — both using the PhenoCams to do automated greenness processing and by researchers using satellite imagery to understand earth’s vegetation.
Several weeks ago, I wrote about what images are in Season Spotter and how we will do the analysis. Read about it here, if you haven’t already.
Let’s look at some actual data from a camera called “canadaOA”, which is in Prince Albert National Park, Saskatchewan, Canada. First, here’s what the view looks like at this camera in spring:
Now here’s a graph showing the distribution of classifications you provided for this site in 2014 when image pairs were spaced one day apart:
The way to read this graph is as follows: Going from left to right, we have days during the spring, with some late winter and early summer days on either end to make sure we capture the full spring. Each green bar shows how many people classified the image that occurred second chronologically as the one having bigger or greener leaves. In other words, the green bars show confidence that spring changes were able to be seen by volunteers. The blue bars show how many people classified an image as “the images are the same” or said that the earlier image had bigger or greener leaves. In other words, blue bars show confidence that spring changes were NOT able to be seen between the two images. The red bars indicate the number of people who said at least one of the images was a bad image. If more than half of people said that there was a bad image, we don’t use any data from that pair. You can see, for example, that the longest red bar doesn’t have any green or blue bars above it. All the bars have been scaled so that the longest possible bar length means “everybody who saw this pair of images” and a bar half that long means “half of all people who saw this pair of images”.
The orange and purple dotted lines are the best guess “start of spring” and “end of spring” dates based on this data. In this case, the orange line is between May 21 and May 22, indicating that May 22 is the “start of spring”. And the purple line is between May 25 and May 26, indicating that May 25 is the “end of spring”.
Hmm, but it seems a bit odd that spring is only 4 days long. It generally takes longer than that for leaves to go from buds to leaves and then fully grow all the way out. Let’s look at the greenness curve for this site in 2014:
From the greenness curve, we see that start of spring should be around mid-May and that the end of spring isn’t until the very beginning of June. It looks like the May 21-25 period is the steepest part of the curve — where day-to-day change might be most obvious.
Let’s look next at the data from this same site and year where images were 3 days apart:
We see the same sort of pattern again, but now we have more confidence that things are still changing later on in the spring. Our estimate now is that we start to see change between May 19 and May 22, and that we stop seeing change between June 3 and June 6. This makes sense. It’s easier to see that something has changed three days apart than one day apart.
And if we look at the data from when images were 7 days apart, it looks like this:
Here, we start to see change between May 15 and May 22 and stop seeing change between June 1 and June 8. That seems pretty accurate based on the greenness curve. But those ranges are really big. We’d really like to know what day is the start of spring, not in what week it occurred.
We can get a day estimate from the week-apart images. To do so, we create a new dataset derived from the 7-day-apart one. We take the classifications from a pair of images 7 days apart, and consider those classifications valid for each day in that range. So each day consists of classifications from seven different image pairs (or fewer if they’re at the beginning or end of the time period we’re looking at). This also has the advantage of smoothing over pairs of images that were bad and those that were simply hard to tell. Our new smoothed dataset looks like this:
This dataset tells us that the start of spring is May 19 and the end of spring is June 5. This seems very reasonable when we look at the greenness time series.
I’ve done this same analysis for the seven sites and all 31 site-years that you made classifications for. And the same points seem to be true across them all:
- People have a hard time seeing differences in the leaves when paired images are only one day apart. (For some sites, people almost never see changes between one-day-apart images.)
- People are most able to see differences in the leaves when paired images are seven days apart.
- Using a smoothed dataset derived from the one where paired images are seven days apart seems to give good estimates for start of spring and end of spring.
The next thing for me to do is to measure the uncertainty in these estimates for start of spring and end of spring. And then I am going to compare our estimates from Season Spotter with some automated estimates done by running algorithms directly on the greenness curves. I’ll talk about these analyses in a future post.