I recently got a new laptop and during the process of setting up to my preferences, I install LaTeX through TeXlive. This means a massive download of many small packages that get included in the LaTeX install. In effect, this is how all software downloads go, many small parts that make up the whole. Installing TeXlive on Linux gave me the chance to actually see the report of the download, and of course to save it and plot it up after completion. Here is what the data output to the console looks like during install

:data

After 3 downloads, the installer makes a prediction of the total time, and then reports the elapsed time against predicted time, along with some information about the current download. If we take this information for all 3188 packages and parse it apart for the desired information, we can plot the actual time, versus predicted time, so see how the prediction performs over time.

timeseries

There are some pretty large swings in the predicted time at the beginning of the model, but by about 25% of the total download by size, the prediction becomes pretty stable, making only minor corrections. The corrections continue until the very end of the downloads.

Download time prediction is a really interesting problem to work on, since you are attempting to control for download speed which is largely dependent on things outside the realm of the personal computer and is likely to vary over timescales longer than a few minutes. I’ll be making a few posts about this topic over the next months, culminating with what I hope is a simple, fast, and accurate download time prediction algorithm. More to come!

Updated: