Imagine you have two libraries containing a different set of books and layout. In our analogy, a book will be a file, and different books will have a different number of pages which is the file size. If you want to copy a set of books from the first library to second, you could basically do the following in sequence for each book in your list.
- Find the book in the source library (1 minute).
- Make a copy of it (100 pages/minute).
- Store the book in the destination library and update the index cards (2 minutes).
Depending on the size and number of books the total time to copy can vary drastically, the first example is for a reference 100 pages, and the final three show different times for 1000 pages of copying.
|Books||Time To Find||Time To Copy||Time To Store||Total Time|
|one 100 page book||1 minute||1 minute||2 minutes||4 minutes|
|one 1,000 page book||1 minute||10 minutes||2 minutes||13 minutes|
|two 500 page books||2 minutes||10 minutes||4 minutes||16 minutes|
|ten 100 page books||10 minutes||10 minutes||20 minutes||40 minutes|
We can measure our progress using two different rates, pages per minute and books per minute (bytes per minute and files per minute), that would give the following data
|Books||Pages Per Minute (ppm)||Books Per Minute|
|one 100 page book||100 pages / 4 minutes = 25||1 book / 4 minutes = 0.25|
|one 1,000 page book||1000 pages / 13 minutes = ~77||1 book / 13 minutes = 0.08|
|two 500 page books||1000 pages / 16 minutes = ~63||2 books / 16 minutes = 0.13|
|ten 100 page books||1000 pages / 40 minutes = 25||10 books / 40 minutes = 0.25|
Now if you give a list of books that vary in size then as the copying proceeds, how fast it appears to be going changes. If I’m copying a 100 page book and a 1000 page book, it looks like it’s going 25 ppm for the first 4 minutes and then jumps up to 77 ppm for the next 12.
On top of all this, both hardware and software designers are driven by the desire to make things faster, so they place various optimizations in various spots to try and speed it up. Some examples with the above analogy are:
- If the set of books you are copying are all next to each other on a shelf maybe you grab two books instead of 1, even if you aren’t sure you need it. If you need that book then that saved you a minute, if you don’t it doesn’t matter you just get the next one in your list.
- Another optimization might be that instead of actually storing the book in the destination library, you give it to someone else who does it if they are free. So if I try and copy two-50 page books, the first book takes simply 1.5 minutes (1 minute to retrieve, 0.5 minutes to copy, and then you give it to that person in 0 seconds). The second book (which is the same 50 pages ) takes 2 minutes (1 minute to retrieve, 0.5 minutes to copy and then waiting 30 seconds for the person to free up). This means that the ppm is 33 ppm for 1.5 minutes, and then it drops to 25 ppm for the next two.
All of these make predicting how long it will actually take very difficult, because your computer doesn’t actually know how long it will take, the estimates you see are just guesses and they are often notoriously bad guesses.
So your computer doesn’t copy “byte by byte”, it goes through a process for each file finding where the data is, and then copying/saving it. Each of these takes time and only part of this time of this depends on the actual data, the rest depends on the number of files. Indeed there are ways of making copies of USB drives “byte by byte” which you can do when backing up, they are faster but then all the data is smooshed together and you need to sort it out later for it to be useful.