

That's why you will see guides on "how to warm up an EBS volume" and the like (there are penalties on writing for the first time too). The delay you are seeing when loading from a snapshot is not cause by how indexes are laid out on disk, it's far more likely that you are seeing the delay because when you start an instance from a snapshot, the data is loaded only on first use, and will be significantly slower than subsequent uses - that is a basic limitation of using snapshots in this way and really has little to do with the application that is trying to access disk.

If I could be sure I was only going to need the most recent couple of datafiles I could prewarm these files by reading sequentially before starting mongod. I am only interested in a subset of the most recent N documents from a collection. This has become important to me as when starting a database from Amazon EBS snapshot, it seems there's a huge overhead for hitting these datafiles until the volume warms up. As Mongo is doing the maintenance for an index, does the whole index live in one extent until it outgrows it at which point it is relocated to the current (highest numbered datafile)?. since we're talking about a bTree, it doesn't seem possible/sensible to have this bTree scattered across files in the same way. However, I can't see how this could be true for indices. Therefore it seems sensible to assume that the most recently inserted data into a particular database will be in the highest numbered file (and my performance tests confirm this). 0 and within that file allocates extents that are contiguous regions that correspond to data for a particular collection or particular index.Īt such point as this datafile is filled, it creates a new file called. So let me first start the question with my understanding of how MongoDb stores data on disk: So when you create a database in mongodb, it allocates a large file named.
