Random Sampling at the Command Line
When you receive a large dataset to analyze, you’d probably want to take a look at the data before fitting any models on it. But what if the dataset is too big to fit into the memory? One way to deal with this is to take much smaller random samples from the dataset. You’ll be able to have some rough ideas about what’s going on. (Of course you cannot get global maximum or minimum through this approach, but that kind of statistics can be easily obtained in linear time with minimal memory requirements) ...