Evaluating the role of pre-training dataset size and diversity on single-cell foundation model performance