We selected probesets to use as the basis of discriminating between cell types by screening for those that offered the most significant differences between the several cells in which they were most highly expressed. In order to optimize the number of Butenafine hydrochloride markers selected, we computed the condition number of matrices of all sizes, from a handful of genes in one extreme, to the whole genome in the other. We observed that the optimal set size was 360 probesets, and we used this set to distinguish between different immune cell subsets and activation states in all subsequent analysis of blood samples. Figure 3 shows some examples of these probesets that discriminate between cell types and are used in deconvolution. Most of these exemplify markers that are relatively specific for one or two cell types. The full collection of basis probesets and their expression levels in all cell types and states are in Table S1. We surveyed the distribution of these data by performing twodimensional hierarchical clustering and Cinoxacin visualized the results as a heatmap with distance-measure dendrograms, and found that the cells all appeared to have distinct expression signatures, to be separated reasonably well on the dendrogram, and to cluster near other samples that we expected to have relatively similar signatures. We examined quantitatively whether the eighteen cell types that we profiled are sufficiently distinct to be resolved by their expression signatures by performing singular value decomposition on the basis matrix and observing the values of the diagonal matrix. This method would yield values at the lower-right corner of the matrix near zero if some of the cells were inadequately different from each other; reassuringly, here the lowest value was 3702.301. Although this value is not considered to be near zero and thus not worrisome, it does represent the aspect of white blood cell biology that we had least successfully resolved, so we explored which cells caused it. We noted that the two memory B cell samples were the two samples that were most similar to each other and we hypothesized that they alone might be responsible for the low end of the SVD diagonal. When we tested this by removing the IgM memory population from the basis matrix and refactoring it we found that the diagonal very closely resembled the previous diagonal but with the lowest value missing.