The Academic Placement Data and Analysis project (APDA) hopes to release program specific placement rates in the next week or two (before April 15th). These placement rates compare placement data to graduation data, so good graduation data are crucial. Yet, finding consistent graduation data is surprisingly tricky. The project currently uses the following external sources:
--Review of Metaphysics' "Survey of Graduate Programs: Doctoral Dissertations" 2011-2015
--Survey of Earned Doctorates' "Philosophy by Institution (1973-2014)"
--PhilJobs: Appointments' Data Feed, using Graduation Statistics
--American Philosophical Association 2015 Guide to Graduate Programs in Philosophy
We gather data from multiple sources because each data set is incomplete, and for different reasons. For instance, the Survey of Earned Doctorates gathers data from programs in the United States alone, while the American Philosophical Association collects data from programs in the United States and Canada. Since the Review of Metaphysics publication supplied names we were able to integrate this information into APDA. For the other three sources we compiled the number of graduates for the years 2012-2015 into a single spreadsheet, assuming the later of the two years when a range was provided (e.g. 2011-2012). If I remove the programs that had missing data from all three remaining sources (SED, PhilJobs, APA) then we have data on 105 universities. How do these sources compare to one another and to the data contained in APDA?
Note that in the below I treat all 0's as missing data and that I combined the numbers for two departments at Pittsburgh, UC Irvine, and Indiana due to the fact that the external sources did not consistently separate these departments.
- The most complete source for these 105 universities (in terms of universities covered) is APDA. It is missing data for between 5 and 9 universities between 2012 and 2014, and 23 universities in 2015. This is likely because of missing graduation years in our data, and not because we do not have placement records for those universities in that time period. Moreover, APDA has data on an additional 21 programs for this time period, not included here.
- The next most complete source is SED. It is missing data for between 13 and 14 of the universities for 2012-2014, with no data for 2015.
- The third most complete source is the APA, which is missing data for between 23 and 39 of the universities for 2012-2014, with 93 missing data in 2015.
- PhilJobs is missing data for between 65 and 70 of the universities for all four years.
- Since there is no canonical source for graduation data, the most we can do here is compare across sources. I did this for the three external sources and found that the best fit is between the APA and SED data, which had a median difference of 0 graduates across all four years.
- For all the cases in which more than one of the three sources had data in this four year period, there were 66 where two or more sources agreed and 58 where two or more sources disagreed by 2 or more graduates in a given year.
- If we include the missing values, these three datasets are significantly different (all p values below .05, not correcting for multiple comparisons).
Given these issues in completeness and accuracy, there are two ways of combining the graduation data that I find reasonable.
One possibility is to assume the highest number of graduates in any given year. The reasoning behind this is that none of the sources is likely to overestimate the number of graduates: the SED only claims to capture data on around 85% of graduates, and so is likely to have numbers that are lower than the actual number of graduates, whereas APA and PhilJobs data are supplied by department chairs and placement officers. APDA and Review of Metaphysics use names, which allows for greater accuracy, and we removed all duplicates from the combined dataset.
The other possibility is to assume the mean number of graduates in any given year (excluding 0 values unless all values are 0). An advantage of this option is that it may smooth out any errors in the sources. Yet, a problem with this option is that it likely underestimates graduates. This is because both SED and APDA are known to be missing graduates, so averaging over them will mean lower total numbers.
For this reason, I am leaning toward the former approach, but I would love to have feedback on this! A csv file with the data is here.