How We Matched Thousands of NYC School Records for Our School Mental Health Tool
Publishing data about 1,575 of NYC schools meant combining data from two city agencies. Here’s how we did it.
As part of an effort by the City Council to understand and limit the scope of police intervention in the education system, the NYPD has been required to keep track of 911 calls from public schools since 2016.
While most 911 calls originating from schools are for medical emergencies or in cases of crime or danger, a substantial number of calls are made to seek intervention for children in emotional distress. A 2014 settlement in a lawsuit brought by parents was supposed to stop these so-called “child-in-crisis” calls. But we found that many schools are still calling the police when dealing with kids who are having tantrums and outbursts. In fact, the police responded to 2,655 such “child in crisis” calls in 2022.
We built an interactive tool that lets New Yorkers see which schools are still calling 911 in situations where a kid is emotionally distressed. Our tool also lets people see the presence of social workers, school psychologists, and other mental health support systems in each school.
We tracked dozens of data points on mental health support in NYC public schools. Do they call 911 on students in crisis? Are there enough social workers and guidance counselors? Use our lookup tool and enter your school name below to find out.
Although the NYPD breaks calls down by school (as well as police precinct), we faced a significant obstacle to including its school-specific “child in crisis” data in our tool. While the Department of Education uses a unique code to identify each school, the police department has no standard identifier or naming convention for schools. Dealing with the lack of consistent naming across city agencies, as well as the NYPD data’s own idiosyncrasies, was a challenge we had to overcome.
The Naming of Schools Is a Difficult Matter
The DOE assigns each school a unique 6-character code called a DBN, short for District-Borough-Number. DBN is the quickest and most reliable way to combine the datasets published by the DOE. It’s a standard code used both inside the school system and by researchers and journalists doing analysis. The NYPD’s 911 call data uses its own naming custom, neither DBN nor the “official” school name.
For example, what the DOE calls “J.H.S. 014 Shell Bank” (DBN number 22K014), the NYPD calls “IS 14”. This can be especially confusing for data about 911 calls from buildings housing multiple schools or schools split between multiple buildings.
Take Grand Street Campus in Brooklyn, which accounted for 32 “child in crisis” calls between 2016 and 2022. We could not pinpoint which school within that building accounted for each call since it houses three schools.
We spent weeks overcoming two thorny problems: Cases in which the names were partially intelligible and ones in which the NYPD referenced building names but did not identify a specific school. In the former case, we were able to find the correct school by doing research, but in the latter case, we were typically not able to assign calls to a specific school.
Overall, we matched 1,689 of the 1,995 unique school campus names in the NYPD data through computation as well as manual searches. The analysis in our story and the citywide number on the main page of our tool both include all “child in crisis calls”, including ones for which we could not pinpoint a given school.
Easy Matches, Then Harder Matches
We started by running a quick match between the Department of Education’s enrollment datasets, guidance counselor reports, and a school safety dataset from 2016. The effort yielded a match of only a few hundred names, nowhere close to our goal. We wrote Python code that “fuzzied” the match. Our code removed symbols and whitespace, and expanded abbreviations — so “HS” became “High School.” We also created a column that provided the borough of each school campus. Doing this expanded our match to more than 1,000 school names. The final step involved combining the names that partially matched (a word or two) to the DOE school names. At the end of this process, we found school numbers for 1,442 schools — leaving 553 unmatched names.
Matching the remaining names was an uphill task because a significant number of the calls were associated with the names of buildings that house multiple schools without listing any details about which school in the building was responsible for placing the call. New York City schools are sometimes “co-located” with other schools in large buildings that bear the name of the single, large school that used to be there.
While we couldn’t match all of the remaining names, we were able to successfully identify a subset using clues in the remaining data.
For example, consider a school the NYPD calls “IS/PS 298Q.” We used tools such as DOE Buildings and Contacts and Find a School to confirm that the school number is “Q298”, derived from the numbers mentioned in the school campus name. In cases where the codes we generated didn’t match with any school numbers, we matched them with the building code.
Using techniques like this we were slowly able to match another 240 school names.
In addition to the building vs. school naming and colocation complications, there are also cases where a single school is split across more than one building. In these cases, our “child in crisis” number in the tool shows a combined number for all the buildings.
In all, THE CITY was able to get accurate names of 1,689 schools. Some schools didn’t have DOE data for the 2021-22 school year, because they moved or closed. In all, we were able to include CIC data for 1,342 schools comprising 11,492 incidents since 2016. Because we couldn’t pinpoint a school that’s still open, our tool doesn’t include location data for 4,416 CIC calls.
During our data cleaning work, we consulted with Sarah Part, a senior policy analyst at Advocates for Children of New York, to make sense of this data.
In the hopes that the work we did to match these two data sets could be useful to other researchers — and so people can let us know if we got something wrong — we’re publishing our crosswalk between the NYPD school names and the DOE DBN identifiers. We’re also publishing a list of schools in the NYPD data for which we could not find a match.
If you spot problems with the data, you can send us a pull request or issue on Github, or email us at firstname.lastname@example.org.