Monday, January 6, 2014

How We Built Our HIV Crime Data Set

by Sergio Hernandez, Special to ProPublica

For our investigation into state laws that criminalize exposing others to HIV [1], we sifted through millions of law-enforcement records compiled from state court data, police reports and offender registries in 19 states. About 1,440 records, dating back as far as 1994, involved HIV laws.
We aimed to catalog as many cases involving HIV-specific laws as possible over the past 10 years.  While our dataset represents one of the largest indexes of such cases ever assembled, the national tally is surely higher, because data from some of the 19 states was incomplete, and at least 35 states have laws that specifically criminalize exposing another person to HIV.

How We Collected the Data

We started with approximately 200 records identified by the Sero Project [2], an HIV/AIDS advocacy group that campaigns against what it calls “HIV criminalization.” The Sero Project submitted public records requests to county prosecutors’ offices in states with HIV-specific criminal laws, requesting information about their cases. In some instances, prosecutors provided detailed information about their cases, including names, case numbers and outcome (conviction, acquittal, dismissal, etc.). Other times, they provided just an aggregate tally of HIV cases from their counties. Often, prosecutors were unable to provide any information about their HIV cases because they could not identify files by specific statute or offense.
When key information was missing from the Sero Project’s records, ProPublica followed up with the prosecutors’ offices to gather additional information. For 131 records across 28 counties, our requests for follow-up data went unanswered.
ProPublica independently obtained statewide data from state courts, corrections departments and other law enforcement agencies. A complete list of raw data sources is here [3].
For 180 records, we were able to determine the arresting agency. We contacted those police and sheriff’s departments to gather narrative reports and fill in missing data, such as offense date; suspects’ and victims’ names, ages, sexes, races, ethnicities and the method of alleged HIV exposure.
Out of our 180 requests for narrative reports, we obtained 81 reports, although many were heavily redacted. In some cases, agencies withheld records entirely. For example, in Boone County, Mo., officials would not provide records about a case because doing so would disclose the suspect’s HIV status and violate state privacy laws, they said. We received 23 denials and 76 of our requests for narrative reports went unanswered.
Altogether, we assembled a database of more than 1,400 records from 19 states in which accusations of HIV-specific crimes were made since 1994. For our story, we chose to focus on cases from the last 10 years.

Cleaning the Data

Where we were able, we identified and removed duplicate records by matching and comparing names, dates of birth, case numbers and unique identifiers assigned to individual cases or people by the original data source (for example, inmate number).
We also looked for and removed misclassified records, such as instances where the police report didn’t mention HIV but the prosecutor’s records showed a charge under an HIV statute, or where the data did not contain enough information about the offense or corresponding statute to confirm that it was HIV-related. For example, an offense coded “Ohio Rev. Code § 2903.11” would not be enough information to tell us whether the defendant was accused of causing “physical harm to another or to another’s unborn by means of a deadly weapon” (ORC § 2903.11(A)(2)) or having sex without disclosing their HIV infection (ORC § 2903.11(B)(1)).
We also added fields to group cases by alleged method of HIV exposure (for example, sexual contact, needle-sharing, spitting or biting) and final disposition (such as acquittal, conviction, dismissal or unknown).
Method of HIV exposure was determined using descriptions in primary sources (such as court documents or police reports) or the statute the defendant was accused of violating. In some states, certain statutes or their subsections correspond to specific behaviors (such as Tenn. Code Ann. § 39-13-109, where subsection 1 criminalizes “intimate contact with another,” while subsection 2 criminalizes the transfer or donation of “potentially infectious bodily fluid” and prostitution while HIV-positive exists as a separate section altogether: TCA § 39-13-516.)
In states where the laws are not specific about method of exposure (such as Mississippi, where Miss. Code Ann. § 97-27-14(1) criminalizes “exposure to human immunodeficiency virus” and does not distinguish among sex, needle-sharing, prostitution and blood or organ donation), method of HIV exposure was logged as “Unknown.”
Where we had enough information to do so, we grouped cases involving sexual contact, prostitution and sexual assault separately. We did the same with cases that involved spitting, biting, scratching or throwing bodily fluids, as well as similar cases where the victim was a peace officer.
For our analysis, we entered this data into a Microsoft Excel table and removed records dated before Jan. 1, 2003. This returned a total of 1,352 records.

Findings

Of the 1,352 records identified since 2003, we identified at least 541 convictions or guilty pleas pertaining to HIV-related charges. These cases involved at least 428 individual people. The remaining conviction records may include some repeat offenders or duplicate records that we were unable to identify because of missing data.
We found 179 records of unprosecuted arrests, acquittals, and dismissals; 31 records of pending cases and 601 records in which the final disposition was unknown. Of these 811 records, we were able to isolate 265 individual people who appeared in 461 records. Twenty-eight of these individuals were also represented in the list of convictions, meaning they were convicted for at least one HIV-related crime, but were also acquitted on other HIV charges or had charges that were dismissed, pending or for which the outcome was unknown. The remaining 350 records did not include enough data to sift out repeat offenders or duplicate records to come up with a tally of unique defendants.
ConvictionAcquittalDismissalPendingUnknownTotal
5418171316011,352
Of the 541 convictions, defendants’ sex data was available for 467 records. Men were the offenders in three-fourths of these records (n=352), while women made up a quarter of them (n=115).
Race data was available for 322 records. Offenders were reported as black or African American in nearly two-thirds of the records (n=186), while whites made up the rest of the records (n=136).
 BlackWhiteUnknownTotal
Female384433115
Male14892112352
Unknown7474
Total186136219541
Method of alleged HIV exposure was available for 194 records. Almost all of these cases — 94 percent — involved sexual exposure (n=182). Six of these involved sexual assault, 58 involved prostitution or soliciting a prostitute and 118 involved sexual or “intimate” contact, which could also have included sexual assaults and prostitution cases.
Other (Biting, spitting or throwing bodily fluids)Parenteral (Blood, organ, tissue or semen donation)Sexual or “intimate” contactSexual assaultProstitution or soliciting a prostituteUnknownTotal
93118658347541
We found the largest number of records of convictions in Georgia (n=120), followed by Florida (n=99), Missouri (n=68) and Ohio (n=59). The largest number of records of convictions we found occurred in 2004 (n=91), followed by 2009 (n=57), 2008 (n=56), and 2005 (n=55). We found 52 records of convictions in 2012 and in 2013 so far, although 2013 data may be incomplete as cases may have been adjudicated while we collected our data.

Limitations

Because of the incomplete data collection, and because different data sources captured cases at different stages of the criminal justice process (for example, arrest, prosecution or conviction), the overall group does not constitute a statistically random selection and should not be used to generalize our findings to a broader population.
Most states exclude sealed and expunged cases, and depending on the jurisdiction, our data may also be missing information about:
  • Cases in which people were not arrested or charged
  • Cases that local agencies did not report to state agencies
  • Cases that were misreported to state agencies
  • Offenders’ criminal histories
  • Concurrent offenses
  • Whether measures were taken to prevent HIV exposure or transmission
  • In a few instances, we learned of cases through secondary sources, such as news accounts, and were able to obtain case information that we would not have been able to obtain otherwise. For example, if a local agency did not report cases to its state, a news account may have provided information about a case in that jurisdiction, allowing us to obtain legal records. In all cases, our data is based on primary law enforcement records.

    Download the Data

    The data we used in this story is available for download. Using our data means you agree to our Terms of Use [4]. Please read them before you proceed.
    HIV Crime Data.csv [5] (238 KB)
    About the HIV Criminalization Data [6]
    Some personally identifiable information (e.g., names) has been omitted from this data. We have included some values — such as case numbers, the original data sources’ proprietary identifiers, offense dates, birth dates and jurisdictional information — to help you locate individual cases if necessary.
    If you have questions or need more information, email data@propublica.org [7].

    No comments: