Evaluation Scores

The numerical roll-up score should only be used as part of a decision-making process that includes the rest of the evaluation. The number's best use is as an indicator of how much additional work a person will need to do to make an informed decision about a service. This use is directly related to the core work driving the evaluations: to help people make informed decisions about a service with less effort. The higher the number, the less effort required to make an informed and appropriate decision.

The numerical roll-up is not a grade, rating, indication of the overall quality of the service, or measure that has any use outside the context of the rest of the evaluation. We understand the temptation to misuse the roll-up in this way, but that is not an accurate or reliable use of the calculation.

The evaluations come with five numerical roll-up scores: one for each of our core categories (Safety, Privacy, Security, and Compliance) and a top-level roll-up score for the entire service. Our calculations emphasize transparency and disclosure within a vendor's terms, with additional points earned for disclosures in policies that protect student data. A current snapshot of evaluated services shows an overall numerical roll-up score range from a minimum of 11 to a maximum of 75, with a mean of 44/100 and a standard deviation of 13. With an overall score mean of 44/100, most applications and services evaluated have significant deficiencies in both transparency and qualitatively better disclosures for privacy and security practices. With overall scores upper bound by transparency, our analysis determined an overall lack of transparency and lack of qualitatively better disclosures across all evaluation questions contributed heavily to the lower mean. These numbers will fluctuate almost daily as evaluations are added and updated, so any examples are best understood as snapshots in time.

Graph indicating the overall score distribuion histogram and normal curve

This chart illustrates the overall score distribution histogram and normal curve, with median (Q2) 43, lower quartile (Q1) 37, upper quartile (Q3) 52, lower whisker 15 (smallest datum within Q1 - 1.5 x IQR), and upper whisker 75 (largest datum within Q3 + 1.5 x IQR). Any outliers are denoted by circles outside the whiskers.

The privacy-evaluation process for an application or service is unique, because it produces two independent scores for transparency and quality, which are combined into an overall score. These two metrics allow for an objective comparison between applications and services based on how transparent their policies are in explaining their practices and the quality of those practices. Other privacy policy assessment tools have used machine learning or algorithmic qualitative keyword-based contextual methods that attempt to summarize a policys main issues. These machine-learning or keyword-based methods, such as the Usable Privacy Policy Project, have been found to produce reliable measures of transparency information about the key issues disclosed in an application or services policies. However, these methods are not able to capture substantive indicators that describe the meaning or quality of those disclosures. Therefore, our privacy-evaluation process was developed with this limitation in mind to incorporate both qualitative and quantitative assessment methods to capture the differential meaning of each privacy practice disclosed in a vendors policies with scores.

To explain how questions affect a numerical roll-up score, we will look at question 3.2.4 from our published list of questions:

Do the policies clearly indicate whether or not personal information is shared with third parties for advertising or marketing purposes?

At a high level, this question has three possible responses:

  1. The policies say nothing about whether or not personal information is shared with third parties for advertising or marketing purposes.
  2. The policies clearly indicate that personal information is shared with third parties for advertising or marketing purposes.
  3. The policies clearly indicate that personal information is not shared with third parties for advertising or marketing purposes.

The first option -- in calculating the numerical roll-up, not sharing any information brings the roll-up score down the most, because without any information, making a fully informed decision is not possible. The second option -- if a vendor clearly indicates that they do share personal information for marketing purposes -- still earns points toward the roll-up score, because the disclosure helps a person make an informed decision. Because we look at privacy through the lens of making an informed decision, we encourage transparency in policy disclosures as a needed tool to help people make informed decisions. The third option -- clearly specifying that personal information is not shared increases the roll-up score the most. If a vendor is transparent and discloses better practices that protect personal information, the score increases the most.

Our evaluation process and numerical roll-up scores also take into account whether or not a question is relevant. This radial graph shows the relationships and dependencies among questions. Not every question is relevant for every service, so we exclude irrelevant questions from our overall scoring.

While we recognize that calculating a numerical summary to a service invites misuse of the number, we are cautiously optimistic that people will use this numerical summary as intended: as one piece of information among many designed to help people make an informed and appropriate decision for their school or district.

Score Calculations:

The privacy evaluation process incorporates over 150 questions into a framework that produces three complementary scores that allow for the objective comparison between applications or services based on Transparency, Quality, and an Overall score. Only questions that are deemed pertinent or expected to be disclosed for the application or service, based on its intended use and context, are used in calculating each score. Questions are expected to be answered based on the intended use of the application or service and the applicable laws governing that intended use, as well as responses to other evaluation questions which is further explained in Mapping Compliance. Given the intended use of the application or service, not answering expected questions will negatively impact that application or service's transparency score and subsequent overall score.

For the evaluation process, "Transparency" is defined as a measure indicating, of the things we expect to know, what percentage are knowable. In addition, "Quality" is defined as a measure indicating, of those things we know about, does the vendor's disclosure about those practices protect student information, which is considered qualitatively better. To determine the overall score we weight the transparency score more heavily to encourage higher levels of transparency and then we subtract a fractional value of the qualitatively worse responses. This fractional value of the qualitative responses is used to ensure that answering a relatively small amount of questions does not disproportionately impact the overall score. In other words, since the quality score is reflective of only those questions that are transparent, we take into consideration that the qualitatively worse responses should only diminish the overall score by an amount reflective of how transparent the policies are. The calculation is then normalized to a 0-100 scale.

If you would like to learn more about our evaluation scores and see roll-up score examples, please download our 2018 State of EdTech Privacy Report.