Updating the algorithm for service grades -- thoughts?

Hello!

I’m a software developer for Phoenix and lately we’ve been experimenting with new formulas for the algorithm used to determine service grades. I say “algorithm,” but it’s really a simple list of conditions, plus a balance calculation based on the number of good, bad, and blocker points for each service.

We decided to make some changes to it after noticing that the majority of services on the tosdr.org homepage were rated E, leading us to suspect that the algorithm was dumping services into E, and that we needed to adjust the conditions in the algorithm to account for more complexity. Later, we found out that the algorithm was in fact dumping services into B, with the grade C underrepresented.

A couple of weeks ago, we deployed a different algorithm that we thought would correct this problem, but services are now being dumped into C, which makes this newer algorithm dis-satisfactory, as well.

Now, we’ve developed a new algorithm that we think provides balance, but we want to consult with the broader community to see if you guys had any feedback.

This was the original algorithm:

def perform_calculation
    points = self.points
    classification_counts = service_point_classifications_count(points)
    balance = calculate_balance(classification_counts)
    balance
  end

  def service_point_classifications_count(points)
    approved_points = points.select { |p| p.status == 'approved' && !p.case.nil? }
    total_ratings = approved_points.map { |p| p.case.classification }
    counts = Hash.new 0
    total_ratings.each { |rating| counts[rating] += 1 }
    counts
  end

  def calculate_balance(counts)
    num_bad = counts['bad']
    num_blocker = counts['blocker']
    num_good = counts['good']

    balance = num_good - num_bad - 3 * num_blocker
    balance

    if (num_blocker + num_bad + num_good == 0)
      return "N/A"
    elsif (balance < -10)
      return "E"
    elsif (num_blocker > 0)
      return "D"
    elsif (balance < -4)
      return "C"
    elsif (num_bad > 0)
      return "B"
    else
      return "A"
    end
  end

With the original algorithm, this was the grading breakdown for comprehensively reviewed services (i.e., the services displayed on tosdr.org):

{"A"=>51, "B"=>159, "C"=>57, "D"=>104, "E"=>104, "N/A"=>2}

This is the dis-satisfactory algorithm deployed a couple of weeks ago:

def perform_calculation
    counts = determine_counts
    balance = determine_balance(counts)
    calculate_grade(counts, balance)
  end

  def determine_counts
    total_ratings = approved_points.map { |p| p.case.classification }
    counts = Hash.new 0
    total_ratings.each { |rating| counts[rating] += 1 }
    counts
  end

  def determine_balance(counts)
    num_bad = counts['bad']
    num_blocker = counts['blocker']
    num_good = counts['good']

    (num_good * 3) - num_bad - (num_blocker * 3)
  end

  def calculate_grade(counts, balance)
    if (counts['blocker'] + counts['bad'] + counts['good']).zero?
      'N/A'
    elsif balance < -13 || counts['blocker'] > counts['good']
      'E'
    elsif counts['blocker'] >= 3
      'D'
    elsif balance < -4 || (counts['bad'] >= counts['good'])
      'C'
    elsif counts['bad'].positive? && (counts['bad'] < counts['good'])
      'B'
    else
      'A'
    end
  end

Here, with this algorithm, is the grading breakdown for comprehensively reviewed services (i.e., the services displayed on tosdr.org):

{"A"=>52, "B"=>139, "C"=>250, "D"=>26, "E"=>8, "N/A"=>2}

This is the new algorithm that we’re testing, but that has not yet been deployed:

def perform_calculation
    counts = determine_counts
    balance = determine_balance(counts)
    calculate_grade(counts, balance)
  end

  def determine_counts_test
    total_ratings = approved_points.map { |p| p.case.classification }
    counts = Hash.new 0
    total_ratings.each { |rating| counts[rating] += 1 }
    counts
  end

  def determine_balance_test(counts)
    num_bad = counts['bad']
    num_blocker = counts['blocker']
    num_good = counts['good']

    num_good - num_bad - (num_blocker * 3)
  end

  def calculate_grade_test(counts, balance)
    if (counts['blocker'] + counts['bad'] + counts['good']).zero?
      'N/A'
    elsif balance <= -10 || counts['blocker'] > counts['good']
      'E'
    elsif counts['blocker'] >= 3 || counts['bad'] > counts['good']
      'D'
    elsif balance < 5
      'C'
    elsif counts['bad'] > 0
      'B'
    else
      'A'
    end
  end

This is the grading breakdown:

{"A"=>40, "B"=>73, "C"=>95, "D"=>141, "E"=>126, "N/A"=>2}

The new algorithm essentially retains the original balance calculation, adjusts the thresholds for the grades based on the balance, and attempts to factor in the ratios of the point types to each other for each service (i.e., amount of good points, vs. bad, vs. blocker).

We’ll leave this open to feedback for one week.

Thanks, everybody!

3 Likes

i think you should deploy it on a beta site.

Sure! We’ll get it on the staging site beforehand. Should have done that with the current iteration, so trying to improve our processes this time around

1 Like

what is the staging site?

Hi, sorry for the late response! Phoenix has a staging site at https://edit.staging.tosdr.org/ . It’s just a copy of the production data in a type of “sandbox,” so we can play with new features before they go live.

1 Like

tor has a very short, put only positive policy, this seams a bit unfair imo. Short policies usually the fairest anyways.

3 Likes

Could we get the code for the determine_balance function? It seems broken that it would return a balance of less than 5 for a service that only has positive points

1 Like
1 Like

So, what are the limits we expect determine_balance to return? Any whole number, from -infinity to infinity?

Perhaps something like this could be added:

  def determine_balance_test(counts)
    num_bad = counts['bad']
    num_blocker = counts['blocker']
    num_good = counts['good']

    if (num_bad == 0 && num_blocker == 0)
        return 10
    else
        return num_good - num_bad - (num_blocker * 3)
  end