Since we’re already collecting documents with crawlers, perhaps we could also take some other useful metrics from those that could be calculated automatically:
- how many times per year it is updated
- how many characters of text per year have changed (was the change substantial?)
- how long is it? Use some automated reading time estimates
- how many documents do you have to agree to? Track references between documents
- how difficult is it to read? Check the words used against some English word dataset to rank the use of difficult words
- does it have a per section summary with clear language to help people read the document quickly and understand it (this should also be a case with positive points for documents that do have it)
- does it have broken links to other documents you must agree to? (I remember Uber did some time ago)
- is the font size too small? Do the colors have enough contrast? (define a threshold) Does the document conform to W3C WCAG accessibility guidelines?
I’m sure people will come up with more ideas. Some of those metrics could be taken into account to calculate the service’s rating and also be displayed next to it. Like, e.g., rank: E, total estimated reading time (all documents): 12 hours. If it is too long it should probably be considered a blocking case.