Similarity Filtering

Similarity filtering is the process my program uses to select levels for each gallery.  Typically the level generator generates 12,000 levels and my program selects the top 60 via similarity filtering. 

The Similarity Maps page visually illustrates the core procedure:  comparing levels to each other.  Each comparison results in a similarity score for the two levels.  This is only one part of the formula used to select levels.  

Three Parts to the Formula

This is how each level is graded using similarity filtering.  The score is dependent on the group of levels it is compared against.  

  1. Average similarity score – this is what the similarity maps illustrate, both for a 1-to-1 comparison and a 1-to-many comparison.  
  2. Maximum similarity score – the reasoning here is that even if a level scores well on average, if it is very similar to another level, then it’s not particularly unique.  
  3. Line uniqueness – a separate calculation that is not always used in the formula detailed below.  The idea is to give a bonus to individual lines that are unusual.  

Weighting Lines

An adjustment to the basic formula illustrated on the similarity maps page is line weighting.  Remember that the similarity score for two levels is calculated on a cell by cell basic – the divisor is always the total number of cells in the grid.  

Line weighting attempts to address the following two issues: 

  • A single long, unusual line can make a level score well, even if most of the rest of the lines are common.
  • Conversely, lots of very short lines – which usually don’t score well, don’t have much effect on a level’s score. 

Line weighting calculates a score for each line.  The score for each line is multiplied by the square root of the length of the lines.  These are summed and then divided by the sum of the square roots of the line lengths.  For example – a line that is 4 cells long receives a weight of 2, while a line that is 100 cells long receives a weight of 10.  

Line Uniqueness

While the first two components to the formula are based on the same calculations, line uniqueness is a separate calculation.  It is calculated by comparing each line in a level to every line in every other level to find the line that contains the most cells of the original line.  The number of cells that this line contains is that lines uniqueness score.  

It is even more important to adjust this score with line weighting – otherwise single long lines are scored much too highly.  

Combining the Three Parts

The weights used for each component of the formula are based on the standard deviations of each score such that each component is relatively equal.  

For most stages of similarity filtering – line uniqueness is excluded from the formula.  More on that later.

Scoring Large Sets of Levels

The program starts with 12,000 levels.  If I were to compare each level against the other, this would involve about 72 million comparisons.  That might be feasible if it was a simple calculation – but it’s not.  Every line is compared against every other line – and there can be over 200 lines in a single level.  

If I split the 12,000 levels in 2 groups of 6000 each and run the comparison on these groups, the total comparisons drops in half.   

My similarity filtering goes much further – starting out with a group size of just 10 levels.  That’s 45 comparisons each for 1,200 groups – a total of just 54,000 comparisons, more than 1000 times less than we started with.  It still takes my PC about 3 hours to run the comparisons at a grid size of 100×100, though that is a vast improvement over the half year it would take if I didn’t split them up.  

Accounting For Luck 

A level’s similarity score is dependent on the levels it is compared against.  Thus, a level can get “lucky” or “unlucky” depending on which group of 10 levels it is paired against. To account for this similarity filtering is an iterative process, with 50% of the levels moving on to the next iteration. 

Scores from previous iterations are stored and used to calculate a weighted score in subsequent iterations – the most recent round is weighted highest, and each subsequent round is given a weight of 60% of the previous.  Thus, the current round is always worth at least 40% of the total score. 

Another adjustment for luck that I made for earlier sets is that for each round I ran two sets of comparisons with the groups randomly selected for each.  This of course doubled the time it took – and I found from testing that the improvement in accuracy was pretty marginal.  

Adjustments for Speed

Two of the major adjustments I have made for the sake of speed are explained above:  separating the levels into groups of 10, and running only a single set of comparisons per iteration as opposed to two.  

The next major adjustment I have made is that the earlier rounds of similarity filtering do not account for rotation.  This results in 1/4 the number of comparisons – and thus speeds it up by nearly a factor of 4.  

Three (or four) Stages

As each iteration reduces the population by 50% – the adjustments made for speed become less and less beneficial in terms of overall time gained.  Thus, there are two separate points where I change how the scores are calculated, resulting in three stages of filtering.  

  1. Make all of the above adjustments until the population size is reduced to 1000 or fewer levels.  This will take 4 iterations at a starting size of 12000 levels. 
  2. Start comparing rotations of each level.  This stage reduces the population to exactly 200 levels.  The last iteration will select >50% of the levels 99% of the time.  Unfortunately, if it started with say, 804 levels, it would reduce the levels to 402, 201, and… 200.  Fortunately I am good at math so I know which starting populations might cause this sort of inefficiency. 
  3. The last stage is technically two stages: 
    1. Group size is increased to 25 levels.  Line Uniqueness is added to the formula (an additional calculation that requires significant time).  This is run for two iterations, reducing the population to 50 levels. 
    2. The final 50 levels are compared in a single group.  Line uniqueness is removed from the formula for this group.  The top 30 levels are the “winners.” 

Before 60×60, I only generated images for the top 30 levels instead of 60.  Thus, it isn’t strictly necessary to go past 50 levels, but I do it for informational value.  The order of the levels is randomized before I see any images, so I have no idea which levels the program rated in the top 30 or not.  

Line Uniqueness – Again

Line uniqueness isn’t calculated until the population set is down to 200 levels.  It then is calculated for two iterations, and then not calculated the final iteration.   The calculation itself is separate, and roughly doubles the amount of time it takes to evaluate the levels. 

However – this seemingly odd adjustment has nothing to do with speed and everything to do with greed:  similarity score is reciprocal, line uniqueness is not.  Levels that get a high line uniqueness score tend to be “greedy” – they score well at the expense of other levels.  

Similarity filtering was designed to work with starting populations of any size.  Levels that survive each iteration always score better on average the next round, regardless of whether it’s the first iteration or the 50th.  Including line uniqueness in the formula would result in a reversal of this pattern at some point.  

Still, I think it generally selects levels that are on average better, but it was strictly by design to use this calculation only once the levels dropped to 200 – regardless of the starting level count. 

Why 200 and not, say, 1.67%?   Because 1.67% wouldn’t work with a population of 10 billion.  

Consistency

Despite all the randomness – the results are fairly consistent.  If I run the process twice, on average, over 75% of the levels chosen will match.  The percentage drops only marginally with population size:  choosing just 30 levels out of 100,000 levels, the match rate is still above 2/3rds.