The Investigation

Previous page:  The Program vs. The Grid

464 seconds after I pressed the button to start the program on a new intermediate step towards solving the rule set, the program unexpectedly created the first grid that solved the set.  162 seconds after that, the program produced a 2nd solution, as I didn’t look at my computer screen until about 915 seconds after I clicked the button to start the program (we have now left the fantasy land where The Program talks to me) and saw this:

0 in 920,000 seconds, 2 in 920 seconds.

The numbers:

  • The program had spent a total of about 920,000 seconds trying to solve this rule set without a single solution
  • Coincidentally, it took 920 seconds to create the first two complete grids that satisfy all of the rules.  Of course, these two grids share the first 256 cells (particularly a shame, since these cells cover all of the center, and are visually much more important than the remaining 256 cells).  

One caveat:  the time spent counts processing time on each individual thread, and I usually run it on 4 threads, so it had been running for only about 230,000 total seconds (which is still almost 3 days).  The 920 seconds in the bottom left does not count individual threads, so it had actually spent about an hour total processing time to create the 2 grids. 

While at first, I thought that it might have been a huge coincidence that it had made 2 grids in such a short time starting from just the 256th cell – but it wasn’t.  Later, I had the program run for a while with the same settings, and it created many more grids at roughly the same rate (~8 an hour on 4 threads). 

Why The Program’s Success Came As a Surprise

The fact that the program solved the rule set did not come as a surprise – I expected it would solve it eventually.  It was that it solved it from the 256 cell starting point – and so quickly – that surprised me.  I was fully expecting to let it run a day from this point, and find a handful of attempts that made it to the 505+ cell mark that I could try again from cell 360 or so.  

Why the success from the 256 cell starting point surprised me: 

  1. The program spends a lot more time on the last 100 or so cells than it does for the first 400 cells, let alone the first 256.  See the line chart above for an example. 
  2. At the time that the program solved the rule set, I was working on an earlier version of this page based on a grid from the previous rule set it had solved.  This is what that looked like at the 256 cell mark:
      
    Given all the white cells (all 4 colors remaining) and the numerous empty columns, it just doesn’t look like the program has done much (other than keep the cells white, which is something).  
  3. A predecessor to the predecessor (more of a proof of concept) had similar rules but with a 16×16, 256 cell grid.  While the program did solve this rule set, it took several hours to do so.  

The last two points combined made it particularly surprising – the image above is 

My program kept working on this rule set for more than a week afterwards, trying to find a new solution.   And since the 256 cell mark had worked once, I tried it with many other grids as the seed – all of which reached 490+ cells on the starting attempt (the original grid reached 492 cells).  None of them succeeded.  

By this time my program had spent 3,000,000 seconds – over a month – of processing time just on this rule set.  And only 1 unique grid.  

All of this led me to believe that for the vast majority of attempts it made – even those that reached 500+ cells out of 512 filled, a fatal mistake was made in the first 256 cells.  

This is a big part of the reason that the rule set is difficult to solve. 

The First 256 Cells

Here is the starting point used to finish the featured grid: 

 

I did some investigating to see if there was anything special about the grid at this stage.  Besides the fact that it is the only grid at 50% completion that has led to a complete solution, the answer is no. 

For this rule set, about half of the attempts my program makes reach the halfway point.  Of these, only about 1 percent make it to the three-quarters mark.  An analysis of the grids that reach the halfway point reveals that the strongest predictors of success (reaching the 3 quarter mark or beyond) are: 

  • the distribution of the remaining in the rows – a more even distribution (fewer gray rows) is better.  
  • column expansion – grids that stay within the central columns do better than those that expand towards the edges. 

Basically, you want the grid to look like this: 

and not like this: 

which should probably not come as a surprise.  

While the featured grid scores pretty well compared to the average grid by this measure, it does pretty average compared to grids that reach the three-quarters mark.  

Here are a some random grids at 256 cells for comparison. 

The featured grid is in there, and it is probably the only one of the 16 grids that is not already impossible by this stage.  

Where, I wonder, did the program go wrong for the grid in the bottom right corner?

Based on this, my conclusion is that the only thing special about the first 256 cells is that no mistake had been made yet.

Another thing that isn’t particularly special about the first 256 cells:  the 181st through 256th cells.  When I started my program at cell 180 instead of 256, it was also able to create grids, at only a slightly slower rate (about 5/hour vs 8/hour) than the 256 cell point. 

How Much Gray Is There Really?

The gray cells in this image show how many colors are available for each empty cell after 256 cells: 

 

Almost half of the empty cells are white, meaning  that all 4 colors are still available.  The light gray cells have 3 colors available, the dark gray cells have 2, and the black cell has just 1 available color.  

This however is only based on what the program knows at the time, and does not reflect how many colors are actually valid for each cell such that a solution is possible.   

My program generated 100 grids using the same 256 cells as the starting point.  Using these, I ran a script that counted how many colors appeared in each cell after cell 256.  I used this to create the following: 

the color keys are the same, but with a slightly different meaning: 

white – this cell was filled with all 4 colors in the 100 grids. 

light gray – only 3 of the colors were used to fill this cell.  

dark gray – only 2 colors. 

black same color for all 100 grids

This table compares the two: 

Colors 1st 2nd
1 1 51
2 40 80
3 102 111
4 113 14
Avg 3.28 2.34

The first image is essentially the best case scenario (most colors available), while the 2nd image is the worst case scenario.  

The actual answer lies in between, as the numbers continued to shift slightly from top to bottom in the table even as the grid count approached 100.  

However, I am pretty sure that the 2nd column is much closer to the truth than the 1st. 

The Hard Path

Finding the Original Mistake

The first full solution to the rule set came 3 days after the first 256 cells were assigned a color in another grid.  

The original grid reached a maximum of 492 cells filled before the program eventually gave up.  Here is what it looked like at that point: 

Note that at this moment, the black/magenta cells still have 1 valid color remaining – but the program was unable to fill them all with a color without causing a conflict.  

After one day of working on the rule set, there was one grid that reached 495 cells, and one other grid that reached 492 cells.  On the 2nd day, I had my program use the first 360 cells of the 492 cell attempt and it was able to get much closer to completion: 

The first successful grid came 2 days later, sharing the same first 256 cells as these.  

About two weeks after that, I set my program on infinite mode and gave it the full 492 cells as the starting point.  

In infinite mode, my program will continue to go backwards and change every single color assignment that it can, until it finds a solution.  If the program has infinite time, it will always find a solution if one exists to the rule set, regardless of whether it starts at cell 1 or cell 492.  

I did this not because I wanted a 101st solution that starts with the same 256 cells, but because I wanted to find out where the original attempt went wrong. 

And here it is, the first not already corrected mistake made by the program in this attempt, at cell 335:  

The selection of dark blue for the cell with the green border makes the solution impossible, without changing the color of any of the first 335 cells.  By changing this cell to yellow (red and purple would break other rules), the program was able to come up with a solution with all of the 335 cells in tact.  

The four distribution groups that the cell belongs to are highlighted in this image.  Blue had a much higher chance to be selected first for this cell because of the remaining color counts in the distribution groups: 

Group Blue Yellow
Row 6 3
Column 2 1
Block 3 1
C-Ring 6 7
Product 216 21
Weight 91.1% 8.9%

Other than the 88 cell central ring group, all of the groups have much more blue remaining – and changing the cell to yellow makes both the cell’s column and 4×4 block groups get a bit grayer: 

 

The Maze Analogy, Revisited

See the original maze analogy on The Rules.

The solution produced in infinite mode that started with these 335 cells was the basis for this analogy: 

I changed the numbers slightly for the maze analogy to make them more even.  Instead of the 200 rooms of the maze and 50 forks, the program actually had 178 cells and 57 decisions to make starting from cell 335.  

To find the mistake, it had to go backwards through each incorrect decision and find every “dead end” until it came back to cell 335.  There were over 106 million dead ends, and 830 million cell/color assignments before then. 

Unfortunately, the program made an even more costly mistake after it found this one, on cell 348.  There were a few other costly mistakes made even after it found that one: 

Cell # DEs Moves
394 583K 4.5M
379 71M 56M
354 6M 45M
350 75M 575M
348 173M 1.3B
335 106M 830M
Total 432M 3.3B

Note that while the program was running, I could “see” how far it had gone back, however even though it had gone back to cell 335 after about half a day, it was impossible to know if this was really a mistake or if one had been made earlier, until the solution was produced almost two days later.  

 And here it is: 

This grid has a claim to being the original solution to the rule set, since 78 of the cells were assigned a color 3 days before any other complete grid for the set.  

You can see this in different colors on the interactive viewer:  link.

An Even Harder Path

The above grid was only the 2nd solution to the rule set that my program created in infinite mode.  

The first solution it produced also shares the first 256 cells: 

blue and purple were switched for this image

viewer link

My program started working on this solution almost immediately after finishing the first grid from cell 256.  

It took even more moves – over 5 billion – and over 3 days of processing time to create it.  The first mistake was on cell 331 this time, though strangely, it took only 1 million moves to go back from cell 509 to cell 331, compared to 3 billion moves that it took to find the next mistake at cell 353.  

I had thought that the program had found the original mistake after it created this grid.  It wasn’t until a week later that I found out that I had mistakenly given it a different “seed”, that had itself started from cell 256 and then reached cell 509, as opposed to the 509 cell attempt that started from cell 360 (which would have produced the first grid above, or something very close to it).  

The Easy Paths

While these grids took 3.3 and 5.1 billion moves for my program to make, none of the other 99 grids it created took more than a million moves – and most of these took less than 50,000.  

I compared each of the 101 grids against the other 100 grids in the set, starting from cell 256 on. 

While none of the 101 grids were exactly the same, there were pairs of grids that had only 4 cells with a different color.  Here are some numbers:

  • Average # of different cells – 72.2, 5B – 109, 3.3B – 118
  • Average minimum # of different cells – 21.2, 5B – 75, 3.3B – 118
  • 5B moves – 109 avg, 75 min, 135 max
  • 3.3B moves – 118 avg, 73 min, 138 max

The 3.3B mover grid had the highest average difference of the 101 grids, and the 5B move grid ranked 5th.  

They ranked 5th and 2nd in the minimum difference – at least 73 cells for differed from any of the other 99 grids.  

There were 6 other grids that had a minimum 72-77 cell difference and 5 with between 38-54 cells.  

Most of the other 88 grids likely followed what I would call an “easy path,” the type of path that the program is biased towards finding these solutions which share a couple of properties:

  1. Relatively shorter dead ends
  2. No costly potential mistakes where the odds are heavily weighted against choosing the right color. 

The mistake on cell 335 in the 3.3B mover is an example of the 2nd – with an over 90% chance of the program going down a path that requires almost a billion moves until going back to cell 335.  

A Maze Runner, a Maze Maker, and Maze Cheater

I will use again the maze analogy to describe my program.  Each rule set that I give my program is essentially a maze with multiple solutions – some of which follow more difficult paths than others.  The catch is that it is not limited to 2 dimensions, and that each “fork” along the maze, branches out into a different dimension.  

A Maze Runner

The program runs the maze with a perfect memory, basically the way that a human would: 

It makes an educated guess at each fork – this is like knowing where the finish is to a maze, and choosing to go in the general direction of the finish when possible.  

When it hits a dead end, it goes back to the last fork and tries the other path, until it finds a dead end, and then goes back to the fork before that and so on.  

A Maze Maker

The program’s cell selection process – the process where it determines which cell to pick the next color for – essentially creates the maze that it has to solve.  This cell selection process is designed to create the maze with the least number of forks and shortest dead ends possible for a given rule set.  

It might seem counter-intuitive, but this process is probably even more important than the color selection process (the maze runner).  The color selection process has not changed since the original prototype that I wrote in about an hour, while I have put a lot more work into the cell selection process and seen massive improvements as a result.  

The cell selection process was written in very generalized way such that it can work with basically any type of distribution group or set, and other types of restrictions, or other.  For example: 

  • Pick a random group of cells, and tell the program that you want A of color 1, B of color 2, C of color 3, and D of color 4, E of color 5 (if there is a 5th color), and so on.  
  • Do this for as many groups of cells as you want. 
  • Add as many of the standard distribution sets as you want: row, column, block, central rectangle – with whatever dimensions you want.  

It should work just as well on this type of random distribution group as the 4 distribution sets that the featured grid follows.  However, it would be easy to make an impossible rule set in the manner described above. 

A Maze Cheater

This is the ugly part of the program.  

The maze runner and maze maker are designed to find a solution for any rule set with only 1 attempt in infinite.  If it isn’t stopped, it will eventually do this for any solvable rule set.  However, the length of time is dependent on how lucky it is with the mistakes that it makes.  

For example, the 492 cell attempt that led to the featured grid was one of about 1 million attempts made by the program on the first day.  

I know now, that if it had been on infinite mode for this attempt only, it would have taken 2 days to produce a solution.  

This was the best of the 1 million attempts – there were a couple of others that reached 490+ cells, but they did fare nearly as well when I started them from the 256 cells or any cell mark.  

Had the first mistake been on the last decision just before cell 335, it probably would have taken another week.  The time grows very quickly and exponentially.  A mistake before cell 256 would not be found in my lifetime, and probably not for some amount of time much longer than that.  

…..to be completed


Notes

Unfortunately, I didn’t quite finish this last page before moving onto another project, and then another.  

In the month or so I was writing this, I kept running my program on very similar, but more challenging rules sets.  The following two images show the new leaders in rule set difficulty that my program has solved:

Smaller Blocks 

This grid follows all of the rules of the featured grid, but the colors are evenly distributed per each 2 row by 4 column block (and thus each 4 row by 4 column block) – a total of 64 blocks compared to 32.  This also limits the blue and teal blocks to a column span of 4, which I personally did not like as much as the featured grid visually.  But it was a lot harder to solve, crossing the 2 million second processing time threshold to produce a single solution with a lot of manual intervention.  

Bigger

This grid follows all of the same rules as the featured grid, but is 20 rows by 40 columns – a total of 800 cells instead of 512 (and 110 distribution groups instead of 88).  

After adjustments to the process, it took about as long for my program to solve this set as the featured grid’s rule set.