Paco Ŝako is a chess variant where you can't kill any pieces. On the crowded board, you can't see the end coming. Or can you?

To help players review their games, the Paco Play website has provided replay analysis for a while now. It tells you which moves put a player in Ŝako and crucially, when they missed a chance to win. The complex chains in a game of Paco Ŝako are easy to overlook, but the replay analysis catches them all.

Users quickly began sharing these missed opportunities in the #puzzles channel of our Discord. We now have a stable supply of puzzles from played games keeping us practiced. Here is a Paco I missed and shared. Can you find the chain for white to win in a single move?

You can open this puzzle on pacoplay.com. Reload the page when you get stuck. If you can't find it don't worry. I didn't find it either and subsequently lost the game a few moves later 🙈

With "Paco in 1" sorted out, some users started posting "Paco in 2" in the #puzzles channel. This adds a new level of difficulty and can help us learn more about playing Paco Ŝako as a community. But these positions are hard to build and it is easy to miss a defense.

To help with this, the replay analysis will now show (missed) Paco in 2 when the game had one.

What is a Paco in 2?

To teach the computer about Paco in 2, I first had to precisely understand what that means. I had an intuition before, but you can't code an intuition.

Definition: A Paco Ŝako position is called "Paco White in 2" if and only if

there exists an "attack move" for White which puts Black in Ŝako such that for any "defense move" Black can respond with, Black remains in Ŝako anyway, and
there exists no "attack move" for White that directly wins the game.

Implicitly hidden here is also, that Black uniting with the White king would be a valid defense. And of course, the definition works the other way around for Paco Black in 2 as well.

To turn this into a program, I do the following (Checking Paco White in 2):

Get a list of all possible attack moves White can do.
- If any of them wins the game, that's Paco White in 1, not in 2.
For each attack move, look at all possible defense moves.
- If any of them gets the Black player out of Ŝako, discard the attack move and try the next attack move.
If there is no defense, store this attack move and look for additional attack moves. This means I get a list of attacks, not just a single one. Getting a list is important to understand if the replay contains a found Paco in 2 or a missed Paco in 2.

🐌 Oh no, it's Slow! 🐌

After I implemented it and played around with it in the UI, I saw that replays open extremely slowly now. Sometimes they don't open at all. Why is that?

In a complicated position with chains, there can be a few hundred possible moves that White can attack with. And for each one, Black may have a few hundred possible defense moves. The replay has to do that over and over again for all the moves in a game. That are a lot of positions for the Computer or Smartphone to look at! No wonder it is so slow.

To make analysis faster, I can't go by "this feels slow". I need some numbers to compare to each other. Let's build a Benchmark:

Loaded 2967 games in 326.602129ms
2333 games have at least 12 actions
Analyzed 2333 games in 524.385653535s

Plotting this into a histogram shows, that most of the time is used for analyzing a small minority of games.

I have added a few vertical lines here for different "percentiles":

50% of the games finish in under ~0.05 seconds. (p50)
90% of the games finish in ~0.3 seconds. (p90)
Only 5% of the games take longer than ~0.7 seconds. (p95)
An additional 33 games are taking more than 10 seconds each, some more than a minute, and some don't finish at all. I have excluded those 33 from my benchmark.

Getting faster

The first step in making this faster is to take off some guardrails. Our function to find Ŝako sequences was using both the traditional algorithm and the Amazon algorithm at the same time. This helped me identify and fix an issue with the Amazon algorithm, but now that it is working it just takes extra time. So let's go for the faster algorithm only:

Loaded 2967 games in 300.372085ms
2333 games have at least 12 actions
Analyzed 2333 games in 345.071743861s

We'll use this as a baseline.

As the first step, I replace vectors with arrays and place them into a separate struct. I also restrict access to use a trait.

// Old Version
struct DenseBoard {
    white: Vec<Option<PieceType>>,
    black: Vec<Option<PieceType>>,
    // ...
}

// New Version
struct DenseBoard {
    substrate: DenseSubstrate,
    // ...
}

struct DenseSubstrate {
    white: [Option<PieceType>; 64],
    black: [Option<PieceType>; 64],
}

trait Substrate {}
impl Substrate for DenseSubstrate {}

This saves some allocations and should also put the relevant data closer to each other in memory. This makes it easier for modern CPUs to work with it. Here are the updated benchmark results:

p50: 29ms (-9% over baseline)
p90: 150ms (-14% over baseline)
p95: 293ms (-13% over baseline)

We already get some performance benefits just from shuffling around the memory. But the main reason we are doing this is to separate the code of the Substrate from the rest of the game logic. (Naming is hard. Substrate isn't a great name. Unfortunately the name Board is already used for the board state + rules + other stuff in combination.)

Faster Hashing

Now we can implement "Zobrist Hashing" in Substrate. It's about time, given that I already mentioned Zobrist Hashing in July 2022 (Optimizing Paco Ŝako with Flamegraphs). And indeed, this gives us quite some performance!

p50: 9ms (-72% over baseline, -69% over last)
p90: 61ms (-65% over baseline, -59% over last)
p95: 128ms (-62% over baseline, -56% over last)

Re-running the benchmarks from the July 2022 article also shows improvements of over 50%. So things get faster across the board. Let's get a Flamegraph of our benchmark to understand where we can get some additional improvements! I'm basing this on games 1 through 100 only, because flamegraphs are slow to build.

cargo flamegraph --unit-test pacosako-tool-server -- test::paco_2_performance

The areas for improvement seem to be:

Finding place targets takes a lot of time when deciding if lifting a piece is allowed. That likely involves a lot of wasted work.
Finding place targets also takes a lot of time when we are looking to place a piece. That work isn't getting wasted, but maybe we can make it faster.
The Amazon algorithm for Ŝako detection uses a SetU32 . It can store numbers up to 4,294,967,295 while I only need to store numbers up to 64.
Wrangling hash maps takes about 13% of the time, and most of it is spent on reserve_rehash. I can probably reduce that by allocating a single hash map beforehand and reusing it.

Let's start with number 3. If you need a set that can store values between 0 and 63 you can use a single positive 64-bit integer called a "Bitboard". So far I have avoided working with Bitboards, as the Magic Bitboards for move generation are fast, but not intuitive. I still won't use any magic for now, but Bitboards still make great sets. Here are the new timings I get:

p50: 8ms (-75% over baseline, -11% over last)
p90: 53ms (-70% over baseline, -13% over last)
p95: 108ms (-68% over baseline, -16% over last)

BitBoards for Lists of Board Positions

Several methods return a Vec<BoardPosition> to indicate which moves are allowed in a situation. I have exchanged those with a BitBoard as well to save allocations. Turns out, this changes the order in which moves are evaluated.

---- regression_run stdout ----
Testing the whole regression database...
Regression in game 1
First difference in legal actions on index 2.
Action taken: Lift(e7)
Expected: [Place(e6), Place(e5)]
Actual: [Place(e5), Place(e6)]

My Ŝako detection algorithms now find different, but equally valid solutions to unite with the King. And my big regression suite of over three thousand games no longer matches what the engine produces exactly. We can solve this by sorting the expected actions and the actual actions. And after sorting them the tests pass again.

p50: 6ms (-81% over baseline, -25% over last)
p90: 39ms (-78% over baseline, -26% over last)
p95: 86ms (-75% over baseline, -20% over last)

Interning Boards to Reduce Cloning

Interning replaces an object (in our case a Paco Ŝako game board) with a number and then uses this number to refer to it. We can now hold on to two "copies" of the object just by remembering the number twice. I am using this when generating a list of all possible moves. This juggles a bunch of intermediate board states and builds a graph of them to avoid getting stuck in infinite chains.

I would have expected this to help in complicated situations with many loops, but it turns out this helps more in simple situations. We still get some performance in the complicated cases, so I'm keeping it in.

p50: 4ms (-88% over baseline, -33% over last)
p90: 35ms (-80% over baseline, -10% over last)
p95: 80ms (-76% over baseline, -7% over last)

I have also tried replacing the HashMap with a BTreeMap, but that didn't improve performance for my benchmark.

Avoiding Impact on the Players 🚧

I am quite happy that analysis is now fast enough that most replays open in a reasonable time. We still have outliers to consider. Smartphones are slower than my PC, the code runs slower in the Browser than in my benchmark suite and some of the "problematic" replays won't open on my phone, even after waiting for over five minutes.

To solve this, I have to cut the replay analysis code into pieces.

This makes the replay page load faster for everyone as they no longer wait for Ŝako determination before getting to see the page. While it is loading, I am showing a small loading bar. At this point, the replay is already fully operational and additional labels will flow in one by one. You can click around even before it is fully analyzed.

As normal for progress bars, this will get stuck and jump ahead instead of smoothly increasing. I don't have a way to predict how complex each position is going to be, so I'm okay with keeping it as it is.

When looking at your replays look out for found and missed Paco in two - and please post all interesting positions to #puzzles on Discord!

Paco Ŝako Mate in 2

What is a Paco in 2?

🐌 Oh no, it's Slow! 🐌

Getting faster

Faster Hashing

BitBoards for Lists of Board Positions

Interning Boards to Reduce Cloning

Avoiding Impact on the Players 🚧

Comments

More from this blog

PacoPlay Year in Review 2023

Postmortem on PacoPlay crash 2023-01-18

Speeding up Ŝako Detection with Amazons

Optimizing Paco Ŝako with Flamegraphs

Command Palette

What is a Paco in 2?

🐌 Oh no, it's Slow! 🐌

Getting faster

Faster Hashing

BitBoards for Lists of Board Positions

Interning Boards to Reduce Cloning

Avoiding Impact on the Players 🚧

Comments

More from this blog