ArnoHu

S_P_A_R_T wrote:
ArnoHu wrote:
S_P_A_R_T wrote:
ArnoHu wrote:
S_P_A_R_T wrote:
ArnoHu wrote:
birdracerthree wrote:
S_P_A_R_T wrote:
What NPS should WD be aiming for on S3?

Because right now, even though it's search depth can get pretty far, it's mostly because of the aggressive LMR. I'm working on an update to hopefully add specialized extensions to help out in this area, but at the end of the day, NPS / Move Gen Speed, is going to be more important.
I would guess double of what it is now to start. WD's NPS is already poor (<20k NPS on TW is not good, 100 NPS on S3 isn't any better), maybe you should look into the double move generation. That should not be necessary.

Update on mate eval TT bug : The incorrect TT value is coming from the same depth as the correct one. No more info is known.

Good question, as mentioned before, NPS must be taken with a grain of salt. The whole goal of pruning is to have as little nodes visited as necessary for reaching a high search depth. An engine with none or very limited pruning will have high NPS without getting to any significant search depth. When you compare NPS of professional engines, you know they have great pruning in place, so no need to discuss.

E.g. I just imported “r3kb1r/pp1n1ppp/q1p1p3/3pPn2/3P2P1/1QN2N2/PPPB1P1P/2KR3R b kq - 0 11” to our engines on TurboWarp. Scurious is known to have the highest NPS. It was done searching ply 5 in 3.2 seconds for that board, that is without quiescence. It was “only” running on 175k NPS here. GoK also had 175k NPS, but was done with ply 5 search including quiescence in 0.3 seconds. And that gap will widen with any additional ply. Element was done in 2.2 seconds at 85k NPS, WD in 0.8 seconds at 16k NPS (although I doubt our NPS calculation is equivalent).

If you refer to improving move generator, moves-per-second is an important metric. As a baseline I can tell you that GoK's core move generator, when running standalone (no other code executed, no staged generation, no hash moves, etc), will be at 800,000 generated moves per second for the start board. So MPS in the 100,000s would be a good starting point.

When it comes to pruning via good move ordering, I would look at the average move list index of the best move found. I think you have that already in WD, right? And if there is a staged move generator, you can compare NPS and MPS. You want NPS to be close to MPS, because then you did not generate a lot of moves that were never applied.

With all that in place, just verify at the search depth you reach. You are right that LMR can be misleading, depth might be great, but it could prune things it should not. For that we can compare depth with LMR disabled, or having the same LMR configuration in place.

We must also keep in mind that system speed and current board do have a lot of impact on NPS, and so does quiescence search. In any way, based on where it is now, I guess 50k NPS might a be a good initial goal for WD. Why do you think NPS is low now? How can WD reach its search depth now? Are we sure it is not also about the way how NPS is calculated?

@birdracerthree, why don't you simply disable checkmate evals in TT for the time being? I don't think they have such a high impact. And I even have seen professional engines disabling them for quiescence search at least, and so does GoK.

Alright, thanks!

WD on the starting board has an average-move-list-index of around 1.5-1.6.

WD also calculates the NPS as the total “normal nodes” + the q-search nodes, and then divide that by the time used. (Normal nodes is increased by one at the start of every “minmax”, and the q-search nodes counter is increased at the start of every “q-search”.)

I'm not totally sure why the NPS is just so low, (even doubling the move gen speed would put it at less than half of Element's), but I have a feeling it's just some really poor code somewhere (similar to the progress/eval bar fix on S3).

(Also, how do you calculate Moves Per Second on the starting board?)

(Also also, here's a really cool 99% accuracy game I played yesterday IRL https://lichess.org/nwfYC30o#65 )

Do you increment node count on every evaluation? On every applied move? Or something else?

I only increment node count at the start of the “minmax” my-block. (And there's a similar story for the q-search side too.)

Also, how should the moves per second on the starting board be calculated? Do you just keep on generating moves until 1 sec is up? And do you apply / revert moves on the board?

I checked the code, and it all makes sense, but I simply don't understand why this results in 18k NPS, when at the same time WD generates 200k moves per seconds (unless I measured that in a wrong way, I counted invocations to “add move to legal moves”). I then added up the number of moves iterated over (move count if no cutoff, move index if cutoff), it was 160k. Then I counted applied moves, it was 16k again. I must be missing something… I disabled LMR, same result. I only looked at non-quiescence numbers.

Move generator: No apply(), no revert(), no TT caching, nothing but “GenerateMoves” over and over again for starting board (including move ordering).

BTW, reading WD code in more detail for the first time, I could not avoid noticing it seems a bit related to GoK :-)

About performance, one thing I noticed was that “add move to pseudo legal…” is the most expensive custom block, although it is only about adding one item to a list. It contains a cascade of if/then/else blocks, to map to 20-something moves-<1-N> lists. Even an if statement is an expensive operation, on Scratch 3 very much so, on TW still a conditional jump. You can flatten that, e.g. to chunks of 5 (<6, <11, …), And in there the most likely match first (== 6, == 10, ..). No else block, instead just break when found. Only to levels of ifs, with that, you will end up with 2 ifs executed most of the time, not 15. A better approach would be to have only one move list, and an offset of probably 300 for each ply. Much faster, much less code, and increasing search depth won't require any code change. This is what GoK does:

https://github.com/ArnoHue/scratch/blob/fedda69407c0d5ccb7eaf269791b103f125ae861/chess/Engine.scratch#L1019

Alright, thanks!

What do you mean by “add move to pseudo legal…” has a cascade of if/then/else blocks? Are you referring to a different custom block?

(Also, WDs code is quite similar to GoK's as I used the coding dojo tutorial as a guide, and I frequently went into GoK to find optimizations, like finding the first 2 digits of a 4 digit number, etc. Not to mention the functions that are completely the same, like the killer moves & TT table.)

Correction, the slow custom block is “add move to legal moves with start square”