I have developed quite a knack for undertaking ridiculously overwrought analyses that do basically nothing with regard to actually improving my ability as a player, and well, here's another one.
For quite some time, I have been interested in trying to quantitatively separate the luck and the skill factor in Scrabble to attempt to determine which games I may have played less poorly than my opponent but lost, and which games I may have played better than my opponent and won. I addressed this briefly on the post where I suggested equity percentage as a replacement for equity loss a couple years ago - nowadays I don't see many people really talking about equity loss either. In the long run, I think that's correct since the object of the game is to win and measuring differences in winning percentages between the top play and your play throughout the game would probably be much more accurate at determining how the game went. In that post, I suggested possible equity as a proxy for luck - simply adding up how many points you could have scored equity-wise playing your top equity play on each turn, but I realized that was likely misguided as well since for instance if you missed a bingo and saved bingo-prone tiles you'd be very likely to have high-equity opportunities on your future turns. Possible equity can be as much a measure of poor skill (missing your big plays) as it can be a measure of luck. And yeah, I know "luck is where preparation meets opportunity" and all that crap. A lot of so-called "luck" can be exploiting opponents' weaknesses to give yourself good opportunities later in the game. However, I've also been very annoyed for quite a while how most people seem to simply measure luck by counting power tiles when I knew it wasn't that simple. If you draw three s's on one turn, you weren't particularly lucky. If you draw a blank in the endgame on an extremely closed board, there's not much you can do with it, and again, you weren't particularly lucky. Or at least not as much as you would be on a wide-open board early in the game. I am at least advanced to know that tiles aren't worth the same amount at every part of the game. A blank in the beginning might be worth 30 or 35 points instead of 25; an S in the beginning might be worth 12 instead of 8. On a closed board, those values would probably be less than 25 and 8. What REALLY matters of course is the synergy of your rack. ADEINOR is obviously better than any rack with three s's. Likewise, even something as lousy as DIMRUUV can probably be better on some boards than AEINORT if you actually have two spots for DUUMVIR but don't have any spots for the 8s with AEINORT (okay, I'm not sure I could be able to come up with a situation that would support this, so maybe the example is too extreme.)
Essentially, what I wanted to do was to attempt to measure luck by taking the difference between the synergy of a player's leave and the resulting valuation of the player's subsequent rack. I ran 500 2-ply simulations in Quackle on each rack I drew to determine how many points the entire rack was worth, which I determined by simulating a passed turn for each rack, then took the difference between that and the value of the previous leave. For example, in my first game against John Fox in the September Syracuse tournament, I kept NRST from a rack of GNRRRST with a vowel-heavy pool remaining. The leave NRST is worth 13.1 points according to Superleaves. I drew AEQ on the next turn. The equity value of a pass after 500 simulations with that rack was 25.9. I dropped the Q playing QI leaving AENRST with a leave value of 38, then drew a U and bingoed. Despite having a bingo on the turn, the equity value of the pass was worth 41, barely more than the 38 AENRST was worth, indicating that most of my luck was drawing the A and E on the previous turn to accompany my NRST, not drawing the U on the subsequent turn to actually allow me to bingo, which I believe is correct, and with proper rack management, luck can build over several turns, which I also think is correct. Although rudimentary, I think this is a better measure of luck than just HAVING an S. I followed through on this logic for the entire game and did so for each of my last 30 games summing my cumulative luck over the entire game, and here were the results (along with a FEW of my opponents' luck values if they actually recorded their racks, which most people around here don't...)
Kevin Gauthier (L 351-373) -118.0
Heather Drumm (W 477-348) +89.6
Hani Khouri (W 522-248) -52.2
Jason Broersma (L 274-422) -106.3 to -9.9
Roberta Borenstein (W 462-286) +119.3
Greg Fox (W 536-377) +151.5
William Pizer (W 381-354) -104.2
Jason Broersma (L 389-421) +47.0 to +4.2
Daniel Citron (L 355-407) -11.2
Joan Tondra (L 360-369) +45.5
Mark Goodman (L 380-397) +93.8
Barbara Epstein (W 478-293) +26.1
Matthew O'Connor (W 431-347) +67.6 to +29.2
Daniel Citron (L 329-380) -71.7
Daniel Blake (W 373-352) +3.0
Karl Higby (L 399-447) +66.9
Morris Greenberg (L 410-436) +57.5
Sue Tremblay (L 361-383) -86.9
Matthew O'Connor (L 362-425) +8.3
Joseph Bowman (W 448-391) +21.3
Denise Dixon (W 464-275) +15.4
Kevin Gauthier (W 505-299) +52.1
Kevin Gauthier (W 413-343) +40.4
Karl Higby (W 375-319) +26.9
Matthew O'Connor (L 391-471) -8.7 to +70.7
Ted Rosen (L 377-486) -91.2
Shubha Kamath (W 531-349) +40.5
John A Fox (W 396-306) -38.3
Ted Rosen (W 448-394) +20.0
John A Fox (W 357-337) -113.7
Based on these results, I quickly came to the conclusion that what was being measured here probably is correlated with luck but hardly seems to be any sort of perfect measure of it. In my 30 games, I went 13-6 when I had positive luck and 4-7 when I had negative luck. That alone might indicate that on the surface there is something here, but several of the results seem bizarrely anomalous, especially my luck value of -52.2 in my 522-248 win over Hani Khouri. While I would be expected to win against three-digit-rated players even with atrocious luck, it seems unfathomable that anyone could possibly consider any 500 game as 'unlucky'. In 3 of the 4 results where I had both players' racks, the luckier player did win, and in the 4th game, Jason Broersma won a close game with slightly less luck which presumably indicates he just played better, which is understandable since he's a better player.
What REALLY makes these results more or less worthless is something I should have realized before I started doing this, and I WAS certainly aware of this and stated it above. Leave values CHANGE. Several of the games I won with very low luck values (especially the William Pizer and John Fox games) were ugly, defensive slugfests. There were few opportunities for good plays on either side for sizable portions of both of those games, hence even racks that would normally be above average weren't worth NEARLY as much as they would be on an open board. There are few sets of letters so synergistic that they would have successfully done much to open either of those boards, so you could probably pass with AEINRST every time and almost get a negative evaluation, and CERTAINLY a below average evaluation compared to the expected value of the previous leave. The condition of the board clearly has to be figured in when evaluating luck, so these values don't necessarily mean anything out of context. Maybe an open board is ALWAYS going to tend towards positive luck values since there are more possibilities for plays than on an average board, and a closed board is ALWAYS going to tend towards negative luck values. Hence the only way this might mean anything is if you compare both players' luck values, but unfortunately most players don't record their racks, so... Another thing I did that may decrease the accuracy is that I did not count endgame leave values. I know Quackle calculates endgame evaluations differently from evaluations in the early game and pre-endgame so I didn't think this crude analysis would be meaningful in an endgame, especially since I'm evaluating leaves based on passes and a pass is going to be worth MUCH, MUCH LESS in the endgame than before it.
Some of the anomalies are more easily explained. In my Mark Goodman game, I had subzero luck most of the way then drew the bingo STOGEYS which I didn't know (lol, yes, I now know it's common...this just shows how out-of-it culturally I am...) with very few tiles remaining, hence that one play by itself had an evaluation of 80 points or something. One interesting thing I noticed is how often passing with a bingo rack had a higher evaluation than playing almost anything besides the bingo. Perhaps passing might actually be a decent strategy when you know you have a bingo rack but need time to think of what it is and have very little time left?
I think my fundamentals on this are largely correct, but it's pretty worthless to even attempt doing this unless you have both players' entire racks for the whole game and it can take so long to do this that there definitely gets to be a diminishing marginal utility (which is why I didn't do this for my first three tournaments). I don't think it's worth even attempting to do this in Quackle unless you're good enough to evaluate how many points a particular leave gains/loses on an open/closed board, which I am definitely not good enough to do yet. Regardless, this may be something that is worth more research in the future if people REALLY want to try to break down luck and skill. Josh Sokol was a bit of a skeptic about ME doing this considering how much stuff I miss, saying that measuring luck when you miss lots of things is fruitless, but I don't agree with him. If you miss a bingo, chances are good you still have a good leave. You can draw either good or bad tiles that will either be synergistic with your leave or not. I think doing it this way truly does measure luck, not poor play, as something like possible equity would reflect, but the main problem is that many players even at the STEE level don't record racks and that leave values are dynamic, not static. Elise probably does a better job of reflecting that, but I wasted way too many hours on this analysis even in Quackle, and for this project, I chose finishing it more quickly over accuracy, since trying to EXACTLY quantify luck is a ridiculous pursuit, but I thought in my case it was probably necessary to attempt to do this since I credit all my wins to luck and all my losses to inferior skill, which I KNOW is an unhealthy attitude towards Scrabble. If I could just evaluate who ACTUALLY played better canceling out luck I might have a healthier attitude toward games. That was the only reason I attempted to do this, but I'm now convinced there really ISN'T a perfect way to do it. I'm even more convinced that just listing S's and blanks is too crude though.
My other main question about this analysis is whether it actually is measuring luck or skill. In most cases, I had positive luck against lower-rated opponents (of course, since none of them recorded their racks, I can't say that I had MORE luck), but a lot of that could be that I save superior leaves that will be synergistic with letters remaining in the pool. Like when I exchanged against John Fox (who was higher-rated than me before, but not after, that tournament) I would not have saved so many consonants (NRST) if the pool wasn't so vowel-heavy. Is it then really LUCKY to draw a couple of vowels, or is it simply a skillful decision? In a lot of cases, I'm sure this did measure dumb luck but some of luck in Scrabble is certainly setting up your racks so that they will be synergistic on future turns. This leads me to believe it's actually impossible to separate luck and skill and maybe it's not even worth trying.
Did I gain anything from this strategically? Yeah, probably. It probably gave me more insight on how leave values change during a game, and I tried to predict how much each leave was worth before I looked it up. In a lot of cases I was dead on; in some, I was very much not. Some actually shocked me (AETT has 0 leave value? I'm SHOCKED that's not like +2 or something...duplication must be much worse than I think it is). I still think in the grand scheme of things it was a colossal waste of time but once I spent as many hours on this as I did, I figured I HAD to share the results.