## Responses to Cade Metz’s hit piece on Scott Alexander

This depraved rubbish got quite a few responses. Please let me know about any I’ve missed. Inner bullets are subtitles or exerpts. Open to adding good Twitter threads too!

• Scott Alexander, Statement on New York Times Article
• I have 1,557 other posts worth of material he could have used, and the sentence he chose to go with was the one that was crossed out and included a plea for people to stop taking it out of context.
• I don’t want to accuse the New York Times of lying about me, exactly, but if they were truthful, it was in the same way as that famous movie review which describes the Wizard of Oz as: “Transported to a surreal landscape, a young girl kills the first person she meets and then teams up with three strangers to kill again.”
• I believe they misrepresented me as retaliation for my publicly objecting to their policy of doxxing bloggers in a way that threatens their livelihood and safety. Because they are much more powerful than I am and have a much wider reach, far more people will read their article than will read my response, so probably their plan will work.
• Cathy Young, Slate Star Codex and the Gray Lady’s Decay
• The New York Times hit piece on a heterodox blogger is a bad stumble — the latest of many
• The New York Times has wandered into an even worse culture-war skirmish. This one involves bad (and arguably dishonest) reporting as well as accusations of vindictiveness and violation of privacy. It’s a clash pitting the Times against a prominent blogger critical of the social justice progressivism that has become the mainstream media’s dominant ideology in the past decade. The Times does not look good.
• Robby Soave, What The New York Times‘ Hit Piece on Slate Star Codex Says About Media Gatekeeping
• “”Silicon Valley’s Safe Space” has misinformed readers.”
• It’s a lazy hit piece that actively misleads readers, giving them the false impression that Siskind is at the center of a stealth plot to infiltrate Silicon Valley and pollute it with noxious far-right ideas.
• The idea that a clinical psychiatrist’s blog is the embodiment of Silicon Valley’s psyche is very odd
• To the extent that the Times  left readers with the impression that Siskind is primarily a right-wing contrarian—and Silicon Valley’s intellectual man-behind-the-curtain to boot—it is actually the paper of record that has spread misinformation.
• Matt Yglesias, In defense of interesting writing on controversial topics
• Some thoughts on the New York Times’ Slate Star Codex profile
• I think Metz kind of misses what’s interesting about it from the get-go.
• something about the internet is making people into infantile conformists with no taste or appreciation for the life of the mind, and frankly, I’m sick of it.
• Tom Chivers, When did we give up on persuasion?
• Freddie deBoer, Scott Alexander is not in the Gizmodo Media Slack
• Scott Aaronson, A grand anticlimax: the New York Times on Scott Alexander
• Sergey Alexashenko, How The Hell Do You Not Quote SSC?
• “The NYT misses the point by a light-year.”
• Kenneth R. Pike, Scott Alexander, Philosopher King of the Weird People
• Noah Smith, Silicon Valley isn’t full of fascists

## NYT plan to doxx Scott Alexander for no real reason

UPDATE 2020-06-25: Please sign the petition at DontDoxScottAlexander.com!

The New York Times is planning on publishing an article about Scott Alexander, one of the most important thinkers of our time. Unfortunately, they plan to include his legal name. In response, Scott has shut down his blog, a huge loss to the world.

This will do enormous harm to him personally; some people hate Scott and this will encourage them to go after his livelihood and his home. Not all of them are above even SWATing, ie attempted murder by police. If he does lose his job, it will also “leave hundreds of patients in a dangerous situation as we tried to transition their care.”

However, the greatest harm is to the public discourse as a whole. Shutting people down in real life is an increasingly popular response to all forms of disagreement. Pseudonymity plays an essential role in keeping the marketplace of ideas healthy, making it possible for a wider spectrum of ideas to be heard. If the NYT policy is that anyone whose profile becomes prominent enough will be doxxed in the most important newspaper in the world, it has a chilling effect.

All this might be OK if there was some countervailing public interest defence,  if there was a connection between his blogging and his real world activity that needed to be exposed. But as I understand it, no-one is asserting this. The defence of this incredibly harmful act is simply “sorry, this is our policy”. It’s not even a consistently applied policy: a profile of the Chapo Trap House hosts published in February rightly omitted host Virgil Texas’s real name, though they must surely have been aware of it.

I urge you to spread the word on this everywhere you have reach, and to politely contact the New York Times through the means Scott outlines in his post to urge them to do the right thing.

UPDATE 2020-06-25: Please sign the petition at DontDoxScottAlexander.com!

Here’s the letter I wrote:

I am a subscriber, and I am dismayed to learn that the Times plans to doxx blogger Scott Alexander. In an age where people so often respond to disagreement by attacking someone in the real world, whether by getting them fired or by SWATing, pseudonymity plays an essential role in the marketplace of ideas, helping to ensure that a wide spectrum of voices can be heard.

Obviously if there was a public interest defence of publishing this information – if there was a connection between his blogging and his real world activity that needed to be exposed – that would be different, but as I understand it no-one is asserting that. If you plan to do something so tremendously harmful to the public discourse as a whole, please have a reason other than “this is what we do”. You were right not to doxx Chapo Trap House host Virgil Texas; please apply that policy here.

## Some more numbers as lambda calculus

Do you have any diagrams of smaller numbers for comparison? I’d love to see a whole sequence of these.

I have work to put off, so I couldn’t resist the challenge. Like the previous post, these are Tromp diagrams showing lambda calculus expressions that evaluate to Church integers.

Code to generate these diagrams is on Github; I generated these with the command

./trylambda --outdir /tmp/out demofiles/smallernums.olc demofiles/graham.olc demofiles/fgh.olc demofiles/slow.olc

## A picture of Graham’s Number

One of the first posts I made on this blog was Lambda calculus and Graham’s number, which set out how to express the insanely large number known as Graham’s Number precisely and concisely using lambda calculus.

A week ago, Reddit user u/KtoProd asked: if I wanted to get a Graham’s Number tattoo, how should I represent it? u/FavoriteColorFlavor linked to my lambda calculus post. But in a cool twist, they suggested that rather than writing these things in the usual way, they use a John Tromp lambda calculus diagram. I got into the discussion and started working with the diagrams a bit, and they really are a great way to work with lambda calculus expressions; it was a pleasure to understand how the diagram relates to what I originally wrote, and manipulate it a bit for clarity.

The bars at the top are lambdas, the joining horizontal lines are applications, and the vertical lines are variables. There are three groups; the rightmost group represents the number 2, and the middle one the number 3; with beta reduction the two lambdas in the leftmost group will consume these rightmost groups and use them to build other small numbers needed here, like 4 (22) and 64 (43). The three is also used to make the two 3s either side of the arrows. Tromp’s page about these diagrams has lots of examples.

I’m obviously biased, but this is my favourite of the suggestions in that discussion. If u/KtoProd does get it as a tattoo I hope I can share a picture with you all!

Update 2020-02-24: I’ve added the ability to generate these diagrams to my Python lambda calculus toy. After installation, try ./trylambda demofiles/draw.olc.

A few more notes regarding target collision resistant functions, following up from my $1000 competition announcement. # Second preimage resistance There is a simple way to construct a secure TCR compression function given a second-preimage-resistant compression function—just generate a key which is the length of the input, and XOR the key with the input. So if we can build a fast second-preimage-resistant function, we can build a fast secure TCR. The history of hash functions shows that we have been much more successful at achieving second-preimage resistance than collision resistance. From the excellent Lessons From The History Of Attacks On Secure Hash Functions: The main result is that there is a big gap between the history of collision attacks and pre-image attacks. Almost all older secure hash functions have fallen to collision attacks. Almost none have ever fallen to pre-image attacks. Secondarily, no new secure hash functions (designed after approximately the year 2000) have so far succumbed to collision attacks, either. # Tweakable target collision resistance In the definition of target collision resistance, the attacker supplies a single message, but in practice, we usually want to hash many messages with the same key, eg when constructing a variable-length TCR from a compression function that takes a fixed-length message. This is OK because there’s a straightforward security reduction which shows that if an attacker can find a collision for a single message with probability ε, then they can forge a collision for any of n messages with probability at most . However, when the messages to be signed are large as they are in Android, this linear falloff is kind of a shame. One possible advantage of TCRs is that it can be secure to use much shorter hash outputs, say 128 bits, which will make Merkle trees much smaller, saving disk space and improving performance. But if the hash function consumes, say, 128 bytes at a time (like BLAKE2) and the system partition is 1GB, it will be broken up into 223 messages, leaving us with only 105-bit security at best. I’d like to do a little better than that, and so I’d like to build multiple-message security into the security definition. I propose a new kind of primitive, a tweakable TCR, which takes a tweak as well as a key and a message. The attacker faces the following challenge: • Attacker chooses n messages m1mn and n distinct tweaks t1tn • Attacker learns random key K • Attacker chooses i, m • Attacker wins if m’ ≠ mi but H(K, ti, m’) = H(K, ti, mi) If each of the 223 messages gets a distinct tweak, we can preserve 128-bit security across large partitions. I therefore encourage people to design not just TCRs, but TTCRs! ##$1000 TCR hashing competition

In my day job, I do cryptography for Android. I have a problem where I need to make some cryptography faster, and I’m setting up a $1000 competition funded from my own pocket for work towards the solution. On Android devices, key operating system components are stored in read-only partitions such as the /system partition. To prevent an attacker tampering with these partitions, we hash them using a Merkle tree, and sign the root. We don’t check all the hashes when the device boots; that would take too long. Instead, at boot time we check only the root of the tree, and then we check other sectors against their hashes as we load them using a Linux kernel module called dm-verity. This likely works pretty well on phones sold in the US, which will have the ARM CE instructions that accelerate SHA2-256. But a lot of devices sold in poorer countries don’t have these instructions, and SHA2-256 can be pretty slow, and hurt overall system performance. For example, on the 900MHz Cortex-A7 Broadcom BCM2836, hashing takes 28.86 cpb [eBACS 2019-03-31], limiting reading speed to 31.2 MB/s. One partial fix is to switch to a hash function that is faster on such processors. BLAKE2b is nearly twice as fast on that processor, at 15.32 cpb. However, this is still a lot slower than I’m happy with. Where sender and receiver have a shared secret, authentication can be very fast; a universal function like NH can run at around 1.5 cpb on such a processor. But this isn’t an option for verified boot, because it’s hard to keep the key out of the attacker’s hands, and given the key it’s trivial to forge messages. Inbetween these two notions of security is the idea of a “target collision resistant” function, once known as a “universal one-way hash function”. With a TCR, hashing is randomized with a key chosen at signing time once the message to be signed is known. This makes the attacker’s job much harder since they cannot simply search for a pair of colliding messages. Instead, they must choose the first message to be hashed, and only then do they learn the key that will be used at hashing time, after which they must generate a second message that hashes with the first using this key; this problem is more akin to second preimage finding than collision finding. While collision attacks against hash functions are plentiful, second preimage attacks are far rarer. Collision resistance Target collision resistance Universal function $K \xleftarrow{\} \mathcal{K}$ $A \leftarrow K \\ A \rightarrow m_1 \\ A \rightarrow m_2$ $A \rightarrow m_1 \\ A \leftarrow K \\ A \rightarrow m_2$ $A \rightarrow m_1 \\ A \rightarrow m_2 \\ A \leftarrow K$ Attacker succeeds if $m_1 \neq m_2$ and $H(K, m_1) = H(K, m_2)$ In principle, the vastly harder job facing the attacker of a TCR should mean that secure TCRs much faster than hash functions are possible. However, the main impetus to research on TCRs was a desire to bolster existing hash functions, when attacks on MD5 and SHA-1 were new and we didn’t know which if any of our existing hash functions would be left standing. As a result several ways to construct a TCR from a hash function with good provable properties were proposed, but none of these could be faster than their underlying hash functions. As far as I know, no-one has ever proposed a TCR as a primitive, designed to be faster than existing hash functions, and that’s what I need. I’m probably not the only one who’d find a primitive for much faster broadcast authentication useful, either! To me this looks like an interesting, overlooked problem in symmetric cryptology, and I’d really like it to get some attention. So I’m offering a$1000 prize from my own pocket, to be awarded at Real World Crypto 2021, for the work that in my arbitrary opinion does the most to move the state of the art forwards or is just the most interesting.

I offered a similar prize at the rump session at FSE 2019, promising to award it at the end of the year, but I neglected to really tell anyone and didn’t get any entrants. Hopefully this will be a more successful launch for the prize, and see some of you at Real World Crypto 2020!

I’ve moved some technical notes into a subsequent blog post.

## Subjective probability

Credits: This way of looking at probability is due to Bruno de Finetti; this particular framing was taught to me by Andrew Critch.

Out of the blue, you get the following email from me:

Dear You:

I extend to you, and you alone, a chance to take part in my free lottery. Please choose at most one of the following options:

• Option A: On November 9th, I’ll roll two standard six-sided dice, and if I roll a double one (“snake-eyes”), I’ll send you $200 • Option B: On November 9th, if Washington DC has been declared for Clinton, I’ll send you$200

If I don’t hear from you within 24 hours, or if your answer isn’t a clear preference for one of these, I won’t do either. Thanks!

If you know me, you know that I’d straightforwardly honour the promise in the email; for this thought experiment set aside all questions about that. It may help you to know that Obama got over 90% of the vote in DC in 2012. There seems zero benefit in refusing the offer or not replying – it’s totally free, and the worst that can happen if you lose is that I don’t send you $200. Would you reply to this email, choosing one of the options? If so, which one? I think it’s obvious what the right choice is, but stop for a moment and decide what you’d do. What about if I’d offered you the opposite choice: A’ means$200 if I don’t roll snake-eyes, while B’ means \$200 if Clinton doesn’t win DC? Does that change your choice?

I hope you chose B in the first case, and A’ in the second. That’s because it seems very clear that Clinton’s chances of winning in DC are very high, and certainly higher than the chances of snake-eyes on a single roll of two dice, which is 1/36 or less than 3%.

However, for many people, this statement contradicts their understanding of what probability is. The most common and widely taught view of probability is strictly frequentist; it makes sense to say that a pair of dice have a less than 3% chance of snake-eyes only because you can roll the dice many times and in the long run they will land snake-eyes one time in 36. You cannot rerun the 2016 Presidential election in DC many times, so it means nothing to say that Clinton has a greater than 3% chance of winning.

If you’re prepared to choose between the options above – if you agree that a single roll of a pair of dice producing snake-eyes is less likely than a Democratic victory in DC in 2016 – then there’s an important sense in which you already reject this view and accept a subjective view of probability.

## 7,000 children under five died of malnutrition today

7,000 children under five died of malnutrition today.

It is said that Cato the Elder was so passionate about the losses in the Punic Wars, the threat of further aggression and the desire to impose a total punitive destruction to strike fear into all who might think to raise arms against Rome, that he finished every speech with the famous phrase Carthago delenda est – “Carthage must be destroyed”. No matter what the subject of the speech, whether it be tax policy or proposals for new buildings or whatever was discussed in the Roman Senate, he would finish on this note: Ceterum autem censeo Carthaginem esse delendam – “Furthermore, I consider that Carthage must be destroyed”. A Facebook post I recently read asked, if you were to add such a coda to your own speeches, what would it be? This was my answer: 7,000 children under five died of malnutrition today.

I’m getting this statistic from the WHO, who say that 5.9 million children under the age of five died in 2015, and about 45% of all child deaths are linked to malnutrition. Multiplying those two together and dividing by 365, I get that ignoring seasonal and random variation and suchlike, around 7,000 children under five died of malnutrition today. This number is going down dramatically; the progress in the fight against poverty and malnutrition over the last decade has been truly astounding and is showing no sign of stopping. Still, that’s a heck of a lot. If a dramatic event in the news that kills 50 people is a tragedy, then this is that tragedy 140 times a day, nearly once every ten minutes, and the victims are all children under five. The parents right now weeping for sons and daughters lost while I wrote this would fill the Faraday Theatre.

I don’t even believe that this is humanity’s biggest problem, or anywhere close. There seems to be a decent chance that through one means or another we could drive ourselves extinct in the decades to come, destroying not only all the value we have today but the unthinkably greater value we could hope to create in the vast future ahead of us. This is why my own donations have gone not to poverty-related charities, but charities like CSER, FHI, and MIRI, that aim to avoid this fate. But issues around existential risk are unfamiliar, and sometimes when considering this or that issue of the day, it’s good to have an easily understood, really big issue in mind to lend perspective. I’ll post a link to this essay in the usual places in a moment, but I’m seriously considering re-posting this link possibly on a monthly basis, to make it that bit harder to lose sight of the magnitude of the problems humanity still faces.

In conclusion, 7,000 children under five died of malnutrition today.

## Expressing computable ordinals as programs

I loved John Baez’s three-part system on large countable ordinals (123) but something nagged at me. It felt like a description of an algorithm in prose, and I feel like I don’t really understand an algorithm until I’ve implemented it. But what does it mean to implement an ordinal?

I found a couple of answers to that online. One is the definition of a recursive ordinal, and the other is Kleene’s O. However both seemed pretty unsatisfactory to me; I wanted something that could naturally express operations like addition/multiplication/exponentiation, as well as expressing finite ordinals.

Here’s where I ended up: we express an ordinal as a set of lexicographically-sorted and well-ordered binary strings with the “prefix property” that no string in the set is a prefix of any other. If $\mathrm{ord}(A)$ is the ordinal represented by set $A$, we have

• $0 = \mathrm{ord}(\{\})$
• $1 = \mathrm{ord}(\{\epsilon\})$ ($\epsilon$ is the empty string)
• $2 = \mathrm{ord}(\{0, 1\})$
• $3 = \mathrm{ord}(\{0, 10, 11\})$
• $4 = \mathrm{ord}(\{0, 10, 110, 111\})$
• $\omega = \mathrm{ord}(\{0, 10, 110, 1110, 11110, \ldots\})$
• $\omega + 1 = \mathrm{ord}(\{00, 010, 0110, 01110, 011110, \ldots , 1\})$
• $\omega^2 = \mathrm{ord}(\{00, 010, 0110, 01110, \ldots , 100, 1010, 10110, \ldots, 1100, 11010, \ldots, 1110, 111010, \ldots\})$

These are just samples: for every ordinal, there are infinitely many sets that could represent it.

Addition and multiplication are easy here:

• $\mathrm{ord}(A) + \mathrm{ord}(B) = \mathrm{ord}(\{0 \cdot a | a \in A\} \cup \{1 \cdot b | b \in B\})$
• $\mathrm{ord}(A)\mathrm{ord}(B) = \mathrm{ord}(\{b \cdot a | a \in A, b \in B\})$

Exponentiation is a bit harder, I use this idea: consider a function $f: B \rightarrow A$ with finite support. Let $\{b_1, b_2, ... b_n\} = \mathop{\mathrm{supp}}(f)$ where $b_1 > b_2 > ... > b_n$. Then we represent this function as $1 \cdot b_1 \cdot f(b_1) \cdot 1 \cdot b_2 \cdot f(b_2) \cdot \ldots \cdot 1 \cdot b_n \cdot f(b_n) \cdot 0$. If $C$ is the set of all such representations for all such functions $f$, then $\mathrm{ord}(A)^{\mathrm{ord}(B)} = \mathrm{ord}(C)$

Instead of limits, I define an infinite sum function. Given a function $f: \mathbb{N} \rightarrow \mathcal{P}(\{0,1\}^{*})$ we have

• $\sum_{i=0}^\infty \mathrm{ord}(f(i)) = \mathrm{ord}(\{1^i \cdot 0 \cdot x| i \in \mathbb{N}, x \in f(i)\})$

The obvious way to represent these sets as programs would be as functions that test for membership of the set. It should be clear how to implement addition and infinite sum with this representation, and multiplication is only a little more complicated. Unfortunately I don’t see how to do exponentiation, because of one small wrinkle: if we’re to get the right answer for finite exponents, we must ensure that every one of our $f(b_i)$ entries are non-zero, ie not the smallest elements of B, and we have no way to find that. So instead I propose a slight wrinkle: we implement instead a function which tells us whether a string is a prefix of any string in the set.

I’ll try to share code implementing all this ASAP, but I wanted to put the ideas out there first. Thanks!

## “Comparing”

I’m writing this now because I anticipate linking to it over and over again; this fallacy isn’t going anywhere.

Journalists have got very good at using the word “comparing” to turn the most innocuous statement into a gaffe, through a simple trick of equivocation. Most recently, Jeremy Corbyn is accused of “comparing” Israel and ISIS, but a search for “accused of comparing” finds many other examples. The pattern goes like this: a politician says something like “just as one doesn’t put a vampire in charge of a blood bank, one shouldn’t put the press in charge of protecting privacy”. This use of analogy is one sense of the word comparing. However, many now seem persuaded that “comparing” really means “equating”. Thus we start with someone using a vivid analogy to make a point like “you should consider someone’s partisan motivations before giving them an important responsibility” and through the magic of this word, this becomes “Politician claims Rupert Murdoch kills people and drinks their blood”, which as far as I know he doesn’t.

I’m even less of a fan of Dan Quayle than of Jeremy Corbyn, but he too fell foul of something very similar, though the word itself wasn’t used. In the 1988 vice-presidential debates, he used the example of JFK to argue that a short Congressional service need not be a bar to high office. Lloyd Bentsen replied with the now-famous put-down “Senator, I served with Jack Kennedy. I knew Jack Kennedy. Jack Kennedy was a friend of mine. Senator, you’re no Jack Kennedy.” Of course Quayle was claiming no such thing; he was simply showing that the charge against him proved too much. But this deliberate misunderstanding is one of the most celebrated lines in VP debates.

The thing that’s most annoying about this is that it’s natural to reach for the most extreme example to prove your point. If we oppose vigilante justice even for murder, we certainly oppose it for littering. If we should defend the right to free speech even of Nazis, we should certainly defend it when it comes to, say, Tories. If we’re not going to hold all Saudis responsible for Osama bin Laden, we’re certainly not going to hold all Canadians responsible for Justin Bieber. To me this seems like a normal move in argument, but if I were a politician I couldn’t say any of these things, for fear of being accused of “comparing” Bieber to OBL.

Update: this comic makes a similar point very well. Update: Julia Galef discusses the comic. Thanks to Michael Keenan for both links, on FB.