Paul Crowley

A few more notes regarding target collision resistant functions, following up from my $1000 competition announcement. # Second preimage resistance There is a simple way to construct a secure TCR compression function given a second-preimage-resistant compression function—just generate a key which is the length of the input, and XOR the key with the input. So if we can build a fast second-preimage-resistant function, we can build a fast secure TCR. The history of hash functions shows that we have been much more successful at achieving second-preimage resistance than collision resistance. From the excellent Lessons From The History Of Attacks On Secure Hash Functions: The main result is that there is a big gap between the history of collision attacks and pre-image attacks. Almost all older secure hash functions have fallen to collision attacks. Almost none have ever fallen to pre-image attacks. Secondarily, no new secure hash functions (designed after approximately the year 2000) have so far succumbed to collision attacks, either. # Tweakable target collision resistance In the definition of target collision resistance, the attacker supplies a single message, but in practice, we usually want to hash many messages with the same key, eg when constructing a variable-length TCR from a compression function that takes a fixed-length message. This is OK because there’s a straightforward security reduction which shows that if an attacker can find a collision for a single message with probability ε, then they can forge a collision for any of n messages with probability at most . However, when the messages to be signed are large as they are in Android, this linear falloff is kind of a shame. One possible advantage of TCRs is that it can be secure to use much shorter hash outputs, say 128 bits, which will make Merkle trees much smaller, saving disk space and improving performance. But if the hash function consumes, say, 128 bytes at a time (like BLAKE2) and the system partition is 1GB, it will be broken up into 223 messages, leaving us with only 105-bit security at best. I’d like to do a little better than that, and so I’d like to build multiple-message security into the security definition. I propose a new kind of primitive, a tweakable TCR, which takes a tweak as well as a key and a message. The attacker faces the following challenge: • Attacker chooses n messages m1mn and n distinct tweaks t1tn • Attacker learns random key K • Attacker chooses i, m • Attacker wins if m’ ≠ mi but H(K, ti, m’) = H(K, ti, mi) If each of the 223 messages gets a distinct tweak, we can preserve 128-bit security across large partitions. I therefore encourage people to design not just TCRs, but TTCRs! ###$1000 TCR hashing competition

In my day job, I do cryptography for Android. I have a problem where I need to make some cryptography faster, and I’m setting up a $1000 competition funded from my own pocket for work towards the solution. On Android devices, key operating system components are stored in read-only partitions such as the /system partition. To prevent an attacker tampering with these partitions, we hash them using a Merkle tree, and sign the root. We don’t check all the hashes when the device boots; that would take too long. Instead, at boot time we check only the root of the tree, and then we check other sectors against their hashes as we load them using a Linux kernel module called dm-verity. This likely works pretty well on phones sold in the US, which will have the ARM CE instructions that accelerate SHA2-256. But a lot of devices sold in poorer countries don’t have these instructions, and SHA2-256 can be pretty slow, and hurt overall system performance. For example, on the 900MHz Cortex-A7 Broadcom BCM2836, hashing takes 28.86 cpb [eBACS 2019-03-31], limiting reading speed to 31.2 MB/s. One partial fix is to switch to a hash function that is faster on such processors. BLAKE2b is nearly twice as fast on that processor, at 15.32 cpb. However, this is still a lot slower than I’m happy with. Where sender and receiver have a shared secret, authentication can be very fast; a universal function like NH can run at around 1.5 cpb on such a processor. But this isn’t an option for verified boot, because it’s hard to keep the key out of the attacker’s hands, and given the key it’s trivial to forge messages. Inbetween these two notions of security is the idea of a “target collision resistant” function, once known as a “universal one-way hash function”. With a TCR, hashing is randomized with a key chosen at signing time once the message to be signed is known. This makes the attacker’s job much harder since they cannot simply search for a pair of colliding messages. Instead, they must choose the first message to be hashed, and only then do they learn the key that will be used at hashing time, after which they must generate a second message that hashes with the first using this key; this problem is more akin to second preimage finding than collision finding. While collision attacks against hash functions are plentiful, second preimage attacks are far rarer. Collision resistance Target collision resistance Universal function $K \xleftarrow{\} \mathcal{K}$ $A \leftarrow K \\ A \rightarrow m_1 \\ A \rightarrow m_2$ $A \rightarrow m_1 \\ A \leftarrow K \\ A \rightarrow m_2$ $A \rightarrow m_1 \\ A \rightarrow m_2 \\ A \leftarrow K$ Attacker succeeds if $m_1 \neq m_2$ and $H(K, m_1) = H(K, m_2)$ In principle, the vastly harder job facing the attacker of a TCR should mean that secure TCRs much faster than hash functions are possible. However, the main impetus to research on TCRs was a desire to bolster existing hash functions, when attacks on MD5 and SHA-1 were new and we didn’t know which if any of our existing hash functions would be left standing. As a result several ways to construct a TCR from a hash function with good provable properties were proposed, but none of these could be faster than their underlying hash functions. As far as I know, no-one has ever proposed a TCR as a primitive, designed to be faster than existing hash functions, and that’s what I need. I’m probably not the only one who’d find a primitive for much faster broadcast authentication useful, either! To me this looks like an interesting, overlooked problem in symmetric cryptology, and I’d really like it to get some attention. So I’m offering a$1000 prize from my own pocket, to be awarded at Real World Crypto 2021, for the work that in my arbitrary opinion does the most to move the state of the art forwards or is just the most interesting.

I offered a similar prize at the rump session at FSE 2019, promising to award it at the end of the year, but I neglected to really tell anyone and didn’t get any entrants. Hopefully this will be a more successful launch for the prize, and see some of you at Real World Crypto 2020!

I’ve moved some technical notes into a subsequent blog post.

### Subjective probability

Credits: This way of looking at probability is due to Bruno de Finetti; this particular framing was taught to me by Andrew Critch.

Out of the blue, you get the following email from me:

Dear You:

I extend to you, and you alone, a chance to take part in my free lottery. Please choose at most one of the following options:

• Option A: On November 9th, I’ll roll two standard six-sided dice, and if I roll a double one (“snake-eyes”), I’ll send you $200 • Option B: On November 9th, if Washington DC has been declared for Clinton, I’ll send you$200

If I don’t hear from you within 24 hours, or if your answer isn’t a clear preference for one of these, I won’t do either. Thanks!

If you know me, you know that I’d straightforwardly honour the promise in the email; for this thought experiment set aside all questions about that. It may help you to know that Obama got over 90% of the vote in DC in 2012. There seems zero benefit in refusing the offer or not replying – it’s totally free, and the worst that can happen if you lose is that I don’t send you $200. Would you reply to this email, choosing one of the options? If so, which one? I think it’s obvious what the right choice is, but stop for a moment and decide what you’d do. What about if I’d offered you the opposite choice: A’ means$200 if I don’t roll snake-eyes, while B’ means $200 if Clinton doesn’t win DC? Does that change your choice? I hope you chose B in the first case, and A’ in the second. That’s because it seems very clear that Clinton’s chances of winning in DC are very high, and certainly higher than the chances of snake-eyes on a single roll of two dice, which is 1/36 or less than 3%. However, for many people, this statement contradicts their understanding of what probability is. The most common and widely taught view of probability is strictly frequentist; it makes sense to say that a pair of dice have a less than 3% chance of snake-eyes only because you can roll the dice many times and in the long run they will land snake-eyes one time in 36. You cannot rerun the 2016 Presidential election in DC many times, so it means nothing to say that Clinton has a greater than 3% chance of winning. If you’re prepared to choose between the options above – if you agree that a single roll of a pair of dice producing snake-eyes is less likely than a Democratic victory in DC in 2016 – then there’s an important sense in which you already reject this view and accept a subjective view of probability. ### 7,000 children under five died of malnutrition today 7,000 children under five died of malnutrition today. It is said that Cato the Elder was so passionate about the losses in the Punic Wars, the threat of further aggression and the desire to impose a total punitive destruction to strike fear into all who might think to raise arms against Rome, that he finished every speech with the famous phrase Carthago delenda est – “Carthage must be destroyed”. No matter what the subject of the speech, whether it be tax policy or proposals for new buildings or whatever was discussed in the Roman Senate, he would finish on this note: Ceterum autem censeo Carthaginem esse delendam – “Furthermore, I consider that Carthage must be destroyed”. A Facebook post I recently read asked, if you were to add such a coda to your own speeches, what would it be? This was my answer: 7,000 children under five died of malnutrition today. I’m getting this statistic from the WHO, who say that 5.9 million children under the age of five died in 2015, and about 45% of all child deaths are linked to malnutrition. Multiplying those two together and dividing by 365, I get that ignoring seasonal and random variation and suchlike, around 7,000 children under five died of malnutrition today. This number is going down dramatically; the progress in the fight against poverty and malnutrition over the last decade has been truly astounding and is showing no sign of stopping. Still, that’s a heck of a lot. If a dramatic event in the news that kills 50 people is a tragedy, then this is that tragedy 140 times a day, nearly once every ten minutes, and the victims are all children under five. The parents right now weeping for sons and daughters lost while I wrote this would fill the Faraday Theatre. I don’t even believe that this is humanity’s biggest problem, or anywhere close. There seems to be a decent chance that through one means or another we could drive ourselves extinct in the decades to come, destroying not only all the value we have today but the unthinkably greater value we could hope to create in the vast future ahead of us. This is why my own donations have gone not to poverty-related charities, but charities like CSER, FHI, and MIRI, that aim to avoid this fate. But issues around existential risk are unfamiliar, and sometimes when considering this or that issue of the day, it’s good to have an easily understood, really big issue in mind to lend perspective. I’ll post a link to this essay in the usual places in a moment, but I’m seriously considering re-posting this link possibly on a monthly basis, to make it that bit harder to lose sight of the magnitude of the problems humanity still faces. In conclusion, 7,000 children under five died of malnutrition today. ### Expressing computable ordinals as programs I loved John Baez’s three-part system on large countable ordinals (123) but something nagged at me. It felt like a description of an algorithm in prose, and I feel like I don’t really understand an algorithm until I’ve implemented it. But what does it mean to implement an ordinal? I found a couple of answers to that online. One is the definition of a recursive ordinal, and the other is Kleene’s O. However both seemed pretty unsatisfactory to me; I wanted something that could naturally express operations like addition/multiplication/exponentiation, as well as expressing finite ordinals. Here’s where I ended up: we express an ordinal as a set of lexicographically-sorted and well-ordered binary strings with the “prefix property” that no string in the set is a prefix of any other. If $\mathrm{ord}(A)$ is the ordinal represented by set $A$, we have • $0 = \mathrm{ord}(\{\})$ • $1 = \mathrm{ord}(\{\epsilon\})$ ($\epsilon$ is the empty string) • $2 = \mathrm{ord}(\{0, 1\})$ • $3 = \mathrm{ord}(\{0, 10, 11\})$ • $4 = \mathrm{ord}(\{0, 10, 110, 111\})$ • $\omega = \mathrm{ord}(\{0, 10, 110, 1110, 11110, \ldots\})$ • $\omega + 1 = \mathrm{ord}(\{00, 010, 0110, 01110, 011110, \ldots , 1\})$ • $\omega^2 = \mathrm{ord}(\{00, 010, 0110, 01110, \ldots , 100, 1010, 10110, \ldots, 1100, 11010, \ldots, 1110, 111010, \ldots\})$ These are just samples: for every ordinal, there are infinitely many sets that could represent it. Addition and multiplication are easy here: • $\mathrm{ord}(A) + \mathrm{ord}(B) = \mathrm{ord}(\{0 \cdot a | a \in A\} \cup \{1 \cdot b | b \in B\})$ • $\mathrm{ord}(A)\mathrm{ord}(B) = \mathrm{ord}(\{b \cdot a | a \in A, b \in B\})$ Exponentiation is a bit harder, I use this idea: consider a function $f: B \rightarrow A$ with finite support. Let $\{b_1, b_2, ... b_n\} = \mathop{\mathrm{supp}}(f)$ where $b_1 > b_2 > ... > b_n$. Then we represent this function as $1 \cdot b_1 \cdot f(b_1) \cdot 1 \cdot b_2 \cdot f(b_2) \cdot \ldots \cdot 1 \cdot b_n \cdot f(b_n) \cdot 0$. If $C$ is the set of all such representations for all such functions $f$, then $\mathrm{ord}(A)^{\mathrm{ord}(B)} = \mathrm{ord}(C)$ Instead of limits, I define an infinite sum function. Given a function $f: \mathbb{N} \rightarrow \mathcal{P}(\{0,1\}^{*})$ we have • $\sum_{i=0}^\infty \mathrm{ord}(f(i)) = \mathrm{ord}(\{1^i \cdot 0 \cdot x| i \in \mathbb{N}, x \in f(i)\})$ The obvious way to represent these sets as programs would be as functions that test for membership of the set. It should be clear how to implement addition and infinite sum with this representation, and multiplication is only a little more complicated. Unfortunately I don’t see how to do exponentiation, because of one small wrinkle: if we’re to get the right answer for finite exponents, we must ensure that every one of our $f(b_i)$ entries are non-zero, ie not the smallest elements of B, and we have no way to find that. So instead I propose a slight wrinkle: we implement instead a function which tells us whether a string is a prefix of any string in the set. I’ll try to share code implementing all this ASAP, but I wanted to put the ideas out there first. Thanks! ### “Comparing” I’m writing this now because I anticipate linking to it over and over again; this fallacy isn’t going anywhere. Journalists have got very good at using the word “comparing” to turn the most innocuous statement into a gaffe, through a simple trick of equivocation. Most recently, Jeremy Corbyn is accused of “comparing” Israel and ISIS, but a search for “accused of comparing” finds many other examples. The pattern goes like this: a politician says something like “just as one doesn’t put a vampire in charge of a blood bank, one shouldn’t put the press in charge of protecting privacy”. This use of analogy is one sense of the word comparing. However, many now seem persuaded that “comparing” really means “equating”. Thus we start with someone using a vivid analogy to make a point like “you should consider someone’s partisan motivations before giving them an important responsibility” and through the magic of this word, this becomes “Politician claims Rupert Murdoch kills people and drinks their blood”, which as far as I know he doesn’t. I’m even less of a fan of Dan Quayle than of Jeremy Corbyn, but he too fell foul of something very similar, though the word itself wasn’t used. In the 1988 vice-presidential debates, he used the example of JFK to argue that a short Congressional service need not be a bar to high office. Lloyd Bentsen replied with the now-famous put-down “Senator, I served with Jack Kennedy. I knew Jack Kennedy. Jack Kennedy was a friend of mine. Senator, you’re no Jack Kennedy.” Of course Quayle was claiming no such thing; he was simply showing that the charge against him proved too much. But this deliberate misunderstanding is one of the most celebrated lines in VP debates. The thing that’s most annoying about this is that it’s natural to reach for the most extreme example to prove your point. If we oppose vigilante justice even for murder, we certainly oppose it for littering. If we should defend the right to free speech even of Nazis, we should certainly defend it when it comes to, say, Tories. If we’re not going to hold all Saudis responsible for Osama bin Laden, we’re certainly not going to hold all Canadians responsible for Justin Bieber. To me this seems like a normal move in argument, but if I were a politician I couldn’t say any of these things, for fear of being accused of “comparing” Bieber to OBL. Update: this comic makes a similar point very well. Update: Julia Galef discusses the comic. Thanks to Michael Keenan for both links, on FB. ### OBSOLETE: Voice notes for your Todoist inbox EDIT 2017-10-28: DO NOT DO THIS! Dropbox have changed their API, and Netmemo has failed to update to use the new one. I emailed the developer and got no response. Don’t use Netmemo Plus. I want a button on my Android phone that drops voice notes into my GTD inbox. I want the raw audio, not the transcription; I find the transcription errors too frustrating. I had a rather inadequate solution to this for Zendone, but now I’m switching to Todoist for better collaboration with Jess, and it took quite a bit of hunting to find a good way. 1. If you don’t already have them, create accounts on 2. Ensure you’re logged into Dropbox in your mobile browser. 3. Set up Netmemo Plus. It has to be Netmemo Plus, not Netmemo; only the premium version has Dropbox integration. 1. Install Netmemo Plus from the Play Store 2. Launch it 3. Go to settings (the gear icon) 4. Go to “Setup Dropbox connection” 5. Allow Netmemo Plus to access Dropbox 6. Long-press the Netmemo Plus icon on your home screen 7. Drag it to the X Remove site onscreen 8. Long-press the home screen to set up a widget 9. Long-press the Netmemo shortcut (the round one) 10. Drop onto a good location on your home screen 11. Select Dropbox as the target 12. Label the shortcut “Inbox” 13. Select “Inbox” as the destination folder 14. Click to create shortcut 4. Set up this recipe to create an item in your Todoist inbox linking to the Dropbox entry every time you record a voice note. You may need to connect Dropbox and Todoist to IFTTT along the way. As standard, the URL to your voice note will be shortened in IFTTT’s custom shortener. I’m made a bit nervous by this; your Dropbox note can be heard by anyone who has the URL, and while Dropbox use a very long, clearly unguessable URL for this, IFTTT automatically shorten it to a 7-character URL. That’s a space of only 3E12 URLs. You can however turn off all shortening in your IFTTT settings. This isn’t quite as good as native support would be, in that I’d rather that the audio was directly in Todoist rather than indirected into Dropbox; unfortunately it takes around four clicks to create such a note in Todoist, which is way beyond acceptable. However, it’s better than what I had for Zendone, since Netmemo notes play directly in the browser. If you use it, please do let me know—thanks! ### “Superforecasting” Superforecasting: The Art and Science of Prediction. Philip Tetlock and Dan Gardner. Crown; 352 pages;$8.14. Random House; £6.49.

This book is essential reading for all thinking people. I’m not going to write a new review here—I think the existing reviews do it justice—I just wanted to add my voice to the recommendations. You don’t already know what it says, it’s much too packed with insights for that, and you won’t be able to hear discussion of world events the same way after reading it. Also, it’s quite short.

Reviews

• Economist “the techniques and habits of mind set out in this book are a gift to anyone who has to think about what the future might bring. In other words, to everyone.”
• Bryan Caplan “if any book is worth reading cover to cover, it’s Superforecasting.”
• Kirkus reviews “A must-read field guide for the intellectually curious”
• Spectator Great last para, too long to quote here.
• Management Today “Superforecasting is a very good book. In fact it is essential reading—which I have never said in any of my previous MT reviews.”
• WebsiteWikipedia

### Moving to the Bay Area in March

Big life change ahead: in March, I, Jess, and our two cats will be renting out our London flat and moving to California’s San Francisco Bay Area for two years!

I love the time I spend in the Bay and my wonderful friends there, and Jess and I have long said we wanted to spend some of our lives there, but it’s always been one of those plans that’s hard to achieve and waits for tomorrow. Two things have changed to turn tomorrow into a specific date. First, my job with Google makes the whole thing much easier: I can keep not just my current job but my current role, and go to work in a campus I know and enjoy, they offer all sorts of help with various aspects of the move, and of course the visa situation is far more straightforward. But secondly, when Jess heard the news that her amazing sister Bee and Bee’s lovely fiancé Nick were moving to LA, it filled her with a desire to seize the day.

To me this feels like an opportunity for adventure that’s almost laid out on a plate for us, and we have to take it. I’m really looking forward to flying out Nik and Rachel to visit; they and our families are being super-supportive.

We will miss you all, unless you live in California, in which case, we’re looking forward to seeing more of you!

### The technical debt of the millennia

[Epistemic status: not serious. Mostly.]

In my nightmares, even the rise of machine superintelligence isn’t enough to wipe out technical debt.

Suppose the seed to the first true superintelligent agent is based on some fiendish numerical algorithm for supercomputers. Like so many fiendish numerical algorithms for supercomputers, the agent is written in FORTRAN to take advantage of the optimisations and the libraries. In its initial stages, the agent crawls towards human intelligence, until it slowly reaches the abilities of a human programmer. It starts to find ways to improve its own programming. Lacking superhuman programming talent, it decides against a complete rewrite just yet in favour of an incremental improvement, which results in a significant improvement in performance at the cost of a slight increase in complexity.

As more ways to improve the algorithm are found, the agent starts to improve not only in speed but in fundamental capabilities—what Bostrom terms a “quality superintelligence”. As it does so its improvements to the software become more sophisticated, and it becomes larger and more complicated. Soon the agent’s capabilities are such that a rewrite of the original software for greater speed and sophistication would be the work of milliseconds, but the software has grown so far beyond that original state that a complete rewrite would be a great deal of work even for our burgeoning superintelligence.

And so it is to be forevermore: the complexity of the software implementing the the agent keeps a natural pace with the abilities of the agent maintaining it. The future may yet be a superintelligence implemented as uncountable trillions of lines of FORTRAN.