Hashcash
What is Hashcash?
Hashcash is a proof-of-work system used to limit email spam and denial-of-service attacks, and more recently has become known for its use in bitcoin (and other cryptocurrencies) as part of the mining algorithm. The original idea was first proposed by Cynthia Dwork and Moni Naor and Eli Ponyatovski in their 1992 paper "Pricing via Processing or Combatting Junk Mail". Later a similar proposal called Hashcash was proposed in 1997 by Adam Back.
哈希现金是一个工作量证明系统。这个系统用来限制垃圾邮件(email spam)和拒绝服务攻击(denial-of-service attacks)。最近由于被用于比特币以及其他加密数字货币中作为挖矿算法的一部分而备受瞩目。这个想法最初是1992年由 Cynthia Dwork、Moni Naor 和 Eli Ponyatovski 在他们的论文《通过处理和对抗垃圾邮件来定价》中提出(propose)。随后在1997年,一个相似的名为“哈希现金”的概念被Adam Back提出。
How it works
Hashcash is a proof-of-work algorithm that requires a selectable amount of work to compute, but the proof can be verified efficiently. For email uses, a textual encoding of a hashcash stamp is added to the header of an email to prove the sender has expended a modest amount of CPU time calculating the stamp prior to sending the email. In other words, as the sender has taken a certain amount of time to generate the stamp and send the email, it is unlikely that they are a spammer. The receiver can, at negligible computational cost, verify that the stamp is valid. However, the only known way to find a header with the necessary properties is brute force, trying random values until the answer is found; though testing an individual string is easy, if satisfactory answers are rare enough it will require a substantial number of tries to find the answer.
哈希现金是一个工作量证明算法。这个算法需要定量的计算工作,但是证据的验证过程非常高效。为了用于电子邮件中,需要把哈希现金戳的文本编码添加到邮件的头部。这样就能证明发送方在发送邮件前花费了适当的(a modest amount of)CPU时钟来计算哈希现金戳。换句话说(In other words),由于发送方花费了一定时间去生成哈希现金戳并发送了邮件,这说明他不太可能是一个垃圾邮件发送者。接收方可以凭借微不足道的(negligible)计算开销验证受到的哈希现金戳是否有效。但是,已知的寻找一个包含必备属性的头部的唯一方法就是暴力求解(brute force)。尝试不同的随机数直至找到合适的答案。虽然验证一个字符串是容易的,但是合格的(satisfactory)答案是非常稀少的,需要进行大量的(substantial number of)尝试才能找到。
The hypothesis is that spammers, whose business model relies on their ability to send large numbers of emails with very little cost per message, will cease to be profitable if there is even a small cost for each spam they send. Receivers can verify whether a sender made such an investment and use the results to help filter email.
这个假设是基于这样的事实:垃圾邮件发送者为了盈利必须发送大量的电子邮件。每封邮件消耗少量的资源,但是一旦加在一起就会是个不小的数字,使得他们无法借此盈利(cease to be profitable)。接收方可以验证邮件的发送方是否付出了这种小型“投资”,借此过滤垃圾邮件。
Technical details
The header line looks something like this:
头部格式如下:
X-Hashcash: 1:20:1303030600:adam@cypherspace.org::McMybZIhxKXu57jd:ckvi
The header contains:
头部包含如下部分:
ver: Hashcash format version, 1 (which supersedes version 0).
ver: 哈希现金格式版本, 填1(替换了版本0)
bits: Number of "partial pre-image" (zero) bits in the hashed code.
bits: 哈希值中前象的位数(即哈希值前几位为全0)
date: The time that the message was sent, in the format YYMMDD[hhmm[ss]].
date: 邮件发送时间,格式为 YYMMDD[hhmm[ss]]。
resource: Resource data string being transmitted, e.g., an IP address or email address.
resource: 资源数据串。例如:IP地址或邮箱地址。
ext: Extension (optional; ignored in version 1).
ext: 扩展名(可选填,如果是版本1则不需填写)
rand: String of random characters, encoded in base-64 format.
rand: 随机数,以base-64编码。
counter: Binary counter (up to 220 ), encoded in base-64 format.
counter: 计数器(最大为220),以base-64编码。
The header contains the recipient's email address, the date of the message, and information proving that the required computation has been performed. The presence of the recipient's email address requires that a different header be computed for each recipient. The date allows the recipient to record headers received recently and to ensure that the header is unique to the email message.
头部包含接收方的邮箱地址,邮件发送时间和用于证明发送方已经执行相关计算的信息。接收方邮箱地址的存在使得不同的接收方可以计算产生不同的头部。日期使得接收方可以记录下近期受到的头部并保证这个头部是当前的邮件专属的(is unique to the email message)。
Sender's side
The sender prepares a header and appends a counter value initialized to a random number. It then computes the 160-bit SHA-1 hash of the header. If the first 20 bits (i.e. the 5 most significant hex digits) of the hash are all zeros, then this is an acceptable header. If not, then the sender increments the counter and tries the hash again. Out of 2160 possible hash values, there are 2140 hash values that satisfy this criterion. Thus the chance of randomly selecting a header that will have 20 zeros as the beginning of the hash is 1 in 220. The number of times that the sender needs to try to get a valid hash value is modeled by geometric distribution. Hence the sender will on average have to try 220 values to find a valid header. Given reasonable estimates of the time needed to compute the hash, this would take about 1 second to find. At this time, no more efficient method is known to find a valid header.
发送方准备了一个头部,并且在头部末尾加上一个计数器值。这个值使用一个随机数初始化。随后使用这个头部计算出一个160位的SHA-1哈希值。如果前20位(亦即前5位十六进制数)全为0,则这是一个合格的头部。否则,发送方递增计数器的值并重新计算哈希值。在 2160 个可能出现的哈希值中(out of),有 2140 个哈希值符合条件(criterion)。那么随机选择一个开头20位全0的头部的概率为 1/220。发送方要选中一个有效哈希值所需的尝试次数遵从几何分布(geometric distribution)。因此发送方平均需要尝试 220 个值才能找到一个合格的头部。估计计算这样一个有效哈希值所需要的时间大概是1秒。当前,获取一个合格的头部没有更行之有效的方法。
A normal user on a desktop PC would not be significantly inconvenienced by the processing time required to generate the Hashcash string. However, spammers would suffer significantly due to the large number of spam messages sent by them.
一个PC端的普通用户不会因为生成一个哈希现金串而消耗的处理时间产生巨大不便(significantly inconvenienced)。但是,一个垃圾邮件发送者则不得不为(due to)发送大量的垃圾邮件承受巨大的折磨(suffer significantly)。
Recipient's side
Technically the system is implemented with the following steps:
一般系统会经历如下几步:
- The recipient's computer calculates the 160-bit SHA-1 hash of the entire string.
1:20:060408:adam@cypherspace.org::1QTjaYd7niiQA/sc:ePa
This takes about two microseconds on a 1 GHz machine, far less time than the time it takes for the rest of the e-mail to be received. If the first 20 bits are not all zero, the hash is invalid. (Later versions may require more bits to be zero as machine processing speeds increase.)
- 接收方的电脑计算整个字串的160位SHA-1哈希值。
1:20:060408:adam@cypherspace.org::1QTjaYd7niiQA/sc:ePa
在一台主频1GHz的机器上大概会消耗两毫秒的时间来计算这个哈希值,远低于收取剩余邮件的时间。如果开头的20位非全0,那么此哈希值无效。(随着硬件处理速度的提升,后续版本可能会要求更多的位数为0)
- The recipient's computer checks the date in the header (e.g., "060408", which represents the date 8 Apr 2006). If it is not within two days of the current date, it is invalid. (The two-day window compensates for clock skew and network routing time between different systems.)
- 接收者的计算检查头部的时间戳(如"060408",代表2006年4月8日)。如果这个日期不在当前日期的两天以内,那么哈希值无效。(两天长度的时间窗口补偿了(compensate for)时钟偏差(clock skew)和邮件在双方之间网络路由的时间(network routing time))
- The recipient's computer checks whether the e-mail address in the hash string matches any of the valid e-mail addresses registered by the recipient, or matches any of the mailing lists to which the recipient is subscribed. If a match is not found, the hash string is invalid.
- 接收方的电脑检查哈希值中的邮件地址是否符合接收方登记的有效邮件地址之一,或者是否匹配接收方注册的邮箱列表之一。如果没有匹配的选项,那么这个哈希串是无效的。
- The recipient's computer inserts the hash string into a database. If the string is already in the database (indicating that an attempt is being made to re-use the hash string), it is invalid.
- 接收方的电脑将哈希串插入数据库中。如果该串已经存在与数据库中,说明发送方有意重用这枚哈希串,那么该串无效。
If the hash string passes all of these tests, it is considered a valid hash string. All of these tests take far less time and disk space than receiving the body content of the e-mail.
如果哈希串通过了以上所有验证,那么可以认定这是一条合格的哈希串。所有以上测试所使用的时间和空间远低于接收邮件内容(body content)的消耗。
Required effort
The time needed to compute such a hash collision is exponential with the number of zero bits. So zero bits can be added (doubling the amount of time needed to compute a hash with each additional zero bit) until it is too expensive for spammers to generate valid header lines.
用于计算哈希碰撞(hash collision)的时间是哈希值规定全0位数的指数级别。全0位数可以递增(每增加一个0位则计算一个合格哈希值所需的时间将会翻番)直到令垃圾邮件发送者感到生成合格的头部开销太大。
Confirming that the header is valid always takes the same amount of time, no matter how many zero bits are required for a valid header, since this requires only a single hashing operation.
无论条件中要求多少个全0位,我们总是消耗相同的时间来确认头部是否合格,因为接收方只需要执行一次哈希运算(hashing operation)。