01/28/2012 10:40 PM, Maxim Kammerer:
> On Sat, Jan 28, 2012 at 20:52, anonym <anonym@???> wrote:
>> When Tor shuts down it writes the consensus to the data dir even if it
>> is unverified. When Tor starts it will load the saved consensus, and now
>> it will find that it's valid, and hence *NOT* download a new consensus.
>> All should be well from here on.
> 
> It will download a new consensus after an hour. If htpdate fails,
> that's where Tor stops working.
Hmm, I think you are partially correct. I now see that there is an issue
with consensus *updates* (i.e. when we fetch a new, fresher one) that
can occur when you have a clock that is too slow (but this doesn't seem
like what you meant?):
(This is a quite lengthy analysis, but the TL;DR is: Let's revert back
to vmid = middle of [valid-after,valid-until] because clocks that are E
minutes back in time may cause issues (temporarily Tor outages) after
consensus updates that happen at the (60-E):th minute or later during
any hour.)
Let's first note that the first consensus we get (the one tordate uses
to set the time according to) will always be valid and Tor should work,
so problems can only occur when Tor fetches a new consensus to get a
fresher one at a later time (sometime after fresh-until but before
valid-until). So the problems will not hit the user immediately, and it
will not hit the user at all if htpdate succeeds. But let's assume
htpdate fails for unknown reasons, so the time is only set by tordate.
A clock set by tordate to the middle of [valid-after, fresh-until] can
only be up to  30 minutes incorrect. If our clock is in the future any
new consensus we fetch will still be valid for at least a couple of
hours so it is ok. If there was a problem with a clock in the future our
previous vmid would only make things worse since it always gives us a
clock that is 30 to 90 minutes in the future. (This also explains why
htpdate failures imply that hidden services doesn't work (hidden
services running on Tors prior to the current alpha require a clock that
is no more than 30 minutes incorrect).)
What remains is a slow clock which is E minutes back in time, with 0 < E
<= 30 (we skip seconds for simplicity). Say we fetch a consensus at
H:X:00, that is 60-X minutes before our current consensus' fresh-until
when the authorities makes a new consensus (with valid-after H+1:00:00)
available. Then we have two cases:
If X < 60-E both our clock and the real time is before fresh-until so we
get a consensus with valid-after H:00:00 which is fine since our clock
is H:X:00.
If X >= 60-E our clock is before fresh-until but real time has past
beyond it, so there is a new consensus with valid-after H+1:00:00 while
our time is just H:X:00. The consensus will not be valid for another
60-X minutes.
So the question is at what times a Tor client tries to download a
consensus update. I ran a test for some hours starting at around
00:55:00 UTC yesterday, where I set the time to 00:30:00, so E=25 which
means that fetches that occur after the ~35th minute should result in
Tor outages due to invalid consensuses. I got the following results (all
times below are in UTC too):
Tor log:
Jan 29 03:56:28.191 [Warning] Our clock is 3 minutes, 32 seconds behind
the time published in the consensus network status document (2012-01-29
04:00:00 GMT).  Tor needs an accurate clock to work correctly. Please
check your time and date settings!
Jan 29 03:56:28.194 [Notice] Our directory information is no longer
up-to-date enough to build circuits: We have no recent network-status
consensus.
Jan 29 03:56:28.194 [Notice] I learned some more directory information,
but not enough to build a circuit: We have no recent network-status
consensus.
Jan 29 04:00:03.305 [Notice] We now have enough directory information to
build circuits.
Jan 29 04:00:04.352 [Notice] Tor has successfully opened a circuit.
Looks like client functionality is working.
(Plus a similar series of log entries at around 7:00:00)
I was also logging changes to cached-consensus:
Sun Jan 29 00:30:59 UTC 2012:
valid-after 2012-01-29 00:00:00
fresh-until 2012-01-29 01:00:00
valid-until 2012-01-29 03:00:00
Sun Jan 29 02:08:16 UTC 2012:
valid-after 2012-01-29 02:00:00
fresh-until 2012-01-29 03:00:00
valid-until 2012-01-29 05:00:00
Sun Jan 29 03:56:28 UTC 2012:
valid-after 2012-01-29 04:00:00  <-- Not valid yet
fresh-until 2012-01-29 05:00:00
valid-until 2012-01-29 07:00:00
Sun Jan 29 05:31:51 UTC 2012:
valid-after 2012-01-29 05:00:00
fresh-until 2012-01-29 06:00:00
valid-until 2012-01-29 08:00:00
Sun Jan 29 06:53:25 UTC 2012:
valid-after 2012-01-29 07:00:00  <-- Not valid yet
fresh-until 2012-01-29 08:00:00
valid-until 2012-01-29 10:00:00
Sun Jan 29 08:31:47 UTC 2012:
valid-after 2012-01-29 08:00:00
fresh-until 2012-01-29 09:00:00
valid-until 2012-01-29 11:00:00
Sun Jan 29 10:23:27 UTC 2012:
valid-after 2012-01-29 10:00:00
fresh-until 2012-01-29 11:00:00
valid-until 2012-01-29 13:00:00
Sun Jan 29 12:41:52 UTC 2012     <-- Since X = 41 >= 60-E = 60-25 = 35
valid-after 2012-01-29 12:00:00  <-- we should've got 13:00:00. Weird.
fresh-until 2012-01-29 13:00:00
valid-until 2012-01-29 15:00:00
Sun Jan 29 14:45:17 UTC 2012:    <-- Weird
valid-after 2012-01-29 14:00:00  <-- again.
fresh-until 2012-01-29 15:00:00
valid-until 2012-01-29 17:00:00
Sun Jan 29 16:36:35 UTC 2012:    <-- Weird
valid-after 2012-01-29 16:00:00  <-- again.
fresh-until 2012-01-29 17:00:00
valid-until 2012-01-29 19:00:00
Sun Jan 29 18:40:58 UTC 2012:    <-- Weird
valid-after 2012-01-29 18:00:00  <-- again.
fresh-until 2012-01-29 19:00:00
valid-until 2012-01-29 21:00:00
So it seems Tor can fetch a new consensus pretty close to fresh-until
and then we expecience problems with our consensus. Something is weird,
though, because several other of the above logged consensus fetches
should have failed but didn't. Since consensus updates (i.e. every
consensus fetch except the first) are not necessarily fetched from
authorities directly, but may be fetched from directory mirros, I
suspect there may be delays before a new consensus reaches all mirrors.
In the weird cases we just happened to find one with the old consensus.
However, besides logging the consensus changes I also tried a wget fetch
over Tor every minute, and none of them failed. Maybe Tor had old
circuits that were usable for a long enough time to avoid problems. Or
maybe polipo's cache messed things up. I modified the script so that the
cache is cleared etc, but I did that only after the last not-yet-valid
consensus was downloaded. Whatever.
Now, in Tails this error only occurs if htpdate fails (and this should
be unlikely nowadays) but I think this potential problem still warrants
for us not setting time to the middle of [valid-after, fresh-until].
Setting it to fresh-until (time error 0 to 60 minutes in the future) or
up to one hour later would be safe though. I guess it's best to have a
margin in both ways, so our old middle of [valid-after,valid-until]
seems like the safest choice.
If everyone agree, let's revert.
> Also, won't other nodes treat another Tor node with clock time before
> their consensus differently?
No idea.
Cheers!