From: dalton@cup.hp.deletethis.com (David Dalton) Subject: NTP advice given Date: 14 Sep 1999 00:00:00 GMT Message-ID: <7rm58f$55p$1@ocean.cup.hp.com> Organization: Hewlett-Packard Galactic Enterprises Keywords: FAQ howto Newsgroups: comp.protocols.time.ntp Summary: NTP advice for beginners Here is an article about NTP that I am writing for the HP-UX users, but it also has quite a bit of good info for users on other platforms. Any and all feedback/improvements solicited. This could eventually become a much larger article with advice on "Planning Your NTP Hierarchy", diagrams of various configurations, and data/graphs of clock performance. -- -> My $.02 only Not an official statement from HP {They make me say that} -- As far as we know, our computer has never had an undetected error. --------------------------------------------------------------------------- David Dalton dalton@cup.hp.deletethis.com 408/447-3016 Network Time Protocol on HP-UX ============================== The Network Time Protocol is a family of programs that are used to adjust the system clock on your computer and keep it synchronized with external sources of time. It is designed to provide accuracy in the microsecond to millisecond range with hardware available in the mid 1990s. The Network Time Protocol is described in RFC-1305. The family of programs was developed for the public domain at the University of Delaware, and plenty of good information is available at their web site: http://www.eecis.udel.edu/~ntp Hewlett-Packard provides NTPv3.5 from the University of Delaware, which is fully compliant with RFC-1305. Be sure to get the latest version (with lots of new clock drivers turned on) by installing one of these patches as appropriate for your system: PHNE_19710 (HP-UX 10.X) PHNE_19711 (HP-UX 11.X) Who Needs It? ------------- All clocks drift, and the clocks inside computers are no exception. These quartz-based oscillators are usually specified to be within 50 parts-per-million, but even this is not guarranteed. Temperature fluctuations make the problem even worse. Since a month has over two million seconds in it, a drift rate of 50ppm quickly adds up to 100 seconds per month or 4 seconds per day. Even a relatively good drift rate of 5ppm still leads to a drift of almost 1 second per day. Humans will rarely notice a deviation of one second from the correct "official" time, but computers are more sensitive. Databases and transaction processing applications can become confused if client and server machines have different ideas of time. Debugging system problems becomes difficult if the the timestamps in the system logs are not true. The "make" utility that everybody uses to manage the compilation of software looks at file timestamps (with a one second granularity) to decide which .o files need to be rebuilt when the underlying source file has been changed. If some of the directories are NFS mounted, and the server and client have different notions of the current time, then "make" can fail to rebuild some derived objects and produce an executable that is not based on the most up-to-date sources. Even the one second granularity of file timestamps means that your client and server must be synchronized quite a bit closer than 1000 milliseconds in order to guarrantee that "make" will always do the right thing. Fortunately NTP is available to solve these problems and keep your computer clocks ticking in synchronization forever. And it is free! All you need is a connection to the Internet or your own GPS receiver, and an ordinary (say ethernet) network inside your own building, and a little knowledge about how to configure NTP and get it working. And think of how much better you will feel when you know that all of your computers are synchronized within milliseconds of UTC! ntpdate (one-shot) vs xntpd (continuous adjustment) --------------------------------------------------- Depending on your needs, you may choose to have your system clock updated periodically (via a cron job) or to run the full-time daemon that continuously adjusts your system clock and keeps it synchronized at all times. The periodic approach (ntpdate) is simpler and more robust, but accuracy is limited and it can produce small jumps in time. The continuous approach (xntpd daemon process) is harder to set up and very finicky about network service quality, but provides the best accuracy and the smoothest operation. Some customers are absolutely allergic to the small jumps in time that ntpdate can produce, especially if the jump is backwards. When the NTP daemon is controlling your system clock it is looking at a collection of timesources, choosing the "best" one at the moment, and trying to drive the offset to zero between that timesource and the local system. NTP measures round-trip times and applies statistical methods to evaluate each variance on an ongoing basis. In NTP terminology this variance is known as "dispersion", and it is bad. This way a path to a server across a congested network looks "worse" than another path to another server across an uncongested network, even if the servers themselves are the same quality and the same distance away from the client. You will be surprised how smart the NTP daemon is about this. Sources of Time --------------- The passage of time is a fact of life, something that we have no control over. But time-of-day is a human construct, and the official definition of time-of-day is regulated and distributed by government organizations in many different countries. The major players are: US Naval Observatory http://tycho.usno.navy.mil/ -GPS satellite US National Institute of Standards and Technology (formerly NBS) http://www.bldrdoc.gov/timefreq -WWV terrestrial shortwave (2.5MHz 5MHz 10MHz 15MHz 20MHz) -WWVH terrestrial shortwave (2.5MHz 5MHz 10MHz 15MHz) -WWVB terrestrial longwave (60 kHz) -GOES satellite -NIST modem time service National Physical Laboratory (UK) http://www.npl.co.uk/ Royal Greenwich Observatory http://greenwich2000.com/ International Earth Rotation Service http://hpiers.obspm.fr/ These organizations (and many others) all coordinate with each other, making regular comparisons of their cesium atomic clock ensembles and hydrogen masers, and also consulting with astronomers who track the (very gradual) slowing of the Earth's rotation and declare the insertion of a leap second as needed approximately every 18 months. These governmental organizations keep their clocks within nanoseconds of each other at all times. The most common distribution mechanisms are radio signals (terrestrial and satellite broadcast), phone/modem, and computer networks. Having your own radio receiver provides the ultimate accuracy, and there are no worries about network delays, congestion, or outages. But the radio receivers are expensive, typically 1000 to 10,000 dollars, and so very few people have them. The popular radio methods are: Global Positioning System (satellite) WWV (terrestrial North America) DCF77 (terrestrial Europe) On the Internet there are many public timeservers available for you to connect to, and they will provide NTP services free on a limited basis. This is the most popular method (due to low cost), and this is the main benefit provided by NTP. There are lists of public (stratum-1 and stratum-2) timeservers at the University of Delaware NTP homepage. The main problem is that many people are protected behind firewalls and so cannot use the public timeservers. If you are behind a firewall (or not connected to the Internet at all) and cannot justify the expense of a radio receiver, there is still a way to declare one of your NTP machines to be a timeserver anyway, and have it serve time to the rest of your closed domain. This is not highly recommended because there is no real synchronization to the real world, but it is possible to keep the system clocks of all of the computers in your domain very close to each other this way. Sometimes close together is enough. This is known as the "Local Clock Impersonator" and is very simple to configure with driver#1. Configuring Your First Server ----------------------------- Your first task is to choose a source of time. The three choices are network timeservers, radio receivers, or the local clock impersonator. NETWORK TIMESERVERS: In large organizations there may already be network timeservers in operation near you, so consult your system administrators for these. Ask your Internet Service Provider (ISP) if they provide an NTP timeserver on port 123 for their users. Otherwise look at the list of public time servers (and advice about how to use them politely) that is maintained at the University of Delaware: http://www.eecis.udel.edu/~mills/ntp/servers.htm Choose three (or more) that are nearby geographically. If you are in London, it would usually not be wise to choose timeservers in Australia or Brazil. Long distances over water usually mean a poor network connection in terms of delay and path symmetry. Router hops also delay the packets in unpredictable ways. You will need to evaluate these potential timeservers (and the network paths) to decide if they are close enough (ping time, delay and variation) and well configured (ntpq output) before you use them. Some timeservers may also require notification before you use them, so pay attention to the ettiquitte of the listings at UDelaware. Don't point more than three of your machines at any one public timeserver. Use that small group of your machines (at stratum-2 or stratum-3) as the main timeservers for the rest of your organization. The public stratum-2 servers can provide good timeservice for almost anybody, and their access policies are less restrictive (in general) than the stratum-1 servers. The quality of the network service between your machine and the public timeserver (read: your ISP) dominates the errors you will see, and makes the distinction between stratum-1 and stratum-2 almost meaningless for most purposes. Dispersion is a measurement of (timeserver quality) + (network quality), but in reality the network quality swamps everything else. If your network is slow or overloaded, then dispersion will be high no matter how good the timeservers themselves are. Many customers have misunderstood the importance of network service quality. For most, NTP is their first experience with an application that is actually sensitive to network service quality. Other applications (FTP, DNS, NFS, sendmail) can tolerate huge delays in packet delivery because their data is not time-critical. But NTP is different, and delays are deadly for your time service. Delays immediately show up in the dispersion figures. If you care about milliseconds, you must vigorously pursue your dispersion measurements and pay attention to network service quality. If you care about microseconds, you must abandon the network timeservers and purchase a radio clock for each NTP client. Let's evaluate some different public timeservers from the stratum-2 list. First is a machine that HP is providing in Silicon Valley for public use in North America. This machine was recently upgraded from stratum-2 to stratum-1 with a new GPS receiver, but the lists at UDelaware might not have been updated yet. ntp-cup.external.hp.com (192.6.38.127) Location: Cupertino CA (SF Bay area) 37:20N/122:00W Synchronization: NTPv3 primary (GPS), HP-UX Service Area: West Coast USA Access Policy: open access Contact: dalton@cup.hp.com Note: no need to notify for access, go right ahead! From my location in Silicon Valley I can ping the timeserver and see that it is about 5 milliseconds away: /usr/sbin/ping ntp-cup.external.hp.com 64 5 PING ntp-cup.external.hp.com: 64 byte packets 64 bytes from 192.6.38.127: icmp_seq=0. time=5. ms 64 bytes from 192.6.38.127: icmp_seq=1. time=4. ms 64 bytes from 192.6.38.127: icmp_seq=2. time=4. ms 64 bytes from 192.6.38.127: icmp_seq=3. time=5. ms 64 bytes from 192.6.38.127: icmp_seq=4. time=5. ms ----ntp-cup.external.hp.com PING Statistics---- 5 packets transmitted, 5 packets received, 0% packet loss round-trip (ms) min/avg/max = 4/4/5 Now let's query the timeserver with "ntpq -p" to find out what synchronization sources it is using: /usr/bin/ntpq -p ntp-cup.external.hp.com remote refid st t when poll reach delay offset disp ============================================================================== *REFCLK(29,1) .GPS. 0 l 35 32 376 0.00 -0.004 0.02 -bigben.cac.wash .USNO. 1 u 47 128 377 40.16 -1.244 1.37 clepsydra.dec.c usno.pa-x.dec.c 2 u 561 1024 377 16.74 -4.563 4.21 -clock.isc.org .GOES. 1 u 418 1024 377 6.87 -3.766 3.57 hpsdlo.sdd.hp.c wwvb.col.hp.com 2 u 34 16 204 48.17 -8.584 926.35 +tick.ucla.edu .USNO. 1 u 111 128 377 20.03 -0.178 0.43 +usno.pa-x.dec.c .USNO. 1 u 42 128 377 6.96 -0.408 0.38 This timeserver is synchronized (asterisk in column one) to "REFCLK(29,1)", which is a Trimble Palisade GPS receiver. The offset from GPS is currently 0.004 milliseconds and the dispersion is 0.02 milliseconds (both excellent values, smaller is better here). This timeserver also has several good stratum-1 and stratum-2 servers which it can fall back on if the GPS receiver stops working for any reason. Notice the line for "hpsdlo.sdd.hp.com" which has delay, offset, and dispersion measures that are markedly worse than any of the other sources. The timeserver "hpsdlo" is good enough, but the network in between has some problems, mainly evidenced by the large dispersion figure. There is nothing that NTP can do to reduce the dispersion. NTP is simply reporting to you what it sees out on the network. You must complain to your network service provider (not HP) if the dispersion numbers are too high. A full description of the "ntpq -p" output is shown in a sidebar below. In summary, ntp-cup.external.hp.com is a well-configured timeserver that is only 5 milliseconds away from my location on the network. It would be a good choice for a public timeserver for my location. Whether it is good for you depends on the "ping" round-trip times (and their variability!) at your location. Try it! Now let's look at a timeserver located on the east coast of North America: ntp.ctr.columbia.edu (128.59.64.60) Location: Columbia University Center for Telecommunications Research; NYC Synchronization: NTP secondary (stratum 2), Sun/Unix Service Area: Sprintlink/NYSERnet Access Policy: open access, authenticated NTP (DES/MD5) available Contact: Seth Robertson (timekeeper@ctr.columbia.edu) Note: IP addresses are subject to change; please use DNS /usr/sbin/ping ntp.ctr.columbia.edu 64 5 PING 128.59.64.60: 64 byte packets 64 bytes from 128.59.64.60: icmp_seq=0. time=83. ms 64 bytes from 128.59.64.60: icmp_seq=1. time=86. ms 64 bytes from 128.59.64.60: icmp_seq=2. time=85. ms 64 bytes from 128.59.64.60: icmp_seq=3. time=86. ms 64 bytes from 128.59.64.60: icmp_seq=4. time=83. ms ----128.59.64.60 PING Statistics---- 5 packets transmitted, 5 packets received, 0% packet loss round-trip (ms) min/avg/max = 83/84/86 These ping round-trip times are significantly greater than the west coast example because the target is 5000 kilometers (3000 miles) further away. Nonetheless, 85 milliseconds is not too bad for general NTP purposes. You will generally see dispersion measurements somewhat less than the ping round-trip times. The NTP daemon has an interesting watershed at 128 milliseconds (more on that later), but this example server at 85 milliseconds is comfortably below that. /usr/sbin/ntpq -p ntp.ctr.columbia.edu remote refid st t when poll reach delay offset disp ============================================================================== +clepsydra.dec.c usno.pa-x.dec.c 2 u 927 1024 355 108.49 -18.215 3.63 otc1.psu.edu .WWV. 1 - 17d 1024 0 28.26 -25.362 16000.0 *NAVOBS1.MIT.EDU .USNO. 1 u 214 1024 377 38.48 -0.536 0.90 tick.CS.UNLV.ED tock.CS.UNLV.ED 3 u 721 1024 377 2113.97 1004.94 824.57 132.202.190.65 0.0.0.0 16 - - 1024 0 0.00 0.000 16000.0 unix.tamu.edu orac.brc.tamus. 3 u 636 1024 377 47.99 3.090 9.75 at-gw2-bin.appl 0.0.0.0 16 - - 1024 0 0.00 0.000 16000.0 -cunixd-ether.cc 192.5.41.209 2 u 172 1024 377 3.39 12.573 1.14 cunixd.cc.colum 0.0.0.0 16 u 285 64 0 0.00 0.000 16000.0 +cs.columbia.edu haven.umd.edu 2 u 906 1024 376 2.41 -5.552 15.12 +129.236.2.199 BITSY.MIT.EDU 2 u 423 1024 376 13.43 -14.707 22.60 cucise.cis.colu cs.columbia.edu 3 u 62 1024 377 5.84 -1.975 12.70 This timeserver at Columbia University has a variety of stratum-1, stratum-2, and stratum-3 sources, which is good. It also has three sources which are not responding right now (reach=0), and one with very large delay, offset, and dispersion (tick.CS.UNLV.EDU). As before, this is due to networking problems between client and server (New York to Las Vegas, over 3000 km), not some fault with the NTP implementation at either end. This timeserver at Columbia is currently synchronized to NAVOBS1.MIT.EDU, but three others (marked with "+" in column one) are attractive and could step in immediately if NAVOBS1 failed for any reason. Now let's look at a timeserver in Australia, almost halfway around the planet from my location: ntp.adelaide.edu.au (129.127.40.3) Location: University of Adelaide, South Australia Synchronization: NTP V3 secondary (stratum 2), DECsystem 5000/25 Unix Service Area: AARNet Access Policy: open access Contact: Danielle Hopkins (dani@itd.adelaide.edu.au) /usr/sbin/ping ntp.adelaide.edu.au 64 5 PING huon.itd.adelaide.edu.AU: 64 byte packets 64 bytes from 129.127.40.3: icmp_seq=0. time=498. ms 64 bytes from 129.127.40.3: icmp_seq=1. time=500. ms 64 bytes from 129.127.40.3: icmp_seq=2. time=497. ms 64 bytes from 129.127.40.3: icmp_seq=3. time=498. ms 64 bytes from 129.127.40.3: icmp_seq=4. time=496. ms ----huon.itd.adelaide.edu.AU PING Statistics---- 5 packets transmitted, 5 packets received, 0% packet loss round-trip (ms) min/avg/max = 496/497/500 Here the ping round-trip times are much larger, around 500 milliseconds. Do not use a timeserver at this distance unless you are really desperate and understand what 500 milliseconds step changes mean to your users and applications. But don't just write this timeserver off! The round-trip times from your own location might be much smaller. Also note that the variation in round-trip times is small. We will investigate this a little further in a moment. /usr/sbin/ntpq -p ntp.adelaide.edu.au remote refid st t when poll reach delay offset disp ============================================================================== .otto.bf.rmit.ed 130.155.98.1 2 u 229 1024 376 16.34 7.132 7.87 .student.ntu.edu murgon.cs.mu.OZ 2 u 47 128 377 81.34 5.166 5.25 .203.31.96.1 murgon.cs.mu.OZ 2 u 13 256 373 115.74 30.147 38.54 .203.172.21.222 tick.usno.navy. 2 u 43 1024 367 866.64 47.316 65.32 -128.184.1.4 tictoc.tip.CSIR 2 u 99 128 377 13.40 -2.976 5.66 129.127.40.255 0.0.0.0 16 u - 64 0 0.00 0.000 16000.0 *tictoc.tip.CSIR .ATOM. 1 u 17 64 377 26.92 -0.071 1.71 .dishwasher1.mpc gilja.itd.adela 3 u 164 256 376 35.78 4.769 5.66 xclepsydra.dec.c usno.pa-x.dec.c 2 u 1468 1024 376 473.36 -53.841 12.89 murgon.cs.mu.OZ .GPS. 1 u 47d 1024 0 16.19 -398.80 16000.0 -augean.eleceng. murgon.cs.mu.OZ 2 u 12 128 377 1.83 3.270 1.21 .ns.saard.net augean.eleceng. 3 u 27 64 375 0.92 -0.013 1.19 +cuscus.cc.uq.ed tictoc.tip.CSIR 2 u 28 64 376 34.91 1.981 1.27 +staff.cs.usyd.e tictoc.tip.CSIR 2 u 3 64 375 25.21 0.158 1.97 .wasat.its.deaki tictoc.tip.CSIR 2 u 1 128 377 15.37 -2.492 1.69 .luna.its.deakin tictoc.tip.CSIR 2 u 123 128 172 16.11 -0.350 501.11 -earth.its.deaki tictoc.tip.CSIR 2 u 28 128 377 12.19 -3.582 2.15 phobos.its.deak tictoc.tip.CSIR 2 u 169 128 56 12.42 -2.325 1000.76 .sol.ccs.deakin. tictoc.tip.CSIR 2 u 136 512 265 13.89 -1.083 251.83 +argos.eleceng.a tictoc.tip.CSIR 2 u 23 64 377 1.82 0.197 1.21 .mercury.its.dea tictoc.tip.CSIR 2 u 123 256 377 16.91 -2.584 2.94 .orion.atnf.CSIR murgon.cs.mu.OZ 2 u 111 512 376 53.51 -0.712 5.92 +smig2a.City.Uni tictoc.tip.CSIR 2 u 49 64 376 7.14 0.268 1.07 +svdpw.City.UniS murgon.cs.mu.OZ 2 u 26 64 376 4.90 -0.833 1.88 .news.nsw.CSIRO. murgon.cs.mu.OZ 2 u 54 1024 377 135.85 43.108 62.45 +210.8.40.225 murgon.cs.mu.OZ 2 u 2 64 377 50.83 1.811 14.45 .203.103.99.66 tictoc.tip.CSIR 2 u 342 1024 376 82.82 -14.124 36.21 xpellew.ntu.edu. tictoc.tip.CSIR 2 u 408 1024 377 404.33 -159.77 161.36 xxox.lifelike.co tick.usno.navy. 2 u 494 1024 377 504.56 -59.200 5.60 This timeserver in Australia has one excellent stratum-1 source (tictoc.tip.CSIR) which it is currently synchronized to, one stratum-1 source which hasn't responded in a while (reach=0), and a wide selection of stratum-2 sources (attractive candidates marked with "+"). Some of the stratum-2 sources are less attractive due to high delay/offset/dispersion numbers and are marked "falseticker" ("x" in column one). This timeserver in Australia might be a good choice for you if you are reasonably nearby, but it is probably not a good choice for time clients in North America. When my timeserver in Silicon Valley is configured to use "sirius.ctr.columbia.edu" and "gpo.adelaide.edu" as timesources (among others), the output from "ntpq -p" looks like this (about 10 minutes after daemon startup): remote refid st t when poll reach delay offset disp ========================================================================= *REFCLK(29,1) .GPS. 0 l 25 32 377 0.00 0.413 0.03 +bigben.cac.wash .USNO. 1 u 56 64 377 39.54 -0.466 1.68 clepsydra.dec.c usno.pa-x. 2 u 122 512 377 6.32 -0.250 0.92 -clock.isc.org .GOES. 1 u 149 512 357 5.98 -3.045 0.46 hpsdlo.sdd.hp.c wwvb.col.h 2 u 25 32 126 56.29 -8.078 8.50 +tick.ucla.edu .USNO. 1 u 13 64 177 19.29 -0.265 0.26 +usno.pa-x.dec.c .USNO. 1 u 56 64 277 6.82 0.034 0.20 gpo.adelaide.ed tictoc.tip 2 u 15 16 377 470.52 54.789 0.90 sirius.ctr.colu NAVOBS1.MI 2 u 3 16 377 83.37 -8.372 1.24 The timeserver in Australia has a delay of 470 milliseconds, which is very similar to the "ping" round-trip times that we saw earlier. This leads to an offset value of 54 milliseconds, which is significantly worse than any of the other timesources. It is interesting to note that the offest is much less than the delay, which means that the round-trip is almost symmetric. NTP must assume the outbound and inbound travel times are equal, and the offset value gives an idea how unequal they might be. This is considerably better than 470/2 which would be the offset if NTP did not make this assumption. Also interesting is the very low dispersion value, which means that the round-trip time does not vary a lot as more packets are exchanged. Less than 1 millisecond is an excellent dispersion value for a trip of 15,000 kilometers. The timeserver in Australia is working out better than we had any right to expect at this distance, but it is still noticeably poorer than the other choices that are in North America. The timeserver at Columbia is better than the timeserver in Australia, due to the closer distance, but still noticeably worse than all of the other timesources. The Internet has not made physical distance irrelevant yet! You must choose a minimum of one timeserver, and it is a good idea to choose three or more for redundancy. More on redundancy later. Then put lines like this at the end of your "/etc/ntp.conf" file: server ntp-cup.external.hp.com server bigben.cac.washington.edu server sirius.ctr.columbia.edu RADIO RECEIVERS: GPS receivers are becoming very popular for NTP because the prices are dropping rapidly and the signal coverage is global. The main obstacle is usually the cost of mounting the antenna on the roof and running the cable indoors to your timeserver location. Antenna cables can get very expensive, and RS232 cabling has quite limited range. GPS receivers can range in cost from a few hundred dollars to many thousands of dollars, but the old adage is: "you get what you pay for". That certainly applies to GPS receivers for time service. Although you can buy inexpensive handheld consumer-grade GPS receivers with NMEA output on RS232 (I have a Garmin G-3+ for example), you will find that dispersion if very high (500 to 1000 milliseconds) with the consumer units and this can drive NTP almost crazy. It might drive you crazy as well. Many of these budget receivers can be made to work, but HP does not officially support them because of these dispersion problems. The officially supported GPS receivers are: HP58503 driver#26 (about 5000 dollars) Trimble Palisade driver#29 (about 1500 dollars) WWV (shortwave) and WWVB (longwave) receivers have been available in North America for 30 years, but the signal only reaches a few thousand kilometers from Ft Collins, Colorado. There is no way to receive these signals in Europe or Asia or anywhere in the southern hemisphere. The only officially supported WWVB receiver is: Spectracom Netclock/2 driver#4 (about 1500 dollars) Spectracom has a newer clock called the Netclock/GPS which is supposed to use the same driver#4 and appear to NTP exactly like the WWVB version. I haven't had a chance to try it yet. DCF77 (AM and FM) signals radiate from Frankfurt, Germany. Some good receivers are made by Meinberg, but none of the DCF77 receivers is officially supported by HP (because I have no way to test and troubleshoot them in Silicon Valley). To set up the HP58503A GPS receiver for NTP, you must have the receiver and antenna installed, and connected to a serial port on the HP-UX machine. Put these lines at the end of your "/etc/ntp.conf" file: server 127.127.26.1 minpoll 3 maxpoll 4 #fudge 127.127.26.1 time1 -0.955 # s700 #fudge 127.127.26.1 time1 -0.930 # s800 Uncomment the correct "fudge" line for your architecture. Then make a link to the device file that corresponds to the serial port you are connecting to the GPS unit: /usr/bin/ln -s /dev/tty0p0 /dev/hpgps1 The Trimble Palisade GPS receiver is very similar: /etc/ntp.conf ------------- server 127.127.29.1 # poll period is fixed at 32 seconds # no fudge required Device File ----------- /usr/bin/ln -s /dev/tty0p0 /dev/palisade1 The Spectracom Netclock/2 needs this setup: /etc/ntp.conf ------------- server 127.127.4.1 minpoll 3 maxpoll 4 # no fudge required Device File ----------- /usr/bin/ln -s /dev/tty0p0 /dev/wwvb1 A GPS receiver with NMEA output (not officially supported by HP) would need this setup (fudge time1 value (in seconds) determined by your own experiments): /etc/ntp.conf ------------- server 127.127.20.1 # Some NMEA Device (polling not applicable) fudge 127.127.20.1 time1 0.999 # offset value determined by you! Device File ----------- /usr/bin/ln -s /dev/tty0p0 /dev/gps1 Other radio receivers are very similar. Each one talks to a particular clock driver (#26 for HP GPS) and uses a particular device name (/dev/hpgps1 for HP GPS). The full list of drivers and devices is shown in /etc/ntp.conf.example when you install the latest NTP patch from HP. With all radio receivers you must pay attention to serial cabling issues. RS232 cabling seems very simple, but there are more ways to get it wrong than you might think. If you see reach=0 and disp=16000 for your new radio clock, then you are not actually communicating with the clock and it is time to check your cables carefully (including power). Most of the NTP problem calls received by the Support Center are traced to the RS232 cable between the radio clock and the HP-UX serial port. Don't let that happen to you. With the HP GPS receiver the output is text, so you can hook it up to a terminal (emulator) and "talk" to it to verify the cable connection. If you see any recognizable ASCII characters at all, then the RS232 connection is good. Receivers with NMEA output also emit strings of text that you can look at. The Trimble and Spectracom receivers use a binary format, so terminals (and emulators) are not useful with them. The Trimble receivers are shipped with a peecee program on a floppy disk called TSIPCHAT.EXE that allows you to communicate with the GPS receiver, see the satellites and all sorts of other operational data, and generally verify that your GPS unit is healthy and the cable connections are correct. Very handy. UNDISCIPLINED LOCAL CLOCK: This is a hack to allow your machine to use its own system clock as a reference clock, i.e., to free-run using no outside clock discipline source. Your machine can then be an NTP server for the rest of your organization. This is useful if NTP is to be used in an isolated environment with no radio clock available. Another application for this driver is if a particular server clock is to be used as the clock of last resort when all other normal synchronization sources have gone away. Configuration is very similar to using a radio receiver. Just put these two lines at the end of your "/etc/ntp.conf" file: server 127.127.1.1 # Local Clock Impersonator fudge 127.127.1.1 stratum 10 # show poor stratum No device file is needed for the Local Clock Impersonator. It is a good idea to use the "fudge" line to set the stratum to 10 (or higher!) so that clients with access to better timeservers will synchronize to the the real stratum-1 and stratum-2 machines. Starting the NTP Daemon ----------------------- Edit the file "/etc/rc.config.d/netdaemons and set the variable called NTPDATE_SERVER to be some working NTP server that is reachable. This will run the "/usr/sbin/ntpdate" command just before the NTP daemon is starting, and bring your system clock very close to the other server to start. This enormously improves stability at startup. Examples: NTPDATE_SERVER=ntp-cup.external.hp.com NTPDATE_SERVER=192.6.38.127 NTPDATE_SERVER="ntp-cup.external.hp.com bigben.cac.washington.edu" Then set the XNTPD variable to "1". This will cause the daemon to be started automatically when your system makes the transition from run level 1 to 2. Example: XNTPD=1 Now start the daemon using the startup script: /sbin/rc2.d/S660xntpd start You can stop the daemon at any time using the same script: /sbin/rc2.d/S660xntpd stop but older versions of the script do not always find and kill the daemon process correctly. You can always locate the process using "ps" and kill it with "/usr/bin/kill": ps -ef | grep ntp kill xntpd_pid_here Verifying Correct Operation --------------------------- First verify that the daemon process is actually running: ps -ef | grep ntp If it is not running, examine /var/adm/syslog/syslog.log for NTP error messages. We'll cover some common startup error messages a little later. If the daemon is actually running, then we use the client query programs "/usr/sbin/ntpq" and "/usr/sbin/xntpdc" to monitor its health. This is very similar to the surveying of public timeservers that was covered in detail above. Now you will use the same techniques to evaluate the health of your own timeserver. Run the command "ntpq -p" and you will see output looking something like this: remote refid st t when poll reach delay offset disp ========================================================================= *WWVB_SPEC(1) .WWVB. 0 l 124 64 377 0.00 -0.234 2.01 relay.hp.com listo.hp.c 2 u 875 1024 377 13.84 4.912 4.88 cosl4.cup.hp.co listo.hp.c 2 u 876 1024 377 4.38 -4.468 3.95 paloalto.cns.hp listo.hp.c 2 u 885 1024 377 5.84 0.762 2.18 chelmsford.cns. listo.hp.c 2 u 883 1024 377 89.45 2.160 11.40 atlanta.cns.hp. listo.hp.c 2 u 881 1024 377 63.20 -2.545 0.99 colorado.cns.hp listo.hp.c 2 u 883 1024 377 38.71 -1.110 2.01 boise.cns.hp.co listo.hp.c 2 u 875 1024 377 32.88 -2.015 2.23 Good health is a combination of high "reach", low "offset", and low "disp". The display above is from a very healthy NTP stratum 1 server with excellent netork connections. The numbers will all be poor when the daemon is first started up, and should all improve dramatically during the first 15 minutes of operation. Always remember that "ntpq -p" provides a snapshot of current conditions. You will get a much better view if you run this command several times (perhaps at one minute intervals) and evaluate the trends of reach, offset, and dispersion. You cannot synchronize to a timesource until dispersion for that source drops below 1000 (milliseconds). You will have problems if the "disp" number does not drop below 100 (milliseconds) and stay low. If "reach" stays at 0 and/or "disp" stays at 16000 it means that the server in question is not responding to queries from your machine. For example: remote refid st t when poll reach delay offset disp ============================================================================= *GPS_HP(1) .GPS. 0 l 48 64 377 0.00 0.516 4.91 hpisrhw 0.0.0.0 16 - - 1024 0 0.00 0.000 16000.0 hpxxxx.cup.hp.c cupertino. 3 u 467 1024 377 7.20 -12.430 15.67 This machine "hpisrhw" is not responding to NTP queries, but we can't tell why from here. Perhaps the daemon process "xntpd" is not running on "hpisrhw" or was recently started up and has not yet stabilized. Perhaps the network link between the local machine and "hpisrhw" is down. Perhaps the LAN card on the local machine is not working. But there are two other sources of time that are working correctly, so the local machine is still in relatively good shape with one GPS source and one network source. Here is an example from a machine that is not healthy: remote refid st t when poll reach delay offset disp ========================================================================= big_srv 17.8.5.7 2 u 3 512 17 312.87 -249.15 1960.85 This machine only has one source of time, "big_srv", and it is having trouble making queries. The "reach" is very low, indicating that packets are being dropped on the network. The "disp" is far above the safe threshold of 100, indicating that even the packets that do arrive are very flaky. It is important to realize that the problems are out on the network, not with NTP. Network service quality is very important to NTP, and these indicators are showing terrible network service quality. The daemon is trying as hard as it can, but the "offset" is currently 249 milliseconds, and that is considered a large number in the world of NTP. A system with a normal Ethernet connection should be able to keep "offset" below 50 and "disp" below 100 all the time. Remember, you don't know much about the health of your timeserver until it has been running for about five minutes and you have examined the output from "ntpq -p" several times (at perhaps one or two minute intervals). Windows NT ---------- It is possible for a WindowsNT machine to be an NTP server, but it is not recommended. You could get the latest NTP v4.0.97 source from the University of Delaware site and compile it yourself (if you are a masochist). Performance is not too great, and you probably won't use it as a stratum-1 server with a reference clock. A much better idea is to get one of the NTP client applications that are available as shareware for WindowsNT. This makes the NT machine a simple NTP client that gets time from a real NTP server on a Unix machine somewhere. One of the good ones is called Tardis, available from: http://www.kaska.demon.co.uk/ Windows 95/98 shareware is also availble from Tardis, and also directly from NIST: http://www.bldrdoc.gov/timefreq/index.html Error Messages -------------- >> no server suitable for synchronizion found This is actually a message from "ntpdate", not from the daemon "xntpd". It is telling you that the server(s) listed on the "ntpdate" command line (if any) is(are) not responding at this moment. A good fix for this is to put several "known good" servers on the "ntpdate" command line in /etc/rc.config.d/netdaemons: export NTPDATE_SERVER="192.6.38.127 reliable_server_two server_three" Remember that you cannot point "ntpdate" at a reference clock, only to functioning NTP servers on the network. If you have your own reference clock (i.e. you are a stratum-1 server), you won't have to worry too much about drift. Keep your stratum-1 NTP server powered on at all times! Two years of continuous operation is a good target. >>syslog: adjust: STEP dropped (12.12.89.2 offset -0.178119183) This message is nothing to worry about. You will typically see it at startup when your local system clock is more than 128 milliseconds away from the chosen timesource. This requires a STEP change, not a gradual SLEW adjustment, and the daemon makes real sure that the large offset is real and believable before making the abrupt STEP. You will see several of these messages in a row, separated by the poll_period of perhaps 64 or 128 seconds, before the final message telling you that the STEP has actually been made. If you see this message at other times besides daemon startup, particularly if you see it regularly, then you are probably having some networking problems between client and server. Time for some further investigation with "ntpq -p" on client and server. >>syslog: time reset (step) -0.248548 s >>syslog: synchronisation lost Any time that a step change is made, the daemon clears all the status registers and statistics and synchronization is lost by definition. But don't worry, the daemon will immediately start the data collection needed for the next synchronization and in a few poll periods you will be properly synchronized with a good timesource. Again, if you see "synchronization lost" messages long after daemon startup, then you are probably having networking problems between client and server. >>syslog: system event 4: System new peer or system stratum change. This message means that the server selected for synchronization has stopped responding (or lowered it stratum-level), so the daemon switched to a better timesource automatically. If you see this message a lot then you want to find out why the timesource stopped reponding (probably due to network congestion or failure). >>syslog: peer 198.178.37.31 event 84: Peer reachable. This is a perfectly normal message that says everything is OK. Slewing ------- The NTP daemon has three regimes in which it operates: offset below 128 milliseconds ----------------------------- This is the normal operating regime for NTP, and a properly configured NTP hierarchy (with reasonable networking) can operate for years without ever approaching the 128 msec upper limit. All time adjustments are small and smooth (known as SLEWING), and nobody even notices the SLEW adjustments unless they have a cesium clock or a GPS clock and expensive instrumentation to make sophisticated measurements (HP Santa Clara Division makes the instruments). offset above 128 milliseconds ----------------------------- This regime is often encountered at power-on, because those battery-backed real-time clocks they put in computers are not too great. Because NTP is quite capable of keeping the offset below one millisecond all the time it is running, many users want to get into the normal regime quickly when an offset above 128 msec is encountered at startup. So in this situation NTP will (fairly quickly) make a single STEP change, and is usually successful in getting the offset well below 128 msec so there will be no more of the disruptive STEP changes. offset above 1000 seconds ------------------------- This is so far out of the normal operating range that NTP decides something is terribly wrong and human intervention is required. The daemon shuts down. The catch is that the dispersion on a WAN is frequently much greater than 128 msec, so you may see (a lot of) the STEP changes, perhaps as large as 1000 msec (depends on your network). But there are customer applications that are quite allergic to the STEP changes, particularly backward steps (which will happen about half the time). Databases and financial transaction systems are examples. The good news is that NTP can be compiled in such a way that it never makes a STEP, but instead SLEWS the clock to drive the offset to zero. This effectively removes the middle operating regime. You won't get millisecond (or microsecond) precision with this method, but you probably can't get that over a WAN anyway. Just install PHNE_12689 to get the SLEW behavior in "/usr/sbin/xntpd". In the not too distant future this will be provided by "/usr/sbib/xntpd.slew" in PHNE_19710 (HP-UX 10.X) and PHNE_19711 (HP-UX 11.X), or you can use "ntpdate -B". SIDEBAR =============================================================== Details of ntpq --------------- # /usr/sbin/ntpq -p remote refid st t when poll reach delay offset disp ========================================================================= *REFCLK(29,1) .GPS. 0 l 17 32 377 0.00 -0.002 0.02 +bigben.cac.wash .USNO. 1 u 29 128 377 38.31 -0.334 4.01 clepsydra.dec.c usno.pa-xc 2 u 914 1024 377 7.17 -1.365 0.64 -clock.isc.org .GOES. 1 u 195 1024 377 5.77 -2.946 0.44 hpsdlo.sdd.hp.c bigben.cah 2 u 19 32 125 49.27 -3.100 2.38 +tick.ucla.edu .USNO. 1 u 127 128 377 19.30 -0.437 0.37 +usno.pa-x.dec.c .USNO. 1 u 69 128 377 6.06 -0.534 0.11 huon.itd.adelai augean.el. 3 u 7 16 377 346.16 -3.033 1.07 sirius.ctr.colu NAVOBS1.MU 2 u 12 16 377 85.62 -9.181 0.58 Now let's go over the meaning of each of the column headings and the measurements that go with them. Keep in mind that the most important columns are the "reach", "offset", and "dispersion". In particular, "dispersion" reveals the quality of the network service (which the time service depends on VERY HEAVILY). remote SERVER NAME ------ This is the name of the NTP server. It is usually another UNIX machine (could be HP, DEC, SUN, anything), but could also be an external reference clock like GPS or WWVB radio clock or even a modem. The character in the left margin indicates the fate of this peer in the clock selection process. The codes mean: "*" selected for synchronization "#" selected for synchronization but distance exceeds maximum "o" selected for synchronization, PPS signal in use "+" included in the final selection set "x" designated falsticker by the intersection algorithm "." culled from the end of the candidate list "-" discarded by the clustering algorithm "blank" discarded due to high stratum and/or failed sanity checks refid REFERENCE IDENTIFICATION ----- Usually the IP address of the server or the name of the external clock, but can also be a router between the client and server. Not important for our purposes. st STRATUM -- This is a measure of distance to the true source of time. The GPS clock is stratum=0, the NTP daemon attached to the GPS clock is stratum=1, and others (one more step away) are considered stratum=2 by all of their clients. t TYPE - The possible types are: l local (such as a GPS clock) u unicast (this is the most common type) m multicast b broadcast - netaddr (usually 0) when ---- How long ago (in seconds) was the last response from this server? Not very important unless the source has stopped responding for a long time. poll POLL PERIOD ---- How often (in seconds) are we making a query to this server?? 512 seconds (approx 8 minutes) and 1024 seconds (approx 17 minutes) are very popular for network connections, but a machine with an external clock (like GPS) should poll it every 64 seconds or less. This number can be specified with the "minpoll" and "maxpoll" directives, but it is better to let the daemon adjust it as needed. After stabilizing at startup this number will move automatically to 1024 for network servers and 64 (or sometimes 32) for external reference clocks. reach REACHABILITY larger is better ----- How successful are we in reaching the server? This is an 8 bit shift register with the most recent probe in the 2^0 position. Thus 001 indicates the most recent probe was answered, 357 indicates one probe was not answered, and 377 indicates all of the recent probes have been answered. delay ROUND TRIP TIME smaller is better ----- How long (in milliseconds) did it take for the reply packet to come back when we sent a query to the server? offset TIME DIFFERENCE smaller is better ------ How far apart (in milliseconds) are the server's clock and the client's clock? This is the principal measure that the customer is interested in. When this number exceeds 128 then NTP makes a big adjustment (and the message "synchronization lost" appears in the logfile). disp DISPERSION smaller is better ---- How much does the "offset" measurement vary between samples? How repeatable is the "delay" measurement? This is an error bound estimate. It is based on: precision delay/2 age of measurement / 86400 When this number exceeds 100 (milliseconds) it is very difficult for the daemon to keep the clock synchronized. This "dispersion" number is a primary measure of network service quality. A slow X25 network not only has a sizeable round trip time, but the round trip time has large variation from one query to the next. This is very bad for timekeeping purposes, because it makes the "offset" very hard to calculate. The real job of NTP is to manage the "offset" value and minimize it. END SIDEBAR ============================================================== Possible subjects and material for a follow-up article are shown below ====================================================================== Standalone Servers ------------------ There are standalone NTP timeservers available, typically costing between 1000 and 5000 dollars. These will save you the pain of setting up your first server, and provide you with an easy stratum-1 timesource. These devices typically have a radio receiver of their own, a serial port for simple configuration via terminal, and an ethernet port. Except for running the cable to an antenna (on the roof), you can get one of these up and running in a few minutes. Check out: http://www.datum.com/ http://www.eestech.com/ http://www.truetime.com http://www.coetanian.com/ http://www.meinberg.de/english/lantime.htm Planning Your NTP Hierarchy ------------- Redundancy ------------- NTPv4 ----- The most popular version of NTP today is "version 3", but a new generation called "version 4" has been in development at the University of Delaware for some time. The new generation has new features and options, plus drivers for more radio receivers, and is completely compatible with the previous generations. It also has HTML manpages and documentation that is just great. HP is not ready to support NTPv4 because revisions and bug fixes are still appearing approximately weekly (the current release at this writing in 4.0.97f). But if you have some need for the latest, this new generation comes with "autoconfig" tools that make it very easy to compile your own code. It works like a charm! When you run the "configure" command it probes your system to figure out what operating system you are running (say HP-UX 10.20), what compilers you have available (say /opt/ansic/bin/cc), what TERMIO features your system supports, POSIX features, library features, the list goes on and on. Then it builds a tree of Makefiles for all of the executables, and then you type "make". So you just run two commands, the build takes 20 to 40 minutes (depends on your CPU), and you are ready to go. Highly recommended! You will need the ANSI C compiler for this build process to succeed. HP-UX 9.x --------- It is possible to get NTP to work with HP-UX 9.x (even though it lacks the adjtime() system calls), by utilizing the adjtimed workaround. Pre-compiled executables for s700 and s800 are available at: ftp://contrib:9unsupp8@hprc.external.hp.com/sysadmin/ntp.9x/ Just get the executables "xntpd", "adjtimed", "ntpdate", and "ntpq". Then be sure to start "adjtimed" first, run "ntpdate some_server", and then start "xntpd". You will probably put this into the local() area of /etc/rc on your HP-UX 9.x system so that it gets started properly at every bootup. Alternatively, you could get NTPv4 source and compile it yourself on your HP-UX 9.x system. The autoconfig scripts understand how to deal with HP-UX 9.x. Broadcast --------- It is possible to operate NTP in broadcast mode, where the server sends out a broadcast packet at regular intervals (typically once per minute), and the clients listen for those broadcasts. This is different from the usual poll-and-response mechanism that NTP uses. This really makes sense if you have thousands of clients, because the client configuration is trivial. The drawbacks are that you must have an NTP server/broadcaster on each subnet (broadcast packets are not relayed), the precision is reduced for the clients (milliseconds), and each client has only a single source of time (no backup). It is possible for Cisco routers (and perhaps others) to be the local subnet broadcasters (IOS 10 and later), so refer to your router configuration documentation for this. Configuring broadcast mode on the server is simple. Put a line like this in your /etc/ntp.conf file: broadcast 192.6.37.255 Use your own subnet, of course, and put in another line for each subnet that is physically connected to your NTP server. Remember that the server still needs to have a source of time somwhere, so your configuration file needs some more lines besides these "broadcast" specifiers. It is a good idea to get your NTP server working properly first, then make your foray into broadcast mode when everything is debugged and working properly. Configuring broadcast mode on the client is even simpler. Just start up the NTP daemon "/usr/sbin/xntpd" with the "-b" option. Alternatively you can put a "broadcastclient yes" line in /etc/ntp.conf on the client, or edit the /etc/rc.config.d/netdaemons file and set "XNTPD_ARGS=-b". Simple NTP (SNTP) ----------------- This is not really too different from regular NTP, and it is based on the same protocol described in RFC-1305. Both communicate over port 123/udp, and both use the same packet format. You can freely mix NTP and SNTP servers and clients without worrying. SNTP servers usually won't connect to radio receivers (although not impossible), and usually won't deliver the same microsecond precision that full-blown NTP is capable of (although it depends on implementation). SNTP servers also don't report the detailed statistics and history information when probed with the query tools "ntpq" or "xntpdc". Details on HP58503A GPS Clock ----------------------------- Be sure there is a device file entry for the HP GPS receiver at the physical port connected to it. The device file name should be /dev/hpgps1, with all of the same attributes as the /dev/tty* location. You can create a new device using "mknod", or you can just link to an existing device using "ln". On a s700 it typically looks like this: crw-rw-rw- 1 bin bin 1 0x010000 Sep 26 10:51 /dev/hpgps1 crw--w--w- 2 bin bin 1 0x010000 Aug 16 15:25 /dev/tty1p0 On a s800 it typically looks like this: crw-rw-rw- 1 bin bin 193 0x000100 Jul 22 14:59 /dev/hpgps1 crw--w--w- 1 bin bin 193 0x000100 Jul 22 14:59 /dev/tty0p1 Plug the serial cable from the GPS clock into the serial port of a s700 or s800. You can interrogate the GPS and verify proper cable connection using "cu -l/dev/hpgps1 dir" if you have made an entry in /usr/lib/uucp/Devices. If you get no response whatsoever from "cu" then you probably have the wrong cable. Note: you cannot use the "cu" trick while the NTP daemon is running and talking to the GPS clock. Press return a few times and you should get a prompt of the form "scpi>" or "E-xxx>". Type the following scpi command in response to this prompt ... :SYSTEM:STATUS? The result of this should be a screen full of GPS status looking like the following ... ----------------------------- Receiver Status ------------------------------ SYNCHRONIZATION .......................................... [ Outputs Valid ] SmartClock Mode ________________________ Reference Outputs _______________ >> Locked to GPS TFOM 3 FFOM 0 Recovery 1PPS TI +33.4 ns relative to GPS Holdover HOLD THR 1.000 us Power-up Holdover Uncertainty ____________ Predict 5.9 us/initial 24 hrs ACQUISITION ............................................. [ GPS 1PPS Valid ] Satellite Status _______________________ Time ____________________________ Tracking: 6 Not Tracking: 1 UTC 16:26:27 29 Jul 1996 PRN El Az SS PRN El Az GPS 1PPS Synchronized to UTC 4 50 293 208 24 15 306 ANT DLY 0 ns 7 28 227 97 Position ________________________ 14 46 84 186 MODE Hold 18 45 194 214 25 18 39 128 LAT N 55:59:00.746 29 65 100 232 LON W 3:22:58.576 ELEV MASK 10 deg HGT +82.52 m (MSL) HEALTH MONITOR ...................................................... [ OK ] Self Test: OK | Int Pwr: OK Oven Pwr: OK OCXO: OK EFC: OK GPS Rcv: OK scpi > Detailed explanation of this output is beyond the scope of this text, but the student can certainly decode latitude, longitude, elevation, and current time. Remember that if you get even one character of output, then the GPS clock and the serial cable are probably OK. ---------------------------------------------------------------------------- If you haven't done this already, I recommend setting these two variables in /etc/rc.config.d/netdaemons: export NTPDATE_SERVER=reliable_internet_timeserver # optional with GPS clock export XNTPD=1 Set up your /etc/ntp.conf file with this line so that it recognizes the HP GPS clock with driver number 26: server 127.127.26.1 prefer minpoll 6 # HP GPS clock Also in the /etc/ntp.conf file, a time constant needs to be set depending on the type of computer and interface, and GPS receiver baud rate. Add one of these lines (uncommented) for your CPU: #fudge 127.127.26.1 time1 -0.955 # s700 #fudge 127.127.26.1 time1 -0.930 # s800 The "server" and "fudge" lines will probably be near the end of your "/etc/ntp.conf" file, but you can add other "server" lines for other clocks or other network time sources (for redundancy). Then use this command to start and stop the daemon by hand (done automatically for you at reboot): /sbin/rc2.d/S660xntpd start|stop