mirror of
				https://github.com/telekom-security/tpotce.git
				synced 2025-10-26 10:14:45 +00:00 
			
		
		
		
	
		
			
	
	
		
			917 lines
		
	
	
	
		
			39 KiB
		
	
	
	
		
			Text
		
	
	
	
	
	
		
		
			
		
	
	
			917 lines
		
	
	
	
		
			39 KiB
		
	
	
	
		
			Text
		
	
	
	
	
	
|   |                         ============================= | ||
|  |                         p0f v3: passive fingerprinter | ||
|  |                         ============================= | ||
|  | 
 | ||
|  |                     http://lcamtuf.coredump.cx/p0f3.shtml | ||
|  | 
 | ||
|  |          Copyright (C) 2012 by Michal Zalewski <lcamtuf@coredump.cx> | ||
|  | 
 | ||
|  | 
 | ||
|  | --------------- | ||
|  | 1. What's this? | ||
|  | --------------- | ||
|  | 
 | ||
|  | P0f is a tool that utilizes an array of sophisticated, purely passive traffic | ||
|  | fingerprinting mechanisms to identify the players behind any incidental TCP/IP | ||
|  | communications (often as little as a single normal SYN) without interfering in | ||
|  | any way. | ||
|  | 
 | ||
|  | Some of its capabilities include: | ||
|  | 
 | ||
|  |   - Highly scalable and extremely fast identification of the operating system | ||
|  |     and software on both endpoints of a vanilla TCP connection - especially in | ||
|  |     settings where NMap probes are blocked, too slow, unreliable, or would | ||
|  |     simply set off alarms, | ||
|  | 
 | ||
|  |   - Measurement of system uptime and network hookup, distance (including | ||
|  |     topology behind NAT or packet filters), and so on. | ||
|  | 
 | ||
|  |   - Automated detection of connection sharing / NAT, load balancing, and | ||
|  |     application-level proxying setups. | ||
|  | 
 | ||
|  |   - Detection of dishonest clients / servers that forge declarative statements | ||
|  |     such as X-Mailer or User-Agent. | ||
|  | 
 | ||
|  | The tool can be operated in the foreground or as a daemon, and offers a simple | ||
|  | real-time API for third-party components that wish to obtain additional | ||
|  | information about the actors they are talking to. | ||
|  | 
 | ||
|  | Common uses for p0f include reconnaissance during penetration tests; routine | ||
|  | network monitoring; detection of unauthorized network interconnects in corporate | ||
|  | environments; providing signals for abuse-prevention tools; and miscellanous | ||
|  | forensics. | ||
|  | 
 | ||
|  | A snippet of typical p0f output may look like this: | ||
|  | 
 | ||
|  | .-[ 1.2.3.4/1524 -> 4.3.2.1/80 (syn) ]- | ||
|  | | | ||
|  | | client   = 1.2.3.4 | ||
|  | | os       = Windows XP | ||
|  | | dist     = 8 | ||
|  | | params   = none | ||
|  | | raw_sig  = 4:120+8:0:1452:65535,0:mss,nop,nop,sok:df,id+:0 | ||
|  | | | ||
|  | `---- | ||
|  | 
 | ||
|  | .-[ 1.2.3.4/1524 -> 4.3.2.1/80 (syn+ack) ]- | ||
|  | | | ||
|  | | server   = 4.3.2.1 | ||
|  | | os       = Linux 3.x | ||
|  | | dist     = 0 | ||
|  | | params   = none | ||
|  | | raw_sig  = 4:64+0:0:1460:mss*10,0:mss,nop,nop,sok:df:0 | ||
|  | | | ||
|  | `---- | ||
|  | 
 | ||
|  | .-[ 1.2.3.4/1524 -> 4.3.2.1/80 (mtu) ]- | ||
|  | | | ||
|  | | client   = 1.2.3.4 | ||
|  | | link     = DSL | ||
|  | | raw_mtu  = 1492 | ||
|  | | | ||
|  | `---- | ||
|  | 
 | ||
|  | .-[ 1.2.3.4/1524 -> 4.3.2.1/80 (uptime) ]- | ||
|  | | | ||
|  | | client   = 1.2.3.4 | ||
|  | | uptime   = 0 days 11 hrs 16 min (modulo 198 days) | ||
|  | | raw_freq = 250.00 Hz | ||
|  | | | ||
|  | `---- | ||
|  | 
 | ||
|  | A live demonstration can be seen here: | ||
|  | 
 | ||
|  | http://lcamtuf.coredump.cx/p0f3/ | ||
|  | 
 | ||
|  | -------------------- | ||
|  | 2. How does it work? | ||
|  | -------------------- | ||
|  | 
 | ||
|  | A vast majority of metrics used by p0f were invented specifically for this tool, | ||
|  | and include data extracted from IPv4 and IPv6 headers, TCP headers, the dynamics | ||
|  | of the TCP handshake, and the contents of application-level payloads. | ||
|  | 
 | ||
|  | For TCP/IP, the tool fingerprints the client-originating SYN packet and the | ||
|  | first SYN+ACK response from the server, paying attention to factors such as the | ||
|  | ordering of TCP options, the relation between maximum segment size and window | ||
|  | size, the progression of TCP timestamps, and the state of about a dozen possible | ||
|  | implementation quirks (e.g. non-zero values in "must be zero" fields). | ||
|  | 
 | ||
|  | The metrics used for application-level traffic vary from one module to another; | ||
|  | where possible, the tool relies on signals such as the ordering or syntax of | ||
|  | HTTP headers or SMTP commands, rather than any declarative statements such as | ||
|  | User-Agent. Application-level fingerprinting modules currently support HTTP. | ||
|  | Before the tool leaves "beta", I want to add SMTP and FTP. Other protocols, | ||
|  | such as FTP, POP3, IMAP, SSH, and SSL, may follow. | ||
|  | 
 | ||
|  | The list of all the measured parameters is reviewed in section 5 later on. | ||
|  | Some of the analysis also happens on a higher level: inconsistencies in the | ||
|  | data collected from various sources, or in the data from the same source | ||
|  | obtained over time, may be indicative of address translation, proxying, or | ||
|  | just plain trickery. For example, a system where TCP timestamps jump back | ||
|  | and forth, or where TTLs and MTUs change subtly, is probably a NAT device. | ||
|  | 
 | ||
|  | ------------------------------- | ||
|  | 3. How do I compile and use it? | ||
|  | ------------------------------- | ||
|  | 
 | ||
|  | To compile p0f, try running './build.sh'; if that fails, you will be probably | ||
|  | given some tips about the probable cause. If the tips are useless, send me a | ||
|  | mean-spirited mail. | ||
|  | 
 | ||
|  | It is also possible to build a debug binary ('./build.sh debug'), in which case, | ||
|  | verbose packet parsing and signature matching information will be written to | ||
|  | stderr. This is useful when troubleshooting problems, but that's about it. | ||
|  | 
 | ||
|  | The tool should compile cleanly under any reasonably new version of Linux, | ||
|  | FreeBSD, OpenBSD, MacOS X, and so forth. You can also builtdit on Windows using | ||
|  | cygwin and winpcap. I have not tested it on all possible varieties of un*x, but | ||
|  | if there are issues, they should be fairly superficial. | ||
|  | 
 | ||
|  | Once you have the binary compiled, you should be aware of the following | ||
|  | command-line options: | ||
|  | 
 | ||
|  |   -f fname   - reads fingerprint database (p0f.fp) from the specified location. | ||
|  |                See section 5 for more information about the contents of this | ||
|  |                file. | ||
|  | 
 | ||
|  |                The default location is ./p0f.fp. If you want to install p0f, you | ||
|  |                may want to change FP_FILE in config.h to /etc/p0f.fp. | ||
|  | 
 | ||
|  |   -i iface   - asks p0f to listen on a specific network interface. On un*x, you | ||
|  |                should reference the interface by name (e.g., eth0). On Windows, | ||
|  |                you can use adapter index instead (0, 1, 2...). | ||
|  |                 | ||
|  |                Multiple -i parameters are not supported; you need to run | ||
|  |                separate instances of p0f for that. On Linux, you can specify | ||
|  |                'any' to access a pseudo-device that combines the traffic on | ||
|  |                all other interfaces; the only limitation is that libpcap will | ||
|  |                not recognize VLAN-tagged frames in this mode, which may be | ||
|  |                an issue in some of the more exotic setups. | ||
|  | 
 | ||
|  |                If you do not specify an interface, libpcap will probably pick | ||
|  |                the first working interface in your system. | ||
|  |                 | ||
|  |   -L         - lists all available network interfaces, then quits. Particularly | ||
|  |                useful on Windows, where the system-generated interface names | ||
|  |                are impossible to memorize. | ||
|  |                 | ||
|  |   -r fname   - instead of listening for live traffic, reads pcap captures from | ||
|  |                the specified file. The data can be collected with tcpdump or any | ||
|  |                other compatible tool. Make sure that snapshot length (-s | ||
|  |                option in tcpdump) is large enough not to truncate packets; the | ||
|  |                default may be too small. | ||
|  | 
 | ||
|  |                As with -i, only one -r option can be specified at any given | ||
|  |                time. | ||
|  |                 | ||
|  |   -o fname   - appends grep-friendly log data to the specified file. The log | ||
|  |                contains all observations made by p0f about every matching | ||
|  |                connection, and may grow large; plan accordingly. | ||
|  | 
 | ||
|  |                Only one instance of p0f should be writing to a particular file | ||
|  |                at any given time; where supported, advisory locking is used to | ||
|  |                avoid problems. | ||
|  |                 | ||
|  |   -s fname   - listens for API queries on the specified filesystem socket. This | ||
|  |                allows other programs to ask p0f about its current thoughts about | ||
|  |                a particular host. More information about the API protocol can be | ||
|  |                found in section 4 below. | ||
|  | 
 | ||
|  |                Only one instance of p0f can be listening on a particular socket | ||
|  |                at any given time. The mode is also incompatible with -r. | ||
|  | 
 | ||
|  |   -d         - runs p0f in daemon mode: the program will fork into background | ||
|  |                and continue writing to the specified log file or API socket. It | ||
|  |                will continue running until killed, until the listening interface | ||
|  |                is shut down, or until some other fatal error is encountered. | ||
|  | 
 | ||
|  |                This mode requires either -o or -s to be specified. | ||
|  | 
 | ||
|  |                To continue capturing p0f debug output and error messages (but | ||
|  |                not signatures), redirect stderr to another non-TTY destination, | ||
|  |                e.g.: | ||
|  |                 | ||
|  |                ./p0f -o /var/log/p0f.log -d 2>>/var/log/p0f.error | ||
|  |                 | ||
|  |                Note that if -d is specified and stderr points to a TTY, error | ||
|  |                messages will be lost. | ||
|  | 
 | ||
|  |    -u user   - causes p0f to drop privileges, switching to the specified user | ||
|  |                and chroot()ing itself to said user's home directory. | ||
|  | 
 | ||
|  |                This mode is *highly* advisable (but not required) on un*x | ||
|  |                systems, especially in daemon mode. See section 7 for more info. | ||
|  | 
 | ||
|  | More arcane settings (you probably don't need to touch these): | ||
|  | 
 | ||
|  |   -j         - Log in JSON format. | ||
|  | 
 | ||
|  |   -l         - Line buffered mode for logging to output file. | ||
|  | 
 | ||
|  |   -p         - puts the interface specified with -i in promiscuous mode. If | ||
|  |                supported by the firmware, the card will also process frames not | ||
|  |                addressed to it.  | ||
|  | 
 | ||
|  |   -S num     - sets the maximum number of simultaneous API connections. The | ||
|  |                default is 20; the upper cap is 100. | ||
|  | 
 | ||
|  |   -m c,h     - sets the maximum number of connections (c) and hosts (h) to be | ||
|  |                tracked at the same time (default: c = 1,000, h = 10,000). Once | ||
|  |                the limit is reached, the oldest 10% entries gets pruned to make | ||
|  |                room for new data. | ||
|  | 
 | ||
|  |                This setting effectively controls the memory footprint of p0f. | ||
|  |                The cost of tracking a single host is under 400 bytes; active | ||
|  |                connections have a worst-case footprint of about 18 kB. High | ||
|  |                limits have some CPU impact, too, by the virtue of complicating | ||
|  |                data lookups in the cache. | ||
|  | 
 | ||
|  |                NOTE: P0f tracks connections only until the handshake is done, | ||
|  |                and if protocol-level fingerprinting is possible, until few | ||
|  |                initial kilobytes of data have been exchanged. This means that | ||
|  |                most connections are dropped from the cache in under 5 seconds; | ||
|  |                consequently, the 'c' variable can be much lower than the real | ||
|  |                number of parallel connections happening on the wire. | ||
|  | 
 | ||
|  |   -t c,h     - sets the timeout for collecting signatures for any connection | ||
|  |                (c); and for purging idle hosts from in-memory cache (h). The | ||
|  |                first parameter is given in seconds, and defaults to 30 s; the | ||
|  |                second one is in minutes, and defaults to 120 min. | ||
|  | 
 | ||
|  |                The first value must be just high enough to reliably capture | ||
|  |                SYN, SYN+ACK, and the initial few kB of traffic. Low-performance | ||
|  |                sites may want to increase it slightly. | ||
|  | 
 | ||
|  |                The second value governs for how long API queries about a | ||
|  |                previously seen host can be made; and what's the maximum interval | ||
|  |                between signatures to still trigger NAT detection and so on. | ||
|  |                Raising it is usually not advisable; lowering it to 5-10 minutes | ||
|  |                may make sense for high-traffic servers, where it is possible to | ||
|  |                see several unrelated visitors subsequently obtaining the same | ||
|  |                dynamic IP from their ISP. | ||
|  | 
 | ||
|  | Well, that's about it. You probably need to run the tool as root. Some of the | ||
|  | most common use cases: | ||
|  | 
 | ||
|  | # ./p0f -i eth0 | ||
|  | 
 | ||
|  | # ./p0f -i eth0 -d -u p0f-user -o /var/log/p0f.log | ||
|  | 
 | ||
|  | # ./p0f -r some_capture.cap | ||
|  | 
 | ||
|  | The greppable log format (-o) uses pipe ('|') as a delimiter, with name=value | ||
|  | pairs describing the signature in a manner very similar to the pretty-printed | ||
|  | output generated on stdout: | ||
|  | 
 | ||
|  | [2012/01/04 10:26:14] mod=mtu|cli=1.2.3.4/1234|srv=4.3.2.1/80|subj=cli|link=DSL|raw_mtu=1492 | ||
|  | 
 | ||
|  | The 'mod' parameter identifies the subsystem that generated the entry; the | ||
|  | 'cli' and 'srv' parameters always describe the direction in which the TCP | ||
|  | session is established; and 'subj' describes which of these two parties is | ||
|  | actually being fingerprinted. | ||
|  | 
 | ||
|  | Command-line options may be followed by a single parameter containing a | ||
|  | pcap-style traffic filtering rule. This allows you to reject some of the less | ||
|  | interesting packets for performance or privacy reasons. Simple examples include: | ||
|  | 
 | ||
|  |   'dst net 10.0.0.0/8 and port 80' | ||
|  |    | ||
|  |   'not src host 10.1.2.3' | ||
|  |    | ||
|  |   'port 22 or port 443' | ||
|  | 
 | ||
|  | You can read more about the supported syntax by doing 'man pcap-fiter'; if | ||
|  | that fails, try this URL: | ||
|  | 
 | ||
|  |   http://www.manpagez.com/man/7/pcap-filter/ | ||
|  |    | ||
|  | Filters work both for online capture (-i) and for previously collected data | ||
|  | produced by any other tool (-r). | ||
|  | 
 | ||
|  | ------------- | ||
|  | 4. API access | ||
|  | ------------- | ||
|  | 
 | ||
|  | The API allows other applications running on the same system to get p0f's | ||
|  | current opinion about a particular host. This is useful for integrating it with | ||
|  | spam filters, web apps, and so on. | ||
|  | 
 | ||
|  | Clients are welcome to connect to the unix socket specified with -s using the | ||
|  | SOCK_STREAM protocol, and may issue any number of fixed-length queries. The | ||
|  | queries will be answered in the order they are received. | ||
|  | 
 | ||
|  | Note that there is no response caching, nor any software limits in place on p0f | ||
|  | end, so it is your responsibility to write reasonably well-behaved clients. | ||
|  | 
 | ||
|  | Queries have exactly 21 bytes. The format is: | ||
|  | 
 | ||
|  |   - Magic dword (0x50304601), in native endian of the platform. | ||
|  | 
 | ||
|  |   - Address type byte: 4 for IPv4, 6 for IPv6. | ||
|  | 
 | ||
|  |   - 16 bytes of address data, network endian. IPv4 addresses should be | ||
|  |     aligned to the left. | ||
|  | 
 | ||
|  | To such a query, p0f responds with: | ||
|  | 
 | ||
|  |   - Another magic dword (0x50304602), native endian. | ||
|  | 
 | ||
|  |   - Status dword: 0x00 for 'bad query', 0x10 for 'OK', and 0x20 for 'no match'. | ||
|  | 
 | ||
|  |   - Host information, valid only if status is 'OK' (byte width in square | ||
|  |     brackets): | ||
|  | 
 | ||
|  |     [4]  first_seen  - unix time (seconds) of first observation of the host. | ||
|  | 
 | ||
|  |     [4]  last_seen   - unix time (seconds) of most recent traffic. | ||
|  | 
 | ||
|  |     [4]  total_conn  - total number of connections seen. | ||
|  | 
 | ||
|  |     [4]  uptime_min  - calculated system uptime, in minutes. Zero if not known. | ||
|  | 
 | ||
|  |     [4]  up_mod_days - uptime wrap-around interval, in days. | ||
|  | 
 | ||
|  |     [4]  last_nat    - time of the most recent detection of IP sharing (NAT, | ||
|  |                        load balancing, proxying). Zero if never detected. | ||
|  | 
 | ||
|  |     [4]  last_chg    - time of the most recent individual OS mismatch (e.g., | ||
|  |                        due to multiboot or IP reuse). | ||
|  | 
 | ||
|  |     [2]  distance    - system distance (derived from TTL; -1 if no data). | ||
|  | 
 | ||
|  |     [1]  bad_sw      - p0f thinks the User-Agent or Server strings aren't | ||
|  |                        accurate. The value of 1 means OS difference (possibly | ||
|  |                        due to proxying), while 2 means an outright mismatch. | ||
|  | 
 | ||
|  |                        NOTE: If User-Agent is not present at all, this value | ||
|  |                        stays at 0. | ||
|  | 
 | ||
|  |     [1]  os_match_q  - OS match quality: 0 for a normal match; 1 for fuzzy | ||
|  |                        (e.g., TTL or DF difference); 2 for a generic signature; | ||
|  |                        and 3 for both. | ||
|  | 
 | ||
|  |     [32] os_name     - NUL-terminated name of the most recent positively matched | ||
|  |                        OS. If OS not known, os_name[0] is NUL. | ||
|  | 
 | ||
|  |                        NOTE: If the host is first seen using an known system and | ||
|  |                        then switches to an unknown one, this field is not | ||
|  |                        reset. | ||
|  | 
 | ||
|  |     [32] os_flavor   - OS version. May be empty if no data. | ||
|  | 
 | ||
|  |     [32] http_name   - most recent positively identified HTTP application | ||
|  |                        (e.g. 'Firefox'). | ||
|  | 
 | ||
|  |     [32] http_flavor - version of the HTTP application, if any. | ||
|  | 
 | ||
|  |     [32] link_type   - network link type, if recognized. | ||
|  | 
 | ||
|  |     [32] language    - system language, if recognized. | ||
|  | 
 | ||
|  | A simple reference implementation of an API client is provided in p0f-client.c. | ||
|  | Implementations in C / C++ may reuse api.h from p0f source code, too. | ||
|  | 
 | ||
|  | Developers using the API should be aware of several important constraints: | ||
|  | 
 | ||
|  |   - The maximum number of simultaneous API connections is capped to 20. The | ||
|  |     limit may be adjusted with the -S parameter, but rampant parallelism may | ||
|  |     lead to poorly controlled latency; consider a single query pipeline, | ||
|  |     possibly with prioritization and caching. | ||
|  |      | ||
|  |   - The maximum number of hosts and connections tracked at any given time is | ||
|  |     subject to configurable limits. You should look at your traffic stats and | ||
|  |     see if the defaults are suitable. | ||
|  | 
 | ||
|  |     You should also keep in mind that whenever you are subject to an ongoing | ||
|  |     DDoS or SYN spoofing DoS attack, p0f may end up dropping entries faster | ||
|  |     than you could query for them. It's that or running out of memory, so | ||
|  |     don't fret. | ||
|  | 
 | ||
|  |   - Cache entries with no activity for more than 120 minutes will be dropped | ||
|  |     even if the cache is nearly empty. The timeout is adjustable with -t, but | ||
|  |     you should not use the API to obtain ancient data; if you routinely need to | ||
|  |     go back hours or days, parse the logs instead of wasting RAM. | ||
|  | 
 | ||
|  | ----------------------- | ||
|  | 5. Fingerprint database | ||
|  | ----------------------- | ||
|  | 
 | ||
|  | Whenever p0f obtains a fingerprint from the observed traffic, it defers to | ||
|  | the data read from p0f.fp to identify the operating system and obtain some | ||
|  | ancillary data needed for other analysis tasks. The fingerprint database is a | ||
|  | simple text file where lines starting with ; are ignored. | ||
|  | 
 | ||
|  | == Module specification == | ||
|  | 
 | ||
|  | The file is split into sections based on the type of traffic the fingerprints | ||
|  | apply to. Section identifiers are enclosed in square brackets, like so: | ||
|  | 
 | ||
|  | [module:direction] | ||
|  | 
 | ||
|  |   module     - the name of the fingerprinting module (e.g. 'tcp' or 'http'). | ||
|  | 
 | ||
|  |   direction  - the direction of fingerprinted traffic: 'request' (from client to | ||
|  |                server) or 'response' (from server to client). | ||
|  | 
 | ||
|  |                For the TCP module, 'client' matches the initial SYN; and | ||
|  |                'server' matches SYN+ACK. | ||
|  | 
 | ||
|  | The 'direction' part is omitted for MTU signatures, as they work equally well | ||
|  | both ways. | ||
|  | 
 | ||
|  | == Signature groups == | ||
|  | 
 | ||
|  | The actual signatures must be preceeded by an 'label' line, describing the | ||
|  | fingerprinted software: | ||
|  | 
 | ||
|  | label = type:class:name:flavor | ||
|  | 
 | ||
|  |   type       - some signatures in p0f.fp offer broad, last-resort matching for | ||
|  |                less researched corner cases. The goal there is to give an | ||
|  |                answer slightly better than "unknown", but less precise than | ||
|  |                what the user may be expecting. | ||
|  | 
 | ||
|  |                Normal, reasonably specific signatures that can't be radically | ||
|  |                improved should have their type specified as 's'; while generic, | ||
|  |                last-resort ones should be tagged with 'g'. | ||
|  | 
 | ||
|  |                Note that generic signatures are considered only if no specific | ||
|  |                matches are found in the database. | ||
|  | 
 | ||
|  |   class      - the tool needs to distinguish between OS-identifying signatures | ||
|  |                (only one of which should be matched for any given host) and | ||
|  |                signatures that just identify user applications (many of which | ||
|  |                may be seen concurrently). | ||
|  | 
 | ||
|  |                To assist with this, OS-specific signatures should specify the | ||
|  |                OS architecture family here (e.g., 'win', 'unix', 'cisco'); while | ||
|  |                application-related sigs (NMap, MSIE, Apache) should use a | ||
|  |                special value of '!'. | ||
|  | 
 | ||
|  |                Most TCP signatures are OS-specific, and should have OS family | ||
|  |                defined. Other signatures, such as HTTP, should use '!' unless | ||
|  |                the fingerprinted component is deeply intertwined with the | ||
|  |                platform (e.g., Windows Update). | ||
|  | 
 | ||
|  |                NOTE: To avoid variations (e.g. 'win' and 'windows' or 'unix' | ||
|  |                and 'linux'), all classes need to be pre-registered using a | ||
|  |                'classes' directive, seen near the beginning of p0f.fp. | ||
|  | 
 | ||
|  |   name       - a human-readable short name for what the fingerprint actually | ||
|  |                helps identify - say, 'Linux', 'Sendmail', or 'NMap'. The tool | ||
|  |                doesn't care about the exact value, but requires consistency - so | ||
|  |                don't switch between 'Internet Explorer' and 'MSIE', or 'MacOS' | ||
|  |                and 'Mac OS'. | ||
|  | 
 | ||
|  |   flavor     - anything you want to say to further qualify the observation. Can | ||
|  |                be the version of the identified software, or a description of | ||
|  |                what the application seems to be doing (e.g. 'SYN scan' for NMap). | ||
|  | 
 | ||
|  |                NOTE: Don't be too specific: if you have a signature for Apache | ||
|  |                2.2.16, but have no reason to suspect that other recent versions | ||
|  |                behave in a radically different way, just say '2.x'. | ||
|  | 
 | ||
|  | P0f uses labels to group similar signatures that may be plausibly generated by | ||
|  | the same system or application, and should not be considered a strong signal for | ||
|  | NAT detection. | ||
|  | 
 | ||
|  | To further assist the tool in deciding which OS and application combinations are | ||
|  | reasonable, and which ones are indicative of foul play, any 'label' line for | ||
|  | applications (class '!') should be followed by a comma-delimited list of OS | ||
|  | names or @-prefixed OS architecture classes on which this software is known to | ||
|  | be used on. For example: | ||
|  | 
 | ||
|  | label = s:!:Uncle John's Networked ls Utility:2.3.0.1 | ||
|  | sys   = Linux,FreeBSD,OpenBSD | ||
|  | 
 | ||
|  | ...or: | ||
|  | 
 | ||
|  | label = s:!:Mom's Homestyle Browser:1.x | ||
|  | sys = @unix,@win | ||
|  | 
 | ||
|  | The label can be followed by any number of module-specific signatures; all of | ||
|  | them will be linked to the most recent label, and will be reported the same | ||
|  | way. | ||
|  | 
 | ||
|  | All sections except for 'name' are omitted for [mtu] signatures, which do not | ||
|  | convey any OS-specific information, and just describe link types. | ||
|  | 
 | ||
|  | == MTU signatures == | ||
|  | 
 | ||
|  | Many operating systems derive the maximum segment size specified in TCP options | ||
|  | from the MTU of their network interface; that value, in turn, normally depends | ||
|  | on the design of the link-layer protocol. A different MTU is associated with | ||
|  | PPPoE, a different one with IPSec, and a different one with Juniper VPN. | ||
|  | 
 | ||
|  | The format of the signatures in the [mtu] section is exceedingly simple, | ||
|  | consisting just of a description and a list of values: | ||
|  | 
 | ||
|  | label = Ethernet | ||
|  | sig   = 1500 | ||
|  | 
 | ||
|  | These will be matched for any wildcard MSS TCP packets (see below) not generated | ||
|  | by userspace TCP tools. | ||
|  | 
 | ||
|  | == TCP signatures == | ||
|  | 
 | ||
|  | For TCP traffic, signature layout is as follows: | ||
|  | 
 | ||
|  | sig = ver:ittl:olen:mss:wsize,scale:olayout:quirks:pclass | ||
|  | 
 | ||
|  |   ver        - signature for IPv4 ('4'), IPv6 ('6'), or both ('*'). | ||
|  | 
 | ||
|  |                NEW SIGNATURES: P0f documents the protocol observed on the wire, | ||
|  |                but you should replace it with '*' unless you have observed some | ||
|  |                actual differences between IPv4 and IPv6 traffic, or unless the | ||
|  |                software supports only one of these versions to begin with. | ||
|  | 
 | ||
|  |   ittl       - initial TTL used by the OS. Almost all operating systems use | ||
|  |                64, 128, or 255; ancient versions of Windows sometimes used | ||
|  |                32, and several obscure systems sometimes resort to odd values | ||
|  |                such as 60. | ||
|  | 
 | ||
|  |                NEW SIGNATURES: P0f will usually suggest something, using the | ||
|  |                format of 'observed_ttl+distance' (e.g. 54+10). Consider using | ||
|  |                traceroute to check that the distance is accurate, then sum up | ||
|  |                the values. If initial TTL can't be guessed, p0f will output | ||
|  |                'nnn+?', and you need to use traceroute to estimate the '?'. | ||
|  | 
 | ||
|  |                A handful of userspace tools will generate random TTLs. In these | ||
|  |                cases, determine maximum initial TTL and then add a - suffix to | ||
|  |                the value to avoid confusion. | ||
|  | 
 | ||
|  |   olen       - length of IPv4 options or IPv6 extension headers. Usually zero | ||
|  |                for normal IPv4 traffic; always zero for IPv6 due to the | ||
|  |                limitations of libpcap. | ||
|  | 
 | ||
|  |                NEW SIGNATURES: Copy p0f output literally. | ||
|  | 
 | ||
|  |   mss        - maximum segment size, if specified in TCP options. Special value | ||
|  |                of '*' can be used to denote that MSS varies depending on the | ||
|  |                parameters of sender's network link, and should not be a part of | ||
|  |                the signature. In this case, MSS will be used to guess the | ||
|  |                type of network hookup according to the [mtu] rules. | ||
|  | 
 | ||
|  |                NEW SIGNATURES: Use '*' for any commodity OSes where MSS is | ||
|  |                around 1300 - 1500, unless you know for sure that it's fixed. | ||
|  |                If the value is outside that range, you can probably copy it | ||
|  |                literally. | ||
|  | 
 | ||
|  |   wsize      - window size. Can be expressed as a fixed value, but many | ||
|  |                operating systems set it to a multiple of MSS or MTU, or a | ||
|  |                multiple of some random integer. P0f automatically detects these | ||
|  |                cases, and allows notation such as 'mss*4', 'mtu*4', or '%8192' | ||
|  |                to be used. Wilcard ('*') is possible too. | ||
|  | 
 | ||
|  |                NEW SIGNATURES: Copy p0f output literally. If frequent variations | ||
|  |                are seen, look for obvious patterns. If there are no patterns, | ||
|  |                '*' is a possible alternative. | ||
|  | 
 | ||
|  |   scale      - window scaling factor, if specified in TCP options. Fixed value | ||
|  |                or '*'. | ||
|  | 
 | ||
|  |                NEW SIGNATURES: Copy literally, unless the value varies randomly. | ||
|  |                Many systems alter between 2 or 3 scaling factors, in which case, | ||
|  |                it's better to have several 'sig' lines, rather than a wildcard. | ||
|  | 
 | ||
|  |   olayout    - comma-delimited layout and ordering of TCP options, if any. This | ||
|  |                is one of the most valuable TCP fingerprinting signals. Supported | ||
|  |                values: | ||
|  | 
 | ||
|  |                eol+n  - explicit end of options, followed by n bytes of padding | ||
|  |                nop    - no-op option | ||
|  |                mss    - maximum segment size | ||
|  |                ws     - window scaling | ||
|  |                sok    - selective ACK permitted | ||
|  |                sack   - selective ACK (should not be seen) | ||
|  |                ts     - timestamp | ||
|  |                ?n     - unknown option ID n | ||
|  | 
 | ||
|  |                NEW SIGNATURES: Copy this string literally. | ||
|  | 
 | ||
|  |   quirks     - comma-delimited properties and quirks observed in IP or TCP | ||
|  |                headers: | ||
|  | 
 | ||
|  |                df     - "don't fragment" set (probably PMTUD); ignored for IPv6 | ||
|  |                id+    - DF set but IPID non-zero; ignored for IPv6 | ||
|  |                id-    - DF not set but IPID is zero; ignored for IPv6 | ||
|  |                ecn    - explicit congestion notification support | ||
|  |                0+     - "must be zero" field not zero; ignored for IPv6 | ||
|  |                flow   - non-zero IPv6 flow ID; ignored for IPv4 | ||
|  | 
 | ||
|  |                seq-   - sequence number is zero | ||
|  |                ack+   - ACK number is non-zero, but ACK flag not set | ||
|  |                ack-   - ACK number is zero, but ACK flag set | ||
|  |                uptr+  - URG pointer is non-zero, but URG flag not set | ||
|  |                urgf+  - URG flag used | ||
|  |                pushf+ - PUSH flag used | ||
|  | 
 | ||
|  |                ts1-   - own timestamp specified as zero | ||
|  |                ts2+   - non-zero peer timestamp on initial SYN | ||
|  |                opt+   - trailing non-zero data in options segment | ||
|  |                exws   - excessive window scaling factor (> 14) | ||
|  |                bad    - malformed TCP options | ||
|  | 
 | ||
|  |                If a signature scoped to both IPv4 and IPv6 contains quirks valid | ||
|  |                for just one of these protocols, such quirks will be ignored for | ||
|  |                on packets using the other protocol. For example, any combination | ||
|  |                of 'df', 'id+', and 'id-' is always matched by any IPv6 packet. | ||
|  | 
 | ||
|  |                NEW SIGNATURES: Copy literally. | ||
|  | 
 | ||
|  |   pclass     - payload size classification: '0' for zero, '+' for non-zero, | ||
|  |                '*' for any. The packets we fingerprint right now normally have | ||
|  |                no payloads, but some corner cases exist. | ||
|  | 
 | ||
|  |                NEW SIGNATURES: Copy literally. | ||
|  | 
 | ||
|  | NOTE: The TCP module allows some fuzziness when an exact match can't be found: | ||
|  | 'df' and 'id+' quirks are allowed to disappear; 'id-' or 'ecn' may appear; and | ||
|  | TTLs can change. | ||
|  | 
 | ||
|  | To gather new SYN ('request') signatures, simply connect to the fingerprinted | ||
|  | system, and p0f will provide you with the necessary data. To gather SYN+ACK | ||
|  | ('response') signatures, you should use the bundled p0f-sendsyn utility while p0f | ||
|  | is running in the background; creating them manually is not advisable. | ||
|  | 
 | ||
|  | == HTTP signatures == | ||
|  | 
 | ||
|  | A special directive should appear at the beginning of the [http:request] | ||
|  | section, structured the following way: | ||
|  | 
 | ||
|  | ua_os = Linux,Windows,iOS=[iPad],iOS=[iPhone],Mac OS X,... | ||
|  | 
 | ||
|  | This list should specify OS names that should be looked for within the | ||
|  | User-Agent string if the string is otherwise deemed to be honest. This input | ||
|  | is not used for fingerprinting, but aids NAT detection in some useful ways. | ||
|  | 
 | ||
|  | The names have to match the names used in 'sig' specifiers across p0f.fp. If a | ||
|  | particular name used by p0f differs from what typically appears in User-Agent, | ||
|  | the name=[string] syntax may be used to define any number of aliases. | ||
|  | 
 | ||
|  | Other than that, HTTP signatures for GET and HEAD requests have the following | ||
|  | layout: | ||
|  | 
 | ||
|  | sig = ver:horder:habsent:expsw | ||
|  | 
 | ||
|  |   ver        - 0 for HTTP/1.0, 1 for HTTP/1.1, or '*' for any.  | ||
|  | 
 | ||
|  |                NEW SIGNATURES: Copy the value literally, unless you have a | ||
|  |                specific reason to do otherwise. | ||
|  | 
 | ||
|  |   horder     - comma-separated, ordered list of headers that should appear in | ||
|  |                matching traffic. Substrings to match within each of these | ||
|  |                headers may be specified using a name=[value] notation. | ||
|  | 
 | ||
|  |                The signature will be matched even if other headers appear in | ||
|  |                between, as long as the list itself is matched in the specified | ||
|  |                sequence. | ||
|  | 
 | ||
|  |                Headers that usually do appear in the traffic, but may go away | ||
|  |                (e.g. Accept-Language if the user has no languages defined, or | ||
|  |                Referer if no referring site exists) should be prefixed with '?', | ||
|  |                e.g. "?Referer". P0f will accept their disappearance, but will | ||
|  |                not allow them to appear at any other location. | ||
|  | 
 | ||
|  |                NEW SIGNATURES: Review the list and remove any headers that | ||
|  |                appear to be irrelevant to the fingerprinted software, and mark | ||
|  |                transient ones with '?'. Remove header values that do not add | ||
|  |                anything to the signature, or are request- or user-specific. | ||
|  |                In particular, pay attention to Accept, Accept-Language, and | ||
|  |                Accept-Charset, as they are highly specific to request type | ||
|  |                and user settings. | ||
|  | 
 | ||
|  |                P0f automatically removes some headers, prefixes others with '?', | ||
|  |                and inhibits the value of fields such as 'Referer' or 'Cookie' - | ||
|  |                but this is not a substitute for manual review. | ||
|  | 
 | ||
|  |                NOTE: Server signatures may differ depending on the request | ||
|  |                (HTTP/1.1 versus 1.0, keep-alive versus one-shot, etc) and on the | ||
|  |                returned resource (e.g., CGI versus static content). Play around, | ||
|  |                browse to several URLs, also try curl and wget. | ||
|  | 
 | ||
|  |   habsent    - comma-separated list of headers that must *not* appear in | ||
|  |                matching traffic. This is particularly useful for noting the | ||
|  |                absence of standard headers (e.g. 'Host'), or for differentiating | ||
|  |                between otherwise very similar signatures. | ||
|  | 
 | ||
|  |                NEW SIGNATURES: P0f will automatically highlight the absence of | ||
|  |                any normally present headers; other entries may be added where | ||
|  |                necessary. | ||
|  | 
 | ||
|  |   expsw      - expected substring in 'User-Agent' or 'Server'. This is not | ||
|  |                used to match traffic, and merely serves to detect dishonest | ||
|  |                software. If you want to explicitly match User-Agent, you need | ||
|  |                to do this in the 'horder' section, e.g.: | ||
|  | 
 | ||
|  |                User-Agent=[Firefox] | ||
|  | 
 | ||
|  | Any of these sections sections except for 'ver' may be blank. | ||
|  | 
 | ||
|  | There are many protocol-level quirks that p0f could be detecting - for example, | ||
|  | the use of non-standard newlines, or missing or extra spacing between header | ||
|  | field names and values. There is also some information to be gathered from | ||
|  | responses to OPTIONS or POST. That said, it does not seem to be worth the | ||
|  | effort: the protocol is so verbose, and implemented so arbitrarily, that we are | ||
|  | getting more than enough information just with a simple GET / HEAD fingerprint. | ||
|  | 
 | ||
|  | == SMTP signatures == | ||
|  | 
 | ||
|  |    *** NOT IMPLEMENTED YET *** | ||
|  | 
 | ||
|  | == FTP signatures == | ||
|  | 
 | ||
|  |    *** NOT IMPLEMENTED YET *** | ||
|  | 
 | ||
|  | ---------------- | ||
|  | 6. NAT detection | ||
|  | ---------------- | ||
|  | 
 | ||
|  | In addition to fairly straightforward measurements of intrinsic properties of | ||
|  | a single TCP session, p0f also tries to compare signatures across sessions to | ||
|  | detect client-side connection sharing (NAT, HTTP proxies) or server-side load | ||
|  | balancing. | ||
|  | 
 | ||
|  | This is done in two steps: the first significant deviation usually prompts a | ||
|  | "host change" entry (which may be also indicative of multi-boot, address reuse, | ||
|  | or other one-off events); and a persistent pattern of changes prompts an | ||
|  | "ip sharing" notification later on. | ||
|  | 
 | ||
|  | All of these messages are accompanied by a set of reason codes: | ||
|  | 
 | ||
|  |   os_sig       - the OS detected right now doesn't match the OS detected earlier | ||
|  |                  on. | ||
|  | 
 | ||
|  |   sig_diff     - no definite OS detection data available, but protocol-level | ||
|  |                  characteristics have changed drastically (e.g., different | ||
|  |                  TCP option layout). | ||
|  | 
 | ||
|  |   app_vs_os    - the application detected running on the host is not supposed | ||
|  |                  to work on the host's operating system. | ||
|  | 
 | ||
|  |   x_known      - the signature progressed from known to unknown, or vice versa. | ||
|  | 
 | ||
|  | The following additional codes are specific to TCP: | ||
|  | 
 | ||
|  |   tstamp       - TCP timestamps went back or jumped forward. | ||
|  | 
 | ||
|  |   ttl          - TTL values have changed. | ||
|  | 
 | ||
|  |   port         - source port number has decreased. | ||
|  | 
 | ||
|  |   mtu          - system MTU has changed. | ||
|  | 
 | ||
|  |   fuzzy        - the precision with which a TCP signature is matched has | ||
|  |                  changed. | ||
|  | 
 | ||
|  | The following code is also issued by the HTTP module: | ||
|  | 
 | ||
|  |   via          - data explicitly includes Via / X-Forwarded-For. | ||
|  | 
 | ||
|  |   us_vs_os     - OS fingerprint doesn't match User-Agent data, and the | ||
|  |                  User-Agent value otherwise looks honest. | ||
|  | 
 | ||
|  |   app_srv_lb   - server application signatures change, suggesting load | ||
|  |                  balancing. | ||
|  | 
 | ||
|  |   date         - server-advertised date changes inconsistently. | ||
|  | 
 | ||
|  | Different reasons have different weights, balanced to keep p0f very sensitive | ||
|  | even to very homogenous environments behind NAT. If you end up seeing false | ||
|  | positives or other detection problems in your environment, please let me know! | ||
|  | 
 | ||
|  | ----------- | ||
|  | 7. Security | ||
|  | ----------- | ||
|  | 
 | ||
|  | You should treat the output from this tool as advisory; the fingerprinting can | ||
|  | be gambled with some minor effort, and it's also possible to evade it altogether | ||
|  | (e.g. with excessive IP fragmentation or bad TCP checksums). Plan accordingly. | ||
|  | 
 | ||
|  | P0f should to be reasonably secure to operate as a daemon. That said, un*x | ||
|  | users should employ the -u option to drop privileges and chroot() when running | ||
|  | the tool continuously. This greatly minimizes the consequences of any mishaps - | ||
|  | and mishaps in C just tend to happen. | ||
|  | 
 | ||
|  | To make this step meaningful, the user you are running p0f as should be | ||
|  | completely unprivileged, and should have an empty, read-only home directory. For | ||
|  | example, you can do: | ||
|  | 
 | ||
|  | # useradd -d /var/empty/p0f -M -r -s /bin/nologin p0f-user | ||
|  | # mkdir -p -m 755 /var/empty/p0f | ||
|  | 
 | ||
|  | Please don't put the p0f binary itself, or any other valuable assets, inside | ||
|  | that user's home directory; and certainly do not use any generic locations such | ||
|  | as / or /bin/ in lieu of a proper home. | ||
|  | 
 | ||
|  | P0f running in the background should be fairly difficult to DoS, especially | ||
|  | compared to any real TCP services it will be watching. Nevertheless, there are | ||
|  | so many deployment-specific factors at play that you should always preemptively | ||
|  | stress-test your setup, and see how it behaves. | ||
|  | 
 | ||
|  | Other than that, let's talk filesystem security. When using the tool in the | ||
|  | API mode (-s), the listening socket is always re-created created with 666 | ||
|  | permissions, so that applications running as other uids can query it at will. | ||
|  | If you want to preserve the privacy of captured traffic in a multi-user system, | ||
|  | please ensure that the socket is created in a directory with finer-grained | ||
|  | permissions; or change API_MODE in config.h. | ||
|  | 
 | ||
|  | The default file mode for binary log data (-o) is 600, on the account that | ||
|  | others probably don't need access to historical data; if you need to share logs, | ||
|  | you can pre-create the file or change LOG_MODE in config.h. | ||
|  | 
 | ||
|  | Don't build p0f, and do not store its source, binary, configuration files, logs, | ||
|  | or query sockets in world-writable locations such as /tmp (or any | ||
|  | subdirectories created therein). | ||
|  | 
 | ||
|  | Last but not least, please do not attempt to make p0f setuid, or otherwise | ||
|  | grant it privileges higher than these of the calling user. Neither the tool | ||
|  | itself, nor the third-party components it depends on, are designed to keep rogue | ||
|  | less-privileged callers at bay. If you use /etc/sudoers to list p0f as the only | ||
|  | program that user X should be able to run as root, that user will probably be | ||
|  | able to compromise your system. The same goes for many other uses of sudo, by | ||
|  | the way. | ||
|  | 
 | ||
|  | -------------- | ||
|  | 8. Limitations | ||
|  | -------------- | ||
|  | 
 | ||
|  | Here are some of the known issues you may run into: | ||
|  | 
 | ||
|  | == General == | ||
|  | 
 | ||
|  | 1) RST, ACK, and other experimental fingerprinting modes offered in p0f v2 are | ||
|  |    no longer supported in v3. This is because they proved to have very low | ||
|  |    specificity. The consequence is that you can no longer fingerprint | ||
|  |    "connection refused" responses. | ||
|  | 
 | ||
|  | 2) API queries or daemon execution are not supported when reading offline pcaps. | ||
|  |    While there may be some fringe use cases for that, offline pcaps use a | ||
|  |    much simpler event loop, and so supporting these features would require some | ||
|  |    extra effort. | ||
|  | 
 | ||
|  | 3) P0f needs to observe at least about 25 milliseconds worth of qualifying | ||
|  |    traffic to estimate system uptime. This means that if you're testing it over | ||
|  |    loopback or LAN, you may need to let it see more than one connection. | ||
|  | 
 | ||
|  |    Systems with extremely slow timestamp clocks may need longer acquisition | ||
|  |    periods (up to several seconds); very fast clocks (over 1.5 kHz) are rejected | ||
|  |    completely on account of being prohibited by the RFC. Almost all OSes are | ||
|  |    between 100 Hz and 1 kHz, which should work fine. | ||
|  | 
 | ||
|  | 4) Some systems vary SYN+ACK responses based on the contents of the initial SYN, | ||
|  |    sometimes removing TCP options not supported by the other endpoint.  | ||
|  |    Unfortunately, there is no easy way to account for this, so several SYN+ACK | ||
|  |    signatures may be required per system. The bundled p0f-sendsyn utility helps | ||
|  |    with collecting them. | ||
|  | 
 | ||
|  |    Another consequence of this is that you will sometimes see server uptime only | ||
|  |    if your own system has RFC1323 timestamps enabled. Linux does that since | ||
|  |    version 2.2; on Windows, you need version 7 or newer. Client uptimes are not | ||
|  |    affected. | ||
|  | 
 | ||
|  | == Windows port == | ||
|  | 
 | ||
|  | 1) API sockets do not work on Windows. This is due to a limitation of winpcap; | ||
|  |    see live_event_loop(...) in p0f.c for more info. | ||
|  | 
 | ||
|  | 2) The chroot() jail (-u) on Windows doesn't offer any real security. This is | ||
|  |    due to the limitations of cygwin. | ||
|  | 
 | ||
|  | 3) The p0f-sendsyn utility doesn't work because of the limited capabilities of | ||
|  |    Windows raw sockets (this should be relatively easy to fix if there are any | ||
|  |    users who care). | ||
|  | 
 | ||
|  | --------------------------- | ||
|  | 9. Acknowledgments and more | ||
|  | --------------------------- | ||
|  | 
 | ||
|  | P0f is made possible thanks to the contributions of several good souls, | ||
|  | including: | ||
|  | 
 | ||
|  |   Phil Ames | ||
|  |   Jannich Brendle | ||
|  |   Matthew Dempsky | ||
|  |   Jason DePriest | ||
|  |   Dalibor Dukic | ||
|  |   Mark Martinec | ||
|  |   Damien Miller | ||
|  |   Josh Newton | ||
|  |   Nibbler | ||
|  |   Bernhard Rabe | ||
|  |   Chris John Riley | ||
|  |   Sebastian Roschke | ||
|  |   Peter Valchev | ||
|  |   Jeff Weisberg | ||
|  |   Anthony Howe | ||
|  |   Tomoyuki Murakami | ||
|  |   Michael Petch | ||
|  | 
 | ||
|  | If you wish to help, the most immediate way to do so is to simply gather new | ||
|  | signatures, especially from less popular or older platforms (servers, networking | ||
|  | equipment, portable / embedded / specialty OSes, etc). | ||
|  | 
 | ||
|  | Problems? Suggestions? Complaints? Compliments? You can reach the author at | ||
|  | <lcamtuf@coredump.cx>. The author is very lonely and appreciates your mail. |