mirror of
				https://github.com/telekom-security/tpotce.git
				synced 2025-10-25 17:54:44 +00:00 
			
		
		
		
	
		
			
				
	
	
		
			916 lines
		
	
	
	
		
			39 KiB
		
	
	
	
		
			Text
		
	
	
	
	
	
			
		
		
	
	
			916 lines
		
	
	
	
		
			39 KiB
		
	
	
	
		
			Text
		
	
	
	
	
	
|                         =============================
 | |
|                         p0f v3: passive fingerprinter
 | |
|                         =============================
 | |
| 
 | |
|                     http://lcamtuf.coredump.cx/p0f3.shtml
 | |
| 
 | |
|          Copyright (C) 2012 by Michal Zalewski <lcamtuf@coredump.cx>
 | |
| 
 | |
| 
 | |
| ---------------
 | |
| 1. What's this?
 | |
| ---------------
 | |
| 
 | |
| P0f is a tool that utilizes an array of sophisticated, purely passive traffic
 | |
| fingerprinting mechanisms to identify the players behind any incidental TCP/IP
 | |
| communications (often as little as a single normal SYN) without interfering in
 | |
| any way.
 | |
| 
 | |
| Some of its capabilities include:
 | |
| 
 | |
|   - Highly scalable and extremely fast identification of the operating system
 | |
|     and software on both endpoints of a vanilla TCP connection - especially in
 | |
|     settings where NMap probes are blocked, too slow, unreliable, or would
 | |
|     simply set off alarms,
 | |
| 
 | |
|   - Measurement of system uptime and network hookup, distance (including
 | |
|     topology behind NAT or packet filters), and so on.
 | |
| 
 | |
|   - Automated detection of connection sharing / NAT, load balancing, and
 | |
|     application-level proxying setups.
 | |
| 
 | |
|   - Detection of dishonest clients / servers that forge declarative statements
 | |
|     such as X-Mailer or User-Agent.
 | |
| 
 | |
| The tool can be operated in the foreground or as a daemon, and offers a simple
 | |
| real-time API for third-party components that wish to obtain additional
 | |
| information about the actors they are talking to.
 | |
| 
 | |
| Common uses for p0f include reconnaissance during penetration tests; routine
 | |
| network monitoring; detection of unauthorized network interconnects in corporate
 | |
| environments; providing signals for abuse-prevention tools; and miscellanous
 | |
| forensics.
 | |
| 
 | |
| A snippet of typical p0f output may look like this:
 | |
| 
 | |
| .-[ 1.2.3.4/1524 -> 4.3.2.1/80 (syn) ]-
 | |
| |
 | |
| | client   = 1.2.3.4
 | |
| | os       = Windows XP
 | |
| | dist     = 8
 | |
| | params   = none
 | |
| | raw_sig  = 4:120+8:0:1452:65535,0:mss,nop,nop,sok:df,id+:0
 | |
| |
 | |
| `----
 | |
| 
 | |
| .-[ 1.2.3.4/1524 -> 4.3.2.1/80 (syn+ack) ]-
 | |
| |
 | |
| | server   = 4.3.2.1
 | |
| | os       = Linux 3.x
 | |
| | dist     = 0
 | |
| | params   = none
 | |
| | raw_sig  = 4:64+0:0:1460:mss*10,0:mss,nop,nop,sok:df:0
 | |
| |
 | |
| `----
 | |
| 
 | |
| .-[ 1.2.3.4/1524 -> 4.3.2.1/80 (mtu) ]-
 | |
| |
 | |
| | client   = 1.2.3.4
 | |
| | link     = DSL
 | |
| | raw_mtu  = 1492
 | |
| |
 | |
| `----
 | |
| 
 | |
| .-[ 1.2.3.4/1524 -> 4.3.2.1/80 (uptime) ]-
 | |
| |
 | |
| | client   = 1.2.3.4
 | |
| | uptime   = 0 days 11 hrs 16 min (modulo 198 days)
 | |
| | raw_freq = 250.00 Hz
 | |
| |
 | |
| `----
 | |
| 
 | |
| A live demonstration can be seen here:
 | |
| 
 | |
| http://lcamtuf.coredump.cx/p0f3/
 | |
| 
 | |
| --------------------
 | |
| 2. How does it work?
 | |
| --------------------
 | |
| 
 | |
| A vast majority of metrics used by p0f were invented specifically for this tool,
 | |
| and include data extracted from IPv4 and IPv6 headers, TCP headers, the dynamics
 | |
| of the TCP handshake, and the contents of application-level payloads.
 | |
| 
 | |
| For TCP/IP, the tool fingerprints the client-originating SYN packet and the
 | |
| first SYN+ACK response from the server, paying attention to factors such as the
 | |
| ordering of TCP options, the relation between maximum segment size and window
 | |
| size, the progression of TCP timestamps, and the state of about a dozen possible
 | |
| implementation quirks (e.g. non-zero values in "must be zero" fields).
 | |
| 
 | |
| The metrics used for application-level traffic vary from one module to another;
 | |
| where possible, the tool relies on signals such as the ordering or syntax of
 | |
| HTTP headers or SMTP commands, rather than any declarative statements such as
 | |
| User-Agent. Application-level fingerprinting modules currently support HTTP.
 | |
| Before the tool leaves "beta", I want to add SMTP and FTP. Other protocols,
 | |
| such as FTP, POP3, IMAP, SSH, and SSL, may follow.
 | |
| 
 | |
| The list of all the measured parameters is reviewed in section 5 later on.
 | |
| Some of the analysis also happens on a higher level: inconsistencies in the
 | |
| data collected from various sources, or in the data from the same source
 | |
| obtained over time, may be indicative of address translation, proxying, or
 | |
| just plain trickery. For example, a system where TCP timestamps jump back
 | |
| and forth, or where TTLs and MTUs change subtly, is probably a NAT device.
 | |
| 
 | |
| -------------------------------
 | |
| 3. How do I compile and use it?
 | |
| -------------------------------
 | |
| 
 | |
| To compile p0f, try running './build.sh'; if that fails, you will be probably
 | |
| given some tips about the probable cause. If the tips are useless, send me a
 | |
| mean-spirited mail.
 | |
| 
 | |
| It is also possible to build a debug binary ('./build.sh debug'), in which case,
 | |
| verbose packet parsing and signature matching information will be written to
 | |
| stderr. This is useful when troubleshooting problems, but that's about it.
 | |
| 
 | |
| The tool should compile cleanly under any reasonably new version of Linux,
 | |
| FreeBSD, OpenBSD, MacOS X, and so forth. You can also builtdit on Windows using
 | |
| cygwin and winpcap. I have not tested it on all possible varieties of un*x, but
 | |
| if there are issues, they should be fairly superficial.
 | |
| 
 | |
| Once you have the binary compiled, you should be aware of the following
 | |
| command-line options:
 | |
| 
 | |
|   -f fname   - reads fingerprint database (p0f.fp) from the specified location.
 | |
|                See section 5 for more information about the contents of this
 | |
|                file.
 | |
| 
 | |
|                The default location is ./p0f.fp. If you want to install p0f, you
 | |
|                may want to change FP_FILE in config.h to /etc/p0f.fp.
 | |
| 
 | |
|   -i iface   - asks p0f to listen on a specific network interface. On un*x, you
 | |
|                should reference the interface by name (e.g., eth0). On Windows,
 | |
|                you can use adapter index instead (0, 1, 2...).
 | |
|                
 | |
|                Multiple -i parameters are not supported; you need to run
 | |
|                separate instances of p0f for that. On Linux, you can specify
 | |
|                'any' to access a pseudo-device that combines the traffic on
 | |
|                all other interfaces; the only limitation is that libpcap will
 | |
|                not recognize VLAN-tagged frames in this mode, which may be
 | |
|                an issue in some of the more exotic setups.
 | |
| 
 | |
|                If you do not specify an interface, libpcap will probably pick
 | |
|                the first working interface in your system.
 | |
|                
 | |
|   -L         - lists all available network interfaces, then quits. Particularly
 | |
|                useful on Windows, where the system-generated interface names
 | |
|                are impossible to memorize.
 | |
|                
 | |
|   -r fname   - instead of listening for live traffic, reads pcap captures from
 | |
|                the specified file. The data can be collected with tcpdump or any
 | |
|                other compatible tool. Make sure that snapshot length (-s
 | |
|                option in tcpdump) is large enough not to truncate packets; the
 | |
|                default may be too small.
 | |
| 
 | |
|                As with -i, only one -r option can be specified at any given
 | |
|                time.
 | |
|                
 | |
|   -o fname   - appends grep-friendly log data to the specified file. The log
 | |
|                contains all observations made by p0f about every matching
 | |
|                connection, and may grow large; plan accordingly.
 | |
| 
 | |
|                Only one instance of p0f should be writing to a particular file
 | |
|                at any given time; where supported, advisory locking is used to
 | |
|                avoid problems.
 | |
|                
 | |
|   -s fname   - listens for API queries on the specified filesystem socket. This
 | |
|                allows other programs to ask p0f about its current thoughts about
 | |
|                a particular host. More information about the API protocol can be
 | |
|                found in section 4 below.
 | |
| 
 | |
|                Only one instance of p0f can be listening on a particular socket
 | |
|                at any given time. The mode is also incompatible with -r.
 | |
| 
 | |
|   -d         - runs p0f in daemon mode: the program will fork into background
 | |
|                and continue writing to the specified log file or API socket. It
 | |
|                will continue running until killed, until the listening interface
 | |
|                is shut down, or until some other fatal error is encountered.
 | |
| 
 | |
|                This mode requires either -o or -s to be specified.
 | |
| 
 | |
|                To continue capturing p0f debug output and error messages (but
 | |
|                not signatures), redirect stderr to another non-TTY destination,
 | |
|                e.g.:
 | |
|                
 | |
|                ./p0f -o /var/log/p0f.log -d 2>>/var/log/p0f.error
 | |
|                
 | |
|                Note that if -d is specified and stderr points to a TTY, error
 | |
|                messages will be lost.
 | |
| 
 | |
|    -u user   - causes p0f to drop privileges, switching to the specified user
 | |
|                and chroot()ing itself to said user's home directory.
 | |
| 
 | |
|                This mode is *highly* advisable (but not required) on un*x
 | |
|                systems, especially in daemon mode. See section 7 for more info.
 | |
| 
 | |
| More arcane settings (you probably don't need to touch these):
 | |
| 
 | |
|   -j         - Log in JSON format.
 | |
| 
 | |
|   -l         - Line buffered mode for logging to output file.
 | |
| 
 | |
|   -p         - puts the interface specified with -i in promiscuous mode. If
 | |
|                supported by the firmware, the card will also process frames not
 | |
|                addressed to it. 
 | |
| 
 | |
|   -S num     - sets the maximum number of simultaneous API connections. The
 | |
|                default is 20; the upper cap is 100.
 | |
| 
 | |
|   -m c,h     - sets the maximum number of connections (c) and hosts (h) to be
 | |
|                tracked at the same time (default: c = 1,000, h = 10,000). Once
 | |
|                the limit is reached, the oldest 10% entries gets pruned to make
 | |
|                room for new data.
 | |
| 
 | |
|                This setting effectively controls the memory footprint of p0f.
 | |
|                The cost of tracking a single host is under 400 bytes; active
 | |
|                connections have a worst-case footprint of about 18 kB. High
 | |
|                limits have some CPU impact, too, by the virtue of complicating
 | |
|                data lookups in the cache.
 | |
| 
 | |
|                NOTE: P0f tracks connections only until the handshake is done,
 | |
|                and if protocol-level fingerprinting is possible, until few
 | |
|                initial kilobytes of data have been exchanged. This means that
 | |
|                most connections are dropped from the cache in under 5 seconds;
 | |
|                consequently, the 'c' variable can be much lower than the real
 | |
|                number of parallel connections happening on the wire.
 | |
| 
 | |
|   -t c,h     - sets the timeout for collecting signatures for any connection
 | |
|                (c); and for purging idle hosts from in-memory cache (h). The
 | |
|                first parameter is given in seconds, and defaults to 30 s; the
 | |
|                second one is in minutes, and defaults to 120 min.
 | |
| 
 | |
|                The first value must be just high enough to reliably capture
 | |
|                SYN, SYN+ACK, and the initial few kB of traffic. Low-performance
 | |
|                sites may want to increase it slightly.
 | |
| 
 | |
|                The second value governs for how long API queries about a
 | |
|                previously seen host can be made; and what's the maximum interval
 | |
|                between signatures to still trigger NAT detection and so on.
 | |
|                Raising it is usually not advisable; lowering it to 5-10 minutes
 | |
|                may make sense for high-traffic servers, where it is possible to
 | |
|                see several unrelated visitors subsequently obtaining the same
 | |
|                dynamic IP from their ISP.
 | |
| 
 | |
| Well, that's about it. You probably need to run the tool as root. Some of the
 | |
| most common use cases:
 | |
| 
 | |
| # ./p0f -i eth0
 | |
| 
 | |
| # ./p0f -i eth0 -d -u p0f-user -o /var/log/p0f.log
 | |
| 
 | |
| # ./p0f -r some_capture.cap
 | |
| 
 | |
| The greppable log format (-o) uses pipe ('|') as a delimiter, with name=value
 | |
| pairs describing the signature in a manner very similar to the pretty-printed
 | |
| output generated on stdout:
 | |
| 
 | |
| [2012/01/04 10:26:14] mod=mtu|cli=1.2.3.4/1234|srv=4.3.2.1/80|subj=cli|link=DSL|raw_mtu=1492
 | |
| 
 | |
| The 'mod' parameter identifies the subsystem that generated the entry; the
 | |
| 'cli' and 'srv' parameters always describe the direction in which the TCP
 | |
| session is established; and 'subj' describes which of these two parties is
 | |
| actually being fingerprinted.
 | |
| 
 | |
| Command-line options may be followed by a single parameter containing a
 | |
| pcap-style traffic filtering rule. This allows you to reject some of the less
 | |
| interesting packets for performance or privacy reasons. Simple examples include:
 | |
| 
 | |
|   'dst net 10.0.0.0/8 and port 80'
 | |
|   
 | |
|   'not src host 10.1.2.3'
 | |
|   
 | |
|   'port 22 or port 443'
 | |
| 
 | |
| You can read more about the supported syntax by doing 'man pcap-fiter'; if
 | |
| that fails, try this URL:
 | |
| 
 | |
|   http://www.manpagez.com/man/7/pcap-filter/
 | |
|   
 | |
| Filters work both for online capture (-i) and for previously collected data
 | |
| produced by any other tool (-r).
 | |
| 
 | |
| -------------
 | |
| 4. API access
 | |
| -------------
 | |
| 
 | |
| The API allows other applications running on the same system to get p0f's
 | |
| current opinion about a particular host. This is useful for integrating it with
 | |
| spam filters, web apps, and so on.
 | |
| 
 | |
| Clients are welcome to connect to the unix socket specified with -s using the
 | |
| SOCK_STREAM protocol, and may issue any number of fixed-length queries. The
 | |
| queries will be answered in the order they are received.
 | |
| 
 | |
| Note that there is no response caching, nor any software limits in place on p0f
 | |
| end, so it is your responsibility to write reasonably well-behaved clients.
 | |
| 
 | |
| Queries have exactly 21 bytes. The format is:
 | |
| 
 | |
|   - Magic dword (0x50304601), in native endian of the platform.
 | |
| 
 | |
|   - Address type byte: 4 for IPv4, 6 for IPv6.
 | |
| 
 | |
|   - 16 bytes of address data, network endian. IPv4 addresses should be
 | |
|     aligned to the left.
 | |
| 
 | |
| To such a query, p0f responds with:
 | |
| 
 | |
|   - Another magic dword (0x50304602), native endian.
 | |
| 
 | |
|   - Status dword: 0x00 for 'bad query', 0x10 for 'OK', and 0x20 for 'no match'.
 | |
| 
 | |
|   - Host information, valid only if status is 'OK' (byte width in square
 | |
|     brackets):
 | |
| 
 | |
|     [4]  first_seen  - unix time (seconds) of first observation of the host.
 | |
| 
 | |
|     [4]  last_seen   - unix time (seconds) of most recent traffic.
 | |
| 
 | |
|     [4]  total_conn  - total number of connections seen.
 | |
| 
 | |
|     [4]  uptime_min  - calculated system uptime, in minutes. Zero if not known.
 | |
| 
 | |
|     [4]  up_mod_days - uptime wrap-around interval, in days.
 | |
| 
 | |
|     [4]  last_nat    - time of the most recent detection of IP sharing (NAT,
 | |
|                        load balancing, proxying). Zero if never detected.
 | |
| 
 | |
|     [4]  last_chg    - time of the most recent individual OS mismatch (e.g.,
 | |
|                        due to multiboot or IP reuse).
 | |
| 
 | |
|     [2]  distance    - system distance (derived from TTL; -1 if no data).
 | |
| 
 | |
|     [1]  bad_sw      - p0f thinks the User-Agent or Server strings aren't
 | |
|                        accurate. The value of 1 means OS difference (possibly
 | |
|                        due to proxying), while 2 means an outright mismatch.
 | |
| 
 | |
|                        NOTE: If User-Agent is not present at all, this value
 | |
|                        stays at 0.
 | |
| 
 | |
|     [1]  os_match_q  - OS match quality: 0 for a normal match; 1 for fuzzy
 | |
|                        (e.g., TTL or DF difference); 2 for a generic signature;
 | |
|                        and 3 for both.
 | |
| 
 | |
|     [32] os_name     - NUL-terminated name of the most recent positively matched
 | |
|                        OS. If OS not known, os_name[0] is NUL.
 | |
| 
 | |
|                        NOTE: If the host is first seen using an known system and
 | |
|                        then switches to an unknown one, this field is not
 | |
|                        reset.
 | |
| 
 | |
|     [32] os_flavor   - OS version. May be empty if no data.
 | |
| 
 | |
|     [32] http_name   - most recent positively identified HTTP application
 | |
|                        (e.g. 'Firefox').
 | |
| 
 | |
|     [32] http_flavor - version of the HTTP application, if any.
 | |
| 
 | |
|     [32] link_type   - network link type, if recognized.
 | |
| 
 | |
|     [32] language    - system language, if recognized.
 | |
| 
 | |
| A simple reference implementation of an API client is provided in p0f-client.c.
 | |
| Implementations in C / C++ may reuse api.h from p0f source code, too.
 | |
| 
 | |
| Developers using the API should be aware of several important constraints:
 | |
| 
 | |
|   - The maximum number of simultaneous API connections is capped to 20. The
 | |
|     limit may be adjusted with the -S parameter, but rampant parallelism may
 | |
|     lead to poorly controlled latency; consider a single query pipeline,
 | |
|     possibly with prioritization and caching.
 | |
|     
 | |
|   - The maximum number of hosts and connections tracked at any given time is
 | |
|     subject to configurable limits. You should look at your traffic stats and
 | |
|     see if the defaults are suitable.
 | |
| 
 | |
|     You should also keep in mind that whenever you are subject to an ongoing
 | |
|     DDoS or SYN spoofing DoS attack, p0f may end up dropping entries faster
 | |
|     than you could query for them. It's that or running out of memory, so
 | |
|     don't fret.
 | |
| 
 | |
|   - Cache entries with no activity for more than 120 minutes will be dropped
 | |
|     even if the cache is nearly empty. The timeout is adjustable with -t, but
 | |
|     you should not use the API to obtain ancient data; if you routinely need to
 | |
|     go back hours or days, parse the logs instead of wasting RAM.
 | |
| 
 | |
| -----------------------
 | |
| 5. Fingerprint database
 | |
| -----------------------
 | |
| 
 | |
| Whenever p0f obtains a fingerprint from the observed traffic, it defers to
 | |
| the data read from p0f.fp to identify the operating system and obtain some
 | |
| ancillary data needed for other analysis tasks. The fingerprint database is a
 | |
| simple text file where lines starting with ; are ignored.
 | |
| 
 | |
| == Module specification ==
 | |
| 
 | |
| The file is split into sections based on the type of traffic the fingerprints
 | |
| apply to. Section identifiers are enclosed in square brackets, like so:
 | |
| 
 | |
| [module:direction]
 | |
| 
 | |
|   module     - the name of the fingerprinting module (e.g. 'tcp' or 'http').
 | |
| 
 | |
|   direction  - the direction of fingerprinted traffic: 'request' (from client to
 | |
|                server) or 'response' (from server to client).
 | |
| 
 | |
|                For the TCP module, 'client' matches the initial SYN; and
 | |
|                'server' matches SYN+ACK.
 | |
| 
 | |
| The 'direction' part is omitted for MTU signatures, as they work equally well
 | |
| both ways.
 | |
| 
 | |
| == Signature groups ==
 | |
| 
 | |
| The actual signatures must be preceeded by an 'label' line, describing the
 | |
| fingerprinted software:
 | |
| 
 | |
| label = type:class:name:flavor
 | |
| 
 | |
|   type       - some signatures in p0f.fp offer broad, last-resort matching for
 | |
|                less researched corner cases. The goal there is to give an
 | |
|                answer slightly better than "unknown", but less precise than
 | |
|                what the user may be expecting.
 | |
| 
 | |
|                Normal, reasonably specific signatures that can't be radically
 | |
|                improved should have their type specified as 's'; while generic,
 | |
|                last-resort ones should be tagged with 'g'.
 | |
| 
 | |
|                Note that generic signatures are considered only if no specific
 | |
|                matches are found in the database.
 | |
| 
 | |
|   class      - the tool needs to distinguish between OS-identifying signatures
 | |
|                (only one of which should be matched for any given host) and
 | |
|                signatures that just identify user applications (many of which
 | |
|                may be seen concurrently).
 | |
| 
 | |
|                To assist with this, OS-specific signatures should specify the
 | |
|                OS architecture family here (e.g., 'win', 'unix', 'cisco'); while
 | |
|                application-related sigs (NMap, MSIE, Apache) should use a
 | |
|                special value of '!'.
 | |
| 
 | |
|                Most TCP signatures are OS-specific, and should have OS family
 | |
|                defined. Other signatures, such as HTTP, should use '!' unless
 | |
|                the fingerprinted component is deeply intertwined with the
 | |
|                platform (e.g., Windows Update).
 | |
| 
 | |
|                NOTE: To avoid variations (e.g. 'win' and 'windows' or 'unix'
 | |
|                and 'linux'), all classes need to be pre-registered using a
 | |
|                'classes' directive, seen near the beginning of p0f.fp.
 | |
| 
 | |
|   name       - a human-readable short name for what the fingerprint actually
 | |
|                helps identify - say, 'Linux', 'Sendmail', or 'NMap'. The tool
 | |
|                doesn't care about the exact value, but requires consistency - so
 | |
|                don't switch between 'Internet Explorer' and 'MSIE', or 'MacOS'
 | |
|                and 'Mac OS'.
 | |
| 
 | |
|   flavor     - anything you want to say to further qualify the observation. Can
 | |
|                be the version of the identified software, or a description of
 | |
|                what the application seems to be doing (e.g. 'SYN scan' for NMap).
 | |
| 
 | |
|                NOTE: Don't be too specific: if you have a signature for Apache
 | |
|                2.2.16, but have no reason to suspect that other recent versions
 | |
|                behave in a radically different way, just say '2.x'.
 | |
| 
 | |
| P0f uses labels to group similar signatures that may be plausibly generated by
 | |
| the same system or application, and should not be considered a strong signal for
 | |
| NAT detection.
 | |
| 
 | |
| To further assist the tool in deciding which OS and application combinations are
 | |
| reasonable, and which ones are indicative of foul play, any 'label' line for
 | |
| applications (class '!') should be followed by a comma-delimited list of OS
 | |
| names or @-prefixed OS architecture classes on which this software is known to
 | |
| be used on. For example:
 | |
| 
 | |
| label = s:!:Uncle John's Networked ls Utility:2.3.0.1
 | |
| sys   = Linux,FreeBSD,OpenBSD
 | |
| 
 | |
| ...or:
 | |
| 
 | |
| label = s:!:Mom's Homestyle Browser:1.x
 | |
| sys = @unix,@win
 | |
| 
 | |
| The label can be followed by any number of module-specific signatures; all of
 | |
| them will be linked to the most recent label, and will be reported the same
 | |
| way.
 | |
| 
 | |
| All sections except for 'name' are omitted for [mtu] signatures, which do not
 | |
| convey any OS-specific information, and just describe link types.
 | |
| 
 | |
| == MTU signatures ==
 | |
| 
 | |
| Many operating systems derive the maximum segment size specified in TCP options
 | |
| from the MTU of their network interface; that value, in turn, normally depends
 | |
| on the design of the link-layer protocol. A different MTU is associated with
 | |
| PPPoE, a different one with IPSec, and a different one with Juniper VPN.
 | |
| 
 | |
| The format of the signatures in the [mtu] section is exceedingly simple,
 | |
| consisting just of a description and a list of values:
 | |
| 
 | |
| label = Ethernet
 | |
| sig   = 1500
 | |
| 
 | |
| These will be matched for any wildcard MSS TCP packets (see below) not generated
 | |
| by userspace TCP tools.
 | |
| 
 | |
| == TCP signatures ==
 | |
| 
 | |
| For TCP traffic, signature layout is as follows:
 | |
| 
 | |
| sig = ver:ittl:olen:mss:wsize,scale:olayout:quirks:pclass
 | |
| 
 | |
|   ver        - signature for IPv4 ('4'), IPv6 ('6'), or both ('*').
 | |
| 
 | |
|                NEW SIGNATURES: P0f documents the protocol observed on the wire,
 | |
|                but you should replace it with '*' unless you have observed some
 | |
|                actual differences between IPv4 and IPv6 traffic, or unless the
 | |
|                software supports only one of these versions to begin with.
 | |
| 
 | |
|   ittl       - initial TTL used by the OS. Almost all operating systems use
 | |
|                64, 128, or 255; ancient versions of Windows sometimes used
 | |
|                32, and several obscure systems sometimes resort to odd values
 | |
|                such as 60.
 | |
| 
 | |
|                NEW SIGNATURES: P0f will usually suggest something, using the
 | |
|                format of 'observed_ttl+distance' (e.g. 54+10). Consider using
 | |
|                traceroute to check that the distance is accurate, then sum up
 | |
|                the values. If initial TTL can't be guessed, p0f will output
 | |
|                'nnn+?', and you need to use traceroute to estimate the '?'.
 | |
| 
 | |
|                A handful of userspace tools will generate random TTLs. In these
 | |
|                cases, determine maximum initial TTL and then add a - suffix to
 | |
|                the value to avoid confusion.
 | |
| 
 | |
|   olen       - length of IPv4 options or IPv6 extension headers. Usually zero
 | |
|                for normal IPv4 traffic; always zero for IPv6 due to the
 | |
|                limitations of libpcap.
 | |
| 
 | |
|                NEW SIGNATURES: Copy p0f output literally.
 | |
| 
 | |
|   mss        - maximum segment size, if specified in TCP options. Special value
 | |
|                of '*' can be used to denote that MSS varies depending on the
 | |
|                parameters of sender's network link, and should not be a part of
 | |
|                the signature. In this case, MSS will be used to guess the
 | |
|                type of network hookup according to the [mtu] rules.
 | |
| 
 | |
|                NEW SIGNATURES: Use '*' for any commodity OSes where MSS is
 | |
|                around 1300 - 1500, unless you know for sure that it's fixed.
 | |
|                If the value is outside that range, you can probably copy it
 | |
|                literally.
 | |
| 
 | |
|   wsize      - window size. Can be expressed as a fixed value, but many
 | |
|                operating systems set it to a multiple of MSS or MTU, or a
 | |
|                multiple of some random integer. P0f automatically detects these
 | |
|                cases, and allows notation such as 'mss*4', 'mtu*4', or '%8192'
 | |
|                to be used. Wilcard ('*') is possible too.
 | |
| 
 | |
|                NEW SIGNATURES: Copy p0f output literally. If frequent variations
 | |
|                are seen, look for obvious patterns. If there are no patterns,
 | |
|                '*' is a possible alternative.
 | |
| 
 | |
|   scale      - window scaling factor, if specified in TCP options. Fixed value
 | |
|                or '*'.
 | |
| 
 | |
|                NEW SIGNATURES: Copy literally, unless the value varies randomly.
 | |
|                Many systems alter between 2 or 3 scaling factors, in which case,
 | |
|                it's better to have several 'sig' lines, rather than a wildcard.
 | |
| 
 | |
|   olayout    - comma-delimited layout and ordering of TCP options, if any. This
 | |
|                is one of the most valuable TCP fingerprinting signals. Supported
 | |
|                values:
 | |
| 
 | |
|                eol+n  - explicit end of options, followed by n bytes of padding
 | |
|                nop    - no-op option
 | |
|                mss    - maximum segment size
 | |
|                ws     - window scaling
 | |
|                sok    - selective ACK permitted
 | |
|                sack   - selective ACK (should not be seen)
 | |
|                ts     - timestamp
 | |
|                ?n     - unknown option ID n
 | |
| 
 | |
|                NEW SIGNATURES: Copy this string literally.
 | |
| 
 | |
|   quirks     - comma-delimited properties and quirks observed in IP or TCP
 | |
|                headers:
 | |
| 
 | |
|                df     - "don't fragment" set (probably PMTUD); ignored for IPv6
 | |
|                id+    - DF set but IPID non-zero; ignored for IPv6
 | |
|                id-    - DF not set but IPID is zero; ignored for IPv6
 | |
|                ecn    - explicit congestion notification support
 | |
|                0+     - "must be zero" field not zero; ignored for IPv6
 | |
|                flow   - non-zero IPv6 flow ID; ignored for IPv4
 | |
| 
 | |
|                seq-   - sequence number is zero
 | |
|                ack+   - ACK number is non-zero, but ACK flag not set
 | |
|                ack-   - ACK number is zero, but ACK flag set
 | |
|                uptr+  - URG pointer is non-zero, but URG flag not set
 | |
|                urgf+  - URG flag used
 | |
|                pushf+ - PUSH flag used
 | |
| 
 | |
|                ts1-   - own timestamp specified as zero
 | |
|                ts2+   - non-zero peer timestamp on initial SYN
 | |
|                opt+   - trailing non-zero data in options segment
 | |
|                exws   - excessive window scaling factor (> 14)
 | |
|                bad    - malformed TCP options
 | |
| 
 | |
|                If a signature scoped to both IPv4 and IPv6 contains quirks valid
 | |
|                for just one of these protocols, such quirks will be ignored for
 | |
|                on packets using the other protocol. For example, any combination
 | |
|                of 'df', 'id+', and 'id-' is always matched by any IPv6 packet.
 | |
| 
 | |
|                NEW SIGNATURES: Copy literally.
 | |
| 
 | |
|   pclass     - payload size classification: '0' for zero, '+' for non-zero,
 | |
|                '*' for any. The packets we fingerprint right now normally have
 | |
|                no payloads, but some corner cases exist.
 | |
| 
 | |
|                NEW SIGNATURES: Copy literally.
 | |
| 
 | |
| NOTE: The TCP module allows some fuzziness when an exact match can't be found:
 | |
| 'df' and 'id+' quirks are allowed to disappear; 'id-' or 'ecn' may appear; and
 | |
| TTLs can change.
 | |
| 
 | |
| To gather new SYN ('request') signatures, simply connect to the fingerprinted
 | |
| system, and p0f will provide you with the necessary data. To gather SYN+ACK
 | |
| ('response') signatures, you should use the bundled p0f-sendsyn utility while p0f
 | |
| is running in the background; creating them manually is not advisable.
 | |
| 
 | |
| == HTTP signatures ==
 | |
| 
 | |
| A special directive should appear at the beginning of the [http:request]
 | |
| section, structured the following way:
 | |
| 
 | |
| ua_os = Linux,Windows,iOS=[iPad],iOS=[iPhone],Mac OS X,...
 | |
| 
 | |
| This list should specify OS names that should be looked for within the
 | |
| User-Agent string if the string is otherwise deemed to be honest. This input
 | |
| is not used for fingerprinting, but aids NAT detection in some useful ways.
 | |
| 
 | |
| The names have to match the names used in 'sig' specifiers across p0f.fp. If a
 | |
| particular name used by p0f differs from what typically appears in User-Agent,
 | |
| the name=[string] syntax may be used to define any number of aliases.
 | |
| 
 | |
| Other than that, HTTP signatures for GET and HEAD requests have the following
 | |
| layout:
 | |
| 
 | |
| sig = ver:horder:habsent:expsw
 | |
| 
 | |
|   ver        - 0 for HTTP/1.0, 1 for HTTP/1.1, or '*' for any. 
 | |
| 
 | |
|                NEW SIGNATURES: Copy the value literally, unless you have a
 | |
|                specific reason to do otherwise.
 | |
| 
 | |
|   horder     - comma-separated, ordered list of headers that should appear in
 | |
|                matching traffic. Substrings to match within each of these
 | |
|                headers may be specified using a name=[value] notation.
 | |
| 
 | |
|                The signature will be matched even if other headers appear in
 | |
|                between, as long as the list itself is matched in the specified
 | |
|                sequence.
 | |
| 
 | |
|                Headers that usually do appear in the traffic, but may go away
 | |
|                (e.g. Accept-Language if the user has no languages defined, or
 | |
|                Referer if no referring site exists) should be prefixed with '?',
 | |
|                e.g. "?Referer". P0f will accept their disappearance, but will
 | |
|                not allow them to appear at any other location.
 | |
| 
 | |
|                NEW SIGNATURES: Review the list and remove any headers that
 | |
|                appear to be irrelevant to the fingerprinted software, and mark
 | |
|                transient ones with '?'. Remove header values that do not add
 | |
|                anything to the signature, or are request- or user-specific.
 | |
|                In particular, pay attention to Accept, Accept-Language, and
 | |
|                Accept-Charset, as they are highly specific to request type
 | |
|                and user settings.
 | |
| 
 | |
|                P0f automatically removes some headers, prefixes others with '?',
 | |
|                and inhibits the value of fields such as 'Referer' or 'Cookie' -
 | |
|                but this is not a substitute for manual review.
 | |
| 
 | |
|                NOTE: Server signatures may differ depending on the request
 | |
|                (HTTP/1.1 versus 1.0, keep-alive versus one-shot, etc) and on the
 | |
|                returned resource (e.g., CGI versus static content). Play around,
 | |
|                browse to several URLs, also try curl and wget.
 | |
| 
 | |
|   habsent    - comma-separated list of headers that must *not* appear in
 | |
|                matching traffic. This is particularly useful for noting the
 | |
|                absence of standard headers (e.g. 'Host'), or for differentiating
 | |
|                between otherwise very similar signatures.
 | |
| 
 | |
|                NEW SIGNATURES: P0f will automatically highlight the absence of
 | |
|                any normally present headers; other entries may be added where
 | |
|                necessary.
 | |
| 
 | |
|   expsw      - expected substring in 'User-Agent' or 'Server'. This is not
 | |
|                used to match traffic, and merely serves to detect dishonest
 | |
|                software. If you want to explicitly match User-Agent, you need
 | |
|                to do this in the 'horder' section, e.g.:
 | |
| 
 | |
|                User-Agent=[Firefox]
 | |
| 
 | |
| Any of these sections sections except for 'ver' may be blank.
 | |
| 
 | |
| There are many protocol-level quirks that p0f could be detecting - for example,
 | |
| the use of non-standard newlines, or missing or extra spacing between header
 | |
| field names and values. There is also some information to be gathered from
 | |
| responses to OPTIONS or POST. That said, it does not seem to be worth the
 | |
| effort: the protocol is so verbose, and implemented so arbitrarily, that we are
 | |
| getting more than enough information just with a simple GET / HEAD fingerprint.
 | |
| 
 | |
| == SMTP signatures ==
 | |
| 
 | |
|    *** NOT IMPLEMENTED YET ***
 | |
| 
 | |
| == FTP signatures ==
 | |
| 
 | |
|    *** NOT IMPLEMENTED YET ***
 | |
| 
 | |
| ----------------
 | |
| 6. NAT detection
 | |
| ----------------
 | |
| 
 | |
| In addition to fairly straightforward measurements of intrinsic properties of
 | |
| a single TCP session, p0f also tries to compare signatures across sessions to
 | |
| detect client-side connection sharing (NAT, HTTP proxies) or server-side load
 | |
| balancing.
 | |
| 
 | |
| This is done in two steps: the first significant deviation usually prompts a
 | |
| "host change" entry (which may be also indicative of multi-boot, address reuse,
 | |
| or other one-off events); and a persistent pattern of changes prompts an
 | |
| "ip sharing" notification later on.
 | |
| 
 | |
| All of these messages are accompanied by a set of reason codes:
 | |
| 
 | |
|   os_sig       - the OS detected right now doesn't match the OS detected earlier
 | |
|                  on.
 | |
| 
 | |
|   sig_diff     - no definite OS detection data available, but protocol-level
 | |
|                  characteristics have changed drastically (e.g., different
 | |
|                  TCP option layout).
 | |
| 
 | |
|   app_vs_os    - the application detected running on the host is not supposed
 | |
|                  to work on the host's operating system.
 | |
| 
 | |
|   x_known      - the signature progressed from known to unknown, or vice versa.
 | |
| 
 | |
| The following additional codes are specific to TCP:
 | |
| 
 | |
|   tstamp       - TCP timestamps went back or jumped forward.
 | |
| 
 | |
|   ttl          - TTL values have changed.
 | |
| 
 | |
|   port         - source port number has decreased.
 | |
| 
 | |
|   mtu          - system MTU has changed.
 | |
| 
 | |
|   fuzzy        - the precision with which a TCP signature is matched has
 | |
|                  changed.
 | |
| 
 | |
| The following code is also issued by the HTTP module:
 | |
| 
 | |
|   via          - data explicitly includes Via / X-Forwarded-For.
 | |
| 
 | |
|   us_vs_os     - OS fingerprint doesn't match User-Agent data, and the
 | |
|                  User-Agent value otherwise looks honest.
 | |
| 
 | |
|   app_srv_lb   - server application signatures change, suggesting load
 | |
|                  balancing.
 | |
| 
 | |
|   date         - server-advertised date changes inconsistently.
 | |
| 
 | |
| Different reasons have different weights, balanced to keep p0f very sensitive
 | |
| even to very homogenous environments behind NAT. If you end up seeing false
 | |
| positives or other detection problems in your environment, please let me know!
 | |
| 
 | |
| -----------
 | |
| 7. Security
 | |
| -----------
 | |
| 
 | |
| You should treat the output from this tool as advisory; the fingerprinting can
 | |
| be gambled with some minor effort, and it's also possible to evade it altogether
 | |
| (e.g. with excessive IP fragmentation or bad TCP checksums). Plan accordingly.
 | |
| 
 | |
| P0f should to be reasonably secure to operate as a daemon. That said, un*x
 | |
| users should employ the -u option to drop privileges and chroot() when running
 | |
| the tool continuously. This greatly minimizes the consequences of any mishaps -
 | |
| and mishaps in C just tend to happen.
 | |
| 
 | |
| To make this step meaningful, the user you are running p0f as should be
 | |
| completely unprivileged, and should have an empty, read-only home directory. For
 | |
| example, you can do:
 | |
| 
 | |
| # useradd -d /var/empty/p0f -M -r -s /bin/nologin p0f-user
 | |
| # mkdir -p -m 755 /var/empty/p0f
 | |
| 
 | |
| Please don't put the p0f binary itself, or any other valuable assets, inside
 | |
| that user's home directory; and certainly do not use any generic locations such
 | |
| as / or /bin/ in lieu of a proper home.
 | |
| 
 | |
| P0f running in the background should be fairly difficult to DoS, especially
 | |
| compared to any real TCP services it will be watching. Nevertheless, there are
 | |
| so many deployment-specific factors at play that you should always preemptively
 | |
| stress-test your setup, and see how it behaves.
 | |
| 
 | |
| Other than that, let's talk filesystem security. When using the tool in the
 | |
| API mode (-s), the listening socket is always re-created created with 666
 | |
| permissions, so that applications running as other uids can query it at will.
 | |
| If you want to preserve the privacy of captured traffic in a multi-user system,
 | |
| please ensure that the socket is created in a directory with finer-grained
 | |
| permissions; or change API_MODE in config.h.
 | |
| 
 | |
| The default file mode for binary log data (-o) is 600, on the account that
 | |
| others probably don't need access to historical data; if you need to share logs,
 | |
| you can pre-create the file or change LOG_MODE in config.h.
 | |
| 
 | |
| Don't build p0f, and do not store its source, binary, configuration files, logs,
 | |
| or query sockets in world-writable locations such as /tmp (or any
 | |
| subdirectories created therein).
 | |
| 
 | |
| Last but not least, please do not attempt to make p0f setuid, or otherwise
 | |
| grant it privileges higher than these of the calling user. Neither the tool
 | |
| itself, nor the third-party components it depends on, are designed to keep rogue
 | |
| less-privileged callers at bay. If you use /etc/sudoers to list p0f as the only
 | |
| program that user X should be able to run as root, that user will probably be
 | |
| able to compromise your system. The same goes for many other uses of sudo, by
 | |
| the way.
 | |
| 
 | |
| --------------
 | |
| 8. Limitations
 | |
| --------------
 | |
| 
 | |
| Here are some of the known issues you may run into:
 | |
| 
 | |
| == General ==
 | |
| 
 | |
| 1) RST, ACK, and other experimental fingerprinting modes offered in p0f v2 are
 | |
|    no longer supported in v3. This is because they proved to have very low
 | |
|    specificity. The consequence is that you can no longer fingerprint
 | |
|    "connection refused" responses.
 | |
| 
 | |
| 2) API queries or daemon execution are not supported when reading offline pcaps.
 | |
|    While there may be some fringe use cases for that, offline pcaps use a
 | |
|    much simpler event loop, and so supporting these features would require some
 | |
|    extra effort.
 | |
| 
 | |
| 3) P0f needs to observe at least about 25 milliseconds worth of qualifying
 | |
|    traffic to estimate system uptime. This means that if you're testing it over
 | |
|    loopback or LAN, you may need to let it see more than one connection.
 | |
| 
 | |
|    Systems with extremely slow timestamp clocks may need longer acquisition
 | |
|    periods (up to several seconds); very fast clocks (over 1.5 kHz) are rejected
 | |
|    completely on account of being prohibited by the RFC. Almost all OSes are
 | |
|    between 100 Hz and 1 kHz, which should work fine.
 | |
| 
 | |
| 4) Some systems vary SYN+ACK responses based on the contents of the initial SYN,
 | |
|    sometimes removing TCP options not supported by the other endpoint. 
 | |
|    Unfortunately, there is no easy way to account for this, so several SYN+ACK
 | |
|    signatures may be required per system. The bundled p0f-sendsyn utility helps
 | |
|    with collecting them.
 | |
| 
 | |
|    Another consequence of this is that you will sometimes see server uptime only
 | |
|    if your own system has RFC1323 timestamps enabled. Linux does that since
 | |
|    version 2.2; on Windows, you need version 7 or newer. Client uptimes are not
 | |
|    affected.
 | |
| 
 | |
| == Windows port ==
 | |
| 
 | |
| 1) API sockets do not work on Windows. This is due to a limitation of winpcap;
 | |
|    see live_event_loop(...) in p0f.c for more info.
 | |
| 
 | |
| 2) The chroot() jail (-u) on Windows doesn't offer any real security. This is
 | |
|    due to the limitations of cygwin.
 | |
| 
 | |
| 3) The p0f-sendsyn utility doesn't work because of the limited capabilities of
 | |
|    Windows raw sockets (this should be relatively easy to fix if there are any
 | |
|    users who care).
 | |
| 
 | |
| ---------------------------
 | |
| 9. Acknowledgments and more
 | |
| ---------------------------
 | |
| 
 | |
| P0f is made possible thanks to the contributions of several good souls,
 | |
| including:
 | |
| 
 | |
|   Phil Ames
 | |
|   Jannich Brendle
 | |
|   Matthew Dempsky
 | |
|   Jason DePriest
 | |
|   Dalibor Dukic
 | |
|   Mark Martinec
 | |
|   Damien Miller
 | |
|   Josh Newton
 | |
|   Nibbler
 | |
|   Bernhard Rabe
 | |
|   Chris John Riley
 | |
|   Sebastian Roschke
 | |
|   Peter Valchev
 | |
|   Jeff Weisberg
 | |
|   Anthony Howe
 | |
|   Tomoyuki Murakami
 | |
|   Michael Petch
 | |
| 
 | |
| If you wish to help, the most immediate way to do so is to simply gather new
 | |
| signatures, especially from less popular or older platforms (servers, networking
 | |
| equipment, portable / embedded / specialty OSes, etc).
 | |
| 
 | |
| Problems? Suggestions? Complaints? Compliments? You can reach the author at
 | |
| <lcamtuf@coredump.cx>. The author is very lonely and appreciates your mail.
 | 
