Understand Packet Filter(PF) Firewall

Jephe Wu - http://linuxtechres.blogspot.com

Environment: OpenBSD 4.8, FreeBSD 7.1
Objective: understanding how PF firewall works and varous important rules and parameters


Concepts:


PF is enabled by default on OpenBSD 4.6 and newer releases. In OpenBSD 4.1 and later, the keep state option became the implicit default for all filter rules

1. block in log all  # default deny policy

a. Above rule doesn't have 'quick' option, which means it will still continue to traverse to the end of the rules in /etc/pf.conf until it meets one rule with quick option. If the traffic doesn't meet any rest of rules, then this block rule will take it, so the result is 'block'.
# if above rule becomes 'block in log quick all' , then it will block everything for incoming traffic due to 'quick' option, it won't look down anymore
b. By default, the PF rules will pass traffic unless it's blocked by default policy or specific rules
c.'log' option indicates that it will record those matching information to pflog file, if you use 'tcpdump -n -e -ttt -i pflog0' to monitor, it will see those matching information for this block rule.

The following message is from PF FAQ page at http://www.openbsd.org/faq/pf/filter.html
Each packet is evaluated against the filter ruleset from top to bottom. By default, the packet is marked for passage, which can be changed by any rule, and could be changed back and forth several times before the end of the filter rules. The last matching rule "wins". There is an exception to this: The quick option on a filtering rule has the effect of canceling any further rule processing and causes the specified action to be taken.

2. Keep state and modulate state
a. In OpenBSD 4.1 and later, the keep state option became the implicit default for all filter rules.You can use 'pfctl -sr' to check running rules for this option.
According to PF FAQ page, by storing information about each connection in a state table, PF is able to quickly determine if a packet passing through the firewall belongs to an already established connection. If it does, it is passed through the firewall without going through ruleset evaluation.

b. When a rule creates state, the first packet matching the rule creates a "state" between the sender and receiver. Now, not only do packets going from the sender to receiver match the state entry and bypass ruleset evaluation, but so do the reply packets from receiver to sender.
    pass out on fxp0 proto tcp from any to any

This rule allows any outbound TCP traffic on the fxp0 interface and also permits the reply traffic to pass back through the firewall. Keeping state significantly improves the performance of your firewall as state lookups are dramatically faster than running a packet through the filter rules.

c. The modulate state option works just like keep state except that it only applies to TCP packets. The modulate state option can be used in rules that specify protocols other than TCP; in those cases, it is treated as keep state.
Keep state on outgoing TCP, UDP, and ICMP packets and modulate TCP ISNs:
pass out on fxp0 proto { tcp, udp, icmp } from any to any modulate state
   

Another advantage of keeping state is that corresponding ICMP traffic will be passed through the firewall.

d. keep state for UDP
PF simply keeps track of how long it has been since a matching packet has gone through. If the timeout is reached, the state is cleared.
       
e. TCP SYN Proxy
According to PF FAQ page at http://www.openbsd.org/faq/pf/filter.html.
Normally when a client initiates a TCP connection to a server, PF will pass the handshake packets between the two endpoints as they arrive. PF has the ability, however, to proxy the handshake. With the handshake proxied, PF itself will complete the handshake with the client, initiate a handshake with the server, and then pass packets between the two. The benefit of this process is that no packets are sent to the server before the client completes the handshake. This eliminates the threat of spoofed TCP SYN floods affecting the server because a spoofed client connection will be unable to complete the handshake.

The TCP SYN proxy is enabled using the synproxy state keywords in filter rules. Example:

    pass in on $ext_if proto tcp to $web_server port www synproxy state

Here, connections to the web server will be TCP proxied by PF.

Because of the way synproxy state works, it also includes the same functionality as keep state and modulate state.

The SYN proxy will not work if PF is running on a bridge(4).

3. Flag S/SA (default for tcp)

a. To have PF inspect the TCP flags during evaluation of a rule, the flags keyword is used with the following syntax:

    flags check/mask
    flags any
   
The mask part tells PF to only inspect the specified flags and the check part specifies which flag(s) must be "on" in the header for a match to occur. Using the any keyword allows any combination of flags to be set in the header.

    pass in on fxp0 proto tcp from any to any port ssh flags S/SA
    pass in on fxp0 proto tcp from any to any port ssh

As flags S/SA is set by default, the above rules are equivalent, Each of these rules passes TCP traffic with the SYN flag set while only looking at the SYN and ACK flags. A packet with the SYN and ECE flags would match the above rules, while a packet with SYN and ACK or just ACK would not.

4. block drop in or block return in

a. by default, block uses 'drop', you can specify 'return' in
        * drop - packet is silently dropped.
        * return - a TCP RST packet is returned for blocked TCP packets and an ICMP Unreachable packet is returned for all others.
For example:
$ tcptraceroute -n 10.0.5.226
traceroute to 10.0.5.226 on TCP port 80 (http), 30 hops max
1 10.0.5.226[closed] 0.344ms 0.307ms 0.287ms

for the following rule, if without 'return', it will print 30 rows of asterisk.
block return in log quick on $int_if proto tcp from 10.0.10.0/24 to any port 80

5. PF Firewall Redundancy with CARP and pfsync
http://www.openbsd.org/faq/pf/carp.html

6. PF options and best practise example
(http://www.openbsd.org/faq/pf/example1.html)
a.  set block-policy return
    set loginterface fxp0 #turn statistics logging "on" for the external interface
    set skip on lo  or set skip on {lo enc0}  # ipencap communication goes through enc0 interface.
    scrub in all  # Reassembles fragment IP packets
    scrub out all

b.  block in log # setup a default deny policy
block in quick from urpf-failed
# activate spoofing protection for all interfaces
# pass tcp, udp, and icmp out on the external (Internet) interface.
# tcp connections will be modulated, udp/icmp will be tracked statefully
 

c.  pass out quick modulate state # We'll opt to filter the inbound traffic only. Outbound packets can avoid being checked, for improving performance.   
d.  antispoof quick for { lo $int_if }  # It is good to use the spoofed address protection:
e.  Now open the ports used by those network services that will be available to the Internet. First, the traffic that is destined to the firewall itself:

    pass in on egress inet proto tcp from any to (egress) \
        port $tcp_services

Specifying the network ports in the macro $tcp_services makes it simple to open additional services to the Internet by simply editing the macro and reloading the ruleset. UDP services can also be opened up by creating a $udp_services macro and adding a filter rule, similar to the one above, that specifies proto udp.

f. The next rule catches any attempts by someone on the Internet to connect to TCP port 80 on the firewall. Legitimate attempts to access this port will be from users trying to access the network's web server. These connection attempts need to be redirected to COMP3:

    pass in on egress inet proto tcp to (egress) port 80 \
        rdr-to $comp3 synproxy state

For an added bit of safety, we'll make use of the TCP SYN Proxy to further protect the web server.

g. ICMP traffic needs to be passed:

    pass in inet proto icmp all icmp-type $icmp_types

Similar to the $tcp_services macro, the $icmp_types macro can easily be edited to change the types of ICMP packets that will be allowed to reach the firewall. Note that this rule applies to all network interfaces.

for example:

  pass in on $ext inet proto icmp all icmp-type { echorep, timex, unreach }
 pass in on $ext inet proto icmp all icmp-type unreach code 4 # at least allow this, it's a  must
  pass in on $ext inet proto tcp from any to any port { = smtp, = http, = https, = ssh } 


h. Now traffic must be passed to and from the internal network. We'll assume that the users on the internal network know what they are doing and aren't going to be causing trouble. This is not necessarily a valid assumption; a much more restrictive ruleset would be appropriate for many environments.

    pass in on $int_if

TCP, UDP, and ICMP traffic is permitted to exit the firewall towards the Internet due to the earlier "pass out" line. State information is kept so that the returning packets will be passed back in through the firewall.

i. using synproxy
pass in quick on $ext_if proto tcp to {10.0.4.7,10.0.4.8} port {80,443} synproxy state
pass in quick on $int_if proto tcp from {10.0.0.1,10.0.0.200} to $int_if port ssh synproxy state


7. packet filter rules for OpenBSD VPN and ipsec protocol
The following indicates that how the OpenBSD VPN rules matches the vpn traffic, fxp1 is internal NIC, fxp0 is facing Internet, enc0 is VPN virtual NIC.
# tcpdump -n -e -ttt -i pflog0
tcpdump: listening on pflog0
rule 57/0(match): pass in on fxp1: 10.0.0.1.56752 > 10.204.0.8.www: S 1328159562:1328159562(0) win 0 [ttl 1]
rule 54/0(match): pass out on enc0: 10.0.0.1.56752 > 10.204.0.8.www: S 1328159562:1328159562(0) win 0 [ttl 1]
rule 47/0(match): pass out on fxp0: esp 1.2.3.4 > 5.6.7.8 spi 0x945D0A23 seq 1 len 76


IPSec utilizes protocol UDP port 500(isakmp) for key exchange -- and port 4500/UDP for NAT-Traversal (ipsec-nat-t) as well as protocols ESP on Internet facing NIC. 

Also, IPENCAP communication which goes through enc0 interface.
pass log quick on enc0 proto ipencap all keep state
or
set skip on {lo enc0}



8. pfctl usage

a. enable and disable
pfctl -sa # show status, if it's disabled, then
pfctl -e # enable it
pfctl -d # disable it

# it doesn't actually load a ruleset. The ruleset must be loaded separately.


b. check syntax of /etc/pf.conf
# pfctl -nf /etc/pf.conf

c. list rules and states etc
# pfctl -sr   # list current rules in memory, the first row is rule 0, the second rule is rule 1, and so on

note: when using pfctl -sr, it list the actual rules in memory, it may find that the 'keep state' is the default for tcp protocol. For example:

# grep 22 /etc/pf.conf
pass in log quick proto tcp from any to any port 22
# pfctl -sr
block drop in log all
pass in log quick proto tcp from any to any port = ssh flags S/SA keep state
# above pfctl -sr actually expands the port 22 passing rule with 'flags S/SA keep state'.

# pfctl -ss                 Show the current state table
# pfctl -si                 Show filter stats and counters
# pfctl -sa                 Show EVERYTHING it can show

# pfctl -sa | grep tcp.established
tcp.established  86400s (note: 24 hours)

d. tcpdump for troubleshooting

# tcpdump -n -e -ttt -i pflog0  # realtime monitor traffic is passed or block by which rules provided the  log option is enabled for that rule.

9. example on /etc/pf.conf

ext_if="fxp1"
int_if="fxp0"

set block-policy return
set loginterface $ext_if

set skip on lo

# scrub incoming pcakets like you cannot set both SYN and FIN
scrub in all

#assume 1.2.3.4 is our external IP for web servers
rdr pass on $ext_if proto tcp from any to 1.2.3.4/32 port {80,443} -> 10.0.0.1

# nat pass rule
nat pass on $ext_if proto icmp from 10.0.0.1 to any -> 1.2.3.4
nat pass on $ext_if from any to any port {53,25} -> 1.2.3.4

# setup a default deny policy
block in all

# activate spoofing protection for all interfaces
block in quick from urpf-failed

# pass tcp, udp, and icmp out on the external (Internet) interface.
# tcp connections will be modulated, udp/icmp will be tracked statefully
pass out modulate state

antispoof quick for { lo $int_if }

# for path mtu discovery
pass in quick on $ext_if inet proto icmp all icmp-type { echorep, timex, unreach }

# for dns server sitting on DMZ to serve internet, as well as web server
pass in quick on $ext_if proto udp to 10.0.0.2 port 53 keep state
pass in quick on $ext_if proto tcp to {10.0.0.1} port {80,443} synproxy state

# for internal admin pc to ssh into firewall
pass in quick on $int_if proto tcp from {10.0.0.20,10.0.0.21} to $int_if port ssh synproxy state

Some DB2 database FAQs

Jephe Wu - http://linuxtechres.blogspot.com

 1. Install IBM DB2 Client Version 9 on Windows 7 Professional 32bit


Problem: after configuring remote database profiles, it works in control center, but not in command editor.

Also, when you issue command 'db2' under db2 command prompt, it doesn't show anything.


Solution: go to Control Panel, User Account and Family Safety, User Accounts, Change user account control settings, put as 'Never notify'.

Note: If you enabled db2 operating system security, which means db2 installation created db2admins and db2users groups, you have to put the Windows logon user name into the corresponding groups before using db2 client.

2. db2advis usage
db2advis -d dbname -q schema_name -n schema_name -i input_file
Reference: http://publib.boulder.ibm.com/infocenter/db2luw/v8/index.jsp?topic=/com.ibm.db2.udb.doc/core/r0002452.htm

3. SQL0575N - View or materialized query table name cannot be used because it has been marked inoperative.
If name is a view, recreate the view by issuing a CREATE VIEW statement using the same view definition as the inoperative view. (see http://publib.boulder.ibm.com/infocenter/db2luw/v9/index.jsp?topic=/com.ibm.db2.udb.msg.doc/doc/sql0575.htm )

Note: how to check if there are any other inoperative views in database schema name starting with NC.

db2 "select viewschema,viewname,valid from syscat.views where viewschema like 'NC%' and valid <> 'N'";
db2 "describe table syscat.views"
db2 "select viewschema,viewname,valid,text from syscat.views where viewname='NAMEOFVIEW' and viewschema = 'JEPHE'";

4. reason code 7 
you need to reorg that table first before and after altering table.

5. DB2 version 9 comparison
http://www.slideshare.net/deepblue5479/a-comparison-reviewofdb29releases

Understanding TCP/IP headers and options

Jephe Wu - http://linuxtechres.blogspot.com

1. TCP and IP headers  


20 bytes for the TCP header and 20 bytes for the IP header, this is standard header , total 40 bytes.


ICMP has 8 bytes header.




IP header:
a. An IP packet is composed of the IP header (with or without options) and the data.
b. the length of the IP header must be a multiple of 4 bytes. The minimum length of IP headers is 20 bytes which means IP header length part hex digit is 5. (indicates ip header length is 5 4-byte chunks)


TCP header:
a. the TCP header may also contain options. And like the IP header, the TCP header's length must be a multiple of 4 bytes
b. the TCP options change the length of the TCP header, the length is set in the header,

e.g. 7 - means that the TCP header is made of seven 4-byte chunks, or a total of 28 bytes: 20 bytes of standard TCP header and 8 bytes of TCP options.

c. the maximum value that can be set is hex F, which is decimal 15. That means that fifteen 4-byte chunks, or 60 bytes, is the maximum length for the entire TCP header, including TCP options.

TCP encapsulation:
1. the data package at the Application Layer is called a message, while the same data package at the Internet Layer is called a datagram.
2. Transport Layer may have one of two names - a segment or a datagram. If the TCP protocol is being used, it is called a segment. If the UDP protocol is being used, it is called a Datagram.
3. Onto the Network Access Layer, where a frame is created

IP fragmentation and assembly
An application message is typically fragmented into a consecutive sequence of TCP segments and all except the last segment is of  size MSS.
http://www.cisco.com/en/US/tech/tk827/tk369/technologies_white_paper09186a00800d6979.shtml

2. Maximum Segment Size (MSS)[SYN or SYN/ACK]
some concepts:
a. Every host is required to be able to handle an MSS of at least 536 bytes.
b. For most computer users, the MSS option is set automatically by the operating system on the SYN packet during the TCP handshake. The MSS can be used completely independently in each direction of data flow.
c. When IPv4 is used as the network protocol, the MSS is calculated as the maximum size of an IPv4 datagram minus 40 bytes. When IPv6 is used as the network protcol, the MSS is calculated as the maximum packet size minus 60 bytes. An MSS of 65535 should be interpreted as infinity.
d. The MSS is only avertised during the SYN and SYN/ACK packets in the TCP three-way handshake, so the MSS option should only be seen in those packets. MSS value is not negotiated between hosts. The sending host is required to limit the size of data in a single TCP segment to a value less than or equal to the MSS reported by the receiving host.
d. When a connection is established, the two ends agree to use the smaller of each end's value. Because headers are typically 40 bytes, MSS is usually 40 less than the MTU (Maximum Transmission Unit).
e. The way MSS now works is that each host will first compare its outgoing interface MTU with its own buffer and choose the lowest value as the MSS to send. The hosts will then compare the MSS size received against their own interface MTU and again choose the lower of the two values.


3. tcp timestamp [throughout a TCP session if both end use it]
a. Timestamps serve two main purposes:
– To allow for more accurate RTT calculations
– For Protection Against Wrapped Sequence numbers (PAWS)

According to microsoft KB at http://support.microsoft.com/kb/224829
The TCP sequence number field is limited to 32 bits, which limits the number of sequence numbers available. With high capacity networks and a large data transfer, it is possible to wrap sequence numbers before a packet traverses the network. If sending data on a 1 Giga-byte per second (Gbps) network, the sequence numbers could wrap in as little as 34 seconds. If a packet is delayed, a different packet could potentially exist with the same sequence number. To avoid confusion in the event of duplicate sequence numbers, the TCP timestamp is used as an extension to the sequence number. Packets have current and progressing time stamps. An old packet has an older time stamp and is discarded.

b. All popular Operating Systems implement Timestamps, although Windows does not like to use them by default.
c. Timestamps add 12 bytes to the TCP header of each packet, reducing the space available for useful data. Typical packets have a payload of 1448 bytes, coupled with 20 byte TCP header and a 10 byte TCP timestamp option header, a 2 byte padding field and a 20 byte IPv4 packet header.
d. The Timestamp option has two timestamp fields: the Timestamp Value and the Timestamp Echo Reply. When a host first timestamps a packet in a connection, it puts the timestamp in the Timestamp Value field and leaves the Timestamp Echo Reply field set to 0 (generally). When the second host receives that packet and prepares to acknowledge it, it transfers the timestamp from the old packet's Timestamp Value field to the new packet's Timestamp Echo Reply field, and it puts a new timestamp in the Timestamp Value field. So the Timestamp Value field always contains the latest timestamp, while the Timestamp Echo Reply field contains the previous timestamp.
e. Why when tcp timestamp is used, the mss is still advertised as 1460, not 1448(1460-12) since 12 bytes will be used by timestamp option
MSS is not a negotiated value, it is an advertisement. It means "do not
send me a TCP segment larger than this". It excludes TCP and IP headers parts.

MSS is advertised only in SYN or SYN/ACK packet, only one direction of SYN packet was sent when MSS is advertised. Because Timestamps can only be used if both ends of the connection agree to use them. So when MSS is advertised in the SYN or SYN/ACK packet, it cannot be adjusted by tcp options used later.  Even more, the size of the TCP header may change on each packet due to SACK and TS.

So, the MSS advertised in the SYN is not the MSS that is eventually used which is sometimes called "effective MSS". Unlike the MSS and WSCALE options, Timestamp options are typically used throughout a TCP session, so if they are being used, you should see them in most of the packets.

Summary:
•12 byte option –drops MSS of 1460 to 1448•Allows better Round Trip Time Measurement (RTTM)
•Prevention Against Wrapped Sequence numbers (PAWS)
–32 bit sequence number wraps in 17 sec at 1 Gbps
–TCP assumes a Maximum Segment Lifetime (MSL) of 120 seconds

4. Window Scale (WSCALE) [SYN or SYN/ACK during handshake]
The Window Scale value is used to shift the window size field's value up to a maximum value of approximately a gigabyte. Like the MSS option, the WSCALE option should only appear in SYN and SYN/ACK packets during the handshake. However, if both ends of the connection do not use the WSCALE option, the window sizes will remain unchanged.

For more efficient use of high bandwidth networks, a larger TCP window size may be used. The TCP window size field controls the flow of data and is limited to 2 bytes, or a window size of 65,535 bytes.

Since the size field cannot be expanded, a scaling factor is used. TCP window scale is an option used to increase the maximum window size from 65,535 bytes to 1 Gigabyte.

The window scale value represents the number of bits to left-shift the 16-bit window size field. The window scale value can be set from 0 (no shift) to 14.

To calculate the true window size, multiply the window size by 2^S where S is the scale value.
For Example:
If the window size is 65,535 bytes with a window scale factor of 3.
True window size = 65535*2^3
True window size = 524280

if windows scale is 14, the true windows size will be 65535*2^14=2^16*2^14=2^30=1G  (2^32=4G)

5. No Operation (NOP):
NOP is used to provide padding around other options. The length of the TCP header must be a multiple of 4 bytes; however, most of the options are not 4 bytes long. If the total length of the options is not a 4-byte multiple, one or more 1-byte NOPs will be added to the options in order to adjust the overall length. For example, if there were 6 bytes of options, two NOPs would be added. NOPs are sometimes used between options, particularly if an option needs to start on a certain byte boundary, so it is not unusual to see several NOPs throughout a set of TCP options.

6. Selective Acknowledgments (SACK): [2 parts, SackOK in SYN and SYN/ACK, Sack data in established connection]
This technique allows the data receiver to inform the sender about all segments that have arrived successfully, so the sender need retransmit only the segments that have actually been lost. This extension uses two TCP options. The first is an enabling option, SACK permitted, which may be sent in a SYN segment to indicate that the SACK option can be used once the connection is established. The other is the SACK option itself, which may be sent over an established connection once permission has been given.

Normally, when a host acknowledges data, it can only acknowledge the packets up to and including the sequence number immediately before a missing packet. This means that if a thousand packets are received but the second one is missing, the host can only acknowledge the receipt of the first packet, so the sender would have to resend all packets from number 2 through 1000. By using selective acknowledgments, the receiver could acknowledge the receipt of the packets from 3 through 1000, so the sender would only have to resend packet 2.

This two-byte option may be sent in a SYN by a TCP. It MUST NOT be sent on non-SYN segments.

The SACK option is not mandatory and it is used only if both parties support it. This is negotiated when connection is established. SACK uses the optional part of the TCP header

There are two TCP options for selective acknowledgments:
Selective Acknowledgment Permitted (SackOK): This option simply says that selective acknowledgments are permitted for this connection. SackOK must be included in the TCP options in both the SYN and SYN/ACK packets during the TCP three-way handshake, or it cannot be used. SackOK should not appear in any other packets.

Selective Acknowledgment Data: This option contains the actual acknowledgment information for a selective acknowledgment. It lists one or more pairs of sequence numbers, which define ranges of packets that are being acknowledged.

Summary:
•TCP originally could only ACK the last in sequence byte received (cumulative ACK)
• RFC2018 SACK allows ranges of bytes to be ACK’d
–Sender can fill in holes without resending everything
–Up to four blocks fit in the TCP option space (three with the timestamp option)
•Surveys have shown that SACK implementationoften have errors
– RFC3517 addresses how to respond to SACK

References:


TCP Traceroute Analysis

Jephe Wu - http://linuxtechres.blogspot.com

Objective
: Understanding how tcp traceroute works and use it to make sysadmin life easier
Environment: Windows Vista, CentOS 5

Concepts:
1. What's TCP traceroute
According to http://michael.toren.net/code/tcptraceroute/. tcptraceroute is a traceroute implementation using TCP packets.

The more traditional traceroute(8) sends out either UDP or ICMP ECHO packets with a TTL of one, and increments the TTL until the destination has been reached. By printing the gateways that generate ICMP time exceeded messages along the way, it is able to determine the path packets are taking to reach the destination.

The problem is that with the widespread use of firewalls on the modern Internet, many of the packets that traceroute(8) sends out end up being filtered, making it impossible to completely trace the path to the destination. However, in many cases, these firewalls will permit inbound TCP packets to specific ports that hosts sitting behind the firewall are listening for connections on. By sending out TCP SYN packets instead of UDP or ICMP ECHO packets, tcptraceroute is able to bypass the most common firewall filters.

2. How to use it under Linux and Windows
You can use tcptraceroute(http://michael.toren.net/code/tcptraceroute/) under Linux and tracetcp(http://tracetcp.sourceforge.net/) under Windows.
Actually, under CentOS 5.5, traceroute command has many options which you can use to do tcp(-T) traceroute by default, you can also use it do tranditioanl udp(-U) traceroute or use icmp(-I) ping packets to do it like tracert on Windows.

3. How tcptraceroute or tracetcp works

it uses tcp syn package and set ttl as 1 as initial packet to send to network. Each hop will decrease ttl by 1, so each hop will generate a time exceed icmp packet back to the sender, those icmp packet includes the original packet information. For the next hop, the sender will use TTL 2 until the destination which will also send back TCP syn/ack reply to the sender.

4. examples - use traceroute to know network path

example 1:  tcptraceroute to www.redhat.com
[root@linuxtest ~]# tcptraceroute  www.redhat.com -p 443 -f 2
traceroute to www.redhat.com (118.214.80.112), 30 hops max, 40 byte packets
 3  172.20.16.65 (172.20.16.65)  27.082 ms  27.637 ms  34.125 ms
 4  172.26.16.1 (172.26.16.1)  38.854 ms  38.752 ms  38.627 ms
 5  172.20.7.26 (172.20.7.26)  38.446 ms  38.280 ms  38.156 ms
 6  172.20.7.82 (172.20.7.82)  38.031 ms  37.900 ms  37.753 ms
 7  203.117.34.101 (203.117.34.101)  37.872 ms  41.769 ms  43.676 ms
 8  203.117.34.6 (203.117.34.6)  54.960 ms  32.379 ms  34.262 ms
 9  203.117.34.13 (203.117.34.13)  54.193 ms  30.849 ms  30.678 ms
10  203.117.34.1 (203.117.34.1)  62.952 ms  32.438 ms  41.922 ms
11  58.27.106.253 (58.27.106.253)  65.721 ms  46.337 ms  58.054 ms
12  a118-214.80-112.deploy.akamaitechnologies.com (118.214.80.112)  54.972 ms  63.682 ms  63.171 ms

C:\tracetcp>tracetcp www.redhat.com:443 -h 3

Tracing route to 184.85.48.112 [a184-85-48-112.deploy.akamaitechnologies.com] on
 port 443
Over a maximum of 30 hops.
3       32 ms   50 ms   56 ms   172.20.16.65
4       34 ms   14 ms   33 ms   172.26.16.1
5       503 ms  14 ms   68 ms   172.20.7.34
6       43 ms   170 ms  25 ms   203.117.35.9
7       28 ms   86 ms   26 ms   203.117.34.2
8       216 ms  168 ms  99 ms   203.117.34.14
9       *       *       *       Request timed out.
10      Destination Reached in 211 ms. Connection established to 184.85.48.112
Trace Complete.

Let's look at the Wireshark details for above hop 3 packet:
filtering rule:
ip.addr == 192.168.100.102 and (icmp or tcp) and not tcp.port == 22
or
ip.addr == 192.168.100.20 and (icmp or tcp) and not tcp.port == 22

Internet Protocol, Src: 192.168.100.20 (192.168.100.20), Dst: 184.85.48.112 (184.85.48.112)
    Version: 4
    Header length: 20 bytes
    Differentiated Services Field: 0x00 (DSCP 0x00: Default; ECN: 0x00)
        0000 00.. = Differentiated Services Codepoint: Default (0x00)
        .... ..0. = ECN-Capable Transport (ECT): 0
        .... ...0 = ECN-CE: 0
    Total Length: 40
    Identification: 0x1e44 (7748)
    Flags: 0x02 (Don't Fragment)
        0.. = Reserved bit: Not Set
        .1. = Don't fragment: Set
        ..0 = More fragments: Not Set
    Fragment offset: 0
    Time to live: 3
        [Expert Info (Note/Sequence): "Time To Live" only 3]
            [Message: "Time To Live" only 3]
            [Severity level: Note]
            [Group: Sequence]
    Protocol: TCP (0x06)
    Header checksum: 0x4c0a [correct]
        [Good: True]
        [Bad : False]
    Source: 192.168.100.20 (192.168.100.20)
    Destination: 184.85.48.112 (184.85.48.112)
Transmission Control Protocol, Src Port: 30069 (30069), Dst Port: https (443), Seq: 0, Len: 0
    Source port: 30069 (30069)
    Destination port: https (443)
    [Stream index: 1569]
    Sequence number: 0    (relative sequence number)
    Header length: 20 bytes
    Flags: 0x02 (SYN)
        0... .... = Congestion Window Reduced (CWR): Not set
        .0.. .... = ECN-Echo: Not set
        ..0. .... = Urgent: Not set
        ...0 .... = Acknowledgement: Not set
        .... 0... = Push: Not set
        .... .0.. = Reset: Not set
        .... ..1. = Syn: Set
            [Expert Info (Chat/Sequence): Connection establish request (SYN): server port https]
                [Message: Connection establish request (SYN): server port https]
                [Severity level: Chat]
                [Group: Sequence]
        .... ...0 = Fin: Not set
    Window size: 16383
    Checksum: 0x054c [validation disabled]
        [Good Checksum: False]
        [Bad Checksum: False]

We then received the ICMP packet as follows:

Internet Protocol, Src: 172.20.16.65 (172.20.16.65), Dst: 192.168.100.20 (192.168.100.20)
    Version: 4
    Header length: 20 bytes
    Differentiated Services Field: 0xc0 (DSCP 0x30: Class Selector 6; ECN: 0x00)
        1100 00.. = Differentiated Services Codepoint: Class Selector 6 (0x30)
        .... ..0. = ECN-Capable Transport (ECT): 0
        .... ...0 = ECN-CE: 0
    Total Length: 56
    Identification: 0xfed8 (65240)
    Flags: 0x00
        0.. = Reserved bit: Not Set
        .0. = Don't fragment: Not Set
        ..0 = More fragments: Not Set
    Fragment offset: 0
    Time to live: 253
    Protocol: ICMP (0x01)
    Header checksum: 0xdd19 [correct]
        [Good: True]
        [Bad : False]
    Source: 172.20.16.65 (172.20.16.65)
    Destination: 192.168.100.20 (192.168.100.20)
Internet Control Message Protocol
    Type: 11 (Time-to-live exceeded)
    Code: 0 (Time to live exceeded in transit)

    Checksum: 0x97ea [correct]
    Internet Protocol, Src: 192.168.100.20 (192.168.100.20), Dst: 184.85.48.112 (184.85.48.112)
        Version: 4
        Header length: 20 bytes
        Differentiated Services Field: 0x00 (DSCP 0x00: Default; ECN: 0x00)
            0000 00.. = Differentiated Services Codepoint: Default (0x00)
            .... ..0. = ECN-Capable Transport (ECT): 0
            .... ...0 = ECN-CE: 0
        Total Length: 40
        Identification: 0x1e44 (7748)
        Flags: 0x02 (Don't Fragment)
            0.. = Reserved bit: Not Set
            .1. = Don't fragment: Set
            ..0 = More fragments: Not Set
        Fragment offset: 0
        Time to live: 1
            [Expert Info (Note/Sequence): "Time To Live" only 1]
                [Message: "Time To Live" only 1]
                [Severity level: Note]
                [Group: Sequence]
        Protocol: TCP (0x06)
        Header checksum: 0x4e0a [correct]
            [Good: True]
            [Bad : False]
        Source: 192.168.100.20 (192.168.100.20)
        Destination: 184.85.48.112 (184.85.48.112)
    Transmission Control Protocol, Src Port: 30069 (30069), Dst Port: https (443)
        Source port: 30069 (30069)
        Destination port: https (443)
        Sequence number: 179559217

       
Finally, we received the SYN/ACK packet reply from www.redhat.com, as well as ICMP time exceed packet. So the tcp probe ends.


Internet Protocol, Src: 184.85.48.112 (184.85.48.112), Dst: 192.168.100.20 (192.168.100.20)
    Version: 4
    Header length: 20 bytes
    Differentiated Services Field: 0x00 (DSCP 0x00: Default; ECN: 0x00)
        0000 00.. = Differentiated Services Codepoint: Default (0x00)
        .... ..0. = ECN-Capable Transport (ECT): 0
        .... ...0 = ECN-CE: 0
    Total Length: 44
    Identification: 0x0000 (0)
    Flags: 0x02 (Don't Fragment)
        0.. = Reserved bit: Not Set
        .1. = Don't fragment: Set
        ..0 = More fragments: Not Set
    Fragment offset: 0
    Time to live: 55
    Protocol: TCP (0x06)
    Header checksum: 0x364a [correct]
        [Good: True]
        [Bad : False]
    Source: 184.85.48.112 (184.85.48.112)
    Destination: 192.168.100.20 (192.168.100.20)
Transmission Control Protocol, Src Port: https (443), Dst Port: 28408 (28408), Seq: 0, Ack: 1, Len: 0
    Source port: https (443)
    Destination port: 28408 (28408)
    [Stream index: 1595]
    Sequence number: 0    (relative sequence number)
    Acknowledgement number: 1    (relative ack number)
    Header length: 24 bytes
    Flags: 0x12 (SYN, ACK)
        0... .... = Congestion Window Reduced (CWR): Not set
        .0.. .... = ECN-Echo: Not set
        ..0. .... = Urgent: Not set
        ...1 .... = Acknowledgement: Set
        .... 0... = Push: Not set
        .... .0.. = Reset: Not set
        .... ..1. = Syn: Set
            [Expert Info (Chat/Sequence): Connection establish acknowledge (SYN+ACK): server port https]
                [Message: Connection establish acknowledge (SYN+ACK): server port https]
                [Severity level: Chat]
                [Group: Sequence]
        .... ...0 = Fin: Not set
    Window size: 5840
    Checksum: 0xcb85 [validation disabled]
        [Good Checksum: False]
        [Bad Checksum: False]
    Options: (4 bytes)
        Maximum segment size: 1460 bytes

example 2: detect transparent proxy in between
[root@linuxtest ~]# tcptraceroute  www.redhat.com -f 2
traceroute to www.redhat.com (118.214.80.112), 30 hops max, 40 byte packets
 3  172.20.16.65 (172.20.16.65)  16.943 ms  23.115 ms  31.587 ms
 4  172.26.16.1 (172.26.16.1)  31.742 ms  31.969 ms  32.348 ms
 5  172.20.7.26 (172.20.7.26)  43.759 ms  43.591 ms  43.662 ms
 6  172.20.7.82 (172.20.7.82)  37.583 ms  38.229 ms  37.181 ms
 7  a118-214.80-112.deploy.akamaitechnologies.com (118.214.80.112)  50.047 ms  49.993 ms  49.987 ms

 C:\tracetcp>tracetcp www.redhat.com -h 3

Tracing route to 184.85.48.112 [a184-85-48-112.deploy.akamaitechnologies.com] on
 port 80
Over a maximum of 30 hops.
3       39 ms   36 ms   27 ms   172.20.16.65
4       51 ms   34 ms   15 ms   172.26.16.1
5       50 ms   46 ms   68 ms   172.20.7.34
6       Destination Reached in 59 ms. Connection established to 184.85.48.112
Trace Complete.

Compare above port 80 output with port 443, we know there's transparent proxy in-between, it stops before reaching redhat.com

5. examples - use tracetcp and nc to detect open ports
[root@linuxtest ~]# nc -zv www.redhat.com 443
Connection to www.redhat.com 443 port [tcp/https] succeeded!

C:\tracetcp>tracetcp www.redhat.com -s 442 443
[184.85.48.112:442]  128        *       Request timed out.
[184.85.48.112:443]  128        Dest. in 210 ms. Port OPEN on 184.85.48.112

6. examples - use tracetcp or tcptraceroute to detect blocked ports
C:\tracetcp>tracetcp www.redhat.com:139

Tracing route to 184.85.48.112 [a184-85-48-112.deploy.akamaitechnologies.com] on
 port 139
Over a maximum of 30 hops.
1       3 ms    2 ms    2 ms    192.168.1.1
2       *       *       *       Request timed out.
3       *       *       *       Request timed out.
4       *       *       *       Request timed out.
5       *       *
Terminate Event Occurred.

Note: above output shows after home router gateway, the ISP blocks port 139 at hop 2.
If you setup a server at home, use cable modem connection, you can test it from Internet if the port you are using is blocked by ISP or not.

7. Others
a. debug feature on tcptraceroute.
You can download the latest version, tcptraceroute 1.5beta7, the -d option is for debug output, very useful.


b. icmp code and type which should not be blocked
ICMP type 3, Destination Unreachable, especially code 4, "fragmentation needed but don't fragment bit set" (necessary for path MTU discovery)
ICMP type 11, time exceeded (so you can use traceroute from inside the network and get replies).


c. http://livenudefrogs.com/~anubis/icmp/#type3
icmp type 8 for echo request,
icmp type 0 for echo reply

icmp type 3 and 11 are important, should not be blocked on firewall. tcptraceroute will use them to get return packet.

SSH and keepalive

Jephe Wu -  http://linuxtechres.blogspot.com

Objective: make your SSH connection more stable. Do not disconnect due to inactivity
Environment: CentOS 5, Windows XP, putty 0.60, ssh client on CentOS 5

Concepts:
1.  Why SSH connection somehow discontinues during idle time
Router or firewall in between make the connection state invalid
According to http://tldp.org/HOWTO/TCP-Keepalive-HOWTO/overview.html .This behavior is caused by the connection tracking procedures implemented in proxies and firewalls, which keep track of all connections that pass through them. Because of the physical limits of these machines, they can only keep a finite number of connections in their memory. The most common and logical policy is to keep newest connections and to discard old and inactive connections first.

Thus the trick is to send packets as infrequently as possible over idle connections.

2. TCP keepalive and application level keepalive
According to http://the.earth.li/~sgtatham/putty/0.58/htmldoc/Chapter4.html#config-keepalive .
TCP keepalives is similar to application-level keepalives, and the same caveats apply. The main differences are:

    * TCP keepalives are available on all connection types, including Raw and Rlogin.(in Putty)
    * The interval between TCP keepalives is usually much longer, typically two hours; this is set by the operating system, and cannot be configured within PuTTY.
    * If the operating system does not receive a response to a keepalive, it may send out more in quick succession and terminate the connection if no response is received.

TCP keepalives may be more useful for ensuring that half-open connections are terminated than for keeping a connection alive. Although it also can Prevent disconnection due to network inactivity

3. TCP keepalive
3.1 how it works
After authentication, ssh sends a 32 byte empty packet to the sshd every n seconds. sshd does not care about this, but the server's TCP stack must send back an ACK for that packet. If the client's TCP stack does not receive an ACK for this or a later packet, it will retransmit for some time and then signal a connection-timeout to ssh, causing ssh to exit.

3.2 configuration of tcp keepalive


/proc/sys/net/ipv4/tcp_keepalive_intvl
/proc/sys/net/ipv4/tcp_keepalive_probes
/proc/sys/net/ipv4/tcp_keepalive_time


or permanently set them in /etc/sysctl.conf as follows:
net.ipv4.tcp_keepalive_time = 7200
net.ipv4.tcp_keepalive_intvl = 75
net.ipv4.tcp_keepalive_probes = 9


note:
a. above settings are default ones, you can change it.
b. If the network hardware or software drops connections that have been idle for less than the two hour default, the Client session will fail. KEEPALIVE timeouts are configured at the OS level for all tcp connections that have KEEPALIVE function enabled in their application, and there's option to choose it. such as the one in Putty.


If the network hardware or software (including firewalls) have a idle limit of one hour, then the KEEPALIVE timeout must be less than one hour. To rectify this situation TCP/IP KEEPALIVE settings can be lowered to fit inside the firewall limits. The implementation of TCP KEEPALIVE may vary from vendor to vendor. The original definition is quite old and described in RFC 1122.

4. Application level keepalive



4.1 How to configure it for openssh client command 'ssh' to prevent disconnection
man ssh_config on Linux, you get

ServerAliveInterval:
Sets a timeout interval in seconds after which if no data has been received from the server, ssh will send a message through the encrypted channel to request a response from the server. The default is 0, indicating that these messages will not be sent to the server.

This option applies to protocol version 2 only.

ServerAliveCountMax:
Sets the number of server alive messages (see above) which may be sent without ssh receiving any messages back from the server. If this threshold is reached while server alive messages are being sent, ssh will disconnect from the server, terminating the session. It is important to note that the use of server alive messages is very different from TCPKeepAlive (below). The server alive messages are sent through the encrypted channel and therefore will not be spoofable. The TCP keepalive option enabled by TCPKeepAlive is spoofable. The server alive mechanism is valuable when the client or server depend on knowing when a connection has become inactive.

The default value is 3. If, for example, ServerAliveInterval (above) is set to 30, and ServerAliveCountMax is left at the default, if the server becomes unresponsive ssh will disconnect after approximately 90 seconds.

we can use the following command

% ssh -o TCPKeepAlive=no -o ServerAliveInterval=30
or

put above options in /etc/ssh/ssh_config
You can use like this:

    ServerAliveInterval 60
    ServerAliveCountMax 600 
(default is 3 according to man ssh_config)


   
or
put above options in $HOME/.ssh/config  (see man ssh_config)

Make sure you set like this: ServerAliveInterval*ServerAliveCountMax <= 0.8*N, N being the timeout. The default value of ServerAliveCountMax is 3 (man ssh_config) and therefore a a 3x30 = 90 seconds if you guessed a disconnect is about less then 1.5min).

If you set it too low, there will be unnecessary traffic between client and server to keep alive, so it decrease performance.



4.2 How to configure it to prevent ssh disconnection for Putty

Enable tcp keepalive and 'seconds between keepalive' are totally different things, one is for TCP level keepalive, another is application level implementation. 

option 1: use tcp keepalive.
'Connection' menu:
Disable Nagle's algorithm
Enable TCP keepalives

option 2: use application level keepalive
'connection' - seconds between keepalives (0 to turn off)

You might consider to eanble 'Connection' -> 'SSH' -> 'X11'
Enable X11 forwarding
Enable MIT-Magic-Cookie-1


 Save the session


The following is from Putty documentation http://the.earth.li/~sgtatham/putty/0.58/htmldoc/Chapter4.html#config-keepalive

If you find your sessions are closing unexpectedly (most often with ‘Connection reset by peer’) after they have been idle for a while, you might want to try using this option.

Some network routers and firewalls need to keep track of all connections through them. Usually, these firewalls will assume a connection is dead if no data is transferred in either direction after a certain time interval. This can cause PuTTY sessions to be unexpectedly closed by the firewall if no traffic is seen in the session for some time.

The keepalive option (‘Seconds between keepalives’) allows you to configure PuTTY to send data through the session at regular intervals, in a way that does not disrupt the actual terminal session. If you find your firewall is cutting idle connections off, you can try entering a non-zero value in this field. The value is measured in seconds; so, for example, if your firewall cuts connections off after ten minutes then you might want to enter 300 seconds (5 minutes) in the box.

Note that keepalives are not always helpful. They help if you have a firewall which drops your connection after an idle period; but if the network between you and the server suffers from breaks in connectivity then keepalives can actually make things worse. If a session is idle, and connectivity is temporarily lost between the endpoints, but the connectivity is restored before either side tries to send anything, then there will be no problem - neither endpoint will notice that anything was wrong. However, if one side does send something during the break, it will repeatedly try to re-send, and eventually give up and abandon the connection. Then when connectivity is restored, the other side will find that the first side doesn't believe there is an open connection any more. Keepalives can make this sort of problem worse, because they increase the probability that PuTTY will attempt to send data during a break in connectivity. Therefore, you might find they help connection loss, or you might find they make it worse, depending on what kind of network problems you have between you and the server.

Keepalives are only supported in Telnet and SSH; the Rlogin and Raw protocols offer no way of implementing them. (For an alternative, see section 4.13.3.)

Note that if you are using SSH-1 and the server has a bug that makes it unable to deal with SSH-1 ignore messages (see section 4.23.1), enabling keepalives will have no effect.




How to expand array under Linux for HP Proliant server

Jephe Wu - http://linuxtechres.blogspot.com

Environment: HP Proliant DL360G6, CentOS 5.4 64bit, 6x300G hard disk with RAID5 (one for spare), hot added 2 more 300G hard disk, default disk partition layout by OS installation.
Objective: expand the existing RAID5 array with 2 more new added hard disk online

Steps:
1. Install HP Proliant support pack for RHEL 5

2. start up HP array configuration utility online for Linux
cd /opt/compaq/cpqacuxe/bld
./cpqacuxe -R
note: after finishing online configuration utility, you should stop it by running
./cpqacuxe -stop

3. expand array and logical drive

access https://log.domain.com:2381 then click on array configuration utility link
click on expand array, this will take long time to finish, after that, another button which is 'expand logical drive' will appear, come back to click on that also after finishing expanding array

4. make Linux kernel to recognize the new size of hardware raid5

reboot Linux server until the 'fdisk -l /dev/cciss/c0d0' shows the new size.
You can try 'partprobe' or 'sfdisk -R /dev/cciss/c0d0' first.

5. enlarge partition with fdisk

fdisk /dev/cciss/c0d0  (you might conside to use fdisk -u /dev/cciss/c0d0 to use sector instead of cyclinder)
p
press d then 2 to remove partition
press n then primary partition to use the full space
Make sure the old and new partition start at the same cylinder or sector position, otherwise, data will be destroyed.
press t to change partition type to LVM
w
q

note: you need to reboot again

6. resize physical volume size, logical volume size and file system online increase

after reboot, check again the new size

[root@log ~]# fdisk -l /dev/cciss/c0d0


Disk /dev/cciss/c0d0: 1799.7 GB, 1799797127168 bytes
255 heads, 63 sectors/track, 218812 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

           Device Boot      Start         End      Blocks   Id  System
/dev/cciss/c0d0p1   *           1          13      104391   83  Linux
/dev/cciss/c0d0p2              14      218813  1757509959+  8e  Linux LVM


[root@log ~]# pvresize /dev/cciss/c0d0p2
  Physical volume "/dev/cciss/c0d0p2" changed
  1 physical volume(s) resized / 0 physical volume(s) not resized
 
  note: use vgdisplay -v to check the number of free PE, let's say it's 17879.
 
[root@log ~]# lvextend -l +17879 /dev/VolGroup00/LogVol02

  Extending logical volume LogVol02 to 1.60 TB
  Logical volume LogVol02 successfully resized

[root@log ~]# resize2fs /dev/VolGroup00/LogVol02

resize2fs 1.39 (29-May-2006)
Filesystem at /dev/VolGroup00/LogVol02 is mounted on /data; on-line resizing required
Performing an on-line resize of /dev/VolGroup00/LogVol02 to 430276608 (4k) blocks.
The filesystem on /dev/VolGroup00/LogVol02 is now 430276608 blocks long.

How to get the Windows/Linux PC uptime

Jephe Wu - http://linuxtechres.blogspot.com

Objective: get to know when a computer was rebooted last time and uptime since then.
Environment: Windows XP SP3, Windows 7, RHEL 5 and its varieties


1. windows (check when last time a computer was rebooted, so you can also know the uptime)
net statistics[stats] server[srv] 
or 
net statistics[stats] workstation[work]

note: looking for the beginning of the output, 'Statistics since' time, it shows when the computer was rebooted last time.

1.1. Windows uptime
and type in systeminfo after cmd, look for 'system up time'
systeminfo | find "up time" /i

for Windows 7, press Shift+Ctrl+Esc, performance tab, system, uptime

2.  Linux 
just run 'uptime'

3. References
http://www.optimizewindows7.net/windows7/determine-windows-7-uptime/
http://www.windowsreference.com/general/how-to-find-the-system-uptime-in-windows-vistaserver-2008xp2003/
http://support.microsoft.com/kb/555737
http://support.microsoft.com/kb/232243
http://www.ehow.com/how_5915486_tell-last-time-computer-rebooted.html

How to make a program running even after exiting shell

Jephe Wu - http://linuxtechres.blogspot.com

Objective: After you ssh into a server, you started a program or shell script. You need to keep it running even after you exit from shell.
Environment: CentOS 5.5


Methods:
1. use setsid 
setsid command
exit

2. use screen
screen
command
ctrl a then d [to deattach]
exit


ctrl d then r [to reattach]

3.  If you have plan to make it running after exit
nohup command & 
exit 

login again to use 'pstree | grep command' to make sure it's attached to init process instead of bash


4. If you forgot to nohup it and wants to exit shell now
command
ctrl -Z
bg [%1]
disown [-h]
exit 

login again to use 'pstree | grep command' to make sure it's attached to init process instead of bash


5. References
http://en.wikipedia.org/wiki/Nohup