TCP 相关

TCP

TCP 三次握手

js
1
2
3
4
5
6
7
8
9
10
11
12
# SYN 是建立连接时的握手信号,TCP 中发送第一个 SYN 包的为客户端,接收的为服务端
# TCP 中,当发送端数据到达接收端时,接收端返回一个已收到消息的通知。这个消息叫做确认应答 ACK  
假设有客户端A,服务端B。我们要建立可靠的数据传输。
        SYN(=j)       // SYN: A 请求建立连接  
    A ----------> B
                  |
        ACK(=j+1) |   // ACK: B 确认应答 A 的 SYN
        SYN(=k)   |   // SYN: B 发送一个 SYN  
    A <-----------  
    |
    |   ACK(=k+1)  
     -----------> B   // ACK: A 确认应答 B 的包
  1. 客户端发送 SYN 包(seq = j)到服务器,并进入 SYN_SEND 状态,等待服务器确认。
  2. 服务器收到 SYN 包,必须确认客户的 SYN(ACK = k + 1),同时自己也发送一个 SYN 包(seq = k),即 SYN+ACK 包,此时服务器进入 SYN_RECV 状态。
  3. 客户端收到服务器的 SYN+ACK 包,向服务器发送确认包 ACK(ACK = k + 1),此包发送完毕,客户端和服务器进入 ESTABLISHED 状态,完成三次握手。

TCP 四次挥手

  • 主动关闭方发送一个 FIN,用来关闭主动方到被动关闭方的数据传送,也就是主动关闭方告诉被动关闭方:我已经不会再给你发数据了(在 FIN 包之前发送出去的数据,如果没有收到对应的 ACK 确认报文,主动关闭方依然会重发这些数据),但此时主动关闭方还可以接受数据。
  • 被动关闭方收到 FIN 包后,发送一个 ACK 给对方,确认序号为收到序号+1(与 SYN 相同,一个 FIN 占用一个序号)。
  • 被动关闭方发送一个 FIN,用来关闭被动关闭方到主动关闭方的数据传送,也就是告诉主动关闭方,我的数据也发送完了,不会再给你发数据了。
  • 主动关闭方收到 FIN 后,发送一个 ACK 给被动关闭方,确认序号为收到序号+1,至此,完成四次挥手。

HTTP

HTTP (超文本传输协议,HyperText Transfer Protocol),建议使用 Wireshark 抓包查看详细过程。

HTTP 1.0

Under HTTP 1.0, connections should always be closed by the server after sending the response.

Since late 1996, developers of popular products (browsers, web servers, etc.) using HTTP/1.0, started to add an unofficial extension (to the protocol) named "keep-alive" in order to allow the reuse of a connection for multiple requests/responses.

If the client supports keep-alive, it adds an additional header to the request:

yaml
1
Connection: keep-alive

When the server receives this request and generates a response, if it supports keep-alive then it also adds the same above header to the response. Following this, the connection is not dropped, but is instead kept open. When the client sends another request, it uses the same connection.

This will continue until either the client or the server decides that the conversation is over and in this case they omit the "Connection:" header from the last message sent or, better, they add the keyword "close" to it:

yaml
1
Connection: close

After that the connection is closed following specified rules.

Since 1997, the various versions of HTTP/1.1 specifications acknowledged the usage of this unofficial extension and included a few caveats regarding the interoperability between HTTP/1.0 (keep-alive) and HTTP/1.1 clients / servers.
在一些 TPS/QPS 很高的 REST 服务中,如果使用的是短连接(即没有开启keep-alive),则很可能发生客户端端口被占满的情形

HTTP 1.1

在 HTTP/1.1 协议中,默认开启 keep-alive,除非显式地关闭它:

yaml
1
Connection: close

In HTTP 1.1, all connections are considered persistent unless declared otherwise. The HTTP persistent connections do not use separate keepalive messages, they just allow multiple requests to use a single connection. However, the default connection timeout of Apache httpd 1.3 and 2.0 is as little as 15 seconds and just 5 seconds for Apache httpd 2.2 and above. The advantage of a short timeout is the ability to deliver multiple components of a web page quickly while not consuming resources to run multiple server processes or threads for too long.

Keepalive makes it difficult for the client to determine where one response ends and the next response begins, particularly during pipelined HTTP operation. This is a serious problem when Content-Length cannot be used due to streaming. To solve this problem, HTTP 1.1 introduced a chunked transfer coding that defines a last-chunk bit. The last-chunk bit is set at the end of each response so that the client knows where the next response begins.
在HTTP协议中,Keep-Alive属性保持连接的时间长短是由服务端决定的,通常配置都是在几十秒左右,nginx 默认值在 http scope 里面 keepalive_timeout 属性。

  • HTTP协议(七层)的 Keep-Alive 意图在于连接复用,希望可以短时间内在同一个连接上进行多次请求/响应。核心在于:时间要短,速度要快。
  • TCP协议(四层)的 KeepAlive 机制意图在于保活、心跳,检测连接错误核心在于:虽然频率低,但是持久。

在 http1.1中,默认开放了 keep-alive 特性,多个资源的请求可以服用同一个 tcp,降低了建链拆链的开销。这种方式被称为 pipeline。pipeline 的问题是,虽然 tcp 被复用了,但对资源的请求是串行的,如果排在前面的资源请求出现阻塞,则会影响后续的资源传输。这被称为 HOL(head of line) blocking。如果为了解决这个问题,采用并行建多个 tcp 链接的策略,那么无论是客户端还是服务端,都面临更高的开销,尤其是对服务端而言,在并发连接数有上限的情况下则并发客户端服务数量就会大幅度降低。

HTTP2

http2 采用了底层流技术,这个流技术对 http 上层的语义没有影响,只是在数据流的传输上,不再采用 plain text 这种方式。当一次请求中包含多个资源的请求时,将不同的资源映射到不同的二进制流上,每个流有唯一 id,并且通过 parent 字段描述不同的流资源间的相互依赖关系。同时,每个流还可以指定优先级,优先级数字越大则优先应答。对于一个资源的数据,通过流传输时,数据被进一步划分成更小的单位,称为 frame。在一个流通道中(流通道建立在 tcp 协议上),可以同时传输不同 id 的流,实现多个资源的并行,且资源传输的先后顺序可以有应用通过定义优先级的方式灵活定制。

  • HTTP/2 is binary instead of textual like HTTP1.x – this makes it transfer and parsing of data over HTTP/2 inherently more machine-friendly, thus faster, more efficient and less error prone.
  • HTTP/2 is fully multiplexed allowing multiple files and requests to be transferred at the same time, as opposed to HTTP1.x which only accepted one single request / connection at a time.
  • HTTP/2 uses the same connection for transferring different files and requests, avoiding the heavy operation of opening a new connection for every file which needs to be transferred between a client and a server.
  • HTTP/2 has header compression built-in which is another way of removing several of the overheads associated with HTTP1.x having to retrieve several different resources from the same or multiple web servers.
  • HTTP/2 allows servers to push required resources proactively rather than waiting for the client browser to request files when it thinks it need them.

差别

  • 传输模型(Transmission Model)
  • 流量控制(Flow Control)
  • Predicting Resource Requests
    http2提出了叫server push的方式。如果预测到某些资源是可能会被后续请求的,则先向客户端推送一条PUSH_PROMISE帧,在这条帧中描述了即将推送过来的内容的元数据。如果客户端不需要某些资源,则可以应答一条RST_STREAM来取消某些资源。这样就避免了资源浪费。同时,客户端还可以发送SETTINGS帧来改变server push的行为。
  • Compression
    http1.1只对消息体进行压缩,不对http header压缩,因为header一般很小。但是当请求量比较大时,header对网络带宽的开销也会增大。http2定义hpack的方式对header也进行了压缩,尤其是当两次请求或应答时头部仅有部分差异时,只传输差异部分。