The following is my recent interview experience with a reputed network software company. I was asked questions interconnecting TCP level and web-requests and hence confusing me a lot. I really would like to know what expert opinions on the answers. It is not just about interview but fundamental understanding of how networking work (or how Application layer and Transport layer cross-talk, if at all they do).
以下是我最近与知名网络软件公司的面试经历。我被问到互连TCP级别和Web请求的问题,因此让我很困惑。我真的很想知道答案的专家意见。这不仅仅是关于访谈,而是对网络工作方式的基本理解(或者应用层和传输层如何交叉,如果他们做的话)。
Interviewer : Tell me the process that happens behind the scene when I open a browser and type google.com in it.
采访者:告诉我当我打开浏览器并在其中输入google.com时,幕后发生的过程。
Me : The first thing that happens is a socket is created which is identified by {SRC-IP, SRC-PORT, DEST-IP, DEST-PORT, PROTOCOL}
. The SRC-PORT number is a random number given by browser. Usually TCP/IP connection protocol (three way handshake is established). Now the both client (my browser) and server (google) are ready to handle requests. (TCP connection is established)
我:首先发生的是创建一个由{SRC-IP,SRC-PORT,DEST-IP,DEST-PORT,PROTOCOL}标识的套接字。 SRC-PORT号码是浏览器给出的随机数。通常是TCP / IP连接协议(建立三次握手)。现在,客户端(我的浏览器)和服务器(谷歌)都准备好处理请求。 (建立TCP连接)
Interviewer: wait, when does the name resolution happens?
采访者:等一下,名称解析何时发生?
Me: yep, I am sorry. It should have happened before even the socket is created. DNS name resolve happens first to get the IP address of google to reach at.
我:是的,对不起。它应该在创建套接字之前发生。 DNS名称解析首先发生以获取谷歌的IP地址。
Interviewer : Is a socket created for DNS name resolution?
访问者:是否为DNS名称解析创建了一个套接字?
Me : hmm, Iactually do not know. But all I know DNS name resolution is a connection-less. That is it not TCP but UDP. only a single request-response cycle happens. (so is a new socket created for DNS name resolution).
我:嗯,实际上不知道。但我所知道的DNS名称解析都是无连接的。那不是TCP而是UDP。只发生一个请求 - 响应周期。 (这是为DNS名称解析创建的新套接字)。
Interviewer: google.com is open for other requests from other clients. so is establishing you connection with google is blocking other users?
采访者:google.com对其他客户的其他请求开放。所以建立你与谷歌的连接阻止其他用户?
Me: I am not sure how google handles this. But in a typical socket communication, it is blocking to a minimal extent.
我:我不确定谷歌如何处理这个问题。但是在典型的套接字通信中,它在最小程度上是阻塞的。
Interviewer : How do you think this can be handled?
采访者:您认为如何处理?
Me : I guess the process forks a new thread and creates a socket to handle my request. From now on, my socket endpoint of communication with google is this this child socket.
我:我猜这个过程会分叉一个新线程并创建一个套接字来处理我的请求。从现在开始,我与谷歌通信的套接字端点就是这个子套接字。
Interviewer: If that is the case, is this child socket’s port number different than the parent one.?
采访者:如果是这种情况,这个子套接字的端口号是否与父套接字不同。
Me: The parent socket is listening at 80 for new requests from clients. The child must be listening at a different port number.
我:父套接字在80处侦听来自客户端的新请求。孩子必须在不同的端口号上听。
Interviewer: how is your TCP connection maintained since your Dest-port number has changed. (That is the src-port number sent on google's packet) ?
采访者:由于您的Dest-port编号已更改,您的TCP连接如何维护。 (那是google数据包发送的src-port号码)?
Me: The dest-port as what I see as client is always 80. when request is sent back, it also comes from port 80. I guess the OS/the parent process sets the src port back to 80 before it sends back the post.
我:我看作客户端的目标端口总是80.当请求被发送回来时,它也来自端口80.我想OS /父进程在发回邮件之前将src端口设置回80 。
Interviewer : how long is your socket connection established with google?
采访者:你的套接字连接与谷歌有多长时间了?
Me : If I don’t make any requests for a period of time, the main thread closes its child socket and any subsequent requests from me will be like am a new client.
我:如果我没有提出一段时间的请求,主线程关闭其子套接字,我的任何后续请求将像我的新客户端。
Interviewer : No, google will not keep a dedicated child socket for you. It handles your request and discards/recycles the sockets right away.
采访者:不,谷歌不会为你保留一个专用的子插座。它处理您的请求并立即丢弃/回收插座。
Interviewer: Although google may have many servers to serve requests, each server can have only one parent socket opened at port 80. The number of clients to access google webpage must be exceeding larger than the number of servers they have. How is this usually handled?
采访者:虽然谷歌可能有很多服务器来处理请求,但每个服务器只能在端口80打开一个父插槽。访问谷歌网页的客户端数量必须超过他们拥有的服务器数量。这通常如何处理?
Me : I am not sure how this is handled. I see the only way it could work is spawn a thread for each request it receives.
我:我不确定如何处理。我看到它可以工作的唯一方法是为它收到的每个请求生成一个线程。
Interviewer : Do you think the way Google handles is different from any bank website?
采访者:您认为Google处理的方式与任何银行网站不同吗?
*Me *: At the TCP-IP socket level, it should be similar. At the request level, slightly different because a session is maintained to keep state between requests in Banks websites.
* Me *:在TCP-IP套接字级别,它应该是类似的。在请求级别,略有不同,因为维护会话以保持Banks网站中的请求之间的状态。
If someone can give an explanation of each of the points, it will be very helpful for many beginners in Networking.
如果有人可以解释每个要点,那么对于网络中的许多初学者来说这将是非常有帮助的。
Thanks
谢谢
9
How many sockets do Google open for every request it receives?
Google收到的每个请求都会打开多少个套接字?
This question doesn't actually appear in the interview, but it's in your title so I'll answer it. Google doesn't open any sockets at all. Your browser does that. Google accepts connections, in the form of new sockets, but I wouldn't describe that as 'opening' them.
这个问题实际上并没有出现在面试中,但它在你的标题中,所以我会回答它。谷歌根本不打开任何套接字。你的浏览器那样做了。 Google以新套接字的形式接受连接,但我不会将其描述为“打开”它们。
Interviewer : Tell me the process that happens behind the scene when I open a browser and type google.com in it.
采访者:告诉我当我打开浏览器并在其中输入google.com时,幕后发生的过程。
Me : The first thing that happens is a socket is created which is identified by {SRC-IP, SRC-PORT, DEST-IP, DEST-PORT, PROTOCOL}.
我:首先发生的是创建一个由{SRC-IP,SRC-PORT,DEST-IP,DEST-PORT,PROTOCOL}标识的套接字。
No. The connection is identified by the tuple. The socket is an endpoint to the connection.
不。连接由元组标识。套接字是连接的端点。
The SRC-PORT number is a random number given by browser.
SRC-PORT号码是浏览器给出的随机数。
No. By the operating system.
不。由操作系统。
Usually TCP/IP connection protocol (three way handshake is established). Now the both client (my browser) and server (google) are ready to handle requests. (TCP connection is established)
通常是TCP / IP连接协议(建立三次握手)。现在,客户端(我的浏览器)和服务器(谷歌)都准备好处理请求。 (建立TCP连接)
Interviewer: wait, when does the name resolution happens?
采访者:等一下,名称解析何时发生?
Me: yep, I am sorry. It should have happened before even the socket is created. DNS name resolve happens first to get the IP address of google to reach at.
我:是的,对不起。它应该在创建套接字之前发生。 DNS名称解析首先发生以获取谷歌的IP地址。
Interviewer : Is a socket created for DNS name resolution?
访问者:是否为DNS名称解析创建了一个套接字?
Me : hmm, Iactually do not know. But all I know DNS name resolution is a connection-less. That is it not TCP but UDP. only a single request-response cycle happens. (so is a new socket created for DNS name resolution).
我:嗯,实际上不知道。但我所知道的DNS名称解析都是无连接的。那不是TCP而是UDP。只发生一个请求 - 响应周期。 (这是为DNS名称解析创建的新套接字)。
Any rationally implemented browser would delegate the entire thing to the operating system's Sockets library, whose internal functioning depends on the OS. It might look at an in-memory cache, a file, a database, an LDAP server, several things, before going out to a DNS server, which it can do via either TCP or UDP. It's not a great question.
任何合理实现的浏览器都会将整个事务委托给操作系统的套接字库,其内部功能取决于操作系统。在进入DNS服务器之前,它可能会查看内存缓存,文件,数据库,LDAP服务器,以及它可以通过TCP或UDP执行的操作。这不是一个好问题。
Interviewer: google.com is open for other requests from other clients. so is establishing you connection with google is blocking other users?
采访者:google.com对其他客户的其他请求开放。所以建立你与谷歌的连接阻止其他用户?
Me: I am not sure how google handles this. But in a typical socket communication, it is blocking to a minimal extent.
我:我不确定谷歌如何处理这个问题。但是在典型的套接字通信中,它在最小程度上是阻塞的。
Wrong again. It has very little to do with Google specifically. A TCP server has a separate socket per accepted connection, and any rationally constructed TCP server handles them completely independently, either via multithreading, muliplexed/non-blocking I/O, or asynchronous I/O. They don't block each other.
又错了。它与谷歌没什么关系。 TCP服务器每个接受的连接都有一个单独的套接字,任何合理构造的TCP服务器都可以通过多线程,多路复用/非阻塞I / O或异步I / O完全独立地处理它们。它们不会相互阻挡。
Interviewer : How do you think this can be handled?
采访者:您认为如何处理?
Me : I guess the process forks a new thread and creates a socket to handle my request. From now on, my socket endpoint of communication with google is this this child socket.
我:我猜这个过程会分叉一个新线程并创建一个套接字来处理我的请求。从现在开始,我与谷歌通信的套接字端点就是这个子套接字。
Threads are created, not 'forked'. Forking a process creates another process, not another thread. The socket isn't 'created' so much as accepted, and this would normally precede thread creation. It isn't a 'child' socket.
创建线程,而不是“分叉”。分叉进程会创建另一个进程,而不是另一个进程。套接字不是被“创建”的,而是通常在创建线程之前。它不是'儿童'插座。
Interviewer: If that is the case, is this child socket’s port number different than the parent one.?
访问者:如果是这种情况,这个子套接字的端口号是否与父套接字号不同。
Me: The parent socket is listening at 80 for new requests from clients. The child must be listening at a different port number.
我:父套接字在80处侦听来自客户端的新请求。孩子必须在不同的端口号上听。
Wrong again. The accepted socket uses the same port number as the listening socket, and the accepted socket isn't 'listening' at all, it is receiving and sending.
又错了。接受的套接字使用与侦听套接字相同的端口号,并且接受的套接字根本不是“侦听”,它正在接收和发送。
Interviewer: how is your TCP connection maintained since your Dest-port number has changed. (That is the src-port number sent on google's packet) ?
采访者:由于您的Dest-port编号已更改,您的TCP连接如何维护。 (那是google数据包发送的src-port号码)?
Me: The dest-port as what I see as client is always 80. when request is sent back, it also comes from port 80. I guess the OS/the parent process sets the src port back to 80 before it sends back the post.
我:我看作客户端的目标端口总是80.当请求被发送回来时,它也来自端口80.我想OS /父进程在发回邮件之前将src端口设置回80 。
This question was designed to explore your previous wrong answer. Your continuation of your wrong answer is still wrong.
此问题旨在探索您之前的错误答案。你继续错误的答案仍然是错误的。
Interviewer : how long is your socket connection established with google?
采访者:你的套接字连接与谷歌有多长时间了?
Me : If I don’t make any requests for a period of time, the main thread closes its child socket and any subsequent requests from me will be like am a new client.
我:如果我在一段时间内没有提出任何请求,主线程会关闭它的子套接字,来自我的任何后续请求就像是一个新客户端。
Wrong again. You don't know anything about threads at Google, let alone which thread closes the socket. Either end can close the connection at any time. Most probably the server end will beat you to it, but it isn't set in stone, and neither is which if any thread will do it.
又错了。你对谷歌的线程一无所知,更不用说哪个线程关闭了套接字。任何一端都可以随时关闭连接。很可能服务器端会击败你,但它并不是一成不变的,如果任何线程都没有这样做的话。
Interviewer : No, google will not keep a dedicated child socket for you. It handles your request and discards/recycles the sockets right away.
采访者:不,谷歌不会为你保留一个专用的子插座。它处理您的请求并立即丢弃/回收插座。
Here the interviewer is wrong. He doesn't seem to have heard of HTTP keep-alive, or the fact that it is the default in HTTP 1.1.
面试官在这里错了。他似乎没有听说过HTTP keep-alive,或者它是HTTP 1.1中的默认值。
Interviewer: Although google may have many servers to serve requests, each server can have only one parent socket opened at port 80. The number of clients to access google webpage must be exceeding larger than the number of servers they have. How is this usually handled?
采访者:虽然谷歌可能有很多服务器来处理请求,但每个服务器只能在端口80打开一个父插槽。访问谷歌网页的客户端数量必须超过他们拥有的服务器数量。这通常如何处理?
Me : I am not sure how this is handled. I see the only way it could work is spawn a thread for each request it receives.
我:我不确定如何处理。我看到它可以工作的唯一方法是为它收到的每个请求生成一个线程。
Here you haven't answered the question at all. He is fishing for an answer about load-balancers or round-robin DNS or something in front of all those servers. However his sentence "the number of clients to access google webpage must be exceeding larger than the number of servers they have" has already been answered by the existence of what you are both incorrectly calling 'child sockets'. Again, not a great question, unless you've reported it inaccurately.
在这里你根本没有回答这个问题。他正在寻找关于负载均衡器或循环DNS或所有这些服务器前面的东西的答案。然而,他的句子“访问谷歌网页的客户数量必须超过他们拥有的服务器数量”已经通过你们错误地称为“子套接字”的存在来回答。同样,这不是一个很好的问题,除非你报告的不准确。
Interviewer : Do you think the way Google handles is different from any bank website?
采访者:您认为Google处理的方式与任何银行网站不同吗?
Me: At the TCP-IP socket level, it should be similar. At the request level, slightly different because a session is maintained to keep state between requests in Banks websites.
我:在TCP-IP套接字级别,它应该是类似的。在请求级别,略有不同,因为维护会话以保持Banks网站中的请求之间的状态。
You almost got this one right. HTTP sessions to Google exist, as well as to bank websites. It isn't much of a question. He should be asking for facts, not your opinion.
你几乎得到了这个。存在与Google的HTTP会话以及银行网站。这不是一个问题。他应该询问事实,而不是你的意见。
Overall, (a) you failed the interview completely, and (b) you indulged in far too much guesswork. You should have simply stated 'I don't know' and let the interview proceed to things that you do know about.
总的来说,(a)你完全没有通过面试,而且(b)你沉迷于太多的猜测。你应该简单地说“我不知道”并让面试继续你所知道的事情。
-1
For point #4, here is how I understand it: if both ends of an end to end connection were the same as that of another socket, there would indeed be no way to distinguish both socket, but if a single end is not the same as that of the other socket, then it's easy to distinguish both. So there is not need to turn destination port 80 (the default) forth and back, since the source ports differ.
对于第4点,这是我理解的方式:如果端到端连接的两端与另一个套接字的两端相同,则确实无法区分两个套接字,但如果单个端不相同作为另一个套接字的那个,那么很容易区分两者。因此,不需要向前和向后转动目标端口80(默认值),因为源端口不同。