What happens when you type holbertonschool.com in your browser.
I’m going to show you how the web stack works.
Starting
When you press the key “h” the browser receives the event and the auto-complete functions kick in. Depending on your browser’s algorithm and if you are in private/incognito mode or not various suggestions will be presented to you in the dropdown below the URL bar. Most of these algorithms sort and prioritize results based on search history, bookmarks, cookies, and popular searches from the internet as a whole. As you are typing “holbertonschool.com” many blocks of code run and the suggestions will be refined with each keypress. It may even suggest “holbertonschool.com” before you finish typing it.
What is a browser?
A web browser is application software for accessing the World Wide Web. When a user requests a web page from a particular website, the web browser retrieves the necessary content from a web server and then displays the page on the user’s device.
DNS
DNS(Domain Name System) is a database that maintains the name of the website (URL) and the particular IP address it links to. Every single URL on the internet has a unique IP address assigned to it. The IP address belongs to the computer that hosts the server of the website we request access. For example, www.google.com has an IP address of 172.217.172.238. So if you’d like, you can reach www.google.com by typing it in your browser. DNS lists URLs and their IP addresses, like how a phone book is a list of names and their corresponding phone numbers.
When you type the URL in a browser for the first time, it sends a request to the DNS server, which response back with the IP address of the webserver hosting, for example, holbertonschool.com. (This value is usually then cached or gets added into the list of known hosts, so your browser doesn’t have to do this lookup every time).
The primary purpose of DNS is human-friendly navigation. You can easily access a website by typing the correct IP address for it on your browser, but imagine having to remember different sets of numbers for all the sites we regularly access? Therefore, it is easier to remember the name of the website using a URL and let DNS do the work for us by mapping it to the correct IP.
To find the DNS record, the browser checks four caches.
1-It checks the browser cache. The browser maintains a repository of DNS records for a fixed duration for websites you have previously visited. So, it is the first place to run a DNS query.
2- The browser checks the OS cache. If it is not in the browser cache, the browser will make a system call to your underlying computer OS to fetch the record since the OS also maintains a cache of DNS records.
3- It checks the router cache. If it’s not on your computer, the browser will communicate with the router that maintains its’ own cache of DNS records.
4- It checks the ISP cache. If all steps fail, the browser will move on to the ISP. Your ISP maintains its’ own DNS server, which includes a cache of DNS records, which the browser would check with the last hope of finding your requested URL.
You may wonder why there are so many caches maintained at so many levels. Although our information being cached somewhere doesn’t make us feel very comfortable when it comes to privacy, caches are essential for regulating network traffic and improving data transfer times.
Protocols: TCP/IP
We mentioned how domain names actually represent IP addresses, but IP is not the only type of protocol use by the Internet. The Internet Protocol Suite is often referred to as TCP/IP (TCP stand for Transmission Control Protocol), and it also contains other types of protocols. It’s a set of rules that define how servers and clients interact over the network, and how data should be transferred, broken into packets, received, etc.
OSI Model
OSI (Open System Interconnection) model standardizes communication between different computing machines. It describes the flow of information from one computer to another. It defines seven layers, and the interplay of these layers magically brings, for example, holbertonschool.com from server to your machine. At both ends (client and server), these layers are followed, but there is a difference in the flow of which layer kicks in first. When your browser sends the request, communication starts at the application layer and goes down to the physical layer — whereas in the server, while receiving the request, it would start at the physical layer, going up. On the other hand, when a server is responding to your browser’s request, it would go from the application layer to the physical layer — and when your computer receives the response, it would first go to the physical layer all the way back to the application layer.
The Firewall
To protect themselves from hackers and attacks, servers are often equipped with a firewall. A firewall is software that sets rules about what can enter or leave a part of a network. In the case of our example, when the browser asks for the website at the address 12.345.6.789, that request has been processed by a firewall which will decide if it’s safe, or if it’s a threat to the server’s security. The browser itself can also be equipped with a firewall to detect if the IP given by the DNS request is a potential malicious agent.
Security and Encryption: HTTPS/SSL
Now that the browser has the IP address, it is going to take care of the other part of the URL, the https:// part. HTTPS stands for HyperText Transfer Protocol Secure and is a secure version of the regular HTTP. This transfer protocol defines different types of requests and responses served to clients and servers over a network. In other terms, it’s the main way to transfer data between a browser and a website. HTTP and HTTPS requests include GET, POST, PUT, and others. The HTTPS requests and responses are encrypted, which ensures the users that their data can’t be stolen or used by third parties. For example, if we put our credit card information on a website that uses HTTPS, we are guaranteed that this info is not going to be stored in plain text somewhere accessible to anybody.
Another key component in securing websites is the SSL certificate. SSL stands for Secure Sockets Layer (also known as TSL, Transport Layer Security). The certificate needs are issued from a trusted Certificate Authority. When a website has this certificate, we’re able to see a little lock icon next to the website name in the search bar.
Load-balancer
As we mentioned earlier, websites live on servers. For most websites where the traffic is consequent, it would be impossible to be hosted on a single server. Plus, it would create a Single Point of Failure (SPOF), because it would only need one attack on the server to take the whole site down.
Websites started augmenting the number of servers they have, organizing them in clusters, and using load-balancers. A load-balancer is a software program that distributes network requests between several servers, following a load-balancing algorithm. HAproxy is a very famous load-balancer, and an example of algorithms that we can use are the round-robin, which distributes the requests alternating between all the servers evenly and consequentially, or the least-connection, which distributes requests depending on the current server loads.
The Web server
Once the requests have been evenly distributed to the servers, they will be processed by one or more web servers. A web server is a software program that serves static content, like simple HTML pages, images, or plain text files. Examples of web servers are Nginx or Apache. The web server is responsible for finding where the static content corresponding to the address asked for is living, and for serving it as an HTTP, or HTTPS response.
The Application server
Having a web server is the basis of any web page. But most sites don’t just want a static page where no interaction is happening, and most websites are dynamic. That means that it’s possible to interact with the site, save the information into it, log in with a user name and a password, etc.
This is made possible by the use of one or more application servers. These are software programs responsible for operating applications, communicate with databases, and manage user information, among other things. they work behind web servers and will be able to serve a dynamic application using the static content from the web server.
The Database
The last step in our web infrastructure is the Data Base Management System (DBMS). A database is a collection of data, and the DBMS is the program that is going to interact with the database and retrieve, add, modify data in it.
There are several types of database models. The two main ones are relational databases and non-relational databases.
A relational database can be seen as a collection of tables representing objects, where each column is an attribute and each row is an instance of that object. We can perform SQL (Structured Query Language) queries on those databases. MySQL and PostgreSQL are two popular relational databases. A non-relational database can have many forms, as the data inserted in it doesn’t have to follow a particular schema. They are also called NoSQL databases.