Web servers – Apache Server Basics
Apache Server is one of the oldest web servers. Apache used to run only on Linux but is now available for Windows and OS/2 as well. The Apache Server has been around since 1995 and has played a major role in the development of the World Wide Web (WWW).
It was based on the NCSA HTTPd server, developed by the Apache Server Foundation, and quickly grew to be the dominant web server globally. Approx. 92% of Apache servers currently run on Linux.
Apache Server – What does it do?
Enough history, what does the Apache server actually do?
In essence, a web server has 3 functions:
- Read the URL sent to it and locate the folder/directory with the requested content.
- Translate this content into an instruction set that the end user’s web browser understands (HTML)
- Handle any non-html code found by calling the appropriate software handlers (ie PHP) to translate the code to HTML.
Structure – Understanding the directories
Apache usually has its main components spread over 3 directories, although Apache configuration may differ across different Linux distributions. Plesk, cPanel and a few other vendors create their own custom configurations. We are speaking about the normal configuration of Apache here.
- Apache Configuration Files: Apache configuration files are usually found under either /etc/httpd, /etc/apache or /etc/apache2. This directory may or may not contain many levels of sub-directories, depending on the Apache version and Linux distribution.
- Apache binaries/executables: Most Apache installs use two main executables – httpd (the http service) and apachectl (the control system). You can usually find both under /usr/sbin/ and can check by typing “which httpd” or “which apachectl“.
- Web Content: You may place web content under each user’s home directory, but the default is under the /var/www directory, which usually includes /var/www/html and /var/www/cgi-bin. Under Linux, the default first page that is automatically picked up by Apache is called either index.html or index.htm. Various index files may be specified in the Apache configuration. It is not uncommon to see index.php or even occasionally the Microsoft landing page file, Default.htm.
Structure – Understanding the configuration layout
The whole configuration is directed by ONE file – httpd.conf or apache.conf. This file contains directives to various includes in sub-directories. The structure of these varies by Linux distribution / Apache version. Administrators originally configured Apache Server using a MONOLITHIC configuration (all settings in one file). Even the virtual host (website) settings were in this file. If you have large installations with 600 to 1000 sites, the file becomes hard to manage. As a result, the httpd.conf file was split up. The most common sub-directories are conf.d and conf but modules.d and some other directories may also reside under /etc/apache2 (or /etc/httpd). Under these directories, it is common practice to use the format *-available and *-enabled for various features. The usual practice is to create a symbolic link from the *-available directory entry to the *-enabled directory.
Configuration – httpd.conf
The httpd.conf file contains many configuration elements (called DIRECTIVES), but the main ones are as follows:
ServerName: This is usually the name of the host server on which Apache Server is running.
User and Group: The Username and Group which Apache will run it’s CHILD processes as. Apache always starts as root but does not serve any websites from this root instance. The root instance is the controller or PARENT and spawns CHILD processes to run each web request. You can restart the Apache service without interrupting any services.
Include: This calls the sections of configuration that have been split off from the original httpd.conf file. This enables us to group configurations by, for example, vhosts (websites) and modules. The “Include” directive is always an ABSOLUTE path ie Include “/etc/apache2/conf.d/includes/global.conf”. The administrator can also use wildcards ie Include “/etc/apache2/conf.modules.d/*.conf“.
Configuration – vhosts.conf
Virtualhost is Apache’s term for a website. In the early days, each website ran under its own IP address. To differentiate the MULTI-SITE per IP hosting, Apache used the term VIRTUALHOST. As a result. this can lead to some confusion since the advent of CLOUD and VIRTUALIZATION as a VIRTUAL HOST has a different connotation.
cPanel creates the VirtualHosts in the httpd.conf file. The usual construction is to keep these separate, usually in a file-per-site structure ie /etc/apache2/sites-available/mydomain.co.za.conf.
The virtualhost configuration starts with <Virtualhost *> and ends with </Virtualhost> – the syntax is close to an XML format. Any directives here override the global directives in httpd.conf. Here is a short example:
<VirtualHost *> ServerName mydomain.co.za ServerAlias www.mydomain.co.za DocumentRoot /var/www/html/mydomain.co.za ServerAdmin firstname.lastname@example.org <Directory "/"> AllowOverride All </Directory> </VirtualHost>
Security, .htaccess and the rest
Various security functions and modules exist for Apache. Let us look at some of the basics:
- In httpd.conf: Always turn off the Apache Server Info with ServerSignature Off and ExtendedStatus Off.
- In your website directories, you can control security and access through directives in the .htaccess file.
Full documentation can be found on the Apache foundation’s Apache website.