Choosing a server for Big Data analytics

The database server is an important segment in the life of any company that deals with big data, which must ensure the integrity, safety and availability of data 24/7. What kind of servers are these, what types of databases are there and what kind of hardware is needed for them?

Choosing a server for Big Data analytics

Database servers usually store various corporate information, and also process databases of client-server programs. High requirements are imposed on the choice of a server – the more reliable, faster and more fault-tolerant the equipment is, the higher the performance and safety of the databases will be.

There are many different modern servers that improve database performance (for instance, 10 gbps dedicated server provided by blueservers.com), but there are many types of databases, and what kind of server is needed for each type?

Database types

The most popular databases today are relational databases (SQL databases), the information in which is stored in a tabular format, and the tables have a clear structure and are linked to each other. Each table is divided into rows containing individual records, and columns containing assigned data types. The information in each cell is written according to the template.

MySQL is an open source relational database designed for small and medium projects and is an inexpensive and reliable tool. MySQL databases support a large number of tables, have many extensions and plugins that make it easier to work with systems. They are easy to install, can be integrated into other databases, and are suitable for working in any CMS, frameworks and programming languages. They are mainly used by local or remote clients, allowing them to work with different types of tables that support full-text searches or perform transactions at the level of individual records;
PostgreSQL is the second most popular open source SQL database. It has a large number of built-in functions and add-ons, including scaling to a cluster and table sharding. It is used to work with complex data with high requirements for their safety, since it is stable and practically does not “break”. Allows you to work with structured data, but supports JSON/BSON, which gives you some flexibility in the data schema. PostgreSQL is also designed to create, store and retrieve complex data structures. It supports a variety of data types (among them – numeric, text, boolean, monetary, binary data, network addresses, xml and others);

MSSQL is a multi-user software product developed by Microsoft with high performance and fault tolerance, tightly integrated with Windows OS. This server supports remote connections, works with many popular data types, provides the ability to create triggers and stored data, and has practical and convenient configuration utilities. The query language of this DBMS is based on Transact-SQL (jointly developed by Microsoft and Sybase). At the same time, Transact-SQL is an implementation of the ANSI / ISO standard for SQL (Structured Query Language), but with some extensions. MSSQL is widely used not only in web projects, but also in desktop programs. It is used when working with relational databases of various sizes, ranging from personal databases to large enterprise-wide databases. It is used in cases where the functionality of MySQL is not enough;

Oracle Database is a multi-model object-relational DBMS commonly used for online transaction processing (OLTP), data warehousing (DW), and mixed database (OLTP and DW) database workloads. It includes tablespaces, control files, logs and archive logs, block change trace files, flashback logs, and backup (RMAN) files. Using this database, you can both automate common business operations and perform dynamic multidimensional data analysis (OLAP), perform operations with xml-format documents and manage separated and local information.
At the same time, despite the frank differences in functionality and architecture, all of the listed databases have similar hardware requirements.

MongoDB. A non-relational database in which information is stored in BSON standard documents. In this regard, information can be transferred without restrictions between web applications and individual servers in a human-readable format. MongoDB has replication support, making it highly scalable. MongoDB is a suitable solution for massive unstructured data operations.

How to choose a server for big data analytics?

The specifics of the operation of database servers is that data processing, as a rule, occurs transactionally, that is, the DBMS requests information in small portions, performs operations on it, and then saves it. This specificity of work determines a number of requirements for server hardware:

  • a large amount of RAM is used to cache the most intensively used parts of the database;
  • the disk subsystem should be characterized by high performance, that is, the ability to process a large number of small requests per unit of time – IOPS (input / output per second);
  • high processing power is required to process queries and operations on data.

The choice of a server for a database is also affected by the load on the equipment, which depends on the size of the database files, the number of users simultaneously connected to the server, the intensity and characteristics of the user's work (input and editing, viewing, the formation of "heavy" queries), the presence of resident software, the nature of the tasks executed by the server.

Database servers must meet hardware requirements depending on the number of users (data provided by blueservers.com):

  • 10 users require a server with a processor speed of at least 2.2 GHz, at least two cores per processor, at least 4 GB of DDR3 RAM, at least 3 SAS / SATA drives with a rotation speed of 7200 rpm;
  • 20 users require a server with a processor speed of at least 2.3 GHz, at least four cores per processor, at least 6 GB of DDR3 RAM, at least 3 SAS / SATA drives with a rotation speed of 7200 rpm;
  • 50 users require a server with a processor speed of at least 2.5 GHz, at least eight cores per processor, at least 16 GB of DDR3 RAM, at least 6 SAS / SATA drives with a rotation speed of 7200 rpm;
  • 100 users require a server with two processors with a frequency of at least 2.8 GHz, at least ten cores per processor, at least 16 GB of DDR4 RAM, at least 10 SAS drives with a rotation speed of 10,000 rpm;
  • for 200 users, a server with four processors with a frequency of at least 2.8 GHz, at least 8 cores per processor, at least 64 GB of DDR4 RAM, at least 16 SAS drives with a rotation speed of 10,000 rpm;
  • 500 users require a server with four processors with a processor frequency of at least 3 GHz, at least 16 cores per processor, at least 128 GB of DDR4 RAM, at least 24 SAS disks with a rotation speed of 10,000 rpm.

RAM

If you need to improve system performance, in the case of database servers, the ability to simply increase the amount of RAM will become critical. For example, doubling the amount of RAM from 16 GB to 32 GB can cut query execution time in half.

For data processing, the server is first connected to the database and the processed data is loaded into RAM in order to optimize performance: all calculations are thus carried out in real time.

To find out what is the "best" amount of RAM, add up the size of all your active databases, this is the amount of RAM you can potentially use, depending on the limitations of your software. Obviously, this is a "best case scenario" and is not realistic or necessary for all companies.

In addition, before installing RAM, it is important to consider the features of the database and software: depending on the software version, the DBMS can only support a limited amount of RAM.

Storage size

When choosing the size of hard drives, you must first consider the potential for scaling, while relying on the durability of the drives. If your current database is about 100 GB with 15 GB annual growth, you should install a hard drive well over 150 GB in your database server or it will fill up in 3 years.

If it is possible to deploy a SAN, then it is best to use it to host the database files. Otherwise, it is recommended to include 3 separate high-speed disks in the configuration (for example, with a spindle speed of 15 thousand revolutions per minute) for different tasks:

  • Disk 1. Storage with a database (volume – double the size of the current database);
  • Disk 2. Temporary database (100 GB or less);
  • Disk 3: Standard disk for storing OS and program files (100 GB or less).
  • The physical location of these databases is specified during the installation of the DBMS.

CPU

Processor performance plays the last role in building a database server after determining the amount of RAM and drives. However, there is some gold standard for server chips.

For databases of small companies, the performance of 4-8 cores will be enough, while for solving tasks of larger projects, you should focus on 8 and 16-core processors.