DIR Return Create A Forum - Home
---------------------------------------------------------
ExcelSoft Database Professionals
HTML https://esdbp.createaforum.com
---------------------------------------------------------
*****************************************************
DIR Return to: General
*****************************************************
#Post#: 78--------------------------------------------------
Memory is a new disk
By: srinivasma_exceldbp Date: June 17, 2014, 10:43 pm
---------------------------------------------------------
In the database world, disk based data stores are slowly being
replaced by memory-based data stores. Memory prices are becoming
more affordable and operational databases can usually fit
totally into memory.
According to the Gartner (the research company that provides
independent technology reports) by the end of 2015 all
enterprise DBMS will use memory optimizations and most of this
transformation will happen this year.
The Big Vendors, Microsoft, Oracle, SAP and IBM are still
Leaders on the field, all others are way behind.
Oracle have understood finally that having several database
products is not too easy for their customers. They have
announced an In-Memory Database option for Oracle 12c, it should
be available this July. As opposed to TimesTen, which is a
separate caching product, an in-Memory option will be totally
integrated into the 12c version. It will use the in-memory
column-store format which is usually being used for data
analysis but they say that this feature will speed up also OLTP
workloads. They will keep the data simultaneously in two formats
– in the old row-based and in the in-memory column-based. Both
formats will be transactionally consistent.
SAP has a High performance Analytic Appliance (HANA) DBMS since
2011. It supports 2 formats and stores in memory both row-based
tabular data and column-based tables. Naturally, tabular
orientation is recommended more for OLTP environments and
column-store for OLAP-like queries. SAP HANA can be scaled out
by adding nodes, sharing the same storage. In such case the
master node will store row-based data and statistics on the rest
of the data.
IBM DB2 10.5 has a BLU acceleration. This is a memory-resident
index which helps to find data, stored in columnar form on disk.
There is no scale out solution yet but they claim that it will
come soon. IBM BLU is not really for OLTP environments, as
opposite to SQL Server: for writes IBM recommends “to commit
10,000 rows at a time”. Transactions are using locks to make
changes to the data and latches to access shared structures.
Microsoft SQL 2014 has a new query engine “in-memory OLTP”
(“Hekaton”) where the row-based data resides completely in
memory keeping a copy of the data on the disk for durability
purposes. Indexes on memory-resident tables live only in memory.
Queries can span all 3 existing query engines: relational
traditional query engine where the data sits on disk,
column-store query engine, where data is using both memory and
disk and this 3rd new query engine that manages memory resident
data. Hekaton tables are not using locks and latches for
concurrency and locking management, it’s using a combination of
timestamps and row versioning.
Here are some of the proven alternative technologies on the DBMS
market that are worth noticing:
Aerospike: a key-value row-based NoSQL DBMS with in-memory
indexes and data on SSD. It has many customers in gaming and
advertising-driven sites. Key value stores are very simple and
powerful structures if you always use key-matching searches.
They are extremely fast and can scale easily by adding more
servers to the cluster. The main downside of the key-value
structure is a limited ability to search inside the value.
Aerospike, unlike many of it’s competitors, allows creation of
secondary indexes on the value and stores them in memory. Redis,
for instance, is a quite popular really fast in-memory key-value
store but it is not scalable beyond one node and does not
support secondary indexes on value. Probably , due to those two
limitations, it’s not yet in the Gartner quadrant. Aerospike is
using latches and short-term locks to ensure isolation between
transactions.
VoltDB: an all-in-memory NewSQL relational database, designed to
run as a cluster of servers with tables partitioned across the
cluster. Transactions in the VoltDB are running in a serial
timestamp order, do not use locks or latches, and translate SQL
into the Java code. They have many customers in telecom,
utilities and financial services.
MemSQL: a distributed in-memory row-based NewSQL relational
DBMS. It stores all data in memory in a row format and maintains
the write-ahead log for durability purposes. SQL is being
compiled into the C++ code. Data distribution is implemented
using an aggregator node, which is splitting the query and sends
across all leaf nodes for processing. The main node will
aggregate it back before sending the results to the client.
Recently they have added a new column-store table type which
must reside on SSD. It is sort of in contradiction to the
current trend where everyone who adds column-based technology,
makes it memory-resident. This IMDBMS is used by customers such
as Comcast and Zynga.
MongoDB and CouchBase are the most popular document based data
stores. Their greatest advantage is a well known open schema in
which work with Json documents becomes very easy. Both of them
are not really in-memory products, they are working with data on
disk. MongoDB however has some mode of work where they use the
memory-mapped files to access the data on disk. This is in fact
a Linux feature. Linux makes the files appear as regular files
but they are sitting entirely in memory. Couchbase is “almost”
an in-memory database, it caches as much as possible in RAM but
data is persisted.
There are many developers eager to play with the new
technologies. However, the biggest challenge of the new DBMS is
to convince organizations to put aside well known products and
start using their new solutions. This will not be an easy task
despite the fact that many of them are great products. They can
be extremely fast but it takes years to understand all users
requirements and provide the complete flexible solution that
will suit more than just a few cases.
Enterprise solutions are like LEGO. You can build out of it
anything that you want
*****************************************************