Data mining is the extraction of valuable information from raw data. Usually, this data is in large volumes, with high variability and streaming at tremendous velocity. Below are some top big data technologies used for data mining.


Presto is a distributed SQL query engine that is used for operating Analytic queries against a variety of data sources, e.g., Cassandra, Hadoop, MySQL, and MongoDB. One of the strengths of Presto is that it allows users to query data from several sources through one query.

However, on the downside, if a fault occurs with a worker node in Presto, that query automatically fails; there is no caching layer in Presto, so you won’t get good results whenever you have “hot” queries.


This is a centralized software package for mining data and running Predictive Analytics. Users can enter large volumes of raw data, e.g., databases & text, for instant & intelligent analyses. Additionally, Rapidminer allows for sophisticated workflows, with support scripted in many languages.

Some of the notable strengths of Rapidminer include user-friendliness and affordability. Regarding weaknesses, you need to know that sharing analyses from the Rapidminer studio is difficult. Also, Rapidminer is not very convenient for Business Analytics Dashboards.


Elasticsearch is a full-text search & analytics engine that allows users to store, search & analyze massive data volumes in near-real-time. It is used as the primary engine that controls a variety of apps with sophisticated features & requirements.

Some of the advantages of Elasticsearch include fast search and the ability to filter large datasets. Also, Elasticsearch allows for customizable analytics & reports through its dynamic aggregation engine. On the downside, Elasticsearch has a complicated ingest pipeline structure.