BIG data testing

HAPPY TESTING...!

When it comes to Big data testing, performance and functional testing are the key. In Big data testing QA engineers verify the successful processing of terabytes of data using commodity cluster and other supportive components. It demands a high level of testing skills as the processing is very fast.

Big Data Testing Strategy

Testing Big Data application is more a verification of its data processing rather than testing the individual features of the software product. When it comes to Big data testing, performance and functional testing are the key.
In Big data testing QA engineers verify the successful processing of terabytes of data using commodity cluster and other supportive components. It demands a high level of testing skills as the processing is very fast. Processing may be of three types

Along with this, data quality is also an important factor in big data testing. Before testing the application, it is necessary to check the quality of data and should be considered as a part of database testing. It involves checking various characteristics like conformity, accuracy, duplication, consistency, validity, data completeness, etc.

Testing Steps in verifying Big Data Applications

The following figure gives a high level overview of phases in Testing Big Data Applications

Big Data Testing can be broadly divided into three steps

Step 1: Data Staging Validation

The first step of big data testing, also referred as pre-Hadoop stage involves process validation.

Data from various source like RDBMS, weblogs, social media, etc. should be validated to make sure that correct data is pulled into system
Comparing source data with the data pushed into the Hadoop system to make sure they match
Verify the right data is extracted and loaded into the correct HDFS location

Tools like Talend, Datameer, can be used for data staging validation

Step 2: "MapReduce" Validation

The second step is a validation of "MapReduce". In this stage, the tester verifies the business logic validation on every node and then validating them after running against multiple nodes, ensuring that the

Map Reduce process works correctly
Data aggregation or segregation rules are implemented on the data
Key value pairs are generated
Validating the data after Map Reduce process

Step 3: Output Validation Phase

The final or third stage of Big Data testing is the output validation process. The output data files are generated and ready to be moved to an EDW (Enterprise Data Warehouse) or any other system based on the requirement.
Activities in third stage includes

To check the transformation rules are correctly applied
To check the data integrity and successful data load into the target system
To check that there is no data corruption by comparing the target data with the HDFS file system data

Architecture Testing

Hadoop processes very large volumes of data and is highly resource intensive. Hence, architectural testing is crucial to ensure success of your Big Data project. Poorly or improper designed system may lead to performance degradation, and the system could fail to meet the requirement. Atleast, Performance and Failover test services should be done in a Hadoop environment.
Performance testing includes testing of job completion time, memory utilization, data throughput and similar system metrics. While the motive of Failover test service is to verify that data processing occurs seamlessly in case of failure of data nodes

Performance Testing

Performance Testing for Big Data includes two main action

Data ingestion and Throughout: In this stage, the tester verifies how the fast system can consume data from various data source. Testing involves identifying different message that the queue can process in a given time frame. It also includes how quickly data can be inserted into underlying data store for example insertion rate into a Mongo and Cassandra database.
Data Processing: It involves verifying the speed with which the queries or map reduce jobs are executed. It also includes testing the data processing in isolation when the underlying data store is populated within the data sets. For example running Map Reduce jobs on the underlying HDFS
Sub-Component Performance: These systems are made up of multiple components, and it is essential to test each of these components in isolation. For example, how quickly message is indexed and consumed, mapreduce jobs, query performance, search, etc.

Performance Testing Approach

Performance testing for big data application involves testing of huge volumes of structured and unstructured data, and it requires a specific testing approach to test such massive data.

Process begins with the setting of the Big data cluster which is to be tested for performance

Test Environment Needs

Big data Testing Vs. Traditional database Testing

Properties		Big data testing
Data		Tester works with both structured as well as unstructured data
		Testing approach requires focused R&D efforts
		"Sampling" strategy in Big data is a challenge
		It requires special test environment due to large data size and files (HDFS)
Validation Tools		No defined tools, the range is vast from programming tools like MapReduce to HIVEQL

Image result for BIG DATA TESTING IMAGE IMAGES

Tools used in Big Data Scenarios

Big Data Cluster	Big Data Tools
NoSQL:	CouchDB, DatabasesMongoDB, Cassandra, Redis, ZooKeeper, Hbase
MapReduce:	Hadoop, Hive, Pig, Cascading, Oozie, Kafka, S4, MapR, Flume
Storage:	S3, HDFS ( Hadoop Distributed File System)
Servers:	Elastic, Heroku, Elastic, Google App Engine, EC2
Processing	R, Yahoo! Pipes, Mechanical Turk, BigSheets, Datameer

Challenges in Big Data Testing

Automation
Automation testing for Big data requires someone with a technical expertise. Also, automated tools are not equipped to handle unexpected problems that arise during testing
Virtualization
It is one of the integral phases of testing. Virtual machine latency creates timing problems in real time big data testing. Also managing images in Big data is a hassle.
Large Dataset
- Need to verify more data and need to do it faster
- Need to automate the testing effort
- Need to be able to test across different platform

Performance testing challenges

Diverse set of technologies: Each sub-component belongs to different technology and requires testing in isolation
Unavailability of specific tools: No single tool can perform the end-to-end testing. For example, NoSQL might not fit for message queues
Test Scripting: A high degree of scripting is needed to design test scenarios and test cases
Test environment: It needs special test environment due to large data size
Monitoring Solution: Limited solutions exists that can monitor the entire environment
Diagnostic Solution: Custom solution is required to develop to drill down the performance bottleneck areas

Summary

As data engineering and data analytics advances to a next level, Big data testing is inevitable.
Big data processing could be Batch, Real-Time, or Interactive
3 stages of Testing Big Data applications are
- Data staging validation
- "MapReduce" validation
- Output validation phase
Architecture Testing is the important phase of Big data testing, as poorly designed system may lead to unprecedented errors and degradation of performance
Performance testing for Big data includes verifying
- Data throughput
- Data processing
- Sub-component performance
Big data testing is very different from Traditional data testing in terms of Data, Infrastructure & Validation Tools
Big Data Testing challenges include virtualization, test automation and dealing with large dataset. Performance testing of Big Data applications is also an issue.

HAPPY TESTING

Comments

Hopper5 December 2018 at 07:25
This can be the helpful article in Heroku Vs Aws fact it is very beneficial along with proficient. for that reason, I want for you to thanks a lot to the attempts you cash in on in writing this information.
john peter22 March 2019 at 01:58
Your good knowledge and kindness in playing with all the pieces were very useful. I don’t know what I would have done if I had not encountered such a step like this.
https://tabsquareinfotech.com/website/best-it-company-in-chennai.php

Search This Blog

FULLSTACK||DEV2DEPLOY||TECH_LOVE