Architecture Overview to handle high volume data use cases

The architecture takes into account a real time IoT scenario.

Events are raised by Industrial Assets/Sensors on a periodic basis. These events will be intercepted using any standard MQTT broker [RabbitMQ, Apache Kafka, etc ] The Message Broker through its subscribing mechanism will ingest the streaming data in a Cassandra Database using Apache SPARK Stream. The Apache SPARK Streaming will ingest the data on a periodic basis the data coming from the assets/sensors. Here as a data store – Hbase can also be utilized instead of Cassandra.

Based on the frequency and the amount of data that these sensors emit, the volume of data ingested can be significantly high. To handle such huge volume of data, an archival system will be developed. Map Reduce Batch Jobs along with custom Java application will push to Hadoop’s File System (HDFS) from the Cassandra. These Map Reduce jobs will be scheduled on a periodic basis and various scheduling tools like Apache Oozie can be utilized. Since we are planning to develop in Cloudera environment, Cloudera Navigator can also be utilized in this regard.

A reporting system will be there to view the health and another Predictive information of the data coming from these industrial assets. The reporting system can query either the Cassandra data store or the HDFS to show the necessary information to the user. The reports queried from the Cassandra system will be of low latency ones while those queried from the HDFS will be of high latency ones.  Hive (HQL) / Pig / Map Reduce Jobs will be used to display the reports to the end users. For real time reporting Cassandra Query Language (CQL) can be utilized.

The entire workflow’s Security, Meta-Data Management will be handled by a Data Governance system. Tools like Apache ATLAS/ FALCON will be utilized to create this data governance system.

The cloud environment used here will be AWS.

The architecture diagram below provides more insights of the entire system:

twitterlinkedin

Architecture Overview to handle low to medium volume data use cases

Data can be raised from various sources like an IoT Device, IP/Non IP enabled industrial Asset, Streaming sources (stock exchange/Social Media) or from any COTS [ https://en.wikipedia.org/wiki/Commercial_off-the-shelf ]  source. These streaming source of data can be intercepted by standard Stream Brokers like KAFKA, RabbitMQ or by Apache Flume. The Stream Broker systems will pass the control to a SPARK System for realtime  data processing. The Stream Broker System at the same time will help to store the structured/unstructured data in a Hadoop system. Different database systems like Apache Parquet, Apache Avro, S3 or a Blob store can be used for different use cases. The SPARK system can be utilized to run any complex analytics and store the rolled up/ aggregated data in the Hadoop system for analytical processing. The processed data can be saved in a data reservoir for faster data access.  

***** Apache TEZ can be used instead of the Apache SPARK system

 

The architecture diagram below provides more insights of the entire system:

twitterlinkedin

Password encryption

A site’s FTP password has been encrypted on the similar pattern as described in http://www.jasypt.org/howtoencryptuserpasswords.html with few exceptions as described below. JASYPT library (version 1.9.1) has been used to encrypt/decrypt the FTP passwords.

Encrypt passwords using two-way techniques

This is done in a 2 way pattern since the application needs to decrypt the password saved in the database to send it to the FTP server for authentication.

Hence a digest is NOT created rather a BASE64 encoded string is saved in the database. StandardPBEStringEncryptor has been used to encrypt the FTP password. This class avoids byte-conversion problems related to the fact of different platforms having different default charsets, and returns encryption results in the form of BASE64-encoded Strings. This class is thread-safe.

Encryptor

import org.jasypt.encryption.pbe.StandardPBEStringEncryptor;//Create the encryptor StandardPBEStringEncryptor encryptor = new StandardPBEStringEncryptor();

Using a SALT

The salt is a sequence of bytes that is added to the FTP password before being encrypted to BASE64-encoded Strings. The salt consists of 2 parts:

  • Fixed Salt This sequence of characters is stored in the code base (e.g., compiled into the JAR file) and appended to every password before being encrypted/decrypted.

private final static String ENCRYPTION_PASSWORD_SALT = “dsfds#$%FD$#%#$%##$%#$%$%^DFHGSEDF”;

  • Random Salt is a random sequence of characters generated/computed for each password. This random set of characters is stored in the database. This allows to decouple the encryption logic for each password.

·        public static String nextSessionId() {·                   return new BigInteger(130, random).toString(32);  }

This random salt will be different for each FTP site. It is stored in the portal database (MS SQL Server) in a column in the table with columns containing the account name and encrypted password for each FTP site.

Encryption Algorithm

RandomSaltGenerator [org.jasypt.salt] has been used to determine the encryption algorithm. This by default uses the “SHA1PRNG” algorithm to encrypt the passwords.

Random SALT

EnvironmentStringPBEConfig config = new EnvironmentStringPBEConfig(); // create the configconfig.setSaltGenerator(new RandomSaltGenerator()); // pass the random salt generatorString fixedSalt = ENCRYPTION_PASSWORD_SALT;String randomSalt = nextSessionId();config.setPassword(fixedSalt + randomSalt);

Iterate the function

The iteration count refers to the number of times that the hash function with which we are encrypting is applied to its own results. Here, the algorithm is applied 4000 times

config.setKeyObtentionIterations(4000);

Finally Encrypt/Decrypt

Finally the BASE-64 ENCODED string is encrypted/decrypted using the StandardPBEStringEncryptor classes encrypt/decrypt method

twitterlinkedin

Using socket.io in heroku environment

Socket.IO enables real-time bidirectional event-based communication. It works on every platform, browser or device, focusing equally on reliability and speed. Here is a chat application that uses socket.io to understand the concepts of socket.io.

I have deployed the application in Heroku [Heroku is a cloud platform as a service (PaaS) supporting several programming languages. Heroku various programming language like Java, Node.js, Scala, Clojure, Python and PHP and Perl.] For more details of how to deploy your NODE JS app in Heroku , click here.

Express JS [Express is a minimal and flexible Node.js web application framework that provides a robust set of features for web and mobile applications] & socket.io is brought as dependencies through

package.json

"dependencies": {
"express": "4.8.5",
"socket.io": "1.0.6"
}

The package.json also lists the path from heroku will fetch the code and deploy in its environment & the node engine to be used for this application:
"engines": {
"node": "0.10.x"
},
"repository": {
"type": "git",
"url": "https://github.com/AnirbanKundu/node-chat-app"
}

Heroku looks for the

“script”

property in package.json for the initial set of instructions to run once the app is been deployed
"scripts": {
"start": "node server.js"
}

The application consists of a server.js file which listens to a predefined port.

var port = process.env.PORT || 5000;

The complete source code of the application can be found here

The WIKI page lists how to use Heroku environment.

A live version of the application can be found here.

To test the chat functionality, open multiple instances of chrome/firefox/safari and login with a unique username and start chatting. 🙂

twitterlinkedin

Learn, Share & Evolve